Latest news and articles about model inference
Total: 1 articles found
Mianbi Intelligence has released SALA, a hybrid sparse‑linear attention architecture, and a 9B model called MiniCPM‑SALA that claims large inference speed gains and support for up to one million token contexts. If independently validated, the design could make very long‑context applications feasible on mid‑sized models and a range of inference hardware.