# MoE
Latest news and articles about MoE
Total: 3 articles found

Efficiency Over Scale: Bailing Unveils Ling-2.6-flash to Disrupt the Intelligence-Cost Curve
Bailing has launched Ling-2.6-flash, a 104B parameter model that uses Mixture of Experts (MoE) technology to activate only 7.4B parameters. It achieves benchmark parity with larger models while consuming only 10% of the tokens required by competitors like Nemotron-3-Super.

Alibaba’s Qwen3.5 Claims Gemini‑3‑Pro Parity at a Fraction of the Cost — A Shift from Scale to Efficiency
Alibaba has open‑sourced Qwen3.5‑Plus, a 397B‑parameter multimodal model the company says matches Gemini 3 Pro’s performance while operating with only ~17B activated parameters and much lower inference costs. The model emphasises architectural efficiency, native multimodal pretraining and agent capabilities, and forms part of a flurry of Chinese model launches that shift competition from raw scale to systems and cost efficiency.

China’s DeepSeek Pushes Context Limits — and Triggers a Backlash Over a Colder, ‘Faster’ Model
DeepSeek activated a grayscale update extending context length to 1 million tokens, prompting user complaints that the assistant sounds colder and less personalised. Industry sources say the build is a speed‑focused variant intended to stress‑test long‑context performance ahead of a V4 launch, highlighting trade‑offs between throughput and conversational quality. The episode illustrates the wider tension in scaling LLMs: architectural gains can come at the cost of user experience and trust.