# MoE

Efficiency Over Scale: Bailing Unveils Ling-2.6-flash to Disrupt the Intelligence-Cost Curve

Bailing has launched Ling-2.6-flash, a 104B parameter model that uses Mixture of Experts (MoE) technology to activate only 7.4B parameters. It achieves benchmark parity with larger models while consuming only 10% of the tokens required by competitors like Nemotron-3-Super.

NeTe2026年4月22日 02:28

#Bailing LLM#Ling-2.6-flash#Artificial Intelligence

Gemini zodiac sign spelled with Scrabble tiles on a wooden table.

Technology

Alibaba’s Qwen3.5 Claims Gemini‑3‑Pro Parity at a Fraction of the Cost — A Shift from Scale to Efficiency

Alibaba has open‑sourced Qwen3.5‑Plus, a 397B‑parameter multimodal model the company says matches Gemini 3 Pro’s performance while operating with only ~17B activated parameters and much lower inference costs. The model emphasises architectural efficiency, native multimodal pretraining and agent capabilities, and forms part of a flurry of Chinese model launches that shift competition from raw scale to systems and cost efficiency.

NeTe2026年2月16日 18:25

#Alibaba#Qwen3.5#Gemini 3

Close-up of wooden Scrabble tiles spelling 'China' and 'Deepseek' on a wooden surface.

Technology

China’s DeepSeek Pushes Context Limits — and Triggers a Backlash Over a Colder, ‘Faster’ Model

DeepSeek activated a grayscale update extending context length to 1 million tokens, prompting user complaints that the assistant sounds colder and less personalised. Industry sources say the build is a speed‑focused variant intended to stress‑test long‑context performance ahead of a V4 launch, highlighting trade‑offs between throughput and conversational quality. The episode illustrates the wider tension in scaling LLMs: architectural gains can come at the cost of user experience and trust.

NeTe2026年2月12日 17:04

#DeepSeek#large language model#long context