Alibaba unveiled Qwen3.5‑Plus on Lunar New Year’s Eve, pitching the new open‑source model as a turning point in the large‑model era. The company says the 397‑billion‑parameter model achieves performance comparable to Google’s Gemini 3 Pro while running with dramatically lower activated parameters and far cheaper inference costs, a message aimed squarely at both enterprise customers and open‑source communities.
Qwen3.5‑Plus departs from Qwen’s previous text‑only pretraining by using mixed visual‑and‑text tokens in its base training. Alibaba describes a hybrid architecture that blends linear attention with a sparse mixture‑of‑experts (MoE) design and a proprietary gating mechanism — work the team says was described in a NeurIPS best‑paper contribution. The headline technical claim is that the model’s 397 billion total parameters require only about 17 billion active parameters at inference, delivering large‑model performance with much smaller memory and compute footprints.
That efficiency shows up in Alibaba’s throughput claims: relative to its larger Qwen3‑Max base, Qwen3.5 reduces deployment memory by roughly 60% and raises inference throughput substantially — roughly 8.6× in common 32K‑token contexts and up to 19× in 256K‑token scenarios. Alibaba also emphasizes training innovations — FP8/FP32 precision strategies and stability tweaks — that it says cut activation memory by about half and sped training by roughly 10% on mixed text, image and video token workloads.
Alibaba is positioning Qwen3.5 as a native multimodal model. The company reports best‑in‑class scores on a range of established multimodal benchmarks — from visual reasoning and VQA to OCR, spatial understanding and video tasks — and advertises stronger capabilities across reasoning, STEM and multilingual datasets. A cheaper API price (reported at 0.8 RMB per million tokens, roughly one‑eighteenth of Gemini 3 Pro’s advertised rate) and the model’s open‑source release are central to Alibaba’s competitive pitch.
Beyond benchmarks, Alibaba highlights practical agent and automation capabilities: Qwen3.5 can autonomously operate smartphones and PCs, orchestrate cross‑application workflows, and run plugin‑based agents at much larger scale thanks to an asynchronous reinforcement‑learning framework said to accelerate agent training three‑ to five‑fold. The smaller active footprint also makes high‑function agents more plausible on mobile and enterprise edge deployments.
The Qwen3.5 launch arrives amid a flurry of Chinese model announcements. Rival domestic players — from ByteDance’s Doubao 2.0 and Seedance 2.0 to MiniMax’s M2.5 and other open‑source flagships — have rolled out upgrades in recent weeks, signalling an aggressive domestic push to commercialize and localize advanced LLM capabilities.
Strategically, Qwen3.5 crystallizes a broader pivot in the industry away from raw parameter counts toward architectural and systems efficiency. If Alibaba’s performance and cost claims hold up under independent tests, lower inference prices and reduced hardware needs could democratize access for Chinese enterprises and startups, compress margin pools for cloud inference services, and force incumbents to rework pricing or feature strategies.
There are important caveats. Alibaba’s claims rest on proprietary benchmarks, and real‑world behaviour — safety, hallucination rates, adversarial robustness and long‑context coherence — will determine adoption at scale. The open‑source release raises familiar dual‑use concerns: lower cost and easier deployment can accelerate both benign innovation and misuse, complicating regulatory and export‑control debates.
For now, Qwen3.5 represents a tangible bet by Alibaba: that architectural innovation plus cloud infrastructure can outflank the sheer‑scale approach of some Western rivals. Whether it marks a durable model for competitiveness will depend on independent evaluations, third‑party integrations and how the market responds to a new wave of lower‑cost, high‑performance models.
