China’s Mianbi Intelligence, a startup co-founded by Tsinghua computer scientist Liu Zhiyuan and CEO Li Dahai, says the era when machines interact with people as naturally as other humans is approaching. In interviews this week the pair argued that recent advances in so‑called full‑modality models — systems that can listen, see and speak in parallel — are exposing a clear path toward embodied intelligence, even if the transition will be incremental rather than instantaneous.
Their message is both technical and practical. Liu, who is Mianbi’s chief scientist and a tenured professor, pointed out a basic mismatch between human perception and most contemporary machine interfaces: people can look, listen and talk simultaneously, whereas many AI applications still force serial, turn‑based exchanges. This limitation, he said, is a structural bottleneck for robots, wearables and smart devices that must operate in the messy, real world rather than behind a screen.
Li framed the challenge as an engineering trade‑off between cloud and edge models. Centralised cloud models offer power and breadth but raise privacy concerns and latency limits; on‑device models preserve privacy and responsiveness but are constrained by compute and power budgets. Mianbi argues that the commercial and technical path forward lies in a steady improvement of both ends — denser, smaller models that can run closer to users while remaining integrated with more capable cloud brains.
The company has a name for that view. Where many in the industry salivate over Scaling Laws — the idea that larger models improve predictably with scale — Mianbi promotes a complementary “Densing Law.” It claims model ability density can double roughly every 100 days, making continual retraining and incremental efficiency gains the crucial capability, and likens its role to a “photolithography machine” for successive generations of high‑density models.
Products already hint at the new paradigm. Li cited the example of the Doubao phone, which uses one of the sector’s stronger models to let an intelligent agent operate the device on a user’s behalf and complete multi‑step tasks. Yet he was candid about limits: even the best current models do not reliably reach acceptable completion rates on complex, real‑world tasks, and adding vision and continuous listening raises power and privacy hurdles on battery‑constrained phones.
That constraint helps explain why Mianbi and others see early commercial promise in automobiles, robots and other platforms with more generous power and compute budgets. In those settings, full‑modality models can exploit richer sensor suites and sustained power to build continuous contextual awareness — a prerequisite for embodied intelligence that can act autonomously and safely in the physical world.
Liu sketched a short and longer horizon. He expects a rapid iteration in capabilities over the next two to three years, driven by denser models and better on‑device execution, and a broader shift toward autonomous, self‑improving agents within one to two years. Over five to ten years he envisions many agents coordinating — a form of multi‑agent group intelligence that could change how systems handle complex, distributed tasks.
For entrepreneurs, Li stressed the competitive landscape is still open despite big tech participation. He advised startups to pick their strategy: take a slice of a very large market or seek leadership within a narrower niche. Commercialisation of edge models, he argued, will depend less on single product wins and more on ecosystems of developers and partners to reach billions of endpoints.
Why this matters: a practical, multi‑modal shift in human‑machine interaction would remake user experience across industries. It would permit devices to maintain ongoing context, reduce friction in multi‑step tasks, and enable robots and cars to reason about the physical world more like humans do. That, in turn, raises urgent policy questions about data privacy, on‑device security, power consumption and standards for safe autonomous behaviour.
Mianbi’s rhetoric and positioning reflect a broader theme in AI engineering: the move from scale‑as‑size to scale‑as‑proximity. The startup’s focus on continuous model improvement, compact architectures and developer ecosystems is an attempt to weaponise agility against the raw resource dominance of larger firms. Whether that strategy yields a breakthrough depends on hardware progress, software co‑design, and business models that align incentives for data, privacy and distribution.
For international observers the Chinese debate is instructive. It underscores that the next phase of AI is no longer just about bigger models hosted in giant data centres, but about practical deployments that must negotiate trade‑offs of latency, privacy and power. Companies that can compress capability into efficient on‑device systems while coordinating with cloud intelligence will likely shape how billions of people experience AI in everyday life.
