At the annual Yabuli China Entrepreneurs Forum on March 17, Wang Xingxing, founder and CEO of Yushu Technology, set out a crisp yardstick for what he calls a “ChatGPT moment” for embodied intelligence: robots that, when placed in unfamiliar environments, can complete about 80% of tasks in 80% of those scenes using voice or text commands. Wang cautioned that the industry is not quite there yet — he estimates at least two to three years before that threshold is reached, while allowing that progress could accelerate unexpectedly.
Wang used the definition to underscore a practical engineering view: motion capabilities are the gatekeeper for useful robotics. “Movement and doing work must advance in parallel,” he said, arguing that a rich repertoire of reliable physical actions is a precondition for robots to perform real-world tasks. In Wang’s framing, once humanoid platforms can execute a wide variety of elementary actions robustly, task-level utility follows by composing those actions under higher-level control.
The remark lands in an industry halfway between spectacular lab demos and wide commercial adoption. Large language models showed how rapid capability gains can feel sudden — the so-called ChatGPT moment that popularized AI conversational agents. But embodied intelligence fuses perception, actuation, control and learning, and each of those pieces still has notable gaps. Progress in locomotion, manipulation, sensor fusion and long-horizon planning has been steady, yet generalisation across unstructured, human environments remains a technical bottleneck.
Commercial deployments so far favor constrained settings: logistics, warehousing and repetitive industrial tasks where environments are controlled and safety envelopes are well defined. Consumer and service robots face a steeper path because homes and public spaces present richer variability and safety concerns. Hardware constraints — battery life, power-to-weight ratios, and durable actuators — alongside software challenges in object understanding and adaptive manipulation, help explain why Wang hedges his optimism with a modest timeline.
China’s robotics ecosystem gives additional context to Wang’s projection. A flurry of investments, new data-collection initiatives for embodied AI, and a cluster of startups scaling humanoid and legged platforms have created momentum. That momentum is reinforced by national industrial priorities to capture higher-value manufacturing and automation markets. Global competition — from Boston Dynamics to several North American and European startups — means breakthroughs will have commercial and geopolitical implications.
If Wang’s two-to-three-year horizon proves optimistic, the immediate consequence is a stepped rollout of capabilities: narrowly competent embodied agents first, then progressively more general systems as datasets, simulation fidelity and on-device compute improve. That pathway would produce meaningful economic effects long before fully general humanoid assistants arrive — changing labour mixes in logistics, elder care and light industrial roles — while also concentrating attention on safety, liability and regulatory frameworks.
Wang’s remarks are a useful corrective to both hype and pessimism. They sketch a near-term, measurable milestone rather than unspecified promises of humanlike robots. Whether the industry hits such a milestone within his timeframe depends on continued investment in integrated hardware-software systems, richer real-world training data and advances in robust control and perception. Even if the exact timetable slips, the strategic direction is clear: embodied intelligence is moving from research curiosity toward industrial reality.
