At the Aabori forum this month, Wang Xingxing, founder of Shanghai-based Yushi Technology, painted an assertive near-term picture for humanoid robots while warning that truly generalised, language-driven robotic intelligence remains some way off.
Wang said Yushi’s second-generation machine, launched in 2024, has become one of the top-selling humanoid platforms worldwide: the company estimates roughly 5,000 humanoid robots shipped globally last year, with its 1.3-metre, lightweight model accounting for most deployments. The smaller size and a reported entry price of about RMB 30,000 (roughly $4,000) have encouraged wider adoption for research, performance and pilot commercial uses.
Sporting benchmarks have captured public imagination. At a Beijing humanoid-robot athletics meet in August, Yushi machines won the 1,500m, 400m and the 4×100m obstacle relay; Wang said the 1,500m prototype ran in just over six minutes, faster than most of his staff. He predicted that by mid-year humanoids — especially Chinese models — would be able to outpace humans over sprint distances, with 100-metre times dipping under ten seconds.
Yushi’s team also emphasised cultural projection: its robots performed a kung-fu routine on the Spring Festival Gala that included complex martial-arts moves and acrobatics. Wang framed the program as both a technical showcase and a form of cultural diplomacy that resonated abroad, where clips generated substantial international attention.
On the technical front, Wang was candid about limits. Current systems lack robust generalisation: machines trained for a handful of scenes succeed reliably there but fail when environments change. He set out a concrete test for an embodied-AI ‘ChatGPT moment’ — a robot that, when dropped into an unfamiliar scene, can carry out roughly 80% of language-directed tasks about 80% of the time without prior mapping or rehearsal. He thinks that moment is likely two to three years away, a more cautious estimate than some in the industry.
His favored route to that leap is the ‘‘world model’’ approach built from video generation. Wang highlighted ByteDance’s January release of Seedance 2.0 as a landmark in video-generation quality and argued that aligning generated video with robot motion could produce the generalized embodied models robots need.
Data scarcity is the immediate operational constraint, he said. To address it, Yushi plans to deploy thousands — potentially up to ten thousand — robots by year-end and collect around ten hours of telemetry per day per unit. Those data, Wang argued, would rapidly accelerate training and close a major bottleneck for embodied learning.
Despite showy demos and rising shipments, industrial deployment remains largely pilot-stage. Wang acknowledged that efficiency and success rates for factory or logistics tasks lag what is needed for large-scale commercial rollout. He therefore positions Yushi’s strategy as parallel development of locomotion and manipulation: movement capability must be solved before robots can reliably “do the work” at scale.
Wang ended with a market hypothesis: if embodied intelligence crosses a critical threshold, demand could spike from thousands to millions of units a year. He said China’s current first-mover position gives domestic companies a rare window of opportunity to shape that market.
