At Cisco’s AI Summit in the early hours of 4 February 2026, Fei‑Fei Li — the computer vision pioneer often described as the “godmother of AI” — set out a compact but consequential thesis: the path to more capable, useful and humane artificial intelligence runs through three dimensions, not just words. Li, now founder of World Labs, argued that the next fundamental advance in AI will be spatial intelligence — models that genuinely understand, simulate and interact with physical 3D environments — and she unveiled Marble, World Labs’ first “world model.”
Marble is not another video‑generation tool. Li framed it as a persistent, navigable, physically consistent 3D environment generator that accepts multimodal prompts and produces worlds with geometric structure. That distinction matters: a photographic video that “looks” plausible does not provide the stable, manipulable representation a robot or a design team needs. Marble is intended to be a training ground and testbed, a virtual lab where agents can practice manipulation and where architects, filmmakers and clinicians can prototype spaces that behave according to geometric and physical rules.
The list of early use cases Li cited is eclectic and telling. Game studios and visual‑effects houses are already experimenting with rapid world construction; robotics teams are using Marble’s simulated environments to teach manipulation and navigation; architects translate floor plans into walkable 3D mockups; and — perhaps most strikingly — psychologists are exploring immersive simulations as a therapeutic tool for patients with obsessive‑compulsive disorder. Those examples indicate that spatial models can bridge consumer entertainment, industrial automation and medical research.
Li was candid about the practical barriers. Training a full‑blown world model runs into acute data and compute constraints that differ from language models. High‑quality, physically annotated 3D data is scarce compared with the oceans of text used to train large language models, and Marble today is several orders of magnitude smaller in compute scale than the biggest LLMs. World Labs therefore uses a hybrid data strategy: web‑scale images and video, synthetic simulation, and real‑world capture data of a type familiar to autonomous‑vehicle developers.
On robotics, Li urged sobriety. She warned against conflating the steady progress in narrow, task‑specific robots with a near‑term jump to fully general, dexterous machines. Automobiles operate mostly on a constrained, planar axis; robots that must touch and manipulate objects in 3D face a vastly higher dimensional challenge in terms of control, safety and reliability. The reality, she suggested, is a long march of incremental advances rather than a sudden arrival of general‑purpose humanoid aides.
Beyond technicalities Li deployed a civic measure for success. Borrowing the analogy of electrification, she argued that AI’s value should be judged by whether it advances civilisation — helping people thrive with dignity, prosperity and happiness — rather than by marvels of engineering alone. That normative framing pushes a human‑centred agenda into the engineering ethos of a technology ecosystem often fixated on benchmarks and market share.
For international observers, Li’s talk crystallises two shifts in the AI landscape. First, the field’s attention is moving from purely linguistic competence to embodied, physical competence; second, the competitive and regulatory stakes will hinge on access to specialised datasets, sensor hardware and simulation platforms as much as on raw compute. Companies that dominate cloud GPUs and LLM toolchains will need to partner with sensor‑rich industries — robotics, automotive, construction — to capture the rich, annotated data world models require.
If Marble and models like it scale, the consequences will be broad. Industries from construction to healthcare could lower prototyping costs and risk by rehearsing procedures in physics‑faithful virtual spaces. Robotics may accelerate in narrow domains that can be exhaustively simulated. At the same time, the appetite for real‑world capture raises new questions about privacy, intellectual property, and standards for simulation fidelity. Li’s insistence that AI serve human dignity is as much a political intervention as a moral one: it signals to funders, policymakers and engineers that debates over safety and benefit should be front and centre as spatial models are deployed.
Fei‑Fei Li’s intervention is a reminder that the architecture of AI’s next ascent may not be a single breakthrough but a stacking of modalities — perception, geometry, physics and language — into systems that operate in, and on, the actual world. Whether Marble becomes the first reliable brick in that edifice depends on data, compute, partnerships and a steady commitment to the societal ends Li says should define success.
