For two years the consensus held that physical intelligence could only be learned from real interaction data. Simulation hit a wall: as Sergey Levine argued, stronger models get better at detecting the seams in surrogate data, so the skills that transfer to reality keep shrinking. Real data was the only path, and real data is slow, expensive, and scarce.
That framing is breaking, and two things changed at once. World models now learn “what happens next” from internet-scale video, giving robots physics priors without hand-coded simulators or armies of teleoperators. And a thin layer of real demonstrations now bootstraps enormous synthetic scale. NVIDIA’s Manipulation-Augmented dataset turns 10 human teleoperated demos into 1,000 domain-randomized examples. The real world becomes the seed, not the substrate.
The data confirms the shift. Robotics went from 1,145 datasets on Hugging Face in 2024 to 26,991 in 2025, climbing from rank 44 to 1, with synthetic generation as the primary driver. And this week, NVIDIA launched Cosmos 3, an open model that collapses vision reasoning, world generation, and action prediction into a single system. Cosmos runs as a physics-grounded simulator that predicts approaches, evaluates them in a closed loop, and converges on behavior without real-world risk.
We think the moat is moving. Not to whoever owns the best model or the most teleoperators, but to whoever builds the best learned simulator. Reality becomes the verification step, and the data flywheel that used to live on customer floors now runs in software. The teams that own the simulator own what comes out of it.

