February 17, 2026
Why Sim-to-Real Transfer Fails — And What's Actually Missing
The sim-to-real gap is widely acknowledged as a critical bottleneck in robotics and autonomous systems. The standard solution — domain randomization — adds random noise to bridge the gap. But what if the gap isn't random at all?
The Standard Approach and Its Limits
When a robot trained in simulation encounters the real world, its performance degrades. This is the sim-to-real gap, and it is one of the most persistent challenges in physical AI development.
The dominant approach to addressing this gap is domain randomization — varying textures, lighting, physics parameters, and object placements during training so that the model becomes robust to variation. The theory is straightforward: if the simulation covers a wide enough range of conditions, the real world will fall somewhere within that range.
This works for some things. Lighting variation, texture changes, basic sensor noise — these are effectively random, and random variation in training handles them well.
But much of what makes the real world different from simulation is not random. It is structured, meaningful, and human.
The Human Layer
Consider a delivery robot navigating a hotel corridor. In simulation, the corridor is empty or populated with agents moving along predictable paths. In reality:
A housekeeper has parked her cart at an angle that blocks exactly 60% of the corridor — not because she was careless, but because that specific angle allows her to reach both the room and the cart without extra steps. This positioning was never taught. It was optimized through years of physical practice.
Two guests are standing in a way that technically allows passage but socially does not — they are having a private conversation and their body language signals that interruption would be unwelcome. No physics simulation captures this.
The lighting in this section is different at 3 PM because the afternoon sun hits a window that was not in the building's original design — it was added during a renovation that no dataset documents.
None of these situations are random. They are the product of human behavior, social dynamics, and environmental adaptation. And they represent the vast majority of what makes sim-to-real transfer fail in human environments.
Random Noise vs. Structured Variation
Domain randomization treats the sim-to-real gap as a problem of insufficient variation. Add more noise, and the model becomes more robust. But there is a fundamental difference between random noise and meaningful variation.
Random noise has no information content. It makes models tolerant of unpredictable conditions, but it does not teach them anything about how the world actually works.
Structured variation — what we call Yuragi — carries information. The housekeeper's cart angle, the guests' body language, the afternoon lighting pattern — these are not random. They are selected patterns, behaviors that survived because they work. Training on these patterns teaches an AI system something fundamentally different from training on noise.
The sim-to-real gap is not a randomness problem. It is a missing data problem — and the missing data is human reality.
A Different Approach to Transfer
Instead of adding random variation to simulated environments, what if we added real variation captured from human environments?
This is the Yuragi approach to sim-to-real transfer. Rather than generating synthetic noise, we capture the actual patterns of human behavior, social dynamics, and environmental adaptation that define real-world conditions. This data is structured, annotated, and designed for integration with existing World Model architectures.
The result is not a replacement for domain randomization — it is a complementary layer. Physics-based variation handles the physical gap. Yuragi data handles the human gap. Together, they address both dimensions of the sim-to-real problem.
Explore Our Data Specifications
See the categories, formats, and quality standards for Yuragi data products designed for physical AI and robotics applications.
Data Specifications →