RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots

This paper introduces RoboCasa365, a large-scale simulation framework featuring 365 everyday tasks across 2,500 diverse kitchen environments and extensive human and synthetic demonstration data, designed to provide a reproducible benchmark for evaluating and advancing generalist robot policies through systematic analysis of task diversity, dataset scale, and environment variation.

Soroush Nasiriany, Sepehr Nasiriany, Abhiram Maddukuri, Yuke Zhu

Published 2026-03-05
📖 5 min read🧠 Deep dive

Imagine you want to teach a robot to be the ultimate housekeeper. You want it to be able to make toast, organize the fridge, wash dishes, and maybe even cook a full dinner, all while navigating a messy kitchen.

The problem? Teaching a robot this way is incredibly hard, expensive, and slow. If you try to teach it in a real kitchen, you might break a lot of dishes, the robot might get stuck, and you'd need thousands of hours of human time to show it what to do. Plus, how do you know if the robot is actually getting smarter, or if it just got lucky with one specific kitchen layout?

Enter RoboCasa365.

Think of RoboCasa365 as a "Massive, Infinite Virtual Kitchen Simulator" designed specifically to train and test these general-purpose robots. It's like a video game for robots, but instead of fighting dragons, the robot is learning to make a sandwich.

Here is a breakdown of what makes this paper special, using some everyday analogies:

1. The "Gym" for Robots (The Environment)

Imagine a gym where a human athlete trains. To get strong, they need to lift different weights, run on different terrains, and face different obstacles.

  • The Old Way: Most robot simulators were like a single, tiny room with one chair and one table. The robot learned to push that one chair, but if you put it in a real kitchen with a fridge and a stove, it was lost.
  • The RoboCasa365 Way: This framework is like a gym with 2,500 different rooms. Some kitchens are small and cluttered; others are huge and modern. Some have red cabinets, others have wood. The robot gets to practice in thousands of different "versions" of a kitchen so it learns the concept of a kitchen, not just one specific room.

2. The "365 Days of Cooking" (The Tasks)

The name "365" isn't random. It represents 365 different everyday tasks, one for every day of the year.

  • The Menu: The robot has to learn everything from simple things like "close the fridge" to complex, multi-step chores like "make a hot dog."
  • The Complexity: Making a hot dog isn't just one move. It's a chain reaction: Open the fridge -> Grab the sausage -> Put it on a plate -> Open the mustard -> Squeeze mustard -> Put the bun on the plate.
  • The Challenge: The paper tests if the robot can handle these long chains of events without forgetting the first step by the time it gets to the last one.

3. The "Tutor" (The Data)

You can't learn to play piano just by reading a book; you need to watch a master and then practice.

  • Human Teachers: The researchers recorded over 600 hours of real humans doing these tasks with a robot arm. This is the "master class."
  • The AI Copycats: To get even more practice, they used a clever tool called MimicGen. Think of this as a photocopier for robot movements. It took the human demonstrations and generated 1,600+ hours of new, slightly different variations.
    • Analogy: If a human shows the robot how to pour milk into a cup, MimicGen creates 10,000 new videos showing the robot pouring milk into a cup, but sometimes the cup is on the left, sometimes the right, sometimes the milk is cold, sometimes warm. This teaches the robot to be flexible.

4. The "Report Card" (The Benchmarks)

How do you know if the robot is actually smart? You need a standardized test.

  • The Exam: The paper sets up three different types of exams:
    1. Multi-Task Learning: Can the robot learn 300 different tasks at once without getting confused?
    2. Foundation Model Training: Can the robot learn a "general knowledge" base from the massive dataset, and then quickly learn a new, specific task with just a little bit of extra practice? (This is like learning general physics so you can easily learn how to build a specific bridge).
    3. Lifelong Learning: Can the robot learn a new skill today without forgetting how to do the skills it learned yesterday? (This is the hardest part, often called "catastrophic forgetting" in AI).

5. The Results: What Did They Find?

The researchers tested the smartest robot brains (AI models) available today on this new "gym."

  • Big Data Works: They found that training on huge, diverse datasets makes robots much better at generalizing.
  • Pre-training is Key: Just like a human student who reads a library of books before taking a specific exam, robots that were "pre-trained" on the massive RoboCasa365 data learned new tasks much faster and with less data than robots that started from scratch.
  • The Gap: Even with all this data, robots still struggle with very long, complex tasks (like making a full meal) and sometimes forget old skills when learning new ones. But, the paper proves that simulation is the key to getting us there.

The Bottom Line

RoboCasa365 is a massive, open-source playground. It's not just a dataset; it's a complete ecosystem that allows researchers to stop building their own tiny, broken kitchens and start testing their robot brains in a realistic, diverse, and huge virtual world.

It's the difference between teaching a child to swim in a bathtub versus teaching them in a massive, wave-filled ocean. RoboCasa365 is that ocean, and it's helping us figure out how to build robots that can actually live and work in our homes.