Open-Source Robotics Learning Datasets

A curated catalog of open-source datasets for robot manipulation, imitation learning, and reinforcement learning — with links to official sources.

Popular Categories

Popular Tags

Category
Tag

Datasets for Robot Learning

Each dataset has a dedicated page with description, scale, access links, and citations.

DROID dataset capture workflow
RSS 2024

DROID

76K trajectories, 350 hours, 86 tasks. In-the-wild manipulation from 50 collectors across 564 scenes. TensorFlow Datasets, Hugging Face.

View dataset → Official source ↗
BridgeData V2 engineering data setup
2023

BridgeData v2

60K trajectories, 24 environments, 13 manipulation skills. Low-cost WidowX robot. Natural language labels, multi-task learning.

View dataset → Official source ↗
Open X-Embodiment multi-robot data processing
Google DeepMind

Open X-Embodiment

1M+ episodes, 22 robot types, 500+ skills. Unified RLDS format. RT-X models. 33 institutions.

View dataset → Official source ↗
ALOHA teleoperation platform image
Stanford / NVIDIA

ALOHA

Bimanual teleoperation. ALOHA-Cosmos-Policy, baseline datasets. HDF5, Hugging Face. Open hardware.

View dataset → Official source ↗
LIBERO benchmark planning workflow
Benchmark

LIBERO

130 tasks, 65K demos. Lifelong learning benchmark. Spatial, object, goal suites. RoboSuite simulation.

View dataset → Official source ↗
RoboNet multi-platform robotics scene
Stanford / Berkeley

RoboNet

15M frames, 7 robot platforms. Multi-robot transfer. Sawyer, Franka, Baxter, Fetch, WidowX.

View dataset → Official source ↗
OpenArm open-hardware bimanual teleoperation platform
Reazon Research

OpenArm

Open-hardware bimanual manipulation platform with reference teleoperation datasets. Reproducible build, low-cost teleop.

View dataset → Official source ↗
MimicGen demonstration synthesis pipeline
NVIDIA

MimicGen

50K+ demos synthesized from ~200 human demos. Task suites for robust imitation learning in simulation and real.

View dataset → Official source ↗
RoboMimic dataset and pipeline image
ARISE Initiative

RoboMimic

Framework + reference datasets for learning from human demonstrations. Simulation + real. MIT license.

View dataset → Official source ↗
RT-X cross-embodiment foundation model training
Google DeepMind

RT-X

Cross-embodiment RT-1-X / RT-2-X policy training data, derived from Open X-Embodiment. Foundation-model scale.

View dataset → Official source ↗
LeRobot open robotics dataset ecosystem
Hugging Face

LeRobot

Standardized format + hub. DROID-100, ALOHA, SO-100. PyTorch, streaming. "ImageNet of robotics."

View dataset →

Models & Tools You Can Pair

Research-Ready Curation

We highlight scale, format, and access details needed for quick evaluation.

Cross-Stack Compatibility

Datasets are mapped to practical model and tool ecosystems.

Deployment Context

Dataset choices are linked with real robot execution constraints.

Scale-up Path

When open data is not enough, we support custom collection pipelines.

Need Custom Data?

We collect high-quality, learning-ready data for your specific tasks and hardware.