Open-Source Robotics Learning Datasets

A curated catalog of open-source datasets for robot manipulation, imitation learning, and reinforcement learning — with links to official sources.

Collection

Real-World Manipulation Data

Datasets with in-the-wild robot interactions and long-horizon tasks.

Collection

Benchmark-Centric Datasets

Suites designed for reproducible evaluation and cross-paper comparison.

Collection

Cross-Robot Ecosystems

Shared formats and multi-embodiment data for foundation model training.

Topic Clusters

High-Intent Dataset Guides

Built for users who search by workflow, industry, or decision intent rather than by a single named dataset.

Dataset Guide

Teleoperation datasets

Operator demos, retries, and bootstrapping workflows.

Dataset Guide

Contact-rich datasets

Tactile, force, and failure-heavy manipulation signals.

Industry Guide

Warehouse datasets

SKU variation, exception handling, and throughput context.

Industry Guide

Lab automation datasets

Repeatable protocols and benchmarkable workflows.

Pilot Guide

Humanoid datasets

Deployment-oriented data choices for humanoid teams.

OpenArm Guide

OpenArm datasets

Collection and packaging workflows around OpenArm.

Quick Browse

Popular Categories

Fast Tags

Popular Tags

Catalog

Datasets for Robot Learning

Each dataset has a dedicated page with description, scale, access links, and citations.

RSS 2024

DROID

76K trajectories, 350 hours, 86 tasks. In-the-wild manipulation from 50 collectors across 564 scenes. TensorFlow Datasets, Hugging Face.

View dataset → Official source ↗

2023

BridgeData v2

60K trajectories, 24 environments, 13 manipulation skills. Low-cost WidowX robot. Natural language labels, multi-task learning.

View dataset → Official source ↗

Google DeepMind

Open X-Embodiment

1M+ episodes, 22 robot types, 500+ skills. Unified RLDS format. RT-X models. 33 institutions.

View dataset → Official source ↗

Stanford / NVIDIA

ALOHA

Bimanual teleoperation. ALOHA-Cosmos-Policy, baseline datasets. HDF5, Hugging Face. Open hardware.

View dataset → Official source ↗

Benchmark

LIBERO

130 tasks, 65K demos. Lifelong learning benchmark. Spatial, object, goal suites. RoboSuite simulation.

View dataset → Official source ↗

Stanford / Berkeley

RoboNet

15M frames, 7 robot platforms. Multi-robot transfer. Sawyer, Franka, Baxter, Fetch, WidowX.

View dataset → Official source ↗

Reazon Research

OpenArm

Open-hardware bimanual manipulation platform with reference teleoperation datasets. Reproducible build, low-cost teleop.

View dataset → Official source ↗

NVIDIA

MimicGen

50K+ demos synthesized from ~200 human demos. Task suites for robust imitation learning in simulation and real.

View dataset → Official source ↗

ARISE Initiative

RoboMimic

Framework + reference datasets for learning from human demonstrations. Simulation + real. MIT license.

View dataset → Official source ↗

RT-X cross-embodiment foundation model training

Google DeepMind

RT-X

Cross-embodiment RT-1-X / RT-2-X policy training data, derived from Open X-Embodiment. Foundation-model scale.

View dataset → Official source ↗

Hugging Face

LeRobot

Standardized format + hub. DROID-100, ALOHA, SO-100. PyTorch, streaming. "ImageNet of robotics."

View dataset →

Linked Resources

Models & Tools You Can Pair

Research-Ready Curation

We highlight scale, format, and access details needed for quick evaluation.

Cross-Stack Compatibility

Datasets are mapped to practical model and tool ecosystems.

Deployment Context

Dataset choices are linked with real robot execution constraints.

Scale-up Path

When open data is not enough, we support custom collection pipelines.

Need Custom Data?

We collect high-quality, learning-ready data for your specific tasks and hardware.

Request Data Contact Us