Embodied AI Grand Challenge 2026 | World Model Challenge for Embodied Intelligence

About The Challenge

EAGC 2026 evaluates world model capabilities for embodied intelligence in standardized simulated environments.

The challenge asks whether participating systems can construct executable environment or task representations from continuous observations and then use those representations for planning, execution, replanning, and recovery.

Unlike benchmarks that mainly measure final task success, EAGC 2026 also evaluates whether a system can maintain a structured internal model, update it over time, expose it through verifiable files, and use it when unexpected changes occur.

Track 1

Unseen Environment Exploration and Closed-Loop Task Execution

Autonomous exploration, spatial-semantic modeling, planning, replanning, and exception recovery in hidden indoor environments.

Track 2

One-Shot Demonstration-Driven Rapid Adaptation

Task abstraction, state-transition modeling, action-precondition reasoning, and rapid adaptation from one expert demonstration.

Why This Challenge Matters

Embodied AI needs reusable, queryable, and action-guiding world models, not only reactive task execution.

Recent progress has advanced policy learning and task execution, but many systems still rely on reactive behavior, shortcut learning, or task-specific engineering. EAGC 2026 focuses on world models that can support structured memory, task-oriented reasoning, uncertainty handling, and transfer from limited supervision.

Executable Representation

Systems should build models that support queries, prediction, and decision-making rather than only describing observations.

Closed-Loop Reasoning

Systems should continuously use internal state for planning, execution, model updates, and recovery after unexpected changes.

Competition Tracks

The two tracks evaluate complementary world-model capabilities: environment modeling and task transfer.

Track 1: Unseen Environment Exploration and Closed-Loop Task Execution

Track 1 is a 90-minute closed-loop challenge in partially unseen indoor environments. The system must autonomously explore, build an executable world model, receive a natural-language task, and complete the task while handling a controlled exception.

Evaluation Focus

Spatial topology discovery, key object recognition, and interactive state modeling.
Affordance modeling, planning, replanning, and structured world model output.
Stable exception recovery under controlled changes.

Official Procedure

Stage	Time Budget	Requirement
Autonomous exploration	20 minutes	Discover room connectivity, key objects, interactive states, and a queryable world model.
Task reception and planning	10 minutes	Generate an execution plan from a natural-language instruction and the system's own world model.
Autonomous execution	60 minutes	Execute the task, update the model, replan after exceptions, and complete the final objective.

Each official hidden episode is evaluated as a single continuous closed-loop run. Unrestricted resets, repeated trial-and-error exploration, or manual intervention are not allowed unless explicitly specified by the official evaluator. The focus is not task completion alone. Instead, the evaluation asks whether a system truly constructs and updates an executable world model, including spatial topology, object states, affordances, task constraints, exception events, and recovery plans, rather than relying on reactive navigation or fixed scripted behaviors.

Controlled Exception Library

Door-lock state changes, object relocation, target container unavailability, and tool substitution. At most one exception will be triggered in each official episode, and every exception will preserve at least one recoverable solution path.

Register for Track 1

Track 2: One-Shot Demonstration-Driven Rapid Adaptation

Track 2 is a one-demonstration learning challenge over constrained task families. The system receives one standardized expert demonstration and must extract a transferable task model before attempting an unseen instance.

Evaluation Focus

Demonstration abstraction, state-transition modeling, and action precondition reasoning.
Rapid online adaptation, task model generation, and few-shot generalization.
Robust execution under hidden geometry, pose, friction, tool, or goal variations.

Official Procedure

Stage	Time Budget	Requirement
Demonstration reception	5 minutes	Receive one standardized expert demonstration.
Task parsing and adaptation	15-20 minutes	Parse the demonstration, infer task state variables, and adapt within the official budget.
Autonomous execution	25-30 minutes	Execute the task on the hidden instance.

Teams may make no more than three official attempts per instance. The evaluation emphasizes whether the system extracts task variables, constraints, action preconditions, and success criteria rather than merely reproducing an observed trajectory.

Register for Track 2

Registration

Team registration is now open for EAGC 2026.

All participating teams must complete registration before submitting results. Teams may register for Track 1, Track 2, or both tracks.

Register for Track 1 Register for Track 2

Runtime And Dataset Policy

Official scores are produced only in the EAGC runtime stack on hidden evaluation environments.

Allowed During Evaluation

Inference.
Memory updates.
Lightweight online adaptation within the official time budget.
Local open-source models and declared model weights.

Not Allowed During Evaluation

Internet access or online model APIs.
Hidden-ground-truth access.
Manual remote control.
Large-scale retraining.
Reverse engineering or redistributing hidden assets.

Recommended Training Resources

Public datasets should be used for training and task-design reference, not copied directly into official hidden tests.

Track 1 Resources

ProcTHOR / ProcTHOR-10K: primary public training environment and procedural reference.
Habitat Challenge / ReplicaCAD: mobile manipulation and hidden-test protocol reference.
ALFRED: language task templates and long-horizon task structure.
BEHAVIOR-1K: high-difficulty long-horizon household interaction reference.

Recommended combination: ProcTHOR + Habitat Challenge + ALFRED.

Track 2 Resources

RLBench: primary public training set and baseline platform.
MimicGen: demonstration expansion and task-variant generation.
LIBERO: transfer across object, goal, and spatial variations.
ManiSkill2 and RoboCasa: extended resources for object variation and future editions.

Recommended combination: RLBench + MimicGen + LIBERO.

Submission Requirements

Teams submit executable systems and structured model files for centralized hidden evaluation.

Executable Docker container containing the inference system, memory update module, online adaptation module, and planner or policy module.
Structured model files: Track 1 submits world_model.json; Track 2 submits task_model.json; both tracks submit episode_log.jsonl.
Technical report in PDF format describing the method, world model design, training resources, compute budget, failure analysis, and reproducibility statement.
Training-resource disclosure covering public datasets, private data, synthetic data, pretrained models, and external dependencies.
Open-source statement. Code release is optional but strongly encouraged.

Evaluation Protocol

All official scores are determined by the official hidden checker for fairness, reproducibility, and resistance to rule exploitation.

The evaluator may query submitted model files during and after an episode to check temporal consistency, topology and object-state correctness, affordance or precondition prediction, recovery traces, and consistency between model files and action logs.

Track 1 Scoring

Task completion40%
World model quality20%
Exception recovery and replanning20%
Execution efficiency10%
Robustness and safety10%

Track 2 Scoring

Task success rate45%
Rapid adaptation capability20%
Generalization quality15%
Task model quality10%
Execution stability10%

Timeline

All submission deadlines use 23:59 Anywhere on Earth (AoE) unless otherwise specified.

Milestone	Date	Role In Ranking
Challenge announcement and website launch	May 8, 2026	Not counted
Registration opens	May 8, 2026	Not counted
Public Dataset, Task Rules, and Evaluation Protocol Release	May 19, 2026	Not counted
Trial / dry-run submission	May 30-June 5, 2026	Not counted
Trial testing period	June 6-June 9, 2026	Not counted
Registration deadline	June 15, 2026	Not counted
Environment v1.0 freeze	June 16, 2026	Not counted
Qualification submission	June 17-June 26, 2026	Used for qualification and finalist selection
Hidden validation testing	June 27-July 3, 2026	Used for qualification and finalist selection
Finalists announced	July 5, 2026	Not counted
Final frozen submission	July 8-July 15, 2026	Counted
Official final testing	July 16-July 24, 2026	Counted
Log audit, rerun, and reproducibility check	July 25-July 28, 2026	Counted
Final ranking released to teams	July 29, 2026	Counted
Public results and on-site final presentation	Early August 2026	Final presentation

Awards

EAGC 2026 recognizes overall performance, track-specific excellence, reproducibility, and research contribution.

Main Awards

Grand Champion.
Track 1 First, Second, and Third Prize.
Track 2 First, Second, and Third Prize.

Special Awards

Best World Model Design Award.
Best Open-Source Contribution Award.
Best Technical Report Award.
Best Academic Innovation Award.

FAQ

Key participation and evaluation questions.

Are private datasets allowed?

Yes. Teams may use private datasets, but all training data sources and usage must be disclosed in the technical report.

Is online learning allowed during official evaluation?

Lightweight online adaptation within the official time and compute budget is allowed. Unrestricted trial-and-error training or large-scale retraining is not allowed.

Are online model APIs allowed during official evaluation?

No. Internet access and online model APIs are not allowed during official evaluation.

Is code release mandatory?

No. Code release is not mandatory, but it is strongly encouraged. Special awards will prioritize reproducibility and community contribution.

Can undergraduate teams participate?

Yes. The challenge is open to university teams, industry research teams, and independent developers worldwide.