Embodied AI Grand Challenge 2026

World Model Challenge for Embodied Intelligence

Beyond perception-centric benchmarks: executable environment understanding, closed-loop decision-making, exception recovery, and rapid task adaptation.

ExploreTopology, objects, states
ModelQueryable world state
PlanSubgoals and paths
AdaptOne-shot task transfer
ExecuteClosed-loop control
RecoverException-aware replanning
Event FormatOnline evaluation and on-site final presentation
Final PresentationEarly August 2026, Changchun, China
Challenge TypeSimulation-based international algorithmic challenge

About The Challenge

EAGC 2026 evaluates world model capabilities for embodied intelligence in standardized simulated environments.

The challenge asks whether participating systems can construct executable environment or task representations from continuous observations and then use those representations for planning, execution, replanning, and recovery.

Unlike benchmarks that mainly measure final task success, EAGC 2026 also evaluates whether a system can maintain a structured internal model, update it over time, expose it through verifiable files, and use it when unexpected changes occur.

Track 1

Unseen Environment Exploration and Closed-Loop Task Execution

Autonomous exploration, spatial-semantic modeling, planning, replanning, and exception recovery in hidden indoor environments.

Track 2

One-Shot Demonstration-Driven Rapid Adaptation

Task abstraction, state-transition modeling, action-precondition reasoning, and rapid adaptation from one expert demonstration.

Why This Challenge Matters

Embodied AI needs reusable, queryable, and action-guiding world models, not only reactive task execution.

Recent progress has advanced policy learning and task execution, but many systems still rely on reactive behavior, shortcut learning, or task-specific engineering. EAGC 2026 focuses on world models that can support structured memory, task-oriented reasoning, uncertainty handling, and transfer from limited supervision.

Executable Representation

Systems should build models that support queries, prediction, and decision-making rather than only describing observations.

Closed-Loop Reasoning

Systems should continuously use internal state for planning, execution, model updates, and recovery after unexpected changes.

Competition Tracks

The two tracks evaluate complementary world-model capabilities: environment modeling and task transfer.

Track 1: Unseen Environment Exploration and Closed-Loop Task Execution

Track 1 is a 90-minute closed-loop challenge in partially unseen indoor environments. The system must autonomously explore, build an executable world model, receive a natural-language task, and complete the task while handling a controlled exception.

Evaluation Focus

  • Spatial topology discovery, key object recognition, and interactive state modeling.
  • Affordance modeling, planning, replanning, and structured world model output.
  • Stable exception recovery under controlled changes.

Official Procedure

StageTime BudgetRequirement
Autonomous exploration20 minutesDiscover room connectivity, key objects, interactive states, and a queryable world model.
Task reception and planning10 minutesGenerate an execution plan from a natural-language instruction and the system's own world model.
Autonomous execution60 minutesExecute the task, update the model, replan after exceptions, and complete the final objective.

Each official hidden episode is evaluated as a single continuous closed-loop run. Unrestricted resets, repeated trial-and-error exploration, or manual intervention are not allowed unless explicitly specified by the official evaluator. The focus is not task completion alone. Instead, the evaluation asks whether a system truly constructs and updates an executable world model, including spatial topology, object states, affordances, task constraints, exception events, and recovery plans, rather than relying on reactive navigation or fixed scripted behaviors.

Controlled Exception Library

Door-lock state changes, object relocation, target container unavailability, and tool substitution. At most one exception will be triggered in each official episode, and every exception will preserve at least one recoverable solution path.

Register for Track 1

Track 2: One-Shot Demonstration-Driven Rapid Adaptation

Track 2 is a one-demonstration learning challenge over constrained task families. The system receives one standardized expert demonstration and must extract a transferable task model before attempting an unseen instance.

Evaluation Focus

  • Demonstration abstraction, state-transition modeling, and action precondition reasoning.
  • Rapid online adaptation, task model generation, and few-shot generalization.
  • Robust execution under hidden geometry, pose, friction, tool, or goal variations.

Official Procedure

StageTime BudgetRequirement
Demonstration reception5 minutesReceive one standardized expert demonstration.
Task parsing and adaptation15-20 minutesParse the demonstration, infer task state variables, and adapt within the official budget.
Autonomous execution25-30 minutesExecute the task on the hidden instance.

Teams may make no more than three official attempts per instance. The evaluation emphasizes whether the system extracts task variables, constraints, action preconditions, and success criteria rather than merely reproducing an observed trajectory.

Register for Track 2

Registration

Team registration is now open for EAGC 2026.

All participating teams must complete registration before submitting results. Teams may register for Track 1, Track 2, or both tracks.

Runtime And Dataset Policy

Official scores are produced only in the EAGC runtime stack on hidden evaluation environments.

Allowed During Evaluation

  • Inference.
  • Memory updates.
  • Lightweight online adaptation within the official time budget.
  • Local open-source models and declared model weights.

Not Allowed During Evaluation

  • Internet access or online model APIs.
  • Hidden-ground-truth access.
  • Manual remote control.
  • Large-scale retraining.
  • Reverse engineering or redistributing hidden assets.

Recommended Training Resources

Public datasets should be used for training and task-design reference, not copied directly into official hidden tests.

Track 1 Resources

  • ProcTHOR / ProcTHOR-10K: primary public training environment and procedural reference.
  • Habitat Challenge / ReplicaCAD: mobile manipulation and hidden-test protocol reference.
  • ALFRED: language task templates and long-horizon task structure.
  • BEHAVIOR-1K: high-difficulty long-horizon household interaction reference.

Recommended combination: ProcTHOR + Habitat Challenge + ALFRED.

Track 2 Resources

  • RLBench: primary public training set and baseline platform.
  • MimicGen: demonstration expansion and task-variant generation.
  • LIBERO: transfer across object, goal, and spatial variations.
  • ManiSkill2 and RoboCasa: extended resources for object variation and future editions.

Recommended combination: RLBench + MimicGen + LIBERO.

Submission Requirements

Teams submit executable systems and structured model files for centralized hidden evaluation.

  1. Executable Docker container containing the inference system, memory update module, online adaptation module, and planner or policy module.
  2. Structured model files: Track 1 submits world_model.json; Track 2 submits task_model.json; both tracks submit episode_log.jsonl.
  3. Technical report in PDF format describing the method, world model design, training resources, compute budget, failure analysis, and reproducibility statement.
  4. Training-resource disclosure covering public datasets, private data, synthetic data, pretrained models, and external dependencies.
  5. Open-source statement. Code release is optional but strongly encouraged.

Evaluation Protocol

All official scores are determined by the official hidden checker for fairness, reproducibility, and resistance to rule exploitation.

The evaluator may query submitted model files during and after an episode to check temporal consistency, topology and object-state correctness, affordance or precondition prediction, recovery traces, and consistency between model files and action logs.

Track 1 Scoring

  • Task completion40%
  • World model quality20%
  • Exception recovery and replanning20%
  • Execution efficiency10%
  • Robustness and safety10%

Track 2 Scoring

  • Task success rate45%
  • Rapid adaptation capability20%
  • Generalization quality15%
  • Task model quality10%
  • Execution stability10%

Timeline

All submission deadlines use 23:59 Anywhere on Earth (AoE) unless otherwise specified.

MilestoneDateRole In Ranking
Challenge announcement and website launchMay 8, 2026Not counted
Registration opensMay 8, 2026Not counted
Public Dataset, Task Rules, and Evaluation Protocol ReleaseMay 19, 2026Not counted
Trial / dry-run submissionMay 30-June 5, 2026Not counted
Trial testing periodJune 6-June 9, 2026Not counted
Registration deadlineJune 15, 2026Not counted
Environment v1.0 freezeJune 16, 2026Not counted
Qualification submissionJune 17-June 26, 2026Used for qualification and finalist selection
Hidden validation testingJune 27-July 3, 2026Used for qualification and finalist selection
Finalists announcedJuly 5, 2026Not counted
Final frozen submissionJuly 8-July 15, 2026Counted
Official final testingJuly 16-July 24, 2026Counted
Log audit, rerun, and reproducibility checkJuly 25-July 28, 2026Counted
Final ranking released to teamsJuly 29, 2026Counted
Public results and on-site final presentationEarly August 2026Final presentation

Awards

EAGC 2026 recognizes overall performance, track-specific excellence, reproducibility, and research contribution.

Main Awards

  • Grand Champion.
  • Track 1 First, Second, and Third Prize.
  • Track 2 First, Second, and Third Prize.

Special Awards

  • Best World Model Design Award.
  • Best Open-Source Contribution Award.
  • Best Technical Report Award.
  • Best Academic Innovation Award.

FAQ

Key participation and evaluation questions.

Are private datasets allowed?

Yes. Teams may use private datasets, but all training data sources and usage must be disclosed in the technical report.

Is online learning allowed during official evaluation?

Lightweight online adaptation within the official time and compute budget is allowed. Unrestricted trial-and-error training or large-scale retraining is not allowed.

Are online model APIs allowed during official evaluation?

No. Internet access and online model APIs are not allowed during official evaluation.

Is code release mandatory?

No. Code release is not mandatory, but it is strongly encouraged. Special awards will prioritize reproducibility and community contribution.

Can undergraduate teams participate?

Yes. The challenge is open to university teams, industry research teams, and independent developers worldwide.