Track 1
Unseen Environment Exploration and Closed-Loop Task Execution
Autonomous exploration, spatial-semantic modeling, planning, replanning, and exception recovery in hidden indoor environments.
World Model Challenge for Embodied Intelligence
Beyond perception-centric benchmarks: executable environment understanding, closed-loop decision-making, exception recovery, and rapid task adaptation.
EAGC 2026 evaluates world model capabilities for embodied intelligence in standardized simulated environments.
The challenge asks whether participating systems can construct executable environment or task representations from continuous observations and then use those representations for planning, execution, replanning, and recovery.
Unlike benchmarks that mainly measure final task success, EAGC 2026 also evaluates whether a system can maintain a structured internal model, update it over time, expose it through verifiable files, and use it when unexpected changes occur.
Unseen Environment Exploration and Closed-Loop Task Execution
Autonomous exploration, spatial-semantic modeling, planning, replanning, and exception recovery in hidden indoor environments.
One-Shot Demonstration-Driven Rapid Adaptation
Task abstraction, state-transition modeling, action-precondition reasoning, and rapid adaptation from one expert demonstration.
Embodied AI needs reusable, queryable, and action-guiding world models, not only reactive task execution.
Recent progress has advanced policy learning and task execution, but many systems still rely on reactive behavior, shortcut learning, or task-specific engineering. EAGC 2026 focuses on world models that can support structured memory, task-oriented reasoning, uncertainty handling, and transfer from limited supervision.
Systems should build models that support queries, prediction, and decision-making rather than only describing observations.
Systems should continuously use internal state for planning, execution, model updates, and recovery after unexpected changes.
The two tracks evaluate complementary world-model capabilities: environment modeling and task transfer.
Track 1 is a 90-minute closed-loop challenge in partially unseen indoor environments. The system must autonomously explore, build an executable world model, receive a natural-language task, and complete the task while handling a controlled exception.
| Stage | Time Budget | Requirement |
|---|---|---|
| Autonomous exploration | 20 minutes | Discover room connectivity, key objects, interactive states, and a queryable world model. |
| Task reception and planning | 10 minutes | Generate an execution plan from a natural-language instruction and the system's own world model. |
| Autonomous execution | 60 minutes | Execute the task, update the model, replan after exceptions, and complete the final objective. |
Each official hidden episode is evaluated as a single continuous closed-loop run. Unrestricted resets, repeated trial-and-error exploration, or manual intervention are not allowed unless explicitly specified by the official evaluator. The focus is not task completion alone. Instead, the evaluation asks whether a system truly constructs and updates an executable world model, including spatial topology, object states, affordances, task constraints, exception events, and recovery plans, rather than relying on reactive navigation or fixed scripted behaviors.
Door-lock state changes, object relocation, target container unavailability, and tool substitution. At most one exception will be triggered in each official episode, and every exception will preserve at least one recoverable solution path.
Track 2 is a one-demonstration learning challenge over constrained task families. The system receives one standardized expert demonstration and must extract a transferable task model before attempting an unseen instance.
| Stage | Time Budget | Requirement |
|---|---|---|
| Demonstration reception | 5 minutes | Receive one standardized expert demonstration. |
| Task parsing and adaptation | 15-20 minutes | Parse the demonstration, infer task state variables, and adapt within the official budget. |
| Autonomous execution | 25-30 minutes | Execute the task on the hidden instance. |
Teams may make no more than three official attempts per instance. The evaluation emphasizes whether the system extracts task variables, constraints, action preconditions, and success criteria rather than merely reproducing an observed trajectory.
Team registration is now open for EAGC 2026.
All participating teams must complete registration before submitting results. Teams may register for Track 1, Track 2, or both tracks.
Official scores are produced only in the EAGC runtime stack on hidden evaluation environments.
Public datasets should be used for training and task-design reference, not copied directly into official hidden tests.
Recommended combination: ProcTHOR + Habitat Challenge + ALFRED.
Recommended combination: RLBench + MimicGen + LIBERO.
Teams submit executable systems and structured model files for centralized hidden evaluation.
world_model.json; Track 2 submits task_model.json; both tracks submit episode_log.jsonl.All official scores are determined by the official hidden checker for fairness, reproducibility, and resistance to rule exploitation.
The evaluator may query submitted model files during and after an episode to check temporal consistency, topology and object-state correctness, affordance or precondition prediction, recovery traces, and consistency between model files and action logs.
All submission deadlines use 23:59 Anywhere on Earth (AoE) unless otherwise specified.
| Milestone | Date | Role In Ranking |
|---|---|---|
| Challenge announcement and website launch | May 8, 2026 | Not counted |
| Registration opens | May 8, 2026 | Not counted |
| Public Dataset, Task Rules, and Evaluation Protocol Release | May 19, 2026 | Not counted |
| Trial / dry-run submission | May 30-June 5, 2026 | Not counted |
| Trial testing period | June 6-June 9, 2026 | Not counted |
| Registration deadline | June 15, 2026 | Not counted |
| Environment v1.0 freeze | June 16, 2026 | Not counted |
| Qualification submission | June 17-June 26, 2026 | Used for qualification and finalist selection |
| Hidden validation testing | June 27-July 3, 2026 | Used for qualification and finalist selection |
| Finalists announced | July 5, 2026 | Not counted |
| Final frozen submission | July 8-July 15, 2026 | Counted |
| Official final testing | July 16-July 24, 2026 | Counted |
| Log audit, rerun, and reproducibility check | July 25-July 28, 2026 | Counted |
| Final ranking released to teams | July 29, 2026 | Counted |
| Public results and on-site final presentation | Early August 2026 | Final presentation |
EAGC 2026 recognizes overall performance, track-specific excellence, reproducibility, and research contribution.
Key participation and evaluation questions.
Yes. Teams may use private datasets, but all training data sources and usage must be disclosed in the technical report.
Lightweight online adaptation within the official time and compute budget is allowed. Unrestricted trial-and-error training or large-scale retraining is not allowed.
No. Internet access and online model APIs are not allowed during official evaluation.
No. Code release is not mandatory, but it is strongly encouraged. Special awards will prioritize reproducibility and community contribution.
Yes. The challenge is open to university teams, industry research teams, and independent developers worldwide.