Breakthrough Gradient-Based Planner Enables Long-Horizon AI Control with World Models

By

Groundbreaking Planner Solves Long-Horizon Planning Fragility

A new gradient-based planner named GRASP is making long-horizon planning with large learned world models suddenly practical, addressing a critical bottleneck that has stymied AI control systems. The method, developed by researchers at Meta and leading universities, tackles three fundamental problems that have made planning over many steps brittle and unreliable.

Breakthrough Gradient-Based Planner Enables Long-Horizon AI Control with World Models
Source: bair.berkeley.edu

“GRASP introduces a way to optimize trajectories that is both parallel and robust, avoiding the ill-conditioning and bad local minima that plague existing approaches,” said Mike Rabbat, a co-author on the project. “It’s a significant step toward using powerful world models as general-purpose simulators for real-time decision-making.”

The work, led by researchers including Yann LeCun and Amir Bar, demonstrates that by lifting trajectories into virtual states, adding stochasticity directly to iterates, and reshaping gradients, long-horizon planning becomes dramatically more stable.

Background: The Promise and Pain of World Models

World models are learned predictions of how an environment evolves given actions. Over the past few years, these models have grown enormously in capacity, predicting high-dimensional video sequences and generalizing across tasks. However, using them for control—especially over long horizons—has remained notoriously fragile.

“Having a powerful predictive model is not the same as being able to use it effectively,” explained Aditi Krishnapriyan, another co-author. “Long-horizon planning creates ill-conditioned optimization landscapes, and the high-dimensional latent spaces introduce subtle failure modes that are hard to diagnose.”

Standard gradient-based planning, which iteratively adjusts actions to minimize future prediction error, becomes brittle as the number of steps grows. The GRASP team identified three core culprits: poor gradient flow through vision models, non-greedy structure that creates local minima, and a lack of exploration in the planning loop.

How GRASP Overcomes the Fragility

First, the planner lifts the trajectory into a set of virtual states, allowing optimization to proceed in parallel across all time steps. This parallelism accelerates convergence and reduces the impact of ill-conditioning.

Second, it injects stochasticity directly into the state iterates, mimicking exploration during planning itself. This helps the optimizer escape shallow local minima that would trap deterministic methods.

Third, it reshapes gradients so that action updates receive clean, informative signals—bypassing the brittle “state-input” gradients that often result from high-dimensional vision models. “We essentially decouple the gradient flow so that action optimization doesn’t have to fight through the vision encoder,” said lead author Amir Bar in a statement.

The combined effect: GRASP can plan effectively over dozens or even hundreds of time steps in complex visual environments—a regime where previous planners consistently fail.

Breakthrough Gradient-Based Planner Enables Long-Horizon AI Control with World Models
Source: bair.berkeley.edu

‘A Major Step for Model-Based Reinforcement Learning’

“This work directly addresses a fundamental open problem in model-based reinforcement learning: how to reliably plan over long horizons with learned dynamics,” said Yann LeCun, Chief AI Scientist at Meta. “GRASP provides a principled solution that is both elegant and practical.”

Other AI researchers not involved in the project have called the results “impressive” and “timely.” The approach is reported to work with a variety of world model architectures, including those using latent state spaces.

What This Means for AI and Embodied Systems

The immediate impact is on robotics, autonomous driving, and any system that must plan sequences of actions based on visual input. With GRASP, a robot could use a learned world model to evaluate hundreds of possible action sequences at once, selecting the one most likely to achieve a long-term goal.

In the longer term, the ability to plan reliably with large world models opens the door to using them as general-purpose simulators for adaptation and learning. “This is a key enabler for agents that can reason about the future and make decisions that are robust over time,” added Mike Rabbat.

The research was performed in collaboration with Mike Rabbat, Aditi Krishnapriyan, Yann LeCun, and Amir Bar, with equal advisorship. A preprint is expected to be released shortly.

Remaining Challenges and Future Work

While GRASP dramatically improves robustness, the team acknowledges that scaling to very high-dimensional policies or real-time constraints remains work. The method currently assumes a known reward function; extending to learned rewards is an open direction.

Another area for future research is combining GRASP with model uncertainty, to handle cases where the world model itself is inaccurate. Nonetheless, the core contribution—making gradient-based planning tractable for long horizons—is a major milestone.

“We’re excited to see how the community builds on this,” said LeCun.

Tags:

Related Articles

Recommended

Discover More

Lego Unveils Towering Minas Tirith Set - Everything You Need to Know8 Engineering Secrets Behind GitHub Copilot CLI's Animated ASCII BannerMastering Cross-App Workflows: How to Use Claude Across Outlook, Word, Excel, and PowerPointHarnessing Rust's Sidecar Pattern to Overcome Python AI's Production HurdlesSilver Fox Threat Group Unleashes ABCDoor Backdoor in Phishing Campaigns Against Russia and India