Multi-step planning

Reactive agents take it one step at a time. Planning agents look ahead. Both are useful — knowing when to use which is the skill.

40 minutesOne hands-on exerciseClaude or ChatGPT required

By the end of this lesson, you will:

Understand the difference between reactive and planning agents.
Know when planning helps and when it gets in the way.
Have asked an agent to plan a small project, then watched it execute the plan.

The two ways an agent can work

Through Lessons 2 and 3, every agent we have seen worked one step at a time. The model thought, acted, observed, then thought again. It never looked beyond the next action.

This is called a reactive agent. It is simple, easy to reason about, and works well for short tasks. But it has a weakness: at any moment, the model only sees the immediate next step. If the task needs ten steps, the model might wander, repeat itself, or miss something obvious.

A planning agent works differently. Before it starts acting, it writes a plan — a list of sub-goals or actions to work through. Then it executes the plan, in order, observing results and updating the plan as it goes.

Planning agents trade simplicity for foresight. They are more reliable on long tasks. They are also harder to debug, slower to start, and more likely to over-plan trivial work.

A worked example

Imagine the goal is: "Prepare a one-page brief for tomorrow's meeting on whether we should invest in carbon offsets."

A reactive agent might begin by searching for "carbon offset investment". It would read a few results, summarise, and then — depending on what it read — either produce a one-page brief or ask follow-up questions. The result might be good, or it might be unfocused.

A planning agent, given the same goal, would first produce a plan. Something like:

Plan: 1. Search for "what are carbon offsets and how do they work" — to ground the brief in facts. 2. Search for "carbon offset markets 2026 quality and integrity" — to find current concerns. 3. Search for "corporate carbon offset strategies case studies" — to find concrete examples. 4. Synthesise the findings into three sections: what offsets are, what they cost, the case for and against. 5. Write a one-page brief. 6. Review for length and clarity.

Then it would execute step 1, observe what it found, and move to step 2. After each step, the model can update the plan — add a step, remove one, or rephrase the next step based on what it just learned.

The planning agent's brief will usually be more focused and better structured. The reactive agent's will sometimes be excellent, sometimes scattered.

When planning helps and when it hurts

Planning helps when the task is genuinely multi-step, when the steps depend on each other, when getting the structure right matters more than getting started fast, and when the cost of going down the wrong path is high.

Planning hurts when the task is short and obvious, when the task is exploratory and you do not know the right shape in advance, when conditions change rapidly and any plan becomes stale quickly, and when over-planning becomes a way for the agent to avoid actually doing the work.

The honest answer for most production agents: a hybrid. Start with a light plan, execute the first few steps, then re-plan based on what you found. Modern agent frameworks like LangGraph make this explicit — there is a planning node and an execution node, and the system loops between them.

Aside · Tree of Thoughts and other research

There is a richer body of research on agent planning than this lesson covers. "Tree of Thoughts" (Yao et al., 2023) has the agent explore several plan branches in parallel and pick the most promising. "Reflexion" (Shinn et al., 2023) has the agent write a critique of its plan after each run and improve it. "ReWOO" (Xu et al., 2023) separates planning from execution entirely. These approaches are useful when you need higher reliability and can afford the extra compute. For the work we are doing in this course, simple sequential planning is enough.

Common planning failure modes

Planning agents fail in characteristic ways. A few worth knowing about, because they are surprisingly common.

Over-planning. The agent writes an enormous plan with twenty steps for a task that needed three. The cure is to tell the agent explicitly: "Write a plan with no more than five steps. Bias toward fewer steps."

Frozen plan. The agent writes a plan, starts executing, finds early that the plan is wrong — and continues anyway, ignoring what it learned. The cure is to explicitly tell the agent to re-evaluate the plan after each step, and to update it if needed.

Plan-but-do-not-execute. The agent produces an elaborate plan and then, instead of executing it, gets stuck refining the plan or just summarises the plan as if it were the answer. The cure is to prompt the agent to "now execute step 1" explicitly, and to flag this as a failure mode the agent should avoid.

Step explosion. The agent decomposes a sub-step into more sub-steps, which become more sub-steps, until the plan is fractal. The cure is to set a hard depth limit and a hard step count limit.

Hands-on time

Exercise 4.1 · 25 minutes

Plan-then-execute a small project

You will compare a reactive run and a planning run on the same task. Pay attention to which one produces a better result.

Open a fresh Claude.ai or ChatGPT conversation.
Reactive first. Paste this prompt:
You are an agent. Your goal: produce a one-page briefing note for the board on whether our startup should adopt OpenTelemetry as our observability standard. You have one tool: search(query). Use it freely. Take one action at a time. For each step: think (one or two sentences), act, then wait for me to give the observation. Begin.
Walk it through, just like Lesson 2 and 3. For each search, give it a plausible-looking result. After about 4–5 search results, ask it for the final brief.
Save the brief somewhere (notes, scratchpad, anywhere).
Planning version. Start a new conversation. Paste this prompt:
You are a planning agent. Your goal: produce a one-page briefing note for the board on whether our startup should adopt OpenTelemetry as our observability standard. Available tools: search(query), write(text). Before you take any action, produce a plan with no more than 5 steps. Then execute the plan, one step at a time. After each step, briefly check whether the plan still makes sense. Update the plan if needed. Continue until the brief is produced. Begin with the plan.
Walk it through the same way. Note where the model updates the plan, where it sticks to it, and where it goes off-script.
Compare the two briefs. Which is better? Which is more focused? Did the planning version miss anything that the reactive version caught? Did the reactive version wander where the planning version stayed on track?

Tools required: Claude.ai or ChatGPT (free tier is fine).

What you should notice

Most of the time, the planning version will produce a more structured brief. Three clear sections, balanced arguments, a recommendation. The reactive version will sometimes be excellent — particularly if the searches happened to land on good material — but it will also sometimes drift.

The interesting moment is when the planning agent updates its plan. If you watched closely, the model probably said something like "I had planned to search for cost data next, but the previous search already gave me costs — I will skip that step and go directly to writing." That dynamic re-planning is what makes a planning agent more than a glorified to-do list.

You may also have seen the failure modes from the section above. If the model wrote an over-elaborate plan, or refused to leave the plan once it started executing, or just produced the plan as if it were the answer, you have seen the real challenges of planning agents in practice.

How real production agents balance the two

The most reliable agents today use a hybrid pattern. The model is asked to write a high-level plan — three to five steps — then it executes the first step reactively. After each step, the agent re-checks whether the plan still applies. If not, it revises. This is the pattern in Claude's agentic loop, in OpenAI's o-series with tool use, and in LangGraph-based production systems.

The reason it works is that humans do the same thing. You plan a day, but you do not pre-plan every micro-decision. You revise as you go. An agent that plans-then-revises is, in a small way, mirroring how humans actually work.

Self-check

What is the difference between a reactive and a planning agent?
Give one example where planning helps, and one where it hurts.
Name two common failure modes of planning agents and how you would mitigate them.
What does the hybrid plan-then-revise pattern look like, in your own words?

Looking ahead

We have one lesson left. In Lesson 5 we look at where agents go wrong in the real world — and the patterns that keep them out of trouble. By the end of it, you will have written a one-paragraph governance plan for the agent you have been building.

← Lesson 3 Lesson 5 — When agents go wrong →