The agent loop

Every modern agent runs the same three-step loop: think, act, observe. Once you can see it, agents stop being magic.

30 minutesOne hands-on exerciseClaude or ChatGPT required

By the end of this lesson, you will:

Understand the think-act-observe loop and why it works.
Be able to read an agent transcript and identify what is happening at each step.
Have run your own mini-agent in the browser and watched the loop in action.

The loop, in three words

Every agent — from the simplest research bot to the most sophisticated coding assistant — runs the same loop. Three steps, repeated until the goal is reached:

Think. The language model looks at the goal, looks at what has happened so far, and decides what to do next.
Act. The model produces an action — a search query, a function call, a piece of code, a message to send.
Observe. Something in the outside world performs that action and returns a result. The result is fed back to the model.

And then the loop starts again. Think, act, observe. Think, act, observe. Until the model decides it has reached the goal — or runs out of steps.

This pattern has a name from a 2022 paper by Yao and colleagues: ReAct, short for Reason + Act. It is the foundation of every working agent framework today, including LangChain, LangGraph, Anthropic's tool use, and OpenAI's function calling. Once you can see the loop, you can read any agent transcript and follow what is happening.

A worked example

Let me show you what the loop looks like in practice. Imagine an agent has been given this goal:

Goal Find the population of New Zealand and tell me what fraction of the UK's population that represents.

Here is one run of the loop. I have annotated each step so you can see what is think, what is act, and what is observe.

Step 1 · Think I need two numbers: the population of New Zealand and the population of the UK. Let me search for them. Step 1 · Act search("population of New Zealand 2026") Step 1 · Observe "As of 2026, New Zealand has an estimated population of 5.27 million." Step 2 · Think Good. Now the UK. Step 2 · Act search("population of United Kingdom 2026") Step 2 · Observe "The UK population in 2026 is estimated at 68.4 million." Step 3 · Think I have both numbers. Now I need to calculate 5.27 ÷ 68.4 and express it as a fraction or percentage. Step 3 · Act calculator("5.27 / 68.4") Step 3 · Observe 0.0770... Step 4 · Think That is roughly 7.7%. I can express this as "about one thirteenth" or "roughly 7.7%". I will give both. The goal is reached; I can answer. Step 4 · Act answer("New Zealand's population is about 5.27 million; the UK's is about 68.4 million. So New Zealand is roughly 7.7% of the UK's population, or about one thirteenth.")

That is the entire loop. Three actions, three observations, four think steps, and the agent has answered. The agent decided which actions to take. Each action moved the work forward. The final answer drew on everything the agent had observed.

What the model actually does at each step

The "think" step is the most important and the least visible. What the model is actually doing is reading the entire conversation history — the goal, every previous action, every previous observation — and deciding, out of all the things it could do next, which one is most likely to move it closer to the goal.

The model does this in natural language. It reasons in words: "I have the New Zealand number; now I need the UK number." That reasoning step is what makes the agent able to recover when things go wrong. If the first search had returned no results, the model would have read that, thought "I need to search differently," and rephrased the query.

The "act" step is where the model commits to a specific action. In modern frameworks, this is usually a structured output — a function name with arguments — but at the model's level it is just text. The framework then parses that text and actually executes the action.

The "observe" step is the agent's window onto the world. Whatever the action did, the result of doing it comes back as text and is added to the conversation history. The model now has more context, and the loop starts again.

Aside · Why this is a loop and not a tree

Many agents in the literature look like trees — the model considers several possible actions, picks one, branches, and so on. ReAct collapses this into a loop: at each step the model decides one action, executes it, and moves on. The reason is practical. Loops are easier to reason about, easier to debug, and easier to deploy. Tree search is sometimes used when the cost of a wrong action is high, but for most production agents, the loop is enough.

When the loop ends

An agent loop terminates when one of three things happens.

The agent decides it is done. The model produces an "answer" or "complete" action that signals the goal has been reached. This is the normal case.
The agent hits a step limit. The system has a budget — say, fifteen steps maximum — and stops the agent when the budget runs out. This is a safety mechanism, because models sometimes get stuck in loops and would run forever.
The agent encounters an error it cannot recover from. A required tool fails, or the model produces output the framework cannot parse. The framework gives up and returns what it has.

A well-designed agent ends in case (1) the vast majority of the time. If you find yourself in case (2) often, the agent is either under-specified (it does not know when the goal has been reached) or the goal is too big for the budget.

Hands-on time

Exercise 2.1 · 20 minutes

Run an agent loop in Claude or ChatGPT

You will play the role of the framework. The model will play the role of the model, obviously. You will manually run the think-act-observe loop, step by step.

Open Claude.ai or ChatGPT in a new conversation.
Paste this exact starting prompt into the chat:
You are an agent. You will reach a goal by taking actions in a loop. At each step, you will: 1. Think: explain in one or two sentences what you need to do next. 2. Act: produce exactly one action from this list: - search("query") — to look something up - calculator("expression") — to compute a number - answer("text") — when you have reached the goal 3. Then stop and wait for me to give you the result. After I give you the result of your action, you continue with the next think-act step. Your goal: Find out the height of Mount Everest in metres, and tell me how many double-decker buses (each 4.4 metres tall) stacked on top of each other would equal that height. Begin with your first think and act step.
The model will produce its first think and act step. Read it carefully. Is the think step coherent? Is the action correct?
You now play the observe step. Make up a realistic-looking result for the action it took. For a search action, give it a reasonable number. For calculator, just compute it for real.
Paste your observation back as a reply, prefixed with Observation: and the result.
The model will then produce its next think and act step. Continue the loop. Keep going until the model produces an answer(...) action.
When you are done, scroll back and identify each think, act, and observe step in the conversation.

Tools required: Claude.ai or ChatGPT (free tier is fine).

What you should have seen

If you ran the exercise, you should have seen the model run a clean loop. Something like: search for Everest's height (about 8,849 metres), then a calculator action dividing by 4.4, then an answer of about 2,011 buses. Three or four steps in total.

What is interesting is what happened when you fed it a slightly off observation. If you told it "no results found" for the search, did the model rephrase and try again? It almost certainly did. That recovery — reading the observation, realising it was useless, and adjusting — is the loop doing its real work.

Why the loop is a big deal

Before the loop, what we had was an LLM that you could chat with. It could answer questions, but if a question required new information or multiple steps, you had to coordinate it yourself. You searched. You pasted the result back. You asked it to summarise. You then asked the next thing.

With the loop, the model coordinates itself. You give a goal. It decides what to do, does it, learns from the result, decides what to do next. The work that you used to do as a human in the middle is now done by the model. That is the unlock.

Of course, this only works when the model is good enough at deciding. We will see in Lesson 5 that the loop also makes failures more interesting — there are many ways an unsupervised loop can go wrong. But the foundation is here.

Self-check

What are the three steps of the agent loop?
What does the "think" step actually do, mechanically?
What are the three ways an agent loop can terminate, and which is the desirable one?
In your own words, why is the loop a bigger deal than just "an LLM that takes one turn"?

Looking ahead

In Lesson 3 we will look at what the "act" step actually does. We will give your agent real tools — a calculator, a search, a pretend email-sender — and watch it learn to use them. This is where agents start to feel powerful.

← Lesson 1 Lesson 3 — Giving an agent tools →