Giving an agent tools

Tools are how an agent acts on the world. Without them, the model just thinks. With them, it can do.

30 minutesOne hands-on exerciseClaude or ChatGPT required

By the end of this lesson, you will:

Know what a "tool" means in the context of an agent.
Understand how function-calling works at a conceptual level.
Have given your own agent two tools and watched it use them.

What is a tool?

A tool is a function the model is allowed to call. The model cannot call it directly — instead, it produces output that says "I would like to call this tool with these arguments", and the framework around the model actually calls it and returns the result.

A tool can be almost anything. The most common are:

Search. Take a query, return a list of results.
Calculator. Take an expression, return a number.
Retrieval. Take a query, look it up in a knowledge base, return matching documents.
Code execution. Take a snippet of code, run it, return the output.
API call. Send a request to some external service — Stripe, Salesforce, your own database — and return the response.
Email or message. Compose and send a message to a person.

From the model's perspective, all tools look the same: a name, a description of what the tool does, and a list of arguments it accepts. The model decides which tool to call, with which arguments, based on the goal and the conversation so far.

Why give the model tools at all?

Language models are very good at reasoning, writing, and summarising. They are not good at things that require fresh information, exact computation, or real-world side effects.

Fresh information is the obvious one. The model's training data has a cutoff. It does not know who won yesterday's match or what your bank balance is. Search and retrieval tools fix that.

Exact computation matters because models are approximating language patterns, not running arithmetic. Ask a model to multiply two seven-digit numbers and you will often get a wrong answer with high confidence. Give it a calculator tool and it will produce the right answer every time.

Real-world side effects — sending an email, booking a flight, updating a database — are things the model cannot do at all without tools. The model is a function from text to text. To affect anything outside that, it needs a tool that does the affecting.

How function-calling works, at a glance

Every major LLM provider — Anthropic, OpenAI, Google, Meta — supports function-calling in some form. The mechanics are nearly identical.

The developer tells the model, in advance, which tools are available. This is done in a structured way: for each tool, the developer gives the model a name, a one-line description, and a list of arguments with their types and what they mean. This is called the tool schema.

{ "name": "search", "description": "Search the web for current information. Returns up to 5 results.", "parameters": { "query": "the search query (string)" } }

When the model decides to use a tool, instead of producing prose, it produces a structured output naming the tool and the arguments. Something like:

{ "tool": "search", "arguments": { "query": "population of New Zealand 2026" } }

The framework intercepts this, runs the actual search function with that query, gets a result, and feeds the result back to the model as an observation. The model continues the loop with the result available.

The model never executes anything on its own. It only ever produces text or structured output. The framework — which is just normal software, written by a developer — does the actual execution. This separation is what makes agents controllable: the developer decides which tools exist, and the framework decides whether to allow each call.

Aside · Tools versus skills versus plugins

Different ecosystems use different words for the same thing. OpenAI calls them functions or tools. Anthropic calls them tools. LangChain calls them tools or sometimes skills. ChatGPT plugins were an early version of the same idea. Underneath, they are all the same pattern: a function the model is allowed to invoke through structured output.

Choosing tools well

This is where engineering judgement starts to matter. The temptation is to give the model every tool you can think of. The right approach is to give it the smallest set of tools that lets it accomplish its goals.

Two reasons. First, every additional tool is more for the model to consider at each step. With three tools, the model has a clear choice. With thirty, it gets confused, picks wrong, or wastes steps. Second, every tool is a security surface. If an agent has a tool that can send money, the worst case is much worse than if it only has a tool that can search the web.

A reasonable starting set for a research agent is: search, retrieval, and answer. For a coding agent: file-read, file-write, run-tests. For a customer support agent: look-up-customer, look-up-order, send-message, escalate-to-human.

The pattern is to start small, watch the agent work, and add tools only when you see it fail because a needed capability is missing.

The hands-on bit

Exercise 3.1 · 20 minutes

Give your agent two tools

You will repeat last lesson's exercise, but this time you will give the model two tools, watch it pick between them, and see it produce a goal-specific result.

Open a fresh Claude.ai or ChatGPT conversation.
Paste this prompt:
You are an agent with access to two tools. You will reach a goal by taking actions in a loop. Available tools: - calculator(expression) — computes any mathematical expression. Returns a number. - weather(city, date) — returns the forecast for a city on a date. Returns a short string like "sunny, 22°C, light winds". At each step you will: 1. Think (one or two sentences): what do I need to do next? 2. Act: produce exactly one of: calculator(...), weather(...), or answer(...). 3. Stop and wait for me. I will give you observations. You continue with the next think-act step. Goal: I am hosting a small outdoor dinner in Stockholm on Saturday for 8 people. I need to know the weather (so I can decide on tarpaulin and heaters), and I need to know how much firewood to buy — assume each guest needs 1.5 kg, and add 20% for spares. Begin.
The model will produce a first think-act. It might call weather("Stockholm", "Saturday") first, or it might call calculator first. Either is fine.
You play the observe step. For weather, just make up something plausible like "Cloudy, 12°C, light rain, 5–8 m/s wind". For calculator, just compute the answer.
Continue until the model produces an answer(...). The answer should give both the weather (so you can plan tarpaulin and heaters) and the total firewood needed (which should be 8 × 1.5 × 1.2 = 14.4 kg).
Now try the same prompt again, but this time give the model only the calculator tool. Remove weather from the list. See how it behaves. Does it admit it cannot answer the weather part? Does it try to make up an answer? What does it do well, and what does it do badly?

Tools required: Claude.ai or ChatGPT (free tier is fine).

What you should notice

Two things worth watching for.

First, the model chose tools sensibly. It used weather for weather questions and calculator for arithmetic. This is not magic — the model is using the descriptions you gave it to decide. If your descriptions are unclear, the model will pick the wrong tool, or pick the right tool but with wrong arguments. A well-written tool schema is half of agent engineering.

Second, when you removed the weather tool, the model usually does one of three things: it admits it cannot answer the weather part of the question, it tries to make up plausible weather, or it asks you to provide the weather. The first behaviour is what you want from a well-behaved agent. The second is dangerous — the agent inventing facts is a failure mode we will look at in detail in Lesson 5.

What tools cannot fix

Tools are powerful but they are not a substitute for a good agent design. A few things they will not solve.

Bad reasoning. If the model picks the wrong tool, or picks the right tool but at the wrong time, more tools will not help. The fix is a clearer goal or a better model.

Bad arguments. If the model calls weather("Stockholm", "next Saturday") but your weather tool expects an ISO date, it will fail. Either fix the tool to be more lenient, or train the model with examples of the correct format.

Hallucinated tool calls. Sometimes the model invents a tool that does not exist, or uses a real tool with non-existent arguments. The framework should reject these and tell the model what went wrong, so the model can correct itself.

Privilege escalation. A tool that can send email can be used to send spam. A tool that can run code can be used to delete files. The tools you give the agent define what damage it can do. Choose carefully — and we will come back to this in Lesson 5.

Self-check

What is a tool, in agent terms?
Why give an LLM a calculator when it can do arithmetic anyway?
How does function-calling work in two sentences?
Why is "give the agent every tool you can think of" a bad idea?

Looking ahead

So far, our agents have all been reactive: they take one action at a time, observe the result, and decide the next action. In Lesson 4 we will introduce planning — getting the agent to break a big goal into a sequence of sub-goals before it starts. This is what makes agents useful for tasks that take ten steps instead of three.

← Lesson 2 Lesson 4 — Multi-step planning →