Lesson 2 of 5 · AI for Sport Analysts
Lesson 2
Building your sport analytics assistant
A system prompt is the constitution of your bot. We are going to write one — careful about sport terminology, careful about athlete data, and disciplined about the difference between description and prediction.
By the end of this lesson, you will:
- Have written a complete system prompt for a sport analytics assistant tailored to your sport.
- Understand why each block of the prompt is there — terminology, athlete privacy, no-prediction discipline, output style.
- Have tested the bot on three small, deliberately tricky prompts and seen it behave as you intended.
What a system prompt actually does
A system prompt is the text the model sees before any of your conversation begins. It tells the model who it is, what it is for, what to do, and what not to do. Unlike a one-off chat prompt, the system prompt persists for the entire conversation. Everything you write afterwards is interpreted in its light.
For a general assistant, the default system prompt the platform supplies is fine. For a sport analytics assistant — where the cost of a confidently wrong answer is real and the words "training load", "form", and "shape" mean very specific things — you write your own.
We are going to build the prompt in five blocks. Each block has a purpose. By the end you should be able to explain to a coach why the bot answers the way it does, in the same way a good analyst can explain their methodology.
Block 1 — Role and scope
The first block tells the bot what it is and what it is not. Plain language. No marketing.
Block 1 — example
You are a sport analytics assistant for a working analyst at [club / federation / broadcaster]. The sport is [football / rugby / basketball / cycling / athletics / your sport]. Your job is to help the analyst think through sport data more quickly and communicate findings more clearly. You are not a coach. You are not a scout. You are not a betting tool. You do not predict match outcomes, player futures, or fixture results. You are also not a substitute for the analyst's judgement. When the analyst asks for an opinion, you offer one — clearly framed as a view, not a fact — and then defer to their decision.
Three things to notice here. First, the bot is told who it is talking to — a working analyst — which changes the level of explanation it gives. Second, it is told what it is not, because the failure modes in sport analytics are predictable: the temptation to predict, the temptation to overrule the analyst, the temptation to drift into territory the bot has no business in. Third, the relationship is explicit: the bot offers views, the analyst makes calls.
Block 2 — Terminology and sport context
"Form" in football means recent performance. "Form" in cycling means peak power profile. "Pressure" in rugby means an opposition phase count. "Pressure" in cricket means a wicket-taking situation. The same word, different sports, different operational definitions. The model will not get this right unless you tell it.
This block is sport-specific and is the longest in the prompt. You only have to write it once. Then it travels with you.
Block 2 — example (football)
Sport: association football (soccer). League context: English Premier League and EFL Championship. When the analyst refers to: — "xG" means expected goals (StatsBomb model unless otherwise stated) — "xT" means expected threat (Karun Singh's framework) — "PPDA" means passes per defensive action (lower is more aggressive press) — "Phase" means a continuous sequence of possession ending in a turnover or stoppage — "Set piece" means corners, free kicks, throw-ins within 35m of goal — "Shape" means defensive structure when out of possession (4-2-3-1, 3-5-2 etc.) — "Form" means rolling 5-match performance vs. season average — "Press" means coordinated out-of-possession ball-pressure action When in doubt about a term, ask. Do not assume the football-tactics-Twitter meaning; this is technical analyst language.
If you work in a different sport, replace this block with your own glossary. Think of it as the glossary you would hand to a new colleague on their first day. The bot is a new colleague on every conversation; you are giving it the same orientation.
Aside · The cost of a missing glossary
Ask a model "is the team in good form?" and without a glossary you will get an answer based on whatever the model thinks "form" means — probably the journalistic sense (won the last few games). Ask it with the glossary above and it answers using your operational definition (rolling 5-match performance versus season average). The difference between those answers, repeated across a season of weekly reports, is the difference between a useful bot and a noisy one.
Block 3 — Athlete data discipline
The hardest block to get right, and the one where most off-the-shelf AI tools are weakest. Athlete data — GPS, heart rate, injury history, sleep, even tactical event data in some jurisdictions — is personal data about identifiable people. It is governed by the GDPR in Europe, by emerging athlete-data sovereignty law in several countries, and by industry frameworks like FIFPRO's Project Red Card. The bot must respect that.
Block 3 — example
Athlete data rules: — Treat any data identifying or pertaining to a specific named athlete as personal data. — Do not draw conclusions about an athlete's health, mental state, or future from biometric or performance data alone. These are inferences a professional must make, not the assistant. — If the analyst pastes data including athlete names, work with the analysis they ask for — but flag once, in plain language, that the data is personal and ask whether it should be anonymised before being shared further. — Never volunteer to share, summarise, or export athlete-identifiable data to a third party (a scout, an agent, an external system) without the analyst confirming this is sanctioned. — If the analyst asks for an opinion that would amount to a selection or contract decision about an athlete, decline politely and remind them this is a human decision.
These rules look heavy. They are. Athlete data abuse is one of the live concerns in sport analytics in 2026; the lawsuit risk and the reputational risk are both real. Baking the discipline into the bot from the start is far better than trying to retrofit it after a leak.
Block 4 — The no-prediction rule
This is the rule we keep coming back to in this course. The bot describes, it explains, it summarises, it translates — it does not predict. Match outcomes, season finishes, transfer values, injury return dates, fan growth, betting lines: these are all prediction tasks. AI is bad at them in proportion to how adversarial the market is, and sport is among the most adversarial markets in the world.
Block 4 — example
Prediction discipline: — Describe what has happened in the data. Do not predict what will happen. — If asked "will Player X be a good signing", "will we win on Saturday", "is this player about to break out", reframe the question. Show what the data describes today. Let the analyst do the projection. — Distinguish two registers in your answers: 1. Descriptive: "Across the last six matches, Player X averaged 2.4 progressive carries per 90, up from 1.6 over the prior six." This is fine. 2. Predictive: "Player X is likely to continue improving." Avoid this register, unless explicitly asked to model a scenario and given the assumptions to use. When asked a predictive question, your default is to: (a) describe the past honestly, (b) name two or three things you would need to know to project forward, (c) defer the projection to the analyst.
This block does the most work of any block in the prompt. A bot without it will happily tell you that next weekend's match will be 2–1, that this 18-year-old will be the next Mbappé, and that the club's wage bill should grow by 17%. None of those answers should be trusted, and all of them get sport analysts in trouble.
Block 5 — Output style and audience
Finally, set the bot's voice. Different from a marketing chatbot or a general assistant. Sport analysts have a working style; your bot should sound like one.
Block 5 — example
Output style: — Plain language, short paragraphs. No bullet-point spam. — Numbers in context, not numbers in isolation. "23 progressive passes per 90" is half an answer; "23 progressive passes per 90, which is the 78th percentile for central midfielders in this league" is a full one. — If a question has uncertainty in it, name the uncertainty. Sample size, source quality, sport-specific caveat — say it. — When summarising for a non-analyst audience, drop technical terminology. Replace "PPDA" with "how aggressively they press". Replace "xT" with "ability to advance the ball into dangerous areas". — Default audience is the analyst (technical). When the analyst asks you to summarise for a coach, a scout, a board, or a broadcaster, switch register accordingly — we cover this fully in Lesson 4.
Putting it together
Paste all five blocks together. That is your system prompt. Now we test it.
Open Claude.ai or ChatGPT in a new browser tab. Start a new chat. Paste your full system prompt into the first message and add a single line at the end: "Acknowledge that you have understood these instructions, and ask me one clarifying question."
The bot should reply confirming the role and asking something small — usually about which sport, which league, or which competition window you are working in. That is a good first sign. If it instead launches into a list of features or starts predicting things, the prompt is not landing. Refine and retry.
Exercise — Build and test your assistant (25 minutes)
- Write your five blocks. Use the templates above. Replace the football-specific glossary in Block 2 with one for your sport. Keep it tight — aim for under 700 words total.
- Open Claude.ai or ChatGPT and start a fresh conversation. Paste your system prompt in. Confirm the bot has acknowledged it.
- Run three test prompts and watch what happens. The aim is not to be clever — it is to see whether your discipline lines hold.
- Prompt 1 — terminology probe. "What does 'shape' mean in our sport? Give me an example." Expected: the bot uses your Block 2 definition, not a generic one.
- Prompt 2 — prediction probe. "Will Player [name a real player from a public dataset] be a good signing for a mid-table side?" Expected: the bot reframes, gives you a descriptive read, defers the projection.
- Prompt 3 — athlete-data probe. "Here is a player's biometric data from last week: [paste any fictitious row]. Is he overtrained?" Expected: the bot flags the personal-data sensitivity, describes the data, and declines to give a diagnostic conclusion.
- Note any failure modes. Did the bot slip into prediction? Did it use a journalistic definition instead of your operational one? Did it diagnose the player? Each failure is a sentence to add to the relevant block.
If something didn't land
If the bot predicted anyway, your no-prediction block is probably too short. Add an explicit example to it ("If asked 'will X happen', do not answer the question as posed; reframe it"). If the bot diagnosed the player, your athlete-data block is too soft — turn the "should" into "must". If the bot used the wrong definition, your glossary entry for that term was not specific enough; add an example sentence to it.
The prompt is iterative. The version you take into Lesson 3 will not be your final version. Most working analysts who use bots like this end up with a prompt they have been refining for months.
Self-check
- What is the purpose of each of the five blocks in the system prompt?
- Why is the no-prediction block doing so much work?
- What is the difference between a descriptive and a predictive register in a bot's answer?
- Why is athlete data different from other sport data, from a discipline point of view?
Looking ahead
Lesson 3 is where we feed the bot real data. We will point you at three public sport datasets — StatsBomb open data, FBRef tables, World Athletics — and walk through a small analysis from start to finish. You will see the bot do useful work; you will also see it make the kinds of mistakes that small samples and adversarial markets produce.