ROMEOADVANCED ACADEMY

Lesson 1 of 5 · AI for Sport Analysts

Lesson 1

The shape of sport analytics in 2026

Four domains. One data revolution that started twenty years ago and is now hitting a different gear. A clear-eyed view of what AI changes and what it does not.

30 minutesReading and thinkingNo tools required

By the end of this lesson, you will:

  • Know the four working domains of sport analytics and what each one does.
  • Understand the layered data stack that any sport analytics workflow sits on top of.
  • Be able to describe, in two sentences, what AI genuinely changes about sport analytics — and what it does not.

Sport analytics, before we add AI

Sport analytics as a working discipline is roughly twenty years old. It was Billy Beane's Oakland Athletics in baseball, Sam Allardyce's Bolton in football, and it became a profession when clubs and federations started hiring people whose entire job was to read the data the cameras and clipboards had been generating for decades. By 2026, every elite club in every elite sport has at least one analyst on payroll. Many of them have departments.

The work falls into four domains. You almost certainly recognise the one you work in; the value of the map is in seeing where your domain fits relative to the others.

Domain 1 — Performance analytics

How fit are our athletes? Are they at risk of injury? What is their training load? When are they ready to play again? This is the domain of strength and conditioning staff, sport scientists, and physios. The data is mostly biometric (heart rate, GPS, accelerometers, sleep, blood markers) plus session logs from training. The output is usually a daily or weekly readiness brief for the coaching staff.

Domain 2 — Tactical analytics

What happened in the match? What is the opposition likely to do? How do we adjust? This is the domain of performance analysts working with coaches. The data is event data (every pass, tackle, shot), tracking data (where every player was for every frame), and video. The output is opponent reports, set-piece libraries, and in-match interventions.

Domain 3 — Scouting and recruitment

Who should we sign? At what price? Is this draft prospect what their numbers say they are? This is the domain of scouts and recruitment analysts. The data is league-wide player stats, contract markets, draft histories, and increasingly biomechanical and psychological profiles. The output is shortlists, valuation models, and transfer-window decisions.

Domain 4 — Fan, broadcast, and integrity

This domain has grown faster than the others. How do we personalise content for a fan? What do we show on the broadcast graphic? How do we detect match-fixing? What is the audience signal telling us about the next deal? The data is broadcast feeds, ticketing, betting markets, and social. The output is broadcast products, fan engagement campaigns, and integrity alerts.

Most working analysts sit in one of these domains. Some — particularly at smaller clubs and federations — wear two or three hats. The growth in the field is in domain 4, where commercial interest has poured in, and in domain 1, where biometric technology has made athletes the most-measured workers in the world.

The data stack underneath all four

Underneath the four domains is the same kind of data stack, even where the specific data differs. Working from the bottom up, it looks like this.

Sensors and feeds. The raw data sources. GPS trackers, accelerometers on jerseys, optical tracking systems in venues, manual event-coding, broadcast cameras, social-media APIs, betting feeds. Each is a stream.

Ingestion and normalisation. Data arrives in many formats. It has to be cleaned, deduplicated, anchored to a shared timeline, joined to player and match identifiers. This unglamorous layer is where most analytics projects either work or fail.

Storage and structures. A warehouse, a data lake, a relational database, or all three. The data is structured according to the entities of the sport — players, matches, teams, events, possessions.

Models. The classical sport-analytics models live here. Expected goals (xG), expected threat (xT), expected points added (EPA), VAEP, possession-value frameworks. Most of the field's productive output for the last decade has come from this layer — and most of it has been classical statistical modelling, not deep learning.

Workflows. The way analysts actually use the data day-to-day. SQL queries, R or Python scripts, Tableau dashboards, slide-deck templates, post-match meeting structures, recruitment-meeting agendas.

Communication. The output. A coach's whiteboard, a scout's PDF, a broadcaster's graphics overlay, a federation's reform recommendation. The data has to leave the analyst and land somewhere it can change a decision.

If you have worked in sport analytics for any length of time, this stack will be familiar. The job is mostly the workflows and communication layers — keeping a pipeline of timely, accurate, decision-relevant insight flowing from sensors to coaches under unrealistic time pressure.

What AI changes — honestly

AI does not change the four domains. The questions are the same: how fit, how to play, who to sign, how to engage fans. AI changes the workflows and communication layers in some specific ways.

The boring layers get cheaper. Ingesting a CSV someone emailed you. Reconciling two opposition reports. Writing the standard paragraphs of a weekly briefing. Producing the same graphic for three different audiences. AI accelerates all of these meaningfully. The hours you spend on these layers each week are the hours you reclaim for the layer above.

Video analytics scales. Five years ago, a club analyst with a tagging panel could code two matches a day. Computer vision now produces structured event data from raw broadcast feeds at scale. The first wave of this changed scouting — clubs now have league-wide event data for divisions they could never afford to scout manually. The second wave is changing tactical analytics, as automated tracking from a single broadcast camera becomes good enough to use.

The natural-language layer becomes possible. Asking the data a question in English and getting back a sensible answer is a new thing. It changes who can do basic analytics — a coach without a quant background can now interrogate a dataset directly, where previously they had to wait for the analyst to do it.

Cross-domain analytics gets easier. Joining a performance-analytics question to a scouting question — "what kind of midfielder, given how we want to play, given our injury history" — used to require an analyst who understood both. AI helps the analyst who understands one bring in the other.

What AI does not change

It does not produce judgement. A coach makes selection decisions on imperfect information weighted by relationships, dressing-room dynamics, and a thousand things the data does not see. AI is not going to replace that. It is going to give the coach better information to weigh.

It does not solve the small-sample problem. A football season has 38 league matches per team. A season's worth of data on a single player is sometimes 90 minutes total. Most of the questions analysts care about are inherently low-sample. AI does not fix this; it can sometimes make it worse, by producing confident-sounding answers about patterns that are not statistically real.

It does not predict outcomes. This is the most important thing. Asked "will Player X be a good signing", an AI can pattern-match against past signings and produce an answer. The answer carries the confidence of an oracle and the reliability of a coin toss. Sport — like markets in the trading-research course — is adversarial. Any edge that an off-the-shelf AI can find from public data has been found.

It does not understand your sport. The model knows what it has read on the internet. Your specific sport's nuances — the way a particular league plays on a particular surface in a particular month, the rivalries that shape decisions, the bias in your data — are things you teach the AI, not the other way around.

Aside · Why this matters for athletes

Through the course we will keep coming back to athlete data ethics. The data that powers sport analytics — heart rate, GPS, biometrics, even tactical event data — is in many jurisdictions the personal data of the athletes who generated it. FIFPRO, WADA, and several national legislators have started taking athlete data sovereignty seriously. We cover it properly in Lesson 5. The signal to take into Lesson 2 is: the bot we build will not store athlete data unnecessarily, will not be shared with people who should not see it, and will not be used to make decisions about athletes without those athletes' awareness.

What we are going to build

In Lesson 2 we start building a sport analytics assistant — a bot that can take a piece of sport data and help you analyse it. By the end of Lesson 5 you will have used it on real data, asked it to translate the analysis for three different audiences, and written a use policy for your own work.

The bot will not replace your judgement, your knowledge of your sport, or your relationships with the people who use your work. It will not predict matches. It will be a copilot for the parts of the workflow that the discipline has been complaining about for years: the ingestion, the boilerplate writing, the cross-checking, the audience-tailoring.

Self-check

  1. What are the four domains of sport analytics, in your own words?
  2. In which domain do you sit? Which adjacent domain has the most overlap with yours?
  3. Name two things AI genuinely changes about sport analytics, and two it does not.
  4. Why is the small-sample problem particularly important for sport?

Looking ahead

Lesson 2 is the build lesson. You will write a system prompt for a sport analytics assistant — including the athlete-data discipline, the no-prediction rule, and the sport-specific terminology setup. By the end of it the bot is ready to take real data.