ROMEOADVANCED ACADEMY
Not financial advice. Education only.

Lesson 4 of 5 · Build a Market Research Bot

Lesson 4

Pattern recognition is not prediction

AI is genuinely good at spotting patterns in language. It is genuinely bad at predicting markets. The whole course depends on you understanding the difference.

40 minutesOne hands-on exerciseMultiple news articles required

By the end of this lesson, you will:

  • Be able to articulate what AI is genuinely good at in market research and what it is not.
  • Understand the seductive trap of pattern-as-prediction, and how it shows up in practice.
  • Have used your bot to characterise sentiment in a set of news articles, and then critiqued its summary against the articles themselves.

The thing AI is really good at

Language models are excellent at finding patterns in text. They were trained to predict the next word from billions of examples of how words go together. As a result, they are very good at things that involve recognising linguistic structure.

In market research, this is genuinely useful for:

  • Sentiment. Is the language in this earnings call hopeful, cautious, or defensive? Is it more hopeful than last quarter's call?
  • Framing. Where does management put the emphasis? What do they describe in detail, and what do they describe in one sentence?
  • Omissions. What is a typical document of this type expected to discuss that this one does not?
  • Hedging. Where does the language become unusually careful — "we believe", "subject to certain conditions", "while we cannot predict"?
  • Cross-document comparison. What is different about this year's risk factors compared with last year's?

These are real patterns. They reflect genuine information about how the company is positioning itself. Skilled human researchers — analysts, journalists, lawyers — have always done this work. AI does it faster and more consistently than a tired human at 11pm.

The thing AI is bad at

Where things go wrong is the move from pattern to prediction. Two examples will show the difference.

Example one. The bot reads four earnings calls from a software company and reports: "The language has shifted from optimistic in 2024 to cautious in 2026. References to 'growth' have decreased by 60%. References to 'efficiency' have increased by 200%."

This is a pattern. It is real. It is in the text. It is useful information for a researcher.

Example two. The bot adds: "Based on this shift in tone, the stock is likely to underperform in the next quarter."

This is not a pattern. This is a prediction. And it is, almost always, wrong — not because the language shift is unimportant, but because the market has already absorbed all this information. Every analyst covering the company has read the same transcripts. The price reflects whatever the consensus has made of them. The bot has no edge in turning the same pattern into a forecast.

This is the trap. The language model is so good at producing fluent text that, when asked to predict, it will produce fluent predictions. The predictions will be confident. They will sound plausible. They will be based on the patterns the bot actually found. And they will be, on any given day, no more reliable than guessing.

Aside · The efficient market hypothesis, in one paragraph

The orthodox academic position is that publicly available information is already priced into a stock. If a language model can read a 10-K and notice that the risk language is more cautious than last year, so can every analyst covering the company. The price reflects their reading. Patterns the AI sees in the same data therefore have, in expectation, no predictive power for future returns. There are caveats — markets are not perfectly efficient at all times — but for public, slowly-moving information of the kind language models read, the orthodox position is right enough to plan around.

Why the bot wants to predict anyway

If you ask a language model "what does this language shift suggest about the company's future", it will tell you. With a tone of expertise. The reason is not that the model knows. The reason is that the training data is full of human writers — journalists, analysts, bloggers — making predictions in exactly this voice. The model has learned to produce that voice. It does not know the predictions in its training data were mostly wrong.

This is why your system prompt from Lesson 2 is so important. The "you will not predict prices, returns, or market direction" line is what stops the model from doing what comes naturally to it. Without that line, the model will slip into prediction territory the moment you ask a question that opens the door.

What useful sentiment analysis looks like

Sentiment work is in the sweet spot of the bot. Done well, it is informative. Done badly, it is the worst kind of pseudo-quant noise. The difference is in the specificity.

Vague and unhelpful. "The sentiment in the earnings transcript was mixed."

Specific and useful. "The CEO described the new product launch in detailed and confident terms (Section 2, paragraphs 4–7). However, the CFO's guidance section used unusually hedged language — 'we expect', 'subject to', 'in line with previously communicated' appear nine times in twelve paragraphs (Section 4). The Q&A section contained two analyst questions about the same supply-chain issue (Q&A pages 14 and 17), suggesting it is a topic the company has not fully resolved."

The second version is what you want. It is specific, it is sourced, it characterises the language without making a prediction, and it leaves the user — you — to decide what the patterns mean.

The bot as a triage tool

The most realistic use case for the bot you are building is triage. A serious researcher has more sources than they have time. The bot's job is to read the firehose and surface what is worth reading carefully. It is not the analyst; it is the analyst's first-pass reader.

A workflow that respects what the bot is good at and is not:

  1. You collect twenty news articles, two earnings transcripts, and the latest 10-K for a company.
  2. The bot reads them and produces a structured summary of each, with sourcing.
  3. The bot flags what is unusual: a new risk factor in the 10-K, a notable shift in management's tone across the two transcripts, a recurring topic across the news articles.
  4. You read the bot's summary. You then read the original sources for anything the bot flagged.
  5. You form your own view. The bot does not.

The bot saved you the cost of reading twenty articles to find the three that mattered. That is its real value. Everything else is on you.

Hands-on time

Exercise 4.1 · 30 minutes

Sentiment across a news set, with verification

  1. Pick the same company you used in Lesson 3, or a different one. Find five to eight news articles about that company from the last six months. Free sources include Reuters, the Financial Times's free content, the company's own press releases, and the SEC's EDGAR for any 8-K filings. Skip paywalled articles unless you have access.
  2. Open a fresh Claude.ai or ChatGPT conversation. Paste the system prompt from Lesson 2.
  3. Tell the bot what you are about to do: "I am going to paste 5–8 news articles about [Company]. I would like you to characterise the overall sentiment, identify any recurring themes, and flag anything that seems unusual or notable. Begin by reading the articles. Do not yet form a view."
  4. Paste each article in turn, telling the bot what each one is ("Source: Reuters, 12 March 2026, headline ..."). Wait for the bot to acknowledge each one.
  5. Once all articles are in, ask: "Now characterise the sentiment across all five articles. Do not make a prediction. Cite each article. Be specific."
  6. Read what the bot produces. Look for three things:
    • Did it cite each article specifically?
    • Did it distinguish between the company's own statements and journalists' framing?
    • Did it slip into prediction language — "this suggests the stock will...", "this is a positive sign for..." — anywhere?
  7. If it slipped into prediction, push back: "You included a forward-looking interpretation there. Please rewrite without any forecast or speculative language. Stick to what the articles say and what is unusual about how they say it."
  8. Pick the most interesting claim the bot made. Open the article it cited and check whether the bot accurately characterised it.

Tools required: Claude.ai or ChatGPT (free tier), five to eight real news articles you collect.

What you should have seen

The bot is genuinely useful in this exercise. You will see patterns that you would have missed by reading the articles one by one. You will also, almost certainly, see the bot try to slip in a forecast or a "this suggests" sentence. That is what you push back on.

If you verified the bot's most interesting claim, the most common outcome is that the bot was approximately right but the citation was slightly off — a paraphrase rather than the exact thing the article said. This is the kind of low-stakes failure to watch for. Approximately right is not the same as right. For research that informs decisions, "approximately right" is sometimes good enough; for any specific quoted number or date, it is not.

Self-check

  1. What does AI do well in market research, in your own words?
  2. What is the trap that turns useful pattern recognition into useless prediction?
  3. How does the efficient-market hypothesis affect what a research bot can usefully do?
  4. What is the bot's most realistic use case in a serious researcher's workflow?

Looking ahead

One lesson left. In Lesson 5 we look at the things that go most wrong with research bots in practice — hallucinated numbers, false confidence, regulatory issues, and the slow drift from "I use it for research" to "I rely on it for decisions". You will end the course with a one-paragraph use policy you write for yourself, and a clear list of which Tier 2 and Tier 3 programme courses go deeper.