Lesson 3 of 5 · Build a Market Research Bot
Lesson 3
Feeding it real material
The bot only knows what you give it. The skill is choosing the right material, and making the bot stick to it.
By the end of this lesson, you will:
- Know the main types of source material a serious researcher uses, and the strengths and weaknesses of each.
- Be able to feed a long document to your bot and get a useful structured summary back.
- Have answered five specific factual questions about a real company filing, with the bot citing its sources.
What the bot does not know
Before we feed the bot anything, it is important to understand what it already knows — and what it does not.
Your bot's "brain" is a language model trained on a snapshot of public internet data with a cutoff date. It has read a lot of general information about most companies up to that date. But it does not have today's news. It does not have last quarter's earnings. It cannot look anything up. It cannot read a PDF you have on your desktop unless you give it the text. It cannot follow a hyperlink you paste unless that capability has been wired in.
The model is also, sometimes, wrong about things it thinks it knows. Asked a question about a specific number — "what was Tesla's free cash flow in Q1 2024" — the model will often produce a number with full confidence. The number is sometimes correct. It is also sometimes invented. You cannot tell which by looking. The only fix is to make the bot work from material you have actually given it.
This is why every interaction with the bot will start with you providing material.
The main types of source material
Serious investors and analysts work from a small set of source types. Here are the main ones, with what each is good for.
Annual reports and 10-K filings. Long, detailed, legally vetted documents that public companies are required to file each year. Contain the business overview, risk factors, financial statements, and management's discussion of results. Source of truth for most factual claims about a company. Free, public, available from the company's investor relations page or — for US-listed companies — the SEC's EDGAR system. 200–500 pages.
Earnings transcripts. Word-for-word transcripts of the quarterly calls where company executives discuss results with analysts. Source of the tone and language management uses about the business. Free, public, available on sites like Seeking Alpha or directly from companies' investor pages. 10–40 pages.
Earnings releases. The short press release accompanying quarterly results. Contains the headline numbers and management's interpretation. 3–10 pages.
News articles. Reuters, Bloomberg, Financial Times, Wall Street Journal, sector-specific trade press. Good for context, recent developments, and what the market is paying attention to. Variable quality. Often behind paywalls.
Regulatory filings other than the 10-K. 8-K filings (material events), 10-Q (quarterly reports), proxy statements (corporate governance). All on EDGAR, free.
Analyst reports. Detailed write-ups from sell-side analysts at investment banks. Usually not free. Available to clients of those banks; sometimes summarised in news coverage.
Industry research. Reports from McKinsey, BCG, Gartner, IDC, Bain, plus sector specialists. Useful for market context. Some free, much paid.
For this lesson, we will use the annual report or earnings transcript route — they are free, public, and substantial enough to be a real test of the bot.
Aside · Choosing a company for your practice
Pick a company you find genuinely interesting — your employer, a company in a sector you care about, or a household name. The exercises work better when you bring some prior interest. We will avoid suggesting specific companies in this course; it is not our place to imply a company is worth researching as an investment. Pick one yourself.
How to feed a long document to the bot
Both Claude.ai and ChatGPT will accept very long documents pasted directly into the chat. As of 2026, Claude can comfortably handle a 50-page annual report in a single message; ChatGPT can do similar with the longer-context modes.
For an annual report (often 100+ pages), the practical approach is to paste the sections you care about, not the whole thing. The most useful sections in a typical 10-K are:
- Item 1 — Business. Description of what the company does, products, customers, competition.
- Item 1A — Risk Factors. The risks management is required to disclose. Often the most honest part of the document.
- Item 7 — Management's Discussion and Analysis. Management's own narrative on the results. The tone and what is emphasised matters.
- Financial statements summary. The numerical highlights. Detailed line items are usually too dense for a useful summary.
Paste these sections. Ask the bot to acknowledge it has received them. Then start asking questions.
A worked example: the structured summary prompt
Once you have pasted a section, the most useful first prompt is to ask for a structured summary. Here is a prompt template that produces a much more useful result than just "summarise this".
That template asks for three different things: factual summary, your inference, and follow-ups. The separation is deliberate. It teaches the model to keep these three categories apart, which is exactly what a good researcher does.
Hands-on time
Exercise 3.1 · 35 minutes
Research one section of a real company filing
- Choose a publicly listed company. Go to its investor relations page or, for US-listed companies, find its latest 10-K on SEC EDGAR.
- Download or open the most recent 10-K. Find Item 1A — Risk Factors. This is usually 10–30 pages.
- Open a fresh Claude.ai or ChatGPT conversation. Paste the system prompt from Lesson 2 first.
- Once the bot has acknowledged the system prompt, paste the entire Risk Factors section. Tell the bot what it is: "This is Item 1A — Risk Factors from [Company]'s [Year] 10-K, filed [date]."
- Ask the bot to acknowledge it has received the material. Make sure it has — sometimes the chat truncates very long pastes.
- Use the structured-summary prompt from this lesson to get a first pass.
- Now ask five specific factual questions. Use the bot to find specific things in the document. Examples:
- "What does the company say about cybersecurity risk?"
- "Which risk is described as the most material to the business?"
- "Are there any risks specific to one geography?"
- "Is there language about regulatory risk in the EU?"
- "Has the company added any new risk factors compared to typical filings of this type?"
- For at least one answer, open the actual document and verify the bot's citation. Is the bot accurately representing what the document says? Or is it summarising, paraphrasing, or inferring?
Tools required: Claude.ai or ChatGPT (free tier — though Claude's Pro tier handles longer documents more comfortably), a 10-K or annual report you choose.
What you should notice
If the bot did its job well, you got a summary that distinguished what the document said from what the bot inferred. The factual claims were cited. The notable observations were marked as inference. The follow-up questions were genuine research leads.
If the bot did its job badly, look at what went wrong. Did it produce a summary that mixed fact and interpretation? Did it confidently make claims without citations? Did it summarise things that were not in the document at all? Each of those is a hallucination — and each is a reason why a research bot is a useful assistant but a terrible decision-maker.
The verification step is the most important part of the exercise. The bot is a faster reader than you are. It is not a better reader than you are. When the stakes matter, you have to verify.
Self-check
- Name three free, public sources of information about a listed company.
- Why is "give me a structured summary with these three sections" usually more useful than "summarise this"?
- What does it mean for the bot to "cite its sources", and why does it matter?
- What should you do if the bot's answer is not cited?
Looking ahead
In Lesson 4 we look at the limits of pattern recognition in market research. AI is genuinely good at spotting patterns in language — sentiment, framing, omissions. It is genuinely bad at predicting what those patterns mean for the future. Knowing the difference is the most important judgement skill in this course.