Three Things AI Will Confidently Get Wrong About Your Investments
The model sounds right even when it isn't. That's the whole problem.
The large language models powering ChatGPT, Claude, Grok, and the rest are not calculators. They’re predictive text engines. Given a sequence of words, they predict the next one, and then the next one after that. This architecture is extraordinary for some kinds of work and structurally bad for others, and the gap between those two categories is where most individual investors using these tools are going to lose money over the next few years.
I’ve used AI heavily for investment research for close to a year now, across all the major models. I wrote about the workflow that’s emerged in AI for Investment Research, and most of it still holds. There are three specific failures I’ve run into enough times to name.
It’s not a calculator
A calculator does math. An LLM generates sequences that look like the output of a calculation. Sometimes the sequence is correct because the training data contained the right answer to a similar problem. Sometimes it’s correct because the model walked through the reasoning in a way that happened to produce the right number. And sometimes the reasoning sounds right and a digit moves anyway.
I’ve caught this on dilution math, on weighted averages, on CAGR calculations, on discounted cash flow models. In every case the model set up the problem correctly. The variables were defined, the formula was right, the steps were ordered the way I would have ordered them. The error was somewhere in the middle of the arithmetic, and nothing about the surrounding output flagged it.
This is what makes it dangerous. A junior analyst who got the math wrong would also probably get the setup wrong, or hedge their answer, or flag uncertainty. The model does none of that. It produces the confident, polished output of someone who has checked their work, except it hasn’t.
So I use AI to structure the analysis and lay out which variables I need, and then I do the math in a spreadsheet. Any number that’s going to inform a decision gets recomputed outside the model.
It will use information that isn’t true
The second failure mode is harder to catch because it doesn’t have the clean tell of bad arithmetic. The model will confidently use information that is outdated, partially correct, or pulled from a context where it doesn’t apply.
Ask one of these tools about a company’s most recent quarter and there’s a real chance it gives you the quarter from two years ago, or mixes up two segments, or cites a revenue number that was correct on a different reporting basis. The training data has a cutoff, the web search results are noisy, and the model doesn’t distinguish well between “this fact was true in 2023” and “this fact is true now.” It treats them with the same confidence.
What’s changed how I use these tools is that I almost never let them work from their training data anymore for anything where the answer depends on what’s true right now. If I want analysis of a quarter, I upload the 10-Q. If I want help on an earnings call, I paste the transcript. If I want to understand a competitor matrix, I feed it the filings. The model is excellent at synthesizing across documents you’ve given it. It’s much worse at retrieving the right documents on its own, and worst of all at pretending it has retrieved them when it’s actually filling in gaps from training data.
This is the inversion of how most people use these tools. Most people ask a question and let the model figure out where to get the answer. For investment research that’s the wrong direction. You have to do the retrieval. The model does the synthesis.
It doesn’t know what’s material
A good analyst reading a 10-K knows that one paragraph in the MD&A about a specific customer concentration matters more than four pages of risk factors that every company in the sector includes verbatim. She knows the auditor’s report is boilerplate ninety-nine percent of the time and that the one time it isn’t, that’s the most important page in the document. She knows the segment disclosures often tell you more than the headline numbers.
The model doesn’t have any of that. It treats every sentence in a filing as roughly equally weighted, because that’s how text-based models read documents. When you ask it to summarize a 10-K, it gives you a competent overview that hits the major sections and misses the specific lines that would actually move your thesis. The summary is correct. It just isn’t useful, because materiality isn’t a property of the text. It’s a property of what an experienced reader knows to look for.
This is the place where the gap between AI and a trained analyst is largest, and it’s also the place that’s hardest to see from the outside. The output looks like analysis. It has the texture of analysis. What it doesn’t have is judgment about which facts in the document deserve attention, and that judgment is most of what a good analyst is being paid for.
The way I use AI now reflects this. I treat it as a powerful search tool for finding information across documents, not as an analyst for telling me what the information means. If I ask it whether a company has mentioned a specific topic in the last four quarters, the answer is fast and reliable. If I ask it what’s most important in the most recent quarter, the answer is generic and shaped like every other answer. The first question is a retrieval problem and the model is great at retrieval. The second is a judgment problem and the model has no judgment to apply.
The shared failure mode
By the time you notice the model was wrong, you’ve already used it to make a decision. That’s true for the arithmetic, the outdated facts, and the missed materiality, and it’s the reason all three matter more than they would in a domain with slower feedback loops.
The time savings is real. The week I used to spend reading filings and building competitor matrices is now a couple of hours. The synthesis across documents I’ve uploaded is excellent. The drafting of bull and bear cases as a starting point is useful. The arithmetic, the currency of the information, and the judgment about what matters are still on me.
Anyone selling you AI as a complete research tool is either not using it on real positions or hoping you don’t notice when the model is wrong.
What to Read Next
📖 Co-Intelligence by Ethan Mollick. Mollick’s “jagged frontier” framing, that AI is shockingly capable in some domains and shockingly weak in others. If you’re going to use these tools seriously, start here.
📖 Superforecasting by Philip Tetlock. Tetlock’s research on what separates accurate forecasters from confident ones is directly relevant to working with AI, because AI confidence is uncorrelated with AI accuracy, and you need a framework for that that doesn’t depend on how persuasive the output sounds.
📖 The Signal and the Noise by Nate Silver. Silver’s chapters on how experts in different fields handle uncertainty are useful background for understanding why “the model sounds confident” is the worst possible reason to trust a number.
🎧 All three are excellent on Audible. The free trial gives you one credit to start.
As an Amazon Associate, I earn from qualifying purchases.


