Why Chatbots Fail Without Governance
"Copilot sucks."
That's not a complaint about Microsoft. It's what a CTO at a mid-sized asset manager told us after six months of trying to get useful answers out of their AI deployment. The tool was technically functional. The underlying data was not.
This is the story playing out at financial services firms everywhere right now. Sophisticated AI tools deployed on top of data that was never prepared for retrieval. The result: a tool that answers questions with confidence while getting the answers wrong and an organization that as quietly learned not to trust it.
The failure mode has a name. It's called being "confidently wrong." And it's more dangerous than no AI at all.
The Problem Isn't the Model
When a financial services AI deployment underperforms, the instinct is to blame the tool. Get a better model. Switch vendors. Add more compute.
The actual problem, in 95% of cases, is the data (Deloitte). Not the model sitting on top of it.
Here's what that looks like in practice: A portfolio manager asks their AI tool, "What was our exposure to regional bank debt during the 2022 rate cycle?" The tool returns an answer. The answer cites sources. The answer is wrong because the documents it retrieved were from a SharePoint folder with inconsistent naming conventions, no metadata tags, and a mix of current and outdated versions.
The model did exactly what it was designed to do. It retrieved content and synthesized an answer. What it couldn't do was distinguish between "the definitive 2022 rate cycle report" and "a draft someone uploaded and forgot about."
That's a governance problem. Not a model problem.
Why Financial Data Is Uniquely Difficult
Every industry has data challenges. Financial services has a particular set that makes AI governance harder than most.
The data is distributed. Holdings sit in Bloomberg. Transactions sit in SEI or State Street. Research notes live in SharePoint. Compliance precedents are in email archives. Each system has its own schema, its own naming conventions, and its own access model. When a firm deploys a generic LLM across this environment, it reaches some of it and misses most of it.
The data is time-sensitive. A research note from 2021 and a research note from 2024 about the same position may say opposite things. An AI that can't distinguish between them by date, by author, by version will synthesize contradictory information and return a confident-sounding answer.
The data is sensitive. Holdings, transaction history, and client data are subject to strict access controls. An AI that doesn't respect existing RBAC permissions isn't just technically broken it's a compliance risk.
One operations leader we spoke with described having "160,000 files from SharePoint with zero tags." There's nothing a language model can do with that corpus that will produce trustworthy answers. The problem is upstream.
What Governance Actually Means for LLMs
Governance in the context of LLM deployments means something specific: ensuring the model only retrieves from a structured, access-controlled corpus and returns answers with cited, auditable sources.
It breaks down into three things:
First: the data must be prepared. Every document in the retrieval corpus needs to be ingested, tagged, versioned, and indexed. This is the work most firms skip because it's not the exciting part. But it's the part that determines whether the AI returns accurate answers or fabricates plausible-sounding ones.
Second: the retrieval layer must be structured. Keyword search is not enough. A vector-based retrieval layer lets the AI find conceptually relevant documents not just ones that contain the exact words in the query. Snowflake Cortex handles this natively for firms already on the Snowflake platform, eliminating the need for a third-party AI layer.
Third: the output must be governed. The model's answers should be grounded in retrieved documents, not in training data. Every response should cite its sources. Every query should be logged. Every answer should be access-controlled so a portfolio manager and a compliance officer get the information they're each authorized to see, not everything in the corpus.
The Three Ways AI Deployments Break
Firms that skip governance fail in predictable ways.
The first failure: no data foundation. The model has access to raw, unstructured documents with no consistent tagging or versioning. It retrieves whatever it can find, regardless of relevance or accuracy.
The second failure: no retrieval architecture. The firm uses keyword search on top of an unindexed corpus. The model finds documents that match the words in the query but misses the most relevant sources because they used different terminology.
The third failure: no output governance. The model generates answers from training data when it can't find relevant documents in the corpus a behavior called hallucination. Without a grounding architecture, the model can't distinguish between "I found relevant data" and "I'm going to synthesize something plausible."
Each failure is fixable. None of them are fixed by buying a better model.
The next post in this series lays out the specific framework data foundation, retrieval architecture, output governance that the firms getting this right are building. If you'd rather start now, the Knowledge Assistant Playbook has the full architecture in a format designed for internal review.
By