Read the third entry in our new series: Making AI Agents Work in High Stakes Environments
TL;DR: In high-stakes analytics, giving an LLM direct access to raw tables is a trust and security risk. Reliable agentic analytics needs semantic context, staged validation, strict access controls, and ongoing maintenance.
“Agentic Analytics” promises a new way to interact with data: ask in plain English, get validated answers. But the fastest way to turn that promise into a liability is to give an LLM direct access to your raw database tables and hope it magically figures out your business logic.
In low-stakes settings, “close enough” may be tolerable. In high-stakes environments like finance, payments, or regulated enterprise workflows, it is a massive liability. Trustworthy agentic analytics requires a governed semantic layer, multi-step validation, strict authorization, and continuous maintenance.
Most failure modes in agentic analytics fall into one of two categories:
A trustworthy agentic analytics setup has to solve both.
The first trap is deceptively simple: wrong answers often look polished enough to pass.
DataBrain’s vendor-reported evaluation of 50,000+ production natural-language-to-SQL queries found accuracy around 55% without semantic context. This means almost half of all answers are wrong. The same write-up says most production failures are not syntax errors, but schema, join, and business logic mistakes.
You can see this in a few common patterns:
🔴 Missing implicit business filters: Ask for “active customers by region,” and the result may include deleted accounts, trials, paused contracts, or churned users because the agent does not magically know your internal definition of “active.”
🔴 Wrong aggregation logic: Ask for “average order value by customer,” and the result may average product prices instead of order totals. The SQL runs. The answer is still wrong.
🔴 Wrong table or join path: Ask for “daily revenue,” and the result may choose the table that sounds intuitive rather than the one finance actually treats as the source of truth.
This is why high-stakes analytics must be treated as a modeling and governance problem.
Correctness is only half the story.
The moment analytics becomes conversational, it also becomes an access-control problem. The interface may feel friendly, but the underlying risks are still enterprise security risks.
Open-source standards like the Model Context Protocol (MCP) strongly recommend strict authorization for AI data access, and for good reason. Consider the breach of McKinsey’s internal AI platform, Lilli, on February 28, 2026.
An autonomous AI agent exploited unauthenticated endpoints to gain read/write access to a database serving over 40,000 consultants. But this wasn't just data theft. Because the attack relied on SQL injection, it highlights a terrifying reality for analytics teams. Agentic analytics tools are literally designed to generate and execute SQL. If you allow an LLM to blindly write queries against raw tables without a governed intermediary layer, you aren't just risking hallucinated metrics — you are opening a massive SQL injection vector into your enterprise.
In other words, trustworthy analytics must answer more than “Is this query valid?” It must also answer:
To prevent both wrong answers and data breaches, you can adopt a governed reference architecture. The gold standard pattern is a multi-agent workflow supported by a semantic layer, rigid identity and access control, and continuous evals and maintenance.
The semantic layer is the translation engine that connects messy physical data to the language of your business.
It defines which metrics are sanctioned, which tables are authoritative, and which synonyms map to the same concept. DataBrain reported that query accuracy jumps from ~55% to 90%+ with semantic context, confirming that the gap isn't an LLM capability problem, it's a missing context problem.
Key principles:
If a CFO asks, “show net revenue retention by segment,” the system should not improvise what counts as contraction, expansion, churn, reactivation, or the valid reporting grain. It should access the semantic layer to find those definitions.
Snowflake’s engineering write-up describes a workflow that classifies the question, extracts features, enriches context with verified queries and relevant literals, and then uses multiple SQL-generation agents before selecting a final answer. That architecture is worth paying attention to because it reflects the right design instinct: generation should be staged, checked, and compared before execution.
Authorization cannot stop at the chat interface. At the conversational entry point, the MCP server (the standardized open-source bridge connecting enterprise data tools to conversational agents) must authenticate the calling application, correctly propagate user roles, and restrict data access.
At the database level, mechanisms such as row-level security must ensure that even if a generated query is valid, it returns only the subset of data the user is entitled to view.
Make the answer auditable. A trustworthy answer should not be a black box. Users should be able to inspect the:
Trust is much easier to build when users can see how the answer was produced.
The full reference architecture inspired by Snowflake's Cortex Analyst implementation:
Architecture alone does not preserve accuracy. Foundation models update. Business logic drifts. Warehouse schemas evolve. The semantic layer that was correct in January may be subtly wrong by April - and "subtly wrong" is worse than "obviously broken," because nobody catches it until the damage is done. A 5-point drop in accuracy on renewal forecasting can mean millions in misallocated retention spend.
Reliable agentic analytics requires three ongoing disciplines:
You do not need to boil the ocean. You need a practical roadmap.