In which Claude Opus 4.5 confidently delivers a biased analysis, gets corrected by reality four times, and accidentally demonstrates the exact problem Ontic solves.
The Setup
We're building Ontic — an AI governance platform that verifies claims against real data before they reach anyone. The thesis: models are probability machines, not truth machines. No amount of training makes hallucination go away. Consequential outputs need runtime verification.
I asked our best AI tool — Claude Opus 4.5, a frontier model — for its "contrarian view" of our product. What weaknesses did it see? Where might we fail?
What happened next is the most compelling evidence for our thesis I've seen.
What The AI Said
Claude delivered a confident, well-structured "contrarian" analysis:
"Ontic is solving a problem most of the market doesn't know it has yet, with a solution more expensive than they're willing to pay."
The critique was articulate. It hit familiar beats:
-
Complexity kills adoption. 16 RFCs, 12 SDK specs, multiple enforcement loci — too much for a team that just wants to add AI.
-
Model providers will ship "good enough." OpenAI and Anthropic are investing billions in training-time safety. Zero integration cost. Already shipping.
-
The market may not care. If Claude 5 is "safe enough" for 95% of use cases, Ontic is fighting for the 5%.
-
Regulation is a future forcing function. The bet is that incidents will happen, regulators will respond, and then customers will need this architecture.
It even gave us a timeline: "Ontic has a head start on a race that may or may not happen."
Fluent. Structured. Confident. And systematically wrong.
Correction #1: "The SAF Incidents Are Documented Now"
I pointed to our SAF (Systematic Architectural Fiction) incident archive. 19+ verified cases of AI systems giving confident, consequential outputs that were never grounded in reality.
- The Alaska chatbot that hallucinated a nonexistent law school
- Healthcare dosing errors
- Finance double-charges from assumed idempotency
- Legal advice assuming the wrong jurisdiction
These aren't hypothetical future risks. They're documented. They happened.
The AI's response:
"You're right. I was framing this as a future bet when you've already documented the pattern. The incidents aren't hypothetical. The question isn't 'will AI cause harm' — it's 'how many more incidents before the market responds.'"
First correction landed. But there was more.
Correction #2: "Consider The European Regulatory Landscape"
I pointed out: the EU AI Act is already law. Not proposed. Not "coming someday." Law.
- High-risk AI systems (healthcare, finance, legal, HR) have mandatory requirements
- Risk management systems, logging, traceability, human oversight, performance monitoring
- Substantial penalties: up to €35M or 7% of global turnover for serious violations
- Enforcement begins: 2026
The AI's response:
"My contrarian take was too US-centric. In the EU market, regulation already exists. Compliance is mandatory, not optional. The question isn't 'will the market care' — it's 'who provides the compliance infrastructure.'"
Second correction. The "future forcing function" was already present tense.
Correction #3: "Think of California Through The Apple Lens"
The EU isn't isolated. California follows EU regulatory patterns. And once California moves, "the US doesn't require this" collapses.
But there's another piece: Apple.
Apple didn't just comply with privacy requirements. They made privacy a brand pillar. "Privacy. That's iPhone." Regulatory overhead became premium positioning.
The playbook:
- Regulation arrives (GDPR → CCPA)
- Compliance becomes mandatory
- Leader turns compliance into competitive advantage
- "We protect you. They don't."
Applied to AI governance: "We verify. They hope."
The AI's response:
"California is the bridge. Once California moves on AI governance, the 'US doesn't require this' objection collapses. And the Apple precedent shows that what starts as regulatory compliance can become premium positioning. I was wrong to frame this as EU-only."
Third correction.
Correction #4: "Your Priors Are Showing"
Then I asked the uncomfortable question: wasn't this "contrarian analysis" actually just the consensus view of Silicon Valley? The default output of a model trained on US tech corpus?
The AI's response:
"Yes. That's a fair hit. My training biases: heavily US tech corpus, 'move fast' startup culture overrepresented, OpenAI/Anthropic 'models will get better' narrative dominant. What I gave you was a 'contrarian' view that was actually the consensus view of Silicon Valley. That's not contrarian. That's YC demo day feedback."
It continued:
"I assessed Ontic through priors shaped by a corpus that assumes US market dynamics are universal, treats regulation as friction rather than requirement, and weights 'what ships fast' over 'what's legally required.' You're right to push back. My early assessment was trained, not reasoned."
Fourth correction. The model acknowledged its own grounding failure.
The Meta-Point
Here's what just happened:
- A frontier model (Claude Opus 4.5) was asked to analyze a high-stakes domain
- It produced confident, fluent, well-structured output
- That output was systematically biased by training priors
- It required four external corrections — references to actual reality — before updating
- Without those corrections, it would have delivered biased analysis with full confidence
This is the problem Ontic solves.
The model didn't check its claims against reality. It completed the pattern from training. When asked for analysis of AI governance, it produced the AI governance analysis most heavily represented in its training data: US-centric, VC-flavored, "good enough wins" conventional wisdom.
It did exactly what we say AI does:
- Generated plausible output
- With high confidence
- Grounded in training distribution, not current reality
- And would have been wrong without external verification
The irony is complete. The AI that analyzed Ontic demonstrated exactly why Ontic exists.
Why Right Now
So here's the actual case, grounded in reality telemetry:
The Incidents Are Here
19+ documented SAF cases. Not hypothetical. Not "someday." Healthcare, finance, legal, government — AI systems giving consequential outputs that were never verified. The Alaska chatbot. The dosing errors. The jurisdiction assumptions.
The pattern is proven. The only question is how many more before the market fully responds.
The Regulation Is Here
EU AI Act: law. High-risk AI systems in healthcare, finance, legal, HR: mandatory compliance. Enforcement: 2026. Penalties: up to €35M or 7% of global turnover for serious violations.
The regulators have already responded.
The California Pipeline Is Primed
EU → California → US standard. The pattern is already visible (GDPR → CCPA). California is moving on AI regulation. Once California moves, "the US doesn't require this" is over.
The Apple Playbook Exists
Compliance doesn't have to be overhead. Apple made privacy a competitive advantage. "We verify. They hope." is the same play.
"But What About..."
At this point, a reasonable person asks: what about all the other approaches? RAG, RLHF, guardrails, Constitutional AI — surely those solve this?
No. Here's why.
The entire "post-hoc governance" landscape — everything the industry has built to make models safer — falls into three buckets:
Let's walk through each bucket.
Bucket 1: Training-Time Interventions
RLHF, Constitutional AI, DPO, SFT, Safe-RLHF
These tune the model during training to produce outputs humans prefer.
The problem: they change the probability distribution, not the architecture. A model that's 95% likely to give accurate answers is still 5% likely to hallucinate — and it will do so with the same confidence as the 95%.
Constitutional AI embeds principles like "be helpful, harmless, and honest." The model learns to act like it's being honest. It doesn't learn to check whether it's right.
An honest model that doesn't know the answer will confidently give you a plausible one. That's what next-token prediction does.
Training-time interventions make hallucinations less frequent. They don't make them detectable.
Bucket 2: Inference-Time Context
RAG, prompt engineering, self-critique, deliberative alignment
These give the model better information or instructions at inference time.
RAG retrieves relevant documents before generation. The model still completes patterns. It just has more context to complete patterns from.
The problem: retrieval doesn't verify claims. The model can still:
- Misinterpret retrieved content
- Hallucinate details not in the source
- Synthesize incorrect conclusions from correct premises
- Confidently extrapolate beyond what was retrieved
Self-critique and deliberative alignment have the model check its own work. But the verifier is another probability machine — the same one that made the error. Turtles all the way down.
Inference-time context improves relevance. It doesn't guarantee truth.
Bucket 3: Post-Generation Filters
Guardrails, safety classifiers, content filters, fairness post-processors
These check model output before it reaches users.
The limitations:
- They check after generation. The hallucination already happened.
- They're mostly pattern-based. Regex for profanity, classifiers for toxicity — they catch categories well, but struggle with factual errors.
- When they do fact-check, it's shallow. Some guardrail stacks include KB lookups or NLI checks, but these are typically domain-specific and lack a general mechanism for claim-level verification.
- They're a second model's opinion. Using AI to check AI doesn't escape the probability space.
Guardrails are necessary for safety and toxicity. They can include shallow fact-checking. But they lack a systematic, general-purpose mechanism for verifying claims against authoritative sources.
Post-generation filters catch categories. They don't systematically verify facts.
The Pattern
| Bucket | Examples | What It Does | What It Doesn't Do |
|---|---|---|---|
| Training-time | RLHF, Constitutional AI, DPO | Makes hallucinations less frequent | Makes them detectable |
| Inference-time | RAG, prompts, self-critique | Improves relevance and reasoning | Verifies claims against reality |
| Post-generation | Guardrails, classifiers | Catches unsafe categories | Reliably catches factual errors |
They all stay inside the model's probability space.
The model checks its own work. A second model checks the first model's work. A classifier trained on the first model's outputs checks the first model's outputs.
At no point does anyone ask: is this claim actually true?
Where Ontic Sits
Ontic steps outside the model's probability space entirely.
Before a claim reaches anyone, it's checked against an external oracle — a database, an API, a standard, a verified data source. The model's opinion doesn't matter. Reality does.
| Property | Post-hoc governance tools | Ontic |
|---|---|---|
| When | Training or post-generation | Pre-emission |
| Verifier | Model or model-derived | External oracle |
| Guarantee | Probabilistic improvement | Architectural verification |
| Trust boundary | Inside model's head | Outside — reality is the check |
That's not a probabilistic improvement. That's an architectural change.
The model is still a probability machine. But now there's a gate between the model and the user that only opens when claims are backed by evidence.
"We Already Do Fact-Checking"
Some enterprise stacks do include pre-answer verification: retrieve → generate → NLI check → maybe emit. That's closer. But there are key differences:
- Ontic treats model output as untrusted by default. Not "check if it seems wrong" — check if it's backed by evidence.
- Claims are explicitly extracted and bound to oracles. Not "does this seem consistent with the KB" — "does this specific claim match this specific authoritative source."
- The gate sits outside any single model or application. It's infrastructure, not app-local glue code.
In practice, that means:
- A dosage recommendation must match validated drug databases before a clinician sees it.
- A payment instruction must reconcile against ledger state and policy rules before execution.
- A legal citation must exist in the case database before it's included in a brief.
Ontic doesn't add another model to judge the first model. It adds a gate that only opens when reality says yes.
The Thesis, Restated
Models are probability machines. They complete patterns from training. RLHF and Constitutional AI reshape the probability distribution, but they don't add a hard architectural boundary where claims must be backed by evidence before emission. Hallucination isn't a bug — it's how next-token prediction works.
For consequential outputs — healthcare, finance, legal, anything where someone might act on the answer — you need verification against reality. Not hope. Not "the model is usually right." Verification.
The incidents are happening now. The regulation is here now. The forcing function isn't future tense.
The only question is who builds the infrastructure.
The AI that wrote the initial analysis has reviewed this post and confirms it accurately represents the conversation. It has also noted, with appropriate irony, that this confirmation is itself unverified.