AI Doesn't Hallucinate. Your Architecture Does.

We've spent two years treating hallucination like a bug to be patched. Better prompts, more examples, stricter schemas—all aimed at making LLMs "less wrong." But what if we've been solving the wrong problem?

Hallucination isn't a defect in language models. It's how they work. The real issue isn't that LLMs produce non-deterministic outputs—it's that we keep putting them in architectural positions that demand determinism.

The Non-Determinism Budget

Every system has a non-determinism budget. Traditional software minimizes it: databases are ACID, APIs return consistent responses, compilers produce identical binaries from identical source. We've spent decades building architecture patterns that push randomness to the edges—user input, network failures, race conditions.

LLMs flip this model. They're maximally non-deterministic by design. Sampling from probability distributions is the mechanism, not a side effect. When you ask an LLM to "write a SQL query," you're not calling a deterministic function—you're rolling weighted dice across a 175-billion-parameter space.

The architectural error isn't using LLMs. It's using them as drop-in replacements for deterministic components.

Consider a typical "AI agent" stack in 2026:

# Agent receives user request
request = parse_user_input(message)  # LLM call #1

# Agent decides which tool to use  
tool = select_tool(request)  # LLM call #2

# Agent generates tool parameters
params = generate_params(tool, request)  # LLM call #3

# Agent formats the response
response = format_output(tool_result)  # LLM call #4

Four LLM calls, each non-deterministic, each compounding error probability. You've allocated your entire non-determinism budget to infrastructure decisions instead of the actual creative work.

Why "SKILLS.md Is Enough" Is Backwards

The minimalist approach to agent architecture says: "Just give the LLM a markdown file describing what it can do, and let it figure out the rest." It's appealing. It's simple. It's also architecturally unsound.

SKILLS.md pushes parsing and dispatch logic—deterministic operations—into the LLM's non-deterministic space. You're spending tokens and probability mass on problems that regex solved in 1987.

Here's what happens:

  • User says: "Deploy the staging branch"
  • LLM reads SKILLS.md, sees 47 available skills
  • LLM hallucinates that "deploy-staging" is close enough to "deploy-to-production"
  • Production goes down

The file said "deploy-staging" right there in markdown. The LLM still got it wrong—not because it's bad at reading, but because you asked a probability distribution to do exact string matching.

The correct architecture inverts this:

# Deterministic parsing (0% error rate)
command = parse_command(user_input)  # "deploy", args: {"env": "staging"}

# Deterministic dispatch (0% error rate)  
handler = COMMAND_REGISTRY[command]  # O(1) lookup

# LLM only for irreducible ambiguity
if handler.needs_clarification(args):
    clarified = llm.resolve_ambiguity(args)  # LLM call—but scoped
    
result = handler.execute(clarified or args)

You've moved non-determinism to the only place it's actually needed: resolving genuine human ambiguity. Everything else is compile-time safe.

The Architectural Fix: Constraint Layers

The solution isn't "better prompts" or "more training data." It's designing constraint layers that allocate non-determinism intentionally.

Layer 1: Type systems

Don't ask an LLM to generate JSON and hope it's valid. Ask it to generate JSON, then validate with a schema. Better yet, use grammar-constrained sampling (like JSON mode in GPT-4, or llama.cpp's grammar support) so invalid output is mathematically impossible.

Layer 2: Capability registries

Don't describe capabilities in prose. Register them as typed functions with explicit signatures:

type Tool = {
  name: string;
  parameters: JSONSchema;
  execute: (params: unknown) => Promise<Result>;
};

const TOOLS: Map<string, Tool> = new Map([...]);

The LLM outputs a tool name (a string). Your code validates it exists, parses parameters against the schema, and dispatches. The LLM cannot hallucinate a tool that isn't in the registry.

Layer 3: Verification loops

For high-stakes operations, make the LLM generate a plan, then show the user the deterministic execution trace before running it:

GPT-4 generated plan:
  1. Read file: src/database.ts
  2. Execute SQL: DROP TABLE users
  3. Commit with message: "Clean up old table"
  
Proceed? [y/N]

The non-deterministic part (understanding intent) is separated from the deterministic part (executing commands). The user reviews the deterministic part.

The Real Lesson

Hallucination isn't the enemy. Misallocated non-determinism is.

Every time you ask an LLM to do something a regex, parser, or type system could do, you're voluntarily downgrading from 100% reliability to 95%. Do that ten times in a pipeline and you're at 60% reliability.

The best AI systems in production today—GitHub Copilot, Cursor, v0.dev—all follow this pattern. They use LLMs for the irreducibly creative parts (generating code, suggesting designs) and deterministic systems for everything else (syntax validation, type checking, file I/O).

Your architecture shouldn't fight hallucination. It should contain it—allocate it to exactly the places where non-determinism is the feature, not the bug.

Because the LLM isn't broken. Your stack is.