Structured Output From Local LLMs: Getting Reliable JSON with Ollama and Zod

If you've worked with local language models, you've hit this wall: you ask for JSON, the model returns almost JSON, and your parser chokes on a missing bracket or an extra comma. A 1.5B parameter model running on your laptop is impressive—until it closes an object with ] instead of }.

The promise of local LLMs is privacy, cost savings, and no API rate limits. The reality has been brittle output that requires retry logic, regex cleanup, or just crossing your fingers. But Ollama's latest versions (0.3.0+) ship with structured output support that changes the game entirely.

Why Local LLMs Struggle With JSON

Language models don't "understand" JSON—they predict the next token based on probability. When you prompt a model with "Return a JSON object with name and age fields," it's doing its best to mimic the pattern it saw during training. Most of the time, that works. Sometimes you get:

{
  "name": "Alice",
  "age": 30
  // forgot the closing brace

Or the model adds helpful commentary:

{
  "name": "Bob",
  "age": 25
}

Here's the JSON you requested!

Smaller models (under 7B parameters) are especially prone to this. You can add stricter prompts ("ONLY return valid JSON, nothing else"), but you're still relying on probabilistic text generation to follow syntactic rules.

How Ollama's Structured Output Works

Ollama 0.3.0 introduced constrained decoding via the format parameter. Instead of hoping the model produces valid JSON, you provide a JSON schema, and Ollama guarantees the output will conform to it. The model can only generate tokens that keep the output schema-valid.

Here's the minimal example:

import ollama from 'ollama';

const response = await ollama.chat({
  model: 'llama3.2',
  messages: [{ role: 'user', content: 'Generate a user profile for Alice, age 30' }],
  format: {
    type: 'object',
    properties: {
      name: { type: 'string' },
      age: { type: 'number' }
    },
    required: ['name', 'age']
  }
});

console.log(response.message.content);
// {"name":"Alice","age":30}

No prompt engineering. No retry logic. The model literally cannot return anything that doesn't match the schema.

Combining Ollama with Zod for End-to-End Type Safety

Structured output solves the syntax problem, but what about TypeScript types? You want the schema to live in your code, not duplicated between JSON Schema and your type definitions. That's where Zod comes in.

Zod is a TypeScript-first schema validation library. You define schemas that double as runtime validators and static types. The zod-to-json-schema package converts Zod schemas to JSON Schema format that Ollama accepts.

Here's the full pipeline:

import ollama from 'ollama';
import { z } from 'zod';
import zodToJsonSchema from 'zod-to-json-schema';

// Define your schema once
const UserSchema = z.object({
  name: z.string(),
  age: z.number().int().positive(),
  email: z.string().email(),
  interests: z.array(z.string()).max(5)
});

// Infer the TypeScript type
type User = z.infer<typeof UserSchema>;

// Convert to JSON Schema for Ollama
const jsonSchema = zodToJsonSchema(UserSchema, 'UserSchema');

async function generateUser(prompt: string): Promise<User> {
  const response = await ollama.chat({
    model: 'llama3.2',
    messages: [{ role: 'user', content: prompt }],
    format: jsonSchema
  });

  const parsed = JSON.parse(response.message.content);
  
  // Validate with Zod (catches any edge cases)
  return UserSchema.parse(parsed);
}

const user = await generateUser('Create a profile for a software engineer named Sam');
console.log(user.email); // TypeScript knows this exists and is a string

You get:

  • One source of truth: The Zod schema defines both the validation rules and TypeScript types
  • Guaranteed structure: Ollama won't return invalid JSON
  • Runtime safety: Zod validates the parsed output, catching any unexpected values
  • Type safety: TypeScript knows exactly what fields exist and their types

Real-World Example: Extracting Structured Data from Text

Here's a practical use case—parsing unstructured meeting notes into actionable items:

const ActionItemSchema = z.object({
  task: z.string(),
  assignee: z.string().nullable(),
  dueDate: z.string().nullable(),
  priority: z.enum(['low', 'medium', 'high'])
});

const MeetingNotesSchema = z.object({
  summary: z.string().max(200),
  actionItems: z.array(ActionItemSchema),
  nextMeetingDate: z.string().nullable()
});

const notes = `
Discussed Q3 roadmap. Alice will draft the API spec by Friday.
Bob needs to review the security audit—high priority.
Next meeting: June 20th.
`;

const structured = await ollama.chat({
  model: 'llama3.2',
  messages: [{
    role: 'user',
    content: `Extract structured data from these meeting notes:\n${notes}`
  }],
  format: zodToJsonSchema(MeetingNotesSchema)
});

const parsed = MeetingNotesSchema.parse(JSON.parse(structured.message.content));
// parsed.actionItems[0].priority is guaranteed to be 'low' | 'medium' | 'high'

The model extracts entities, infers priorities, and formats everything—all while constrained to your exact schema.

The Takeaway

Local LLMs are finally practical for structured data extraction and generation. Ollama's constrained decoding eliminates the "almost valid JSON" problem, and Zod bridges the gap between JSON Schema and TypeScript types.

The combination means you can:

  • Run inference locally (no API costs, no data leaving your machine)
  • Get guaranteed-valid output (no retry loops or parsing hacks)
  • Maintain type safety end-to-end (schema → validation → types)

Works with any Ollama-compatible model (Llama 3.2, Mistral, Phi, etc.). The smaller the model, the more constrained decoding helps—a 1.5B model with a schema often outperforms a 7B model with prompt engineering alone.

If you've been avoiding local LLMs because of output reliability, it's time for a second look. The tooling has caught up to the promise.