Fix Laggy AI Chatbot Responses with Server-Sent Events (SSE)
If you've built an AI chatbot recently, you've probably hit this UX nightmare: your user asks a question, the spinner spins for 3–8 seconds, then the entire response dumps onto the screen at once. It feels broken, even when it's working perfectly.
The issue isn't your AI model—it's your transport layer. Most developers default to request-response polling: send a prompt, wait for the complete answer, render it. But modern AI APIs (OpenAI, Anthropic, Cohere) stream tokens as they're generated. If you're not streaming those tokens to your frontend, you're making users wait unnecessarily.
Enter Server-Sent Events (SSE)—a browser-native technology that lets servers push updates to clients over a single HTTP connection. Unlike polling (wasteful) or WebSockets (overkill for one-way data), SSE is purpose-built for real-time server-to-client updates. Here's how to use it to make your AI chatbot feel instant.
What Are Server-Sent Events?
SSE is a standard API that opens a persistent HTTP connection from your browser to your server. The server can push text data whenever it wants, and the browser fires an event each time a message arrives. It's been supported in all major browsers since 2011 (except IE, which nobody misses).
A basic SSE connection looks like this:
const eventSource = new EventSource('/api/chat/stream');
eventSource.onmessage = (event) => {
console.log('Received:', event.data);
};
eventSource.onerror = () => {
console.error('Connection lost');
eventSource.close();
};
On the server side, you respond with Content-Type: text/event-stream and write newline-delimited messages prefixed with data::
data: Hello
data: World
Each data: block triggers an onmessage event in the browser. That's the entire protocol. Simple, reliable, and built into every modern stack.
Why SSE Beats Polling and WebSockets for AI Streaming
Polling is the naive approach: send a request every 500ms asking "got the answer yet?" This creates server load, burns mobile batteries, and still feels laggy because you're sampling at discrete intervals.
WebSockets are bidirectional and low-latency, but they're complex. You need a persistent connection handler, a separate protocol upgrade, and careful state management. For AI chatbots, the server does 95% of the talking—you don't need bidirectional communication.
SSE is the Goldilocks solution:
- One-way streaming: Perfect for AI responses where the server generates tokens and the client displays them.
- Auto-reconnect: Built-in reconnection with Last-Event-ID headers.
- HTTP-native: Works through proxies, CDNs, and corporate firewalls that block WebSocket upgrades.
- Simpler code: No protocol negotiation, no keep-alive pings, no binary framing.
If your use case is "server generates data, client displays it," SSE is the right tool.
Implementation: Streaming OpenAI Responses with SSE
Let's build a real example. We'll create a Node.js backend that streams OpenAI completions via SSE, and a vanilla JS frontend that renders tokens as they arrive.
Backend (Node.js + Express)
First, install dependencies:
npm install express openai
Then set up the streaming endpoint:
const express = require('express');
const OpenAI = require('openai');
const app = express();
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
app.use(express.json());
app.post('/api/chat/stream', async (req, res) => {
const { prompt } = req.body;
// Set SSE headers
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
res.setHeader('Connection', 'keep-alive');
try {
const stream = await openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: prompt }],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) {
// Send each token as an SSE message
res.write(`data: ${JSON.stringify({ token: content })}\n\n`);
}
}
res.write('data: [DONE]\n\n');
res.end();
} catch (error) {
res.write(`data: ${JSON.stringify({ error: error.message })}\n\n`);
res.end();
}
});
app.listen(3000, () => console.log('Server running on port 3000'));
Frontend (Vanilla JavaScript)
Now connect from the client:
const chatForm = document.getElementById('chat-form');
const promptInput = document.getElementById('prompt');
const responseDiv = document.getElementById('response');
chatForm.addEventListener('submit', async (e) => {
e.preventDefault();
const prompt = promptInput.value;
responseDiv.textContent = '';
// POST the prompt, then open SSE connection
const response = await fetch('/api/chat/stream', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ prompt }),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(line => line.startsWith('data:'));
for (const line of lines) {
const data = line.replace(/^data: /, '');
if (data === '[DONE]') return;
try {
const { token, error } = JSON.parse(data);
if (error) {
console.error(error);
return;
}
responseDiv.textContent += token;
} catch (e) {
// Ignore malformed JSON
}
}
}
});
What Just Happened?
- User submits a prompt via a form.
- Frontend POSTs the prompt and reads the response body as a stream.
- Backend calls OpenAI with
stream: true, which returns an async iterator. - Each token from OpenAI is immediately written to the response as
data: {"token":"..."}\n\n. - Frontend decodes the stream, parses each SSE message, and appends tokens to the DOM in real-time.
The result: tokens appear character-by-character as the model generates them. No lag, no spinner fatigue.
Handling Edge Cases
Reconnection: The browser automatically reconnects if the connection drops. Send an id: field in your SSE messages and check event.lastEventId on reconnect to resume.
CORS: If your frontend and backend are on different origins, add CORS headers:
res.setHeader('Access-Control-Allow-Origin', '*');
res.setHeader('Access-Control-Allow-Methods', 'POST');
Large tokens: Some AI models return multi-character tokens. Buffer partial UTF-8 sequences with TextDecoder to avoid rendering mojibake.
Nginx/Proxy buffering: If you're behind Nginx, disable buffering for SSE endpoints:
location /api/chat/stream {
proxy_pass http://localhost:3000;
proxy_buffering off;
}
The Takeaway
Server-Sent Events are underrated. They solve the real-time streaming problem for 90% of use cases without the complexity of WebSockets. For AI chatbots specifically, SSE turns a frustrating 5-second wait into a smooth, ChatGPT-like experience.
If you're building anything that involves server-generated data flowing to the client—AI responses, live logs, progress updates, notifications—reach for SSE before polling or WebSockets. Your users will feel the difference.
The full code examples above work with OpenAI, but the pattern applies to any streaming AI API (Anthropic Claude, Google Gemini, Cohere). Swap the provider, keep the transport.
Now go make your chatbot feel instant.