Streaming
OpenViber streams LLM responses end-to-end using the Vercel AI SDK. The SDK handles token generation, tool calls, and UI rendering — OpenViber’s job is to relay the stream from daemon to browser through the gateway.
1. End-to-End Flow
Agent (AI SDK streamText)
→ streamResult.toUIMessageStreamResponse() ← AI SDK generates SSE bytes
→ Controller reads SSE, sends task:stream-chunk over WebSocket
→ Gateway buffers and pipes to SSE endpoint
→ Web API route pipes gateway SSE to browser
→ @ai-sdk/svelte Chat class renders UI Every hop is a byte-level passthrough of the AI SDK’s UI Message Stream format. OpenViber doesn’t parse, transform, or re-encode the stream — it just relays it.
2. How Each Layer Works
Daemon (Controller)
The controller calls runTask() which invokes AI SDK streamText(). The result is converted to an SSE response and piped chunk-by-chunk over WebSocket:
const { streamResult } = await runTask(goal, options, messages);
// AI SDK converts the stream to SSE format
const response = streamResult.toUIMessageStreamResponse();
const reader = response.body.getReader();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value, { stream: true });
// Relay raw SSE bytes to gateway
ws.send(JSON.stringify({
type: "task:stream-chunk",
taskId,
chunk,
}));
} Gateway
The gateway holds SSE connections open for web app subscribers. When task:stream-chunk messages arrive from the daemon, it writes them directly to subscribers:
GET /api/tasks/:id/stream
Headers: x-vercel-ai-ui-message-stream: v1
Content-Type: text/event-stream The gateway buffers chunks so that late-connecting subscribers can catch up. When the task completes, it closes all subscriber connections.
Web App API Route
The SvelteKit API route at /api/tasks/[id]/chat submits the task to the gateway, then pipes the gateway’s SSE stream to the browser:
// Submit task
const { taskId } = await gatewayClient.submitTask(goal, viberId, messages);
// Connect to gateway SSE stream and pipe to frontend
const streamResponse = await fetch(`${GATEWAY_URL}/api/tasks/${taskId}/stream`);
return new Response(streamResponse.body, {
headers: { "x-vercel-ai-ui-message-stream": "v1" },
}); Frontend
The @ai-sdk/svelte Chat class consumes the SSE stream automatically:
const chat = new Chat({
transport: new DefaultChatTransport({
api: `/api/tasks/${taskId}/chat`,
}),
}); The Chat class handles text deltas, tool call rendering, and state management. OpenViber’s frontend code focuses on UI — not stream parsing.
3. What the AI SDK Handles
| Concern | Handled By |
|---|---|
| Token-by-token streaming | AI SDK streamText() |
| SSE encoding | AI SDK toUIMessageStreamResponse() |
| Client-side state management | @ai-sdk/svelte Chat class |
| Tool call / result rendering | AI SDK message parts |
| Multi-step tool loops | AI SDK maxSteps / stepCountIs() |
| Backpressure | Built-in to SSE + ReadableStream |
4. What OpenViber Adds
WebSocket Relay
The AI SDK is designed for direct HTTP (browser → server → LLM). OpenViber adds a relay layer because the daemon runs on a separate machine from the web server:
Browser ←SSE→ Web App ←SSE→ Gateway ←WS→ Daemon ←HTTP→ LLM The gateway bridges WebSocket (daemon side) and SSE (browser side). This is the core infrastructure OpenViber provides on top of the AI SDK.
Chunk Buffering
The gateway buffers stream chunks per task so that:
- Late-connecting SSE subscribers catch up.
- Completed tasks can replay their full stream on request.
- Network interruptions don’t lose data.
Task Lifecycle Messages
Beyond the stream relay, the gateway tracks task state transitions (pending → running → completed | error | stopped) and provides REST endpoints for task management.
5. Future: Block Streaming for Chat Channels
Chat platforms (DingTalk, WeCom, Slack) cannot consume SSE streams. For these channels, a block chunking layer will coalesce token deltas into completed text blocks:
- Buffer tokens until a paragraph/sentence boundary.
- Respect per-channel message length limits.
- Never split inside code fences.
- Coalesce small bursts to reduce message spam.
This layer sits between the AI SDK stream and channel delivery — it doesn’t change the core streaming architecture.