Overview
Viber provides first-class support for streaming LLM responses, enabling real-time UI updates and progressive content display.
Channel block streaming (chat surfaces)
Many chat apps do not support token-delta updates, so streaming there is effectively block streaming (coarse chunks). This is a key reason the OpenViber Board web UI remains the preferred surface for high-fidelity streaming UX.
- Block streaming: send completed text blocks as the model generates them (coarse chunks), rather than token-by-token deltas.
- Channel caps: respect per-channel message limits (text length, max lines).
- Chunking rules: avoid splitting inside code fences; prefer paragraph/newline/sentence boundaries before hard cuts.
- Coalescing: allow a small idle window to merge tiny chunks to reduce spam.
This keeps output responsive on chat surfaces that cannot display token deltas.
Streaming mode selection (auto-detect)
Streaming should auto-detect per channel:
- Web UI: prefer token-delta streaming for the best visual effect.
- Chat apps: fall back to block streaming when token deltas aren’t supported.
The transport layer should expose a capability flag so the agent runtime can pick the appropriate mode without manual configuration.
Basic Streaming
const agent = new Agent({
name: 'Assistant',
model: 'openai:gpt-4o',
});
const result = await agent.streamText({
messages: [{ role: 'user', content: 'Tell me a story' }],
});
for await (const chunk of result.textStream) {
process.stdout.write(chunk);
} Stream Events
The streaming result provides multiple event streams:
const result = await agent.streamText({ messages });
// Text chunks
for await (const text of result.textStream) {
console.log('Text:', text);
}
// Or access the full stream with metadata
for await (const event of result.fullStream) {
switch (event.type) {
case 'text-delta':
console.log('Text:', event.textDelta);
break;
case 'tool-call':
console.log('Tool called:', event.toolName);
break;
case 'tool-result':
console.log('Tool result:', event.result);
break;
}
} React Integration
function Chat() {
const { messages, input, handleInputChange, handleSubmit, isLoading } = useChat({
model: 'openai:gpt-4o',
});
return (
<div>
{messages.map((m) => (
<div key={m.id}>
<strong>{m.role}:</strong> {m.content}
</div>
))}
<form onSubmit={handleSubmit}>
<input
value={input}
onChange={handleInputChange}
disabled={isLoading}
/>
</form>
</div>
);
} Svelte Integration
<script lang="ts">
import { createChatStore } from 'viber/svelte';
const chat = createChatStore({
model: 'openai:gpt-4o',
});
</script>
{#each $chat.messages as message}
<div>
<strong>{message.role}:</strong>
{message.content}
</div>
{/each}
<form on:submit|preventDefault={() => chat.submit()}>
<input bind:value={$chat.input} disabled={$chat.isLoading} />
</form> Tip
Both React and Svelte integrations handle streaming automatically, updating the UI as chunks arrive.
Server-Side Streaming
For HTTP endpoints, use the streaming response helpers:
export async function POST(request: Request) {
const { messages } = await request.json();
const result = await agent.streamText({ messages });
return streamToResponse(result);
} Chunking guidelines (for chat channels)
When block streaming is enabled, use a chunker with low/high bounds:
- Low bound: don’t emit until a minimum character count is reached.
- High bound: split before max size; if forced, split at max size.
- Boundary preference: paragraph → newline → sentence → whitespace → hard break.
- Code fences: never split inside a fence; if forced, close + reopen the fence.
Channel-level overrides should allow per-channel chunk sizes and chunking modes.