Docs / Guides
Edit on GitHub

Local LLM Setup

Run OpenViber with local language models for maximum privacy and zero API costs.

Ollama

Ollama is the easiest way to run local models.

Installation

# macOS / Linux
curl -fsSL https://ollama.ai/install.sh | sh

# Or with Homebrew
brew install ollama

Pull a Model

ollama pull llama3.2
ollama pull codellama
ollama pull deepseek-coder-v2

Configure OpenViber

Set the environment variable:

export OLLAMA_BASE_URL="http://localhost:11434"

Then in your agent config (~/.openviber/agents/default.yaml):

provider: ollama
model: llama3.2

vLLM

vLLM provides high-performance inference with OpenAI-compatible API.

Installation

pip install vllm

Start Server

vllm serve meta-llama/Llama-3.2-70B-Instruct --port 8000

Configure OpenViber

export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="not-needed"

Agent config:

provider: openai
model: meta-llama/Llama-3.2-70B-Instruct

LM Studio

LM Studio provides a GUI for running local models with OpenAI-compatible API.

  1. Download and install LM Studio
  2. Download a model (e.g., Llama 3.2, DeepSeek Coder)
  3. Start the local server (default port: 1234)

Configure OpenViber:

export OPENAI_BASE_URL="http://localhost:1234/v1"
export OPENAI_API_KEY="lm-studio"

Model Recommendations

Use CaseRecommended ModelSize
General chatllama3.28B
Codingdeepseek-coder-v216B
Long contextqwen2.5:32b32B
Fastestphi3:mini3.8B

Tips

  • VRAM: Most models need 8-16GB VRAM for good performance
  • Quantization: Use Q4 or Q5 quantized models to reduce memory usage
  • Context: Local models often have 4K-8K context limits
  • CPU fallback: Works but significantly slower than GPU inference