Model providers

Odysseus model provider guide

Odysseus is most useful when the model backend matches the workflow. Ollama is great for starting locally, vLLM is built for serving throughput, llama.cpp is practical on modest hardware, and OpenRouter is useful when you want hosted models without running inference yourself.

Ollama: best first local provider

Use Ollama when you want the quickest path to a working local model. It is friendly for individual users, Mac users, and homelab experiments.

ollama pull qwen2.5:7b
ollama list

Native Odysseus can usually talk to http://localhost:11434/v1. Docker Odysseus often needs http://host.docker.internal:11434/v1.

vLLM: better for GPU serving

Use vLLM when you have a capable NVIDIA GPU and need higher throughput or concurrent requests. It is more server-like than Ollama, so expect more setup but better scaling.

Best for strong GPUs and shared model endpoints.
Good fit for agent workloads that make many calls.
Requires more careful CUDA, driver, and model memory planning.

llama.cpp: practical for CPU and quantized models

Use llama.cpp when you want lightweight local inference, GGUF quantized models, or CPU-friendly testing. It is a strong choice for older hardware and small servers.

OpenRouter: hosted fallback

Use OpenRouter when you do not want to run model inference locally, or when you need quick access to hosted frontier and open models. Keep API keys private and monitor cost.

Quick decision table

ProviderBest forTradeoff

OllamaFast local setupLess tuned for high throughput

vLLMGPU serving and concurrencyMore setup complexity

llama.cppCPU, GGUF, modest hardwareLower speed on heavy models

OpenRouterHosted model accessAPI cost and external dependency