Model providers
Odysseus model provider guide
Odysseus is most useful when the model backend matches the workflow. Ollama is great for starting locally, vLLM is built for serving throughput, llama.cpp is practical on modest hardware, and OpenRouter is useful when you want hosted models without running inference yourself.
Ollama: best first local provider
Use Ollama when you want the quickest path to a working local model. It is friendly for individual users, Mac users, and homelab experiments.
ollama pull qwen2.5:7b
ollama list
Native Odysseus can usually talk to http://localhost:11434/v1. Docker Odysseus often needs http://host.docker.internal:11434/v1.
vLLM: better for GPU serving
Use vLLM when you have a capable NVIDIA GPU and need higher throughput or concurrent requests. It is more server-like than Ollama, so expect more setup but better scaling.
- Best for strong GPUs and shared model endpoints.
- Good fit for agent workloads that make many calls.
- Requires more careful CUDA, driver, and model memory planning.
llama.cpp: practical for CPU and quantized models
Use llama.cpp when you want lightweight local inference, GGUF quantized models, or CPU-friendly testing. It is a strong choice for older hardware and small servers.
OpenRouter: hosted fallback
Use OpenRouter when you do not want to run model inference locally, or when you need quick access to hosted frontier and open models. Keep API keys private and monitor cost.