The LLM Gateway is Nova’s model routing layer. It exposes a unified API that translates requests to any configured provider — Anthropic, OpenAI, Ollama, Groq, Gemini, Cerebras, OpenRouter, GitHub Models, and subscription-based providers (Claude Max, ChatGPT Plus).
Property Value Port 8001 Framework FastAPI + LiteLLM State store Redis (db 1) Source llm-gateway/
Model routing — resolve model IDs to provider instances and forward requests
OpenAI compatibility — expose /v1/chat/completions and /v1/models so any OpenAI-compatible tool works out of the box
Subscription auth — use Claude Max/Pro and ChatGPT Plus/Pro subscriptions as zero-cost providers
Rate limiting — per-provider daily quotas enforced via Redis sliding window
Response caching — cache deterministic (temperature=0) completions to avoid duplicate API calls
Ollama sync — auto-discover locally pulled Ollama models and register them at startup
The routing strategy is configurable at runtime via the platform config:
Strategy Behavior local-onlyOnly use Ollama. Fail if offline. local-firstTry Ollama first, fall back to cloud. (default) cloud-onlySkip Ollama entirely, use cloud providers. cloud-firstTry cloud first, use Ollama as backup.
Provider Setup Model prefix Claude Max/Pro Run claude setup-token or auto-read from ~/.claude/.credentials.json claude-max/ChatGPT Plus/Pro Run codex login or auto-read from ~/.codex/auth.json chatgpt/
Provider Daily limit Env var Ollama Unlimited (local) — Groq 14,400 req/day GROQ_API_KEYGemini 250 req/day GEMINI_API_KEYCerebras 1M tokens/day CEREBRAS_API_KEYOpenRouter 50+ req/day OPENROUTER_API_KEYGitHub Models 50-150 req/day GITHUB_TOKEN
Provider Env var Anthropic ANTHROPIC_API_KEYOpenAI OPENAI_API_KEY
Method Path Description POST /completeNon-streaming LLM completion POST /streamSSE streaming completion POST /embedGenerate text embeddings
Method Path Description POST /v1/chat/completionsChat completions (streaming and non-streaming) GET /v1/modelsList all registered model IDs
Method Path Description GET /v1/models/discoverDiscover available models from all providers GET /v1/models/ollama/*Ollama model management
Method Path Description GET /health/liveLiveness probe GET /health/readyReadiness probe
Variable Description Default ANTHROPIC_API_KEYAnthropic API key — OPENAI_API_KEYOpenAI API key — OLLAMA_BASE_URLOllama API URL http://ollama:11434GROQ_API_KEYGroq API key — GEMINI_API_KEYGemini API key — CEREBRAS_API_KEYCerebras API key — OPENROUTER_API_KEYOpenRouter API key — GITHUB_TOKENGitHub PAT for GitHub Models — REDIS_URLRedis connection string redis://redis:6379/1LOG_LEVELLogging level INFOCORS_ALLOWED_ORIGINSComma-separated allowed origins *
curl http://localhost:8001/v1/models | jq ' .data[].id '
# OpenAI-compatible completion
curl http://localhost:8001/v1/chat/completions \
-H " Content-Type: application/json " \
"model": "claude-max/claude-sonnet-4-6",
"messages": [{"role": "user", "content": "Hello from Nova"}]
# Nova internal completion
curl http://localhost:8001/complete \
-H " Content-Type: application/json " \
"model": "claude-max/claude-sonnet-4-6",
"messages": [{"role": "user", "content": "Hello"}]
LiteLLM abstraction — all provider calls go through LiteLLM for unified request/response translation
Provider auto-detection — providers are registered at startup based on available credentials (env vars, credential files, keychain)
Rate limiting — per-provider daily quotas tracked in Redis; returns HTTP 429 when exhausted
Response cache — temperature=0 requests are cached to avoid redundant API calls; cache is keyed on the full request body (excluding metadata)
Translation layer — openai_compat.py converts between OpenAI wire format and Nova’s internal CompleteRequest/CompleteResponse types