OpenAI- and Anthropic-compatible LLM proxy on Bun and TypeScript. Route logical model names to multiple upstream providers with API-key rotation, cooldowns, format conversion, streaming, optional tool-call enforcement, and a built-in admin UI.
- Logical models —
config/models/<name>.jsonmaps a client-facing id to one or more provider routes and fallbacks - Multi-provider routing — OpenAI-compatible and Anthropic wire protocols (Groq, Cerebras, Gemini, OpenRouter, Nahcrof, etc.)
- API key fallback — rotate keys and cool down failed keys per provider
- Format conversion — OpenAI ↔ Anthropic at the proxy boundary
- Streaming — SSE for chat completions
- Context window metadata —
GET /v1/modelsexposescontext_window,context_length, andlimit.contextfor harness compaction - Audio — OpenAI-style
/v1/audio/transcriptionswith provider routing - Admin UI — Next.js static app at
/setup/(models, providers, env, test bench, bundle import/export)
- Bun ≥ 1.1 (runtime and tests)
- Docker optional (recommended for production)
cp .env.example .env
# Edit .env: set CLIENT_API_KEY and provider API keys
docker compose build
docker compose up -d
curl -s http://127.0.0.1:9876/health
curl -s -H "Authorization: Bearer $CLIENT_API_KEY" http://127.0.0.1:9876/v1/modelsDefault listen address: http://127.0.0.1:9876
Admin UI: http://127.0.0.1:9876/setup/
cp .env.example .env
bun install
# Terminal 1 — API server (hot reload)
bun run dev
# Terminal 2 — admin UI (optional; or rely on Docker-built web-static)
cd web && bun install && bun run devWith only the API process, open /setup/ after building the UI once:
cd web && bun install && bun run build
# Serves from web/out when MODEL_PROXY_WEB_ROOT is unset| Path | Purpose |
|---|---|
src/ |
Hono server, routing, providers, CLI entry |
shared/schemas/ |
Zod schemas for config and wire formats |
web/ |
Current Next.js admin UI (exported to web/out, copied as web-static in Docker) |
config/providers/ |
Provider endpoint + auth JSON (often gitignored locally; samples may ship in repo) |
config/models/ |
Per logical model routing JSON (gitignored locally) |
config/templates/ |
Templates for new provider/model files |
config/audio-models/ |
Audio transcription routing |
tests/ |
bun test integration tests |
There is no Python application in this tree. The v1 FastAPI codebase was replaced by this v2 TypeScript implementation.
| Variable | Description |
|---|---|
CLIENT_API_KEY |
Required. Bearer token clients must send |
HOST / PORT |
Bind address (default 127.0.0.1:9876) |
CORS_ORIGINS |
Comma-separated origins or * |
LOG_LEVEL |
debug | info | warn | error |
DEFAULT_CONTEXT_WINDOW |
Fallback context size (tokens) when upstream/config omit it |
UPSTREAM_MODELS_CACHE_TTL_SECONDS |
Cache TTL for provider /v1/models catalogs (default 3600) |
UPSTREAM_MODELS_FETCH_TIMEOUT_MS |
Max wait on first upstream catalog fetch (default 2000) |
KEY_COOLDOWN_SECONDS |
API key cooldown after failures |
ENFORCE_TOOL_CALL_* |
Global tool-call enforcement defaults |
| Provider keys | e.g. GROQ_API_KEY, CEREBRAS_API_KEY, ANTHROPIC_API_KEY |
See .env.example for the full list.
config/models/turbo.json:
{
"logical_name": "turbo",
"timeout_seconds": 20,
"default_cooldown_seconds": 10,
"context_window": 131072,
"model_routings": [
{ "provider": "cerebras", "model": "zai-glm-4.7" }
],
"fallback_model_routings": []
}Optional context_window on the model or on a route overrides discovery when upstream metadata is missing.
| Method | Path | Auth | Notes |
|---|---|---|---|
| GET | /health |
No | Liveness |
| GET | /health/detailed |
No | Models/providers counts |
| GET | /v1/models |
Bearer | OpenAI list + context metadata |
| POST | /v1/chat/completions |
Bearer | OpenAI chat |
| POST | /v1/chat/completions/stream |
Bearer | Forces stream: true |
| POST | /v1/messages |
Bearer | Anthropic messages |
| POST | /v1/audio/transcriptions |
Bearer | Audio STT |
| GET | /setup/* |
Session or Bearer | Admin UI static assets |
/v1/admin/* |
Session or Bearer | Config CRUD, logs, bundle import |
Chat responses keep the logical model id the client requested.
For each logical model (primary route model_routings[0]):
- Upstream provider
GET /v1/models(cached) provider.models.<id>.context_lengthin provider JSON- Route or model
context_windowin config DEFAULT_CONTEXT_WINDOWenv128000system default
The process entrypoint is Bun, not a separate Python package:
bun run start
# or
bun run ./src/cli/main.ts --host 0.0.0.0 --port 9876 --log-level infoDocker CMD uses the same entrypoint. Supported flags: --host, --port, --log-level. The optional start positional argument is accepted for compatibility.
bun run dev # API with --hot
bun run start # API production mode
bun test # test suite
bun run typecheck # tsc --noEmit
bun run build:web # build admin UI → web/out- Image:
model-proxy:v2(see Dockerfile) - Compose (dev): docker-compose.yml — bind-mounts
./configand./.env - Compose (prod): docker-compose.prod.yml — named volume for config
docker compose up -d --build
docker compose -f docker-compose.prod.yml up -d --buildbun test
bun run typecheckTests live under tests/. Config loaders use a temp directory in tests; production config is read from config/ search paths (cwd, ~/.model-proxy/config, package config/).
MIT (see repository license file if present).
