LiteLLM adapter for BlockRun — call x402-paid AI models through LiteLLM with zero changes to your existing code. Base and Solana chains supported.
📚 Full docs in docs/ — bilingual (English + 中文):
CUSTOMER-ONBOARDING/中文— 5-minute walkthrough, both modesPROXY-FULL-SETUP/中文— full deploy with admin UI + Postgres + troubleshooting
TL;DR — BlockRun's
/v1/chat/completionsis already OpenAI-compatible at the protocol level. The only thing that differs is authentication: BlockRun uses per-request x402 wallet signatures (non-custodial USDC micropayments on Base / Solana), not a Bearer API key. This package bridges that gap.
中文文档见底部 / Chinese docs at the bottom
| Mode | Best for | What it looks like |
|---|---|---|
| 1. Custom provider (in-process) | Apps using the LiteLLM Python library | litellm.completion(model="blockrun/openai/gpt-5.5", ...) |
| 2. Local proxy (sidecar) | Apps using the LiteLLM Proxy Server (or any OpenAI client) | api_base="http://localhost:4001/v1" |
Both modes share the same underlying wallet/signing flow (via the blockrun-llm SDK), so they behave identically. Pick whichever fits your deployment.
Both modes have been validated against https://blockrun.ai/api using the free nvidia/deepseek-v4-flash model:
$ python -c "
> import litellm
> from blockrun_litellm import register; register()
> r = litellm.completion(
> model='blockrun/nvidia/deepseek-v4-flash',
> messages=[{'role':'user','content':'Reply with exactly: pong'}],
> max_tokens=20, temperature=0.0)
> print(r.choices[0].message.content)"
pong
$ curl -sS http://127.0.0.1:4001/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"nvidia/deepseek-v4-flash","messages":[{"role":"user","content":"Reply with exactly: proxy-ok"}]}'
{"id":"a710c144c68c42f7a319fb93e9b9b5a0","object":"chat.completion","model":"nvidia/deepseek-v4-flash",
"choices":[{"index":0,"message":{"role":"assistant","content":"proxy-ok"},...}],"usage":{...}}
# Base chain only — minimal
pip install blockrun-litellm
# Base chain + local OpenAI-compatible proxy (FastAPI/uvicorn)
pip install 'blockrun-litellm[proxy]'
# Base + Solana (adds the x402 SVM toolchain)
pip install 'blockrun-litellm[proxy,solana]'Requires Python ≥ 3.9.
| Chain | Gateway URL | Wallet env var | Status |
|---|---|---|---|
| Base (USDC) | https://blockrun.ai/api (default) |
BLOCKRUN_WALLET_KEY |
sync + async, streaming |
| Solana (USDC) | https://sol.blockrun.ai/api |
SOLANA_WALLET_KEY |
sync + async, streaming on both (since 0.3.1) |
To route on Solana, pass api_base="https://sol.blockrun.ai/api" plus api_key=<solana-key> to litellm.completion(...) — the adapter detects the chain from the URL and uses the right SDK client.
The blockrun-llm SDK signs each request locally with an EVM (Base chain) private key. The key never leaves your machine. Three ways to provide it:
# Option A — environment variable (recommended for servers)
export BLOCKRUN_WALLET_KEY=0xYOUR_BASE_CHAIN_PRIVATE_KEY
# Option B — auto-create + fund a new wallet (interactive, shows QR for funding)
python -c "from blockrun_llm import setup_agent_wallet; setup_agent_wallet()"
# Option C — pass per-call (Python lib mode), see examples below💡 To validate without spending real USDC, use a free model like
nvidia/deepseek-v4-flash— same code path, same wallet flow, $0 settlement.
The shortest path if your app already calls litellm.completion() directly.
import litellm
from blockrun_litellm import register
register() # idempotent; adds "blockrun" to litellm.custom_provider_mapresponse = litellm.completion(
model="blockrun/openai/gpt-5.5", # blockrun/<provider>/<model>
messages=[{"role": "user", "content": "What is the capital of France?"}],
max_tokens=128,
temperature=0.7,
)
print(response.choices[0].message.content)
print(response.usage) # prompt_tokens / completion_tokens / total_tokensThe blockrun/ prefix is stripped before being sent to the BlockRun gateway, so openai/gpt-5.5, anthropic/claude-opus-4-5, google/gemini-3-pro, etc. all work — anything in BlockRun's catalog.
response = litellm.completion(
model="blockrun/openai/gpt-5.5",
messages=[...],
api_key="0xANOTHER_PRIVATE_KEY", # passed to blockrun-llm as wallet
)import asyncio
async def main():
response = await litellm.acompletion(
model="blockrun/openai/gpt-5.5",
messages=[{"role": "user", "content": "Hi"}],
)
print(response.choices[0].message.content)
asyncio.run(main())If you're running the LiteLLM Proxy Server (litellm --config config.yaml), or any client that just speaks OpenAI HTTP, run our proxy as a sidecar.
export BLOCKRUN_WALLET_KEY=0xYOUR_KEY
blockrun-litellm-proxy --port 4001
# → uvicorn running at http://127.0.0.1:4001Flags:
| Flag | Default | Purpose |
|---|---|---|
--host |
127.0.0.1 |
Bind interface. Keep loopback unless you set BLOCKRUN_PROXY_TOKEN. |
--port |
4001 |
Bind port |
--api-url |
https://blockrun.ai/api |
Override BlockRun gateway endpoint |
--log-level |
info |
critical/error/warning/info/debug/trace |
Optional shared-secret guard:
export BLOCKRUN_PROXY_TOKEN=$(openssl rand -hex 32)
# clients must now send: Authorization: Bearer $BLOCKRUN_PROXY_TOKENDrop this into your config.yaml:
model_list:
- model_name: gpt-5.5
litellm_params:
model: openai/openai/gpt-5.5 # first 'openai/' = LiteLLM provider; rest = BlockRun model id
api_base: http://localhost:4001/v1
api_key: "dummy" # ignored if BLOCKRUN_PROXY_TOKEN is unset
- model_name: claude-opus-4-5
litellm_params:
model: openai/anthropic/claude-opus-4-5
api_base: http://localhost:4001/v1
api_key: "dummy"
litellm_settings:
drop_params: True # silently drop OpenAI params BlockRun doesn't supportRun LiteLLM Proxy as usual:
litellm --config config.yaml --port 4000Then call it like any OpenAI endpoint:
curl http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.5",
"messages": [{"role": "user", "content": "Hello"}]
}'The proxy speaks OpenAI HTTP, so anything that takes an api_base works:
# OpenAI Python SDK pointed straight at the BlockRun proxy
from openai import OpenAI
client = OpenAI(api_key="dummy", base_url="http://localhost:4001/v1")
resp = client.chat.completions.create(
model="openai/gpt-5.5",
messages=[{"role": "user", "content": "Hi"}],
)
print(resp.choices[0].message.content)# Plain curl
curl http://localhost:4001/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "openai/gpt-5.5", "messages": [{"role":"user","content":"Hi"}]}'| Method | Path | Notes |
|---|---|---|
POST |
/v1/chat/completions |
OpenAI Chat Completions. stream=True returns text/event-stream; otherwise JSON. |
GET |
/v1/models |
BlockRun model catalog |
GET |
/healthz |
Liveness probe (no upstream call) |
GET |
/docs |
Auto-generated Swagger UI |
All of these are forwarded to BlockRun unchanged:
| OpenAI param | Supported | Notes |
|---|---|---|
model |
✅ | Any BlockRun model id, e.g. openai/gpt-5.5 |
messages |
✅ | Full role/content/tool_calls schema |
max_tokens |
✅ | Defaults to 1024 if omitted |
temperature |
✅ | 0–2 |
top_p |
✅ | |
tools / tool_choice |
✅ | Function calling |
stream |
✅ | OpenAI-style SSE (text/event-stream). Provider mode yields LiteLLM GenericStreamingChunk objects; proxy mode emits data: <json>\n\n events terminated by data: [DONE]. Free models stream directly; paid models stream after the in-band 402-sign-retry dance. |
frequency_penalty / presence_penalty / logprobs / n |
Silently dropped — enable litellm_settings.drop_params: True to suppress LiteLLM warnings |
BlockRun-specific extras (also accepted):
| Param | Purpose |
|---|---|
search: True |
Enable xAI Live Search (for search-enabled models) |
search_parameters: {...} |
Full Live Search config |
fallback_models: ["..."] |
Auto-retry on transient upstream errors |
Opt-in JSONL logger captures every call — works on both Base and Solana, sync and async, streaming and non-streaming.
| Source | Path |
|---|---|
Explicit arg to enable_local_logging("...") |
(whatever you pass) |
BLOCKRUN_LITELLM_LOG env var |
(whatever it points to) |
| Otherwise | ~/.blockrun/litellm_calls.jsonl |
ts, iso, model, provider, messages, completion,
usage{prompt_tokens, completion_tokens, total_tokens},
latency_ms, stream, cost_usd, status, error_type, error_message, request_id
from blockrun_litellm import enable_local_logging
enable_local_logging() # default path
# or enable_local_logging("/var/log/calls.jsonl")# custom_callbacks.py
from blockrun_litellm.logger import JSONLLogger
blockrun_logger = JSONLLogger()litellm_settings:
callbacks: ["custom_callbacks.blockrun_logger"]| File / env var | What | Configurable? |
|---|---|---|
BLOCKRUN_WALLET_KEY (env) |
Base private key | yes |
SOLANA_WALLET_KEY (env) |
Solana private key | yes |
~/.blockrun/.session |
Auto-created Base wallet | — |
~/.blockrun/.solana-session |
Auto-created Solana wallet | — |
~/.blockrun/litellm_calls.jsonl |
LiteLLM request log | BLOCKRUN_LITELLM_LOG env or enable_local_logging(path) |
~/.blockrun/cost_log.jsonl |
USDC cost audit for paid calls (SDK) | — |
~/.blockrun/data/*.json |
Full request/response archive for paid calls (SDK) | — |
BLOCKRUN_PROXY_TOKEN (env) |
Optional shared-secret guard on sidecar | yes |
The examples/ directory has copy-paste-ready snippets:
examples/python_lib.py— full LiteLLM Python library usageexamples/litellm_config.yaml— LiteLLM Proxy Server configexamples/raw_openai_sdk.py— pointing the OpenAI SDK at the proxyexamples/custom_callbacks.py— JSONL log bridge for Proxy mode
┌─────────────────┐ OpenAI dict ┌──────────────────────┐ POST /v1/chat/completions ┌────────────────┐
│ Your app / │ ─────────────────▶ │ blockrun-litellm │ ────────────────────────────▶ │ blockrun.ai │
│ LiteLLM / │ │ (provider OR proxy) │ ◀──── 402 + payment-required ─│ gateway │
│ OpenAI SDK │ │ ↓ │ │ │
└─────────────────┘ │ blockrun-llm SDK │ ───── EIP-712 signed retry ──▶│ │
│ (local signing) │ ◀──── 200 + chat response ────│ │
└──────────────────────┘ └────────────────┘
▲
│ private key (stays local, signs only)
┌──────────────────────┐
│ BLOCKRUN_WALLET_KEY │
│ or ~/.blockrun/ │
└──────────────────────┘
- Caller sends an OpenAI Chat Completions dict.
blockrun-litellmwhitelists the params and dispatches throughblockrun-llm.blockrun-llmposts to BlockRun, receives a 402 with payment requirements, signs an EIP-712 payment locally with your wallet, and retries.- BlockRun verifies the signature on-chain, settles the USDC micropayment, runs the inference, and returns the response.
blockrun-litellmreturns the dumped pydantic model as a plain OpenAI dict (orlitellm.ModelResponsein provider mode).
Q: Does this support streaming?
Yes, as of v0.2.0. Pass stream=True and the adapter routes through blockrun-llm's chat_completion_stream() (SDK ≥ 0.20.0). The 402 → sign-locally → retry-with-PAYMENT-SIGNATURE dance happens before the first chunk; once the upstream switches to text/event-stream, chunks are forwarded straight through (provider mode → litellm.GenericStreamingChunk, proxy mode → OpenAI-style data: <json>\n\n SSE). Caveats inherited from the gateway: search_parameters and the Responses-API models (codex, gpt-5.4-pro) reject streaming server-side with 400.
Q: Where does my private key live?
On your machine only — BLOCKRUN_WALLET_KEY env var, or ~/.blockrun/.session if you used setup_agent_wallet(). The proxy and provider both read from those sources via blockrun-llm. Only EIP-712 signatures are transmitted.
Q: How do I switch between Base and Solana?
Today this adapter wires to BlockRun's Base gateway (USDC on Base). Solana support tracks the blockrun-llm SolanaLLMClient and will be added in a follow-up release.
Q: Can I run the proxy in Docker / k8s?
Yes — it's a vanilla FastAPI app. Pass the wallet key via secret (env var), bind to 0.0.0.0 only inside a private network, and set BLOCKRUN_PROXY_TOKEN for an additional auth layer.
Q: Is this affiliated with LiteLLM (BerriAI)? No — this is an independent adapter built by the BlockRun team. LiteLLM is a great project; we're just plugging into its custom-provider hooks.
git clone https://github.com/BlockRunAI/blockrun-litellm
cd blockrun-litellm
pip install -e '.[proxy,dev]'
pytestMIT. See LICENSE.
BlockRun 的 LiteLLM 适配层 —— 用 LiteLLM 调用 BlockRun 上的 AI 模型,完全零改动。
一句话: BlockRun 的
/v1/chat/completions协议层就是 OpenAI 兼容的,唯一区别是认证方式 —— BlockRun 用 x402 钱包签名(按次 USDC 微支付,非托管),不是 Bearer API Key。这个包就是把这层差异填平。
| 模式 | 适用 | 写法 |
|---|---|---|
| 1. 自定义 Provider(进程内) | 用 LiteLLM Python 库的应用 | litellm.completion(model="blockrun/openai/gpt-5.5", ...) |
| 2. 本地代理(sidecar) | 用 LiteLLM Proxy Server 的、或任何 OpenAI 客户端 | api_base="http://localhost:4001/v1" |
底层都走 blockrun-llm SDK 做签名和 x402 支付,两种模式行为一致。按你的部署方式选一种就行。
# 只装自定义 provider
pip install blockrun-litellm
# 同时装本地代理(带 FastAPI/uvicorn)
pip install 'blockrun-litellm[proxy]'# 方式 A — 环境变量(服务端推荐)
export BLOCKRUN_WALLET_KEY=0xYOUR_BASE_CHAIN_PRIVATE_KEY
# 方式 B — 自动创建并扫码充值(交互式)
python -c "from blockrun_llm import setup_agent_wallet; setup_agent_wallet()"私钥只在本地用于 EIP-712 签名,永远不会离开你的机器。
💡 想零成本试一遍?用免费模型
nvidia/deepseek-v4-flash—— 代码完全一样,钱包流程一样,结算 $0。
import litellm
from blockrun_litellm import register
register() # 启动时调一次即可
response = litellm.completion(
model="blockrun/openai/gpt-5.5", # blockrun/<provider>/<model>
messages=[{"role": "user", "content": "你好"}],
max_tokens=128,
)
print(response.choices[0].message.content)异步版本:await litellm.acompletion(...) 同理。
# 1) 启动 sidecar
export BLOCKRUN_WALLET_KEY=0xYOUR_KEY
blockrun-litellm-proxy --port 4001
# 2) LiteLLM Proxy 配置 (config.yaml)model_list:
- model_name: gpt-5.5
litellm_params:
model: openai/openai/gpt-5.5
api_base: http://localhost:4001/v1
api_key: "dummy"
litellm_settings:
drop_params: True或者直接拿任何 OpenAI 客户端用:
from openai import OpenAI
client = OpenAI(api_key="dummy", base_url="http://localhost:4001/v1")
resp = client.chat.completions.create(
model="openai/gpt-5.5",
messages=[{"role": "user", "content": "你好"}],
)| OpenAI 参数 | 支持 | 备注 |
|---|---|---|
model / messages / max_tokens / temperature / top_p |
✅ | |
tools / tool_choice |
✅ | 函数调用 |
stream |
✅ | OpenAI 标准 SSE(text/event-stream)。Provider 模式 yield LiteLLM GenericStreamingChunk;Proxy 模式发 data: <json>\n\n 事件并以 data: [DONE] 结尾。免费模型直接开流;付费模型走带内 402→签名→重试再开流。 |
frequency_penalty / presence_penalty / logprobs / n |
静默丢弃 —— 建议 LiteLLM 配 drop_params: True 抑制告警 |
BlockRun 额外参数:
| 参数 | 作用 |
|---|---|
search: True |
启用 xAI Live Search(搜索类模型) |
search_parameters: {...} |
完整 Live Search 配置 |
fallback_models: ["..."] |
上游抖动自动重试列表 |
Q:支持流式吗?
v0.2.0 起完全支持。stream=True 时适配层走 blockrun-llm 的 chat_completion_stream()(SDK ≥ 0.20.0),402 → 本地签名 → 带 PAYMENT-SIGNATURE 重试这条链在第一个 chunk 之前完成;上游切到 text/event-stream 后 chunks 直接透传(Provider 模式 → litellm.GenericStreamingChunk,Proxy 模式 → OpenAI 标准 data: <json>\n\n)。后端继承的限制:search_parameters 和 Responses-API 模型(codex、gpt-5.4-pro)在服务端就拒绝流式(400)。
Q:私钥放哪?
只在本地 —— BLOCKRUN_WALLET_KEY 环境变量,或 setup_agent_wallet() 创建的 ~/.blockrun/.session。Provider 和 Proxy 都通过 blockrun-llm 读取。链上只看到签名,看不到私钥。
Q:Docker / k8s 部署?
代理是普通的 FastAPI 应用。密钥用 secret 注入,对外只暴露内网,可选 BLOCKRUN_PROXY_TOKEN 加一层 Bearer 鉴权。
Q:和 BerriAI 是什么关系? 没关系。这是 BlockRun 团队独立维护的适配层,挂在 LiteLLM 的 custom provider 接口上。
git clone https://github.com/BlockRunAI/blockrun-litellm
cd blockrun-litellm
pip install -e '.[proxy,dev]'
pytestMIT