blockrun-litellm

LiteLLM adapter for BlockRun — call x402-paid AI models through LiteLLM with zero changes to your existing code. Base and Solana chains supported.

📚 Full docs in docs/ — bilingual (English + 中文):

CUSTOMER-ONBOARDING / 中文 — 5-minute walkthrough, both modes
PROXY-FULL-SETUP / 中文 — full deploy with admin UI + Postgres + troubleshooting

TL;DR — BlockRun's /v1/chat/completions is already OpenAI-compatible at the protocol level. The only thing that differs is authentication: BlockRun uses per-request x402 wallet signatures (non-custodial USDC micropayments on Base / Solana), not a Bearer API key. This package bridges that gap.

中文文档见底部 / Chinese docs at the bottom

Two ways to integrate

Mode	Best for	What it looks like
1. Custom provider (in-process)	Apps using the LiteLLM Python library	`litellm.completion(model="blockrun/openai/gpt-5.5", ...)`
2. Local proxy (sidecar)	Apps using the LiteLLM Proxy Server (or any OpenAI client)	`api_base="http://localhost:4001/v1"`

Both modes share the same underlying wallet/signing flow (via the blockrun-llm SDK), so they behave identically. Pick whichever fits your deployment.

Verified end-to-end against the live BlockRun gateway

Both modes have been validated against https://blockrun.ai/api using the free nvidia/deepseek-v4-flash model:

$ python -c "
> import litellm
> from blockrun_litellm import register; register()
> r = litellm.completion(
>     model='blockrun/nvidia/deepseek-v4-flash',
>     messages=[{'role':'user','content':'Reply with exactly: pong'}],
>     max_tokens=20, temperature=0.0)
> print(r.choices[0].message.content)"
pong

$ curl -sS http://127.0.0.1:4001/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model":"nvidia/deepseek-v4-flash","messages":[{"role":"user","content":"Reply with exactly: proxy-ok"}]}'
{"id":"a710c144c68c42f7a319fb93e9b9b5a0","object":"chat.completion","model":"nvidia/deepseek-v4-flash",
 "choices":[{"index":0,"message":{"role":"assistant","content":"proxy-ok"},...}],"usage":{...}}

Install

# Base chain only — minimal
pip install blockrun-litellm

# Base chain + local OpenAI-compatible proxy (FastAPI/uvicorn)
pip install 'blockrun-litellm[proxy]'

# Base + Solana (adds the x402 SVM toolchain)
pip install 'blockrun-litellm[proxy,solana]'

Requires Python ≥ 3.9.

Chains supported

Chain	Gateway URL	Wallet env var	Status
Base (USDC)	`https://blockrun.ai/api` (default)	`BLOCKRUN_WALLET_KEY`	sync + async, streaming
Solana (USDC)	`https://sol.blockrun.ai/api`	`SOLANA_WALLET_KEY`	sync + async, streaming on both (since 0.3.1)

To route on Solana, pass api_base="https://sol.blockrun.ai/api" plus api_key=<solana-key> to litellm.completion(...) — the adapter detects the chain from the URL and uses the right SDK client.

Configure your wallet (one-time)

The blockrun-llm SDK signs each request locally with an EVM (Base chain) private key. The key never leaves your machine. Three ways to provide it:

# Option A — environment variable (recommended for servers)
export BLOCKRUN_WALLET_KEY=0xYOUR_BASE_CHAIN_PRIVATE_KEY

# Option B — auto-create + fund a new wallet (interactive, shows QR for funding)
python -c "from blockrun_llm import setup_agent_wallet; setup_agent_wallet()"

# Option C — pass per-call (Python lib mode), see examples below

💡 To validate without spending real USDC, use a free model like nvidia/deepseek-v4-flash — same code path, same wallet flow, $0 settlement.

Mode 1 — Custom provider (Python library)

The shortest path if your app already calls litellm.completion() directly.

1a. Register once at startup

import litellm
from blockrun_litellm import register

register()  # idempotent; adds "blockrun" to litellm.custom_provider_map

1b. Call with a `blockrun/` model prefix

response = litellm.completion(
    model="blockrun/openai/gpt-5.5",        # blockrun/<provider>/<model>
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    max_tokens=128,
    temperature=0.7,
)

print(response.choices[0].message.content)
print(response.usage)  # prompt_tokens / completion_tokens / total_tokens

The blockrun/ prefix is stripped before being sent to the BlockRun gateway, so openai/gpt-5.5, anthropic/claude-opus-4-5, google/gemini-3-pro, etc. all work — anything in BlockRun's catalog.

1c. Override the wallet per-call (optional)

response = litellm.completion(
    model="blockrun/openai/gpt-5.5",
    messages=[...],
    api_key="0xANOTHER_PRIVATE_KEY",          # passed to blockrun-llm as wallet
)

1d. Async

import asyncio

async def main():
    response = await litellm.acompletion(
        model="blockrun/openai/gpt-5.5",
        messages=[{"role": "user", "content": "Hi"}],
    )
    print(response.choices[0].message.content)

asyncio.run(main())

Mode 2 — Local proxy (LiteLLM Proxy Server, langchain, raw curl, …)

If you're running the LiteLLM Proxy Server (litellm --config config.yaml), or any client that just speaks OpenAI HTTP, run our proxy as a sidecar.

2a. Start the proxy

export BLOCKRUN_WALLET_KEY=0xYOUR_KEY
blockrun-litellm-proxy --port 4001
# → uvicorn running at http://127.0.0.1:4001

Flags:

Flag	Default	Purpose
`--host`	`127.0.0.1`	Bind interface. Keep loopback unless you set `BLOCKRUN_PROXY_TOKEN`.
`--port`	`4001`	Bind port
`--api-url`	`https://blockrun.ai/api`	Override BlockRun gateway endpoint
`--log-level`	`info`	`critical`/`error`/`warning`/`info`/`debug`/`trace`

Optional shared-secret guard:

export BLOCKRUN_PROXY_TOKEN=$(openssl rand -hex 32)
# clients must now send:  Authorization: Bearer $BLOCKRUN_PROXY_TOKEN

2b. Point LiteLLM Proxy at it

Drop this into your config.yaml:

model_list:
  - model_name: gpt-5.5
    litellm_params:
      model: openai/openai/gpt-5.5   # first 'openai/' = LiteLLM provider; rest = BlockRun model id
      api_base: http://localhost:4001/v1
      api_key: "dummy"                # ignored if BLOCKRUN_PROXY_TOKEN is unset

  - model_name: claude-opus-4-5
    litellm_params:
      model: openai/anthropic/claude-opus-4-5
      api_base: http://localhost:4001/v1
      api_key: "dummy"

litellm_settings:
  drop_params: True   # silently drop OpenAI params BlockRun doesn't support

Run LiteLLM Proxy as usual:

litellm --config config.yaml --port 4000

Then call it like any OpenAI endpoint:

curl http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.5",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

2c. Or skip LiteLLM entirely

The proxy speaks OpenAI HTTP, so anything that takes an api_base works:

# OpenAI Python SDK pointed straight at the BlockRun proxy
from openai import OpenAI

client = OpenAI(api_key="dummy", base_url="http://localhost:4001/v1")
resp = client.chat.completions.create(
    model="openai/gpt-5.5",
    messages=[{"role": "user", "content": "Hi"}],
)
print(resp.choices[0].message.content)

# Plain curl
curl http://localhost:4001/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "openai/gpt-5.5", "messages": [{"role":"user","content":"Hi"}]}'

2d. Endpoints exposed

Method	Path	Notes
`POST`	`/v1/chat/completions`	OpenAI Chat Completions. `stream=True` returns `text/event-stream`; otherwise JSON.
`GET`	`/v1/models`	BlockRun model catalog
`GET`	`/healthz`	Liveness probe (no upstream call)
`GET`	`/docs`	Auto-generated Swagger UI

Supported parameters

All of these are forwarded to BlockRun unchanged:

OpenAI param	Supported	Notes
`model`	✅	Any BlockRun model id, e.g. `openai/gpt-5.5`
`messages`	✅	Full role/content/tool_calls schema
`max_tokens`	✅	Defaults to 1024 if omitted
`temperature`	✅	0–2
`top_p`	✅
`tools` / `tool_choice`	✅	Function calling
`stream`	✅	OpenAI-style SSE (`text/event-stream`). Provider mode yields LiteLLM `GenericStreamingChunk` objects; proxy mode emits `data: <json>\n\n` events terminated by `data: [DONE]`. Free models stream directly; paid models stream after the in-band 402-sign-retry dance.
`frequency_penalty` / `presence_penalty` / `logprobs` / `n`	⚠️	Silently dropped — enable `litellm_settings.drop_params: True` to suppress LiteLLM warnings

BlockRun-specific extras (also accepted):

Param	Purpose
`search: True`	Enable xAI Live Search (for search-enabled models)
`search_parameters: {...}`	Full Live Search config
`fallback_models: ["..."]`	Auto-retry on transient upstream errors

Local request log (input/output tokens, latency, cost)

Opt-in JSONL logger captures every call — works on both Base and Solana, sync and async, streaming and non-streaming.

Where the log lives

Source	Path
Explicit arg to `enable_local_logging("...")`	(whatever you pass)
`BLOCKRUN_LITELLM_LOG` env var	(whatever it points to)
Otherwise	`~/.blockrun/litellm_calls.jsonl`

Each row contains

ts, iso, model, provider, messages, completion,
usage{prompt_tokens, completion_tokens, total_tokens},
latency_ms, stream, cost_usd, status, error_type, error_message, request_id

Mode 1 — one line

from blockrun_litellm import enable_local_logging
enable_local_logging()                       # default path
# or enable_local_logging("/var/log/calls.jsonl")

Mode 2 — drop a bridge file next to `config.yaml`

# custom_callbacks.py
from blockrun_litellm.logger import JSONLLogger
blockrun_logger = JSONLLogger()

litellm_settings:
  callbacks: ["custom_callbacks.blockrun_logger"]

Where everything is stored

File / env var	What	Configurable?
`BLOCKRUN_WALLET_KEY` (env)	Base private key	yes
`SOLANA_WALLET_KEY` (env)	Solana private key	yes
`~/.blockrun/.session`	Auto-created Base wallet	—
`~/.blockrun/.solana-session`	Auto-created Solana wallet	—
`~/.blockrun/litellm_calls.jsonl`	LiteLLM request log	`BLOCKRUN_LITELLM_LOG` env or `enable_local_logging(path)`
`~/.blockrun/cost_log.jsonl`	USDC cost audit for paid calls (SDK)	—
`~/.blockrun/data/*.json`	Full request/response archive for paid calls (SDK)	—
`BLOCKRUN_PROXY_TOKEN` (env)	Optional shared-secret guard on sidecar	yes

Examples

The examples/ directory has copy-paste-ready snippets:

examples/python_lib.py — full LiteLLM Python library usage
examples/litellm_config.yaml — LiteLLM Proxy Server config
examples/raw_openai_sdk.py — pointing the OpenAI SDK at the proxy
examples/custom_callbacks.py — JSONL log bridge for Proxy mode

How it works (under the hood)

┌─────────────────┐    OpenAI dict     ┌──────────────────────┐    POST /v1/chat/completions  ┌────────────────┐
│ Your app /      │ ─────────────────▶ │  blockrun-litellm    │ ────────────────────────────▶ │  blockrun.ai   │
│ LiteLLM /       │                    │  (provider OR proxy) │ ◀──── 402 + payment-required ─│  gateway       │
│ OpenAI SDK      │                    │  ↓                   │                               │                │
└─────────────────┘                    │  blockrun-llm SDK    │ ───── EIP-712 signed retry ──▶│                │
                                       │  (local signing)     │ ◀──── 200 + chat response ────│                │
                                       └──────────────────────┘                               └────────────────┘
                                                ▲
                                                │ private key (stays local, signs only)
                                       ┌──────────────────────┐
                                       │ BLOCKRUN_WALLET_KEY  │
                                       │   or ~/.blockrun/    │
                                       └──────────────────────┘

Caller sends an OpenAI Chat Completions dict.
blockrun-litellm whitelists the params and dispatches through blockrun-llm.
blockrun-llm posts to BlockRun, receives a 402 with payment requirements, signs an EIP-712 payment locally with your wallet, and retries.
BlockRun verifies the signature on-chain, settles the USDC micropayment, runs the inference, and returns the response.
blockrun-litellm returns the dumped pydantic model as a plain OpenAI dict (or litellm.ModelResponse in provider mode).

FAQ

Q: Does this support streaming? Yes, as of v0.2.0. Pass stream=True and the adapter routes through blockrun-llm's chat_completion_stream() (SDK ≥ 0.20.0). The 402 → sign-locally → retry-with-PAYMENT-SIGNATURE dance happens before the first chunk; once the upstream switches to text/event-stream, chunks are forwarded straight through (provider mode → litellm.GenericStreamingChunk, proxy mode → OpenAI-style data: <json>\n\n SSE). Caveats inherited from the gateway: search_parameters and the Responses-API models (codex, gpt-5.4-pro) reject streaming server-side with 400.

Q: Where does my private key live? On your machine only — BLOCKRUN_WALLET_KEY env var, or ~/.blockrun/.session if you used setup_agent_wallet(). The proxy and provider both read from those sources via blockrun-llm. Only EIP-712 signatures are transmitted.

Q: How do I switch between Base and Solana? Today this adapter wires to BlockRun's Base gateway (USDC on Base). Solana support tracks the blockrun-llm SolanaLLMClient and will be added in a follow-up release.

Q: Can I run the proxy in Docker / k8s? Yes — it's a vanilla FastAPI app. Pass the wallet key via secret (env var), bind to 0.0.0.0 only inside a private network, and set BLOCKRUN_PROXY_TOKEN for an additional auth layer.

Q: Is this affiliated with LiteLLM (BerriAI)? No — this is an independent adapter built by the BlockRun team. LiteLLM is a great project; we're just plugging into its custom-provider hooks.

Development

git clone https://github.com/BlockRunAI/blockrun-litellm
cd blockrun-litellm
pip install -e '.[proxy,dev]'
pytest

License

MIT. See LICENSE.

中文文档

BlockRun 的 LiteLLM 适配层 —— 用 LiteLLM 调用 BlockRun 上的 AI 模型，完全零改动。

一句话： BlockRun 的 /v1/chat/completions 协议层就是 OpenAI 兼容的，唯一区别是认证方式 —— BlockRun 用 x402 钱包签名（按次 USDC 微支付，非托管），不是 Bearer API Key。这个包就是把这层差异填平。

两种对接方式

模式	适用	写法
1. 自定义 Provider（进程内）	用 LiteLLM Python 库的应用	`litellm.completion(model="blockrun/openai/gpt-5.5", ...)`
2. 本地代理（sidecar）	用 LiteLLM Proxy Server 的、或任何 OpenAI 客户端	`api_base="http://localhost:4001/v1"`

底层都走 blockrun-llm SDK 做签名和 x402 支付，两种模式行为一致。按你的部署方式选一种就行。

快速上手

安装

# 只装自定义 provider
pip install blockrun-litellm

# 同时装本地代理（带 FastAPI/uvicorn）
pip install 'blockrun-litellm[proxy]'

配钱包（一次性）

# 方式 A — 环境变量（服务端推荐）
export BLOCKRUN_WALLET_KEY=0xYOUR_BASE_CHAIN_PRIVATE_KEY

# 方式 B — 自动创建并扫码充值（交互式）
python -c "from blockrun_llm import setup_agent_wallet; setup_agent_wallet()"

私钥只在本地用于 EIP-712 签名，永远不会离开你的机器。

💡 想零成本试一遍？用免费模型 nvidia/deepseek-v4-flash —— 代码完全一样，钱包流程一样，结算 $0。

模式 1：自定义 Provider

import litellm
from blockrun_litellm import register

register()  # 启动时调一次即可

response = litellm.completion(
    model="blockrun/openai/gpt-5.5",   # blockrun/<provider>/<model>
    messages=[{"role": "user", "content": "你好"}],
    max_tokens=128,
)
print(response.choices[0].message.content)

异步版本：await litellm.acompletion(...) 同理。

模式 2：本地代理

# 1) 启动 sidecar
export BLOCKRUN_WALLET_KEY=0xYOUR_KEY
blockrun-litellm-proxy --port 4001

# 2) LiteLLM Proxy 配置 (config.yaml)

model_list:
  - model_name: gpt-5.5
    litellm_params:
      model: openai/openai/gpt-5.5
      api_base: http://localhost:4001/v1
      api_key: "dummy"

litellm_settings:
  drop_params: True

或者直接拿任何 OpenAI 客户端用：

from openai import OpenAI
client = OpenAI(api_key="dummy", base_url="http://localhost:4001/v1")
resp = client.chat.completions.create(
    model="openai/gpt-5.5",
    messages=[{"role": "user", "content": "你好"}],
)

支持的参数

OpenAI 参数	支持	备注
`model` / `messages` / `max_tokens` / `temperature` / `top_p`	✅
`tools` / `tool_choice`	✅	函数调用
`stream`	✅	OpenAI 标准 SSE（`text/event-stream`）。Provider 模式 yield LiteLLM `GenericStreamingChunk`；Proxy 模式发 `data: <json>\n\n` 事件并以 `data: [DONE]` 结尾。免费模型直接开流；付费模型走带内 402→签名→重试再开流。
`frequency_penalty` / `presence_penalty` / `logprobs` / `n`	⚠️	静默丢弃 —— 建议 LiteLLM 配 `drop_params: True` 抑制告警

BlockRun 额外参数：

参数	作用
`search: True`	启用 xAI Live Search（搜索类模型）
`search_parameters: {...}`	完整 Live Search 配置
`fallback_models: ["..."]`	上游抖动自动重试列表

常见问题

Q：支持流式吗？ v0.2.0 起完全支持。stream=True 时适配层走 blockrun-llm 的 chat_completion_stream()（SDK ≥ 0.20.0），402 → 本地签名 → 带 PAYMENT-SIGNATURE 重试这条链在第一个 chunk 之前完成；上游切到 text/event-stream 后 chunks 直接透传（Provider 模式 → litellm.GenericStreamingChunk，Proxy 模式 → OpenAI 标准 data: <json>\n\n）。后端继承的限制：search_parameters 和 Responses-API 模型（codex、gpt-5.4-pro）在服务端就拒绝流式（400）。

Q：私钥放哪？ 只在本地 —— BLOCKRUN_WALLET_KEY 环境变量，或 setup_agent_wallet() 创建的 ~/.blockrun/.session。Provider 和 Proxy 都通过 blockrun-llm 读取。链上只看到签名，看不到私钥。

Q：Docker / k8s 部署？ 代理是普通的 FastAPI 应用。密钥用 secret 注入，对外只暴露内网，可选 BLOCKRUN_PROXY_TOKEN 加一层 Bearer 鉴权。

Q：和 BerriAI 是什么关系？ 没关系。这是 BlockRun 团队独立维护的适配层，挂在 LiteLLM 的 custom provider 接口上。

开发

git clone https://github.com/BlockRunAI/blockrun-litellm
cd blockrun-litellm
pip install -e '.[proxy,dev]'
pytest

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
blockrun_litellm		blockrun_litellm
docs		docs
examples		examples
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

blockrun-litellm

Two ways to integrate

Verified end-to-end against the live BlockRun gateway

Install

Chains supported

Configure your wallet (one-time)

Mode 1 — Custom provider (Python library)

1a. Register once at startup

1b. Call with a blockrun/ model prefix

1c. Override the wallet per-call (optional)

1d. Async

Mode 2 — Local proxy (LiteLLM Proxy Server, langchain, raw curl, …)

2a. Start the proxy

2b. Point LiteLLM Proxy at it

2c. Or skip LiteLLM entirely

2d. Endpoints exposed

Supported parameters

Local request log (input/output tokens, latency, cost)

Where the log lives

Each row contains

Mode 1 — one line

Mode 2 — drop a bridge file next to config.yaml

Where everything is stored

Examples

How it works (under the hood)

FAQ

Development

License

中文文档

两种对接方式

快速上手

安装

配钱包（一次性）

模式 1：自定义 Provider

模式 2：本地代理

支持的参数

常见问题

开发

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1b. Call with a `blockrun/` model prefix

Mode 2 — drop a bridge file next to `config.yaml`

Packages