Model Context Protocol server that gives AI assistants the ability to transcribe audio.
pip install funasrClaude Code (~/.claude.json):
{
"mcpServers": {
"funasr": {
"command": "python",
"args": ["/path/to/examples/mcp_server/funasr_mcp.py"],
"env": {"FUNASR_DEVICE": "cuda"}
}
}
}Claude Desktop (claude_desktop_config.json):
{
"mcpServers": {
"funasr": {
"command": "python",
"args": ["/path/to/funasr_mcp.py"],
"env": {"FUNASR_DEVICE": "cpu"}
}
}
}Cursor (Settings → MCP Servers → Add):
- Command:
python /path/to/funasr_mcp.py - Environment:
FUNASR_DEVICE=cuda
Transcribe a speech audio file to text.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
audio_path |
string | Yes | Path to audio file (wav, mp3, flac, m4a, ogg) |
language |
string | No | Language hint (auto-detected by default) |
Returns: Transcribed text with timestamps and speaker labels (when available).
Once configured, ask your AI assistant:
- "Transcribe the meeting recording at ~/Downloads/meeting.wav"
- "What was said in this audio file? /path/to/interview.mp3"
- "Convert this voice memo to text: ~/voice_note.m4a"
| Variable | Default | Description |
|---|---|---|
FUNASR_DEVICE |
cpu |
Device: cuda, cpu, or mps |
FUNASR_MODEL |
iic/SenseVoiceSmall |
ASR model to use |
- 50+ languages with automatic detection
- Speaker diarization — identifies who said what
- Timestamps — per-segment timing
- 170x realtime on GPU, 17x on CPU
- No API key needed — fully local inference
- MIT licensed, privacy-friendly (audio never leaves your machine)
| Tool | Status |
|---|---|
| Claude Code | ✅ Tested |
| Claude Desktop | ✅ Compatible |
| Cursor | ✅ Compatible |
| Windsurf | ✅ Compatible |
| Any MCP client | ✅ Standard protocol |