Live Caption and Translation using Azure AI

This project is a simple implementation of live captioning and translation using Azure Cognitive Services. The project uses the Speech SDK to capture audio from the microphone, send it to the Azure Speech Service for live transcription and translation, and then display the transcribed and translated text in the browser.

Parts of the code are modified from Sample Repository for the Microsoft Cognitive Services Speech SDK.

Demo Video:

Azure.Live.Speech.Translation.Demo.mov

Prerequisites

Azure Speech Service subscription key and region
Python 3.12 or later
uv

Features

Installation

Clone the repository

For those who already know how to clone the repository, you can skip to the next step.

Install uv

Use uv to setup the required version of Python.

For MacOS / Linux users

curl -LsSf https://astral.sh/uv/install.sh | sh

For Windows users

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Setup Python and install dependencies
```
uv sync
```

Configuration

Set the Azure Speech Service subscription key and region in the .env file:

AZURE_SPEECH_KEY =<your-subscription-key>
AZURE_SPEECH_REGION =<your-region>

Set the list of candidate languages and target languages for translation in the main.py file:

Please note that the target language should be one of the supported languages by the Azure Speech Service. You can find the list of supported languages here.

Note: If you want to transcribe speech only, set the target language to [].
```
config = {
...
    "detect_languages": ["en-US", "zh-TW", "ja-JP"],
    "target_languages": ["zh-Hant", "en"],
...
}
```

Usage

OBS Integration

Run the following command to start the application:

uv run --env-file=.env python main.py

Then open the browser and go to http://127.0.0.1:3000/ to see the live caption and translation. You can also open browser in brodcast application like OBS to show the live caption and translation in your live stream.

You can select languages by setting the query as ?language=original,en in the URL. First language will be shown at bottom and second language will be shown at top.

Notice that the application will pick up the default microphone of your system.

Screenshot:

Mobile Mode

You can also use the mobile mode by access http://127.0.0.1:3000/mobile. You can select the speaker's original language or the target language by clicking the dropdown on the top left corner.

Screenshot:

TV Mode

You can also use the TV mode by access http://127.0.0.1:3000/tv. In this mode, the text will be displayed in a larger font size and the background will be black. All selected languages will be displayed in the same block.

You can select languages by setting the query as ?language=original,en in the URL.

Screenshot:

Adavanced Usage

You can run the application in client-server architecture. In this case, the server will be running on the local machine and the client will be hosted on the external server. The server will send the live caption and translation to the client using external Socket.IO.

Architecture

graph LR
        B[Live Translation]
        C[External Socket.IO/Web Server]
        B --> |emit| C

    subgraph Clients
        D[TV]
        E[Mobile]
        F[OBS]

    end

    C --> |broadcast| D
    C --> |broadcast| E
    C --> |broadcast| F

Usage

Set the remote socket.io server endpoint and path in the main.py file and configure the room id that will receive the caption and translation from the translation service:
```
config = {
...
    "socketio": {"endpoint": "http://127.0.0.1:3000", "path": "/socket.io"},
    "roomid": "9d2b8c9b-6ae9-45e9-81be-8f3d4d549fdd",
...
}
```
Build the client with the server URL and host it on the external server. Files will be generated in the build folder:
```
uv run --env-file=.env python main.py --build
```

Start the translation service without the server:

uv run --env-file=.env python main.py --disbale-server

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.vscode		.vscode
static		static
templates		templates
.env.template		.env.template
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
azure_translation.py		azure_translation.py
helper.py		helper.py
main.py		main.py
pyproject.toml		pyproject.toml
user_config_helper.py		user_config_helper.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Live Caption and Translation using Azure AI

Prerequisites

Features

Installation

Configuration

Usage

OBS Integration

Mobile Mode

TV Mode

Adavanced Usage

Architecture

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Languages

andylee830914/live_translation

Folders and files

Latest commit

History

Repository files navigation

Live Caption and Translation using Azure AI

Prerequisites

Features

Installation

Configuration

Usage

OBS Integration

Mobile Mode

TV Mode

Adavanced Usage

Architecture

Usage

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages