Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions Web/AmbientScribe/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
CORTI_TENANT_NAME=your_tenant_name_here
CORTI_CLIENT_ID=your_client_id_here
CORTI_CLIENT_SECRET=your_client_secret_here
PORT=3000
6 changes: 6 additions & 0 deletions Web/AmbientScribe/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
node_modules/
dist/
.env
.env.local
*.log
.DS_Store
330 changes: 189 additions & 141 deletions Web/AmbientScribe/README.md
Original file line number Diff line number Diff line change
@@ -1,189 +1,237 @@
# Corti AI Platform – Live Transcription & Fact-Based Documentation
# Corti AI Platform – Live Transcription & Fact-Based Documentation

This README provides a guide on using the **Corti AI Platform** WebSocket API for **live audio transcription** and **fact-based documentation**. It includes two approaches:
1. **Single audio stream** – Capturing audio from a single microphone.
2. **Dual-channel merged streams** – Combining a **local microphone** and a **WebRTC stream** for doctor-patient scenarios.
A single demo app using the [`@corti/sdk`](https://www.npmjs.com/package/@corti/sdk) for **live audio transcription**, **fact extraction**, and **clinical document generation**. Toggle between two modes from the UI:

- **Single Microphone** – one audio source with automatic speaker diarization.
- **Virtual Consultation** – local microphone (doctor) + remote audio (patient) merged into a multi-channel stream. The remote audio can come from either a **WebRTC peer connection** or **screen/tab capture** (`getDisplayMedia`).

After a consultation ends, generate a structured clinical document from the extracted facts with a single click.

The demo is split into **server** (auth, interaction management, document generation) and **client** (audio capture, streaming, event display, document creation).

---

## **1. Overview of Configurations**
## Quick Start

### **Single Stream (Diarization Mode)**
This setup uses **one audio source** and **speaker diarization** to distinguish multiple speakers in the same channel automatically.
**Prerequisites:** Node.js 18+

```ts
const DEFAULT_CONFIG: Config = {
type: "config",
configuration: {
transcription: {
primaryLanguage: "en",
isDiarization: true, // AI automatically differentiates speakers
isMultichannel: false,
participants: [
{
channel: 0,
role: "multiple",
},
],
},
mode: { type: "facts", outputLocale: "en" },
},
};
**Setup (3 steps):**

```bash
cp .env.example .env
# Edit .env with your Corti credentials (CORTI_TENANT_NAME, CORTI_CLIENT_ID, CORTI_CLIENT_SECRET)

npm install
npm run dev
```

### **Dual-Channel (Explicit Roles: Doctor & Patient)**
This setup **merges two separate audio streams** (e.g., a local microphone and a WebRTC stream). Instead of diarization, each stream is assigned a **fixed role** (Doctor or Patient).
Open http://localhost:3000 in your browser. Transcript and fact events appear in the browser console.

```ts
const DEFAULT_CONFIG: Config = {
type: "config",
configuration: {
transcription: {
primaryLanguage: "en",
isDiarization: false, // No automatic speaker detection
isMultichannel: false,
participants: [
{ channel: 0, role: "doctor" },
{ channel: 0, role: "patient" },
],
},
mode: { type: "facts", outputLocale: "en" },
},
};
---

## Installation (Manual)

If setting up without npm:

```bash
npm i @corti/sdk express
npm i -D typescript ts-node @types/express @types/node
```

---

## **2. Capturing Audio Streams**
## File Structure

### **Single Microphone Access**
Retrieves and returns a **MediaStream** from the user's microphone.
```ts
const microphoneStream = await getMicrophoneStream();
```
AmbientScribe/
server.ts # Server-side: OAuth2 auth, interaction creation, scoped token, document generation
client.ts # Client-side: stream connection, audio capture, event handling, document creation
audio.ts # Audio utilities: getMicrophoneStream(), getRemoteParticipantStream(), getDisplayMediaStream(), mergeMediaStreams()
index.html # Minimal UI with mode toggle, consultation controls, and document output
README.md
```

---

## Server (`server.ts`)

Runs on your backend. Responsible for:

1. **Creating a `CortiClient`** with OAuth2 client credentials (never exposed to the browser).
2. **Creating an interaction** via the REST API.
3. **Minting a scoped stream token** (only grants WebSocket streaming access).
4. **Generating a clinical document** from the facts collected during a consultation.

### **Merging Two Streams (Microphone + WebRTC)**
For doctor-patient conversations, we merge two separate audio sources.
```ts
const { stream, endStream } = mergeMediaStreams([microphoneStream, webRTCStream]);
import { CortiClient, CortiAuth, CortiEnvironment } from "@corti/sdk";

// Full-privilege client — server-side only
const client = new CortiClient({
environment: CortiEnvironment.Eu,
tenantName: "YOUR_TENANT_NAME",
auth: { clientId: "YOUR_CLIENT_ID", clientSecret: "YOUR_CLIENT_SECRET" },
});

// Create an interaction
const interaction = await client.interactions.create({
encounter: { identifier: randomUUID(), status: "planned", type: "first_consultation" },
});

// Mint a token scoped to streaming only
const auth = new CortiAuth({ environment: CortiEnvironment.Eu, tenantName: "YOUR_TENANT_NAME" });
const streamToken = await auth.getToken({
clientId: "YOUR_CLIENT_ID",
clientSecret: "YOUR_CLIENT_SECRET",
scopes: ["stream"],
});

// Send interaction.id + streamToken.accessToken to the client
```

**How Merging Works:**
- **Each stream is treated as a separate channel**
- **WebRTC provides the remote participant's audio**
- **The local microphone captures the speaker on-site**
- **The merged stream is sent to Corti’s API**
### Document Generation

After a consultation ends, the server fetches the extracted facts and generates a structured clinical document:

```ts
export const mergeMediaStreams = (mediaStreams: MediaStream[]): { stream: MediaStream; endStream: () => void } => {
const audioContext = new AudioContext();
const audioDestination = audioContext.createMediaStreamDestination();
const channelMerger = audioContext.createChannelMerger(mediaStreams.length);

mediaStreams.forEach((stream, index) => {
const source = audioContext.createMediaStreamSource(stream);
source.connect(channelMerger, 0, index);
});

channelMerger.connect(audioDestination);

return {
stream: audioDestination.stream,
endStream: () => {
audioDestination.stream.getAudioTracks().forEach((track) => track.stop());
audioContext.close();
}
};
};
// 1. Fetch facts collected during the consultation
const facts = await client.facts.list(interactionId);

// 2. Create a document from the facts
const document = await client.documents.create(interactionId, {
context: [
{
type: "facts",
data: facts.map((fact) => ({
text: fact.text,
group: fact.group,
source: fact.source,
})),
},
],
template: {
sections: [
{ key: "corti-hpi" },
{ key: "corti-allergies" },
{ key: "corti-social-history" },
{ key: "corti-plan" },
],
},
outputLanguage: "en",
name: "Consultation Document",
documentationMode: "routed_parallel",
});
```

---

## **3. Establishing WebSocket Connection**
Once the audio stream is ready, we establish a WebSocket connection to Corti’s API.
## Audio Utilities (`audio.ts`)

Three methods for obtaining audio streams, plus a merge utility:

### **Starting the Audio Flow**
```ts
const { stop } = await startAudioFlow(stream, authCreds, interactionId, handleNewMessage);
// 1. Local microphone
const micStream = await getMicrophoneStream();

// 2a. Remote participant from a WebRTC peer connection
const remoteStream = getRemoteParticipantStream(peerConnection);

// 2b. OR: screen / tab capture (alternative when you don't control the peer connection,
// e.g. the video-call app runs in another browser tab)
const remoteStream = await getDisplayMediaStream();

// 3. Merge into a single multi-channel stream (virtual consultation mode)
const { stream, endStream } = mergeMediaStreams([micStream, remoteStream]);
```
- **Sends real-time audio**
- **Receives transcription and facts**
- **Automatically starts when a CONFIG_ACCEPTED message is received**

---

## **4. Handling WebSocket Events (Transcripts & Facts)**
Each incoming WebSocket message is parsed and stored.
## Client (`client.ts`)

Receives the scoped token + interaction ID from the server, then:

1. Creates a `CortiClient` with the stream-scoped token.
2. Connects via `client.stream.connect()`.
3. Acquires audio — just the mic in single mode, or mic + remote merged in virtual mode.
4. Streams audio in 200 ms chunks via `MediaRecorder`.
5. Logs transcript and fact events to the console.

```ts
const transcripts: TranscriptEventData[] = [];
const facts: FactEventData[] = [];

const handleNewMessage = (msg: MessageEvent) => {
const parsed = JSON.parse(msg.data);
if (parsed.type === "transcript") {
transcripts.push(parsed.data as TranscriptEventData);
} else if (parsed.type === "fact") {
facts.push(parsed.data as FactEventData);
}
};
```
const client = new CortiClient({
environment: CortiEnvironment.Eu,
tenantName: "YOUR_TENANT_NAME",
auth: { accessToken }, // stream scope only
});

---
const streamSocket = await client.stream.connect({ id: interactionId });

// With a stream-scoped token, only streaming works:
// await client.interactions.list(); // Error — outside scope
// await client.transcribe.connect(); // Error — outside scope
```

## **5. Stopping & Cleanup**
Ensure all resources (WebSocket, MediaRecorder, and merged streams) are properly closed.
### Single Microphone Mode

```ts
stop();
microphoneStream.getAudioTracks().forEach((track) => track.stop());
webRTCStream.getAudioTracks().forEach((track) => track.stop());
endStream(); // Stops the merged audio
console.log("Call ended and resources cleaned up.");
const microphoneStream = await getMicrophoneStream();
const mediaRecorder = new MediaRecorder(microphoneStream);
mediaRecorder.ondataavailable = (e) => streamSocket.send(e.data);
mediaRecorder.start(200);
```

---
### Virtual Consultation Mode

The remote audio source is selected from the UI — either a WebRTC peer connection or screen/tab capture:

## **6. Full Flow Example**
### **Single-Stream (Diarization Mode)**
```ts
async function startSingleStreamCall() {
const microphoneStream = await getMicrophoneStream();
const { stop } = await startAudioFlow(microphoneStream, authCreds, interactionId, handleNewMessage);

return {
endCall: () => {
stop();
microphoneStream.getAudioTracks().forEach((track) => track.stop());
},
};
}
const microphoneStream = await getMicrophoneStream();

// Option A: WebRTC
const remoteStream = getRemoteParticipantStream(peerConnection);

// Option B: Screen / tab capture (getDisplayMedia)
const remoteStream = await getDisplayMediaStream();

// channel 0 = doctor, channel 1 = patient
const { stream, endStream } = mergeMediaStreams([microphoneStream, remoteStream]);

const mediaRecorder = new MediaRecorder(stream);
mediaRecorder.ondataavailable = (e) => streamSocket.send(e.data);
mediaRecorder.start(200);
```

### **Dual-Channel (Doctor-Patient Setup)**
### Event Handling

```ts
async function startDualChannelCall() {
const microphoneStream = await getMicrophoneStream();
const webRTCStream = new MediaStream(); // Example WebRTC stream

const { stream, endStream } = mergeMediaStreams([microphoneStream, webRTCStream]);
const { stop } = await startAudioFlow(stream, authCreds, interactionId, handleNewMessage);

return {
endCall: () => {
stop();
endStream();
microphoneStream.getAudioTracks().forEach((track) => track.stop());
webRTCStream.getAudioTracks().forEach((track) => track.stop());
},
};
}
streamSocket.on("transcript", (data) => console.log("Transcript:", data));
streamSocket.on("fact", (data) => console.log("Fact:", data));
```

---

## UI (`index.html`)

A minimal page with:

- Radio buttons to toggle between **Single Microphone** and **Virtual Consultation** mode.
- When **Virtual Consultation** is selected, a second radio group appears to choose between **WebRTC** and **Screen / tab capture** as the remote audio source.
- **Start Consultation** / **End Consultation** buttons to control the streaming session.
- **Create Document** button — enabled after a consultation ends. Calls the server to fetch facts and generate a clinical document, then displays the result on the page.
- Transcript and fact events are logged to the browser console.

---

## Production Build

For production deployment, compile and run the server:

```bash
npm run build # Compile TypeScript to dist/
npm start # Run compiled server
```

---

## **7. Summary**
🚀 **Two streaming options** – single microphone **(diarization)** or **merged dual-channel streams** (doctor-patient).
✅ **Minimal setup** – simply plug in credentials and select a mode.
📡 **Real-time AI transcription & fact extraction** – powered by **Corti’s API**.
## Resources

For further details, refer to **Corti's API documentation**.
- [`@corti/sdk` on npm](https://www.npmjs.com/package/@corti/sdk)
- [Corti API documentation](https://docs.corti.ai)
Loading