diff --git a/Web/AmbientScribe/.env.example b/Web/AmbientScribe/.env.example new file mode 100644 index 0000000..e4835a7 --- /dev/null +++ b/Web/AmbientScribe/.env.example @@ -0,0 +1,4 @@ +CORTI_TENANT_NAME=your_tenant_name_here +CORTI_CLIENT_ID=your_client_id_here +CORTI_CLIENT_SECRET=your_client_secret_here +PORT=3000 diff --git a/Web/AmbientScribe/.gitignore b/Web/AmbientScribe/.gitignore new file mode 100644 index 0000000..94362eb --- /dev/null +++ b/Web/AmbientScribe/.gitignore @@ -0,0 +1,6 @@ +node_modules/ +dist/ +.env +.env.local +*.log +.DS_Store diff --git a/Web/AmbientScribe/README.md b/Web/AmbientScribe/README.md index 3423131..cac0b67 100644 --- a/Web/AmbientScribe/README.md +++ b/Web/AmbientScribe/README.md @@ -1,189 +1,237 @@ -# Corti AI Platform – Live Transcription & Fact-Based Documentation +# Corti AI Platform – Live Transcription & Fact-Based Documentation -This README provides a guide on using the **Corti AI Platform** WebSocket API for **live audio transcription** and **fact-based documentation**. It includes two approaches: -1. **Single audio stream** – Capturing audio from a single microphone. -2. **Dual-channel merged streams** – Combining a **local microphone** and a **WebRTC stream** for doctor-patient scenarios. +A single demo app using the [`@corti/sdk`](https://www.npmjs.com/package/@corti/sdk) for **live audio transcription**, **fact extraction**, and **clinical document generation**. Toggle between two modes from the UI: + +- **Single Microphone** – one audio source with automatic speaker diarization. +- **Virtual Consultation** – local microphone (doctor) + remote audio (patient) merged into a multi-channel stream. The remote audio can come from either a **WebRTC peer connection** or **screen/tab capture** (`getDisplayMedia`). + +After a consultation ends, generate a structured clinical document from the extracted facts with a single click. + +The demo is split into **server** (auth, interaction management, document generation) and **client** (audio capture, streaming, event display, document creation). --- -## **1. Overview of Configurations** +## Quick Start -### **Single Stream (Diarization Mode)** -This setup uses **one audio source** and **speaker diarization** to distinguish multiple speakers in the same channel automatically. +**Prerequisites:** Node.js 18+ -```ts -const DEFAULT_CONFIG: Config = { - type: "config", - configuration: { - transcription: { - primaryLanguage: "en", - isDiarization: true, // AI automatically differentiates speakers - isMultichannel: false, - participants: [ - { - channel: 0, - role: "multiple", - }, - ], - }, - mode: { type: "facts", outputLocale: "en" }, - }, -}; +**Setup (3 steps):** + +```bash +cp .env.example .env +# Edit .env with your Corti credentials (CORTI_TENANT_NAME, CORTI_CLIENT_ID, CORTI_CLIENT_SECRET) + +npm install +npm run dev ``` -### **Dual-Channel (Explicit Roles: Doctor & Patient)** -This setup **merges two separate audio streams** (e.g., a local microphone and a WebRTC stream). Instead of diarization, each stream is assigned a **fixed role** (Doctor or Patient). +Open http://localhost:3000 in your browser. Transcript and fact events appear in the browser console. -```ts -const DEFAULT_CONFIG: Config = { - type: "config", - configuration: { - transcription: { - primaryLanguage: "en", - isDiarization: false, // No automatic speaker detection - isMultichannel: false, - participants: [ - { channel: 0, role: "doctor" }, - { channel: 0, role: "patient" }, - ], - }, - mode: { type: "facts", outputLocale: "en" }, - }, -}; +--- + +## Installation (Manual) + +If setting up without npm: + +```bash +npm i @corti/sdk express +npm i -D typescript ts-node @types/express @types/node ``` + --- -## **2. Capturing Audio Streams** +## File Structure -### **Single Microphone Access** -Retrieves and returns a **MediaStream** from the user's microphone. -```ts -const microphoneStream = await getMicrophoneStream(); ``` +AmbientScribe/ + server.ts # Server-side: OAuth2 auth, interaction creation, scoped token, document generation + client.ts # Client-side: stream connection, audio capture, event handling, document creation + audio.ts # Audio utilities: getMicrophoneStream(), getRemoteParticipantStream(), getDisplayMediaStream(), mergeMediaStreams() + index.html # Minimal UI with mode toggle, consultation controls, and document output + README.md +``` + +--- + +## Server (`server.ts`) + +Runs on your backend. Responsible for: + +1. **Creating a `CortiClient`** with OAuth2 client credentials (never exposed to the browser). +2. **Creating an interaction** via the REST API. +3. **Minting a scoped stream token** (only grants WebSocket streaming access). +4. **Generating a clinical document** from the facts collected during a consultation. -### **Merging Two Streams (Microphone + WebRTC)** -For doctor-patient conversations, we merge two separate audio sources. ```ts -const { stream, endStream } = mergeMediaStreams([microphoneStream, webRTCStream]); +import { CortiClient, CortiAuth, CortiEnvironment } from "@corti/sdk"; + +// Full-privilege client — server-side only +const client = new CortiClient({ + environment: CortiEnvironment.Eu, + tenantName: "YOUR_TENANT_NAME", + auth: { clientId: "YOUR_CLIENT_ID", clientSecret: "YOUR_CLIENT_SECRET" }, +}); + +// Create an interaction +const interaction = await client.interactions.create({ + encounter: { identifier: randomUUID(), status: "planned", type: "first_consultation" }, +}); + +// Mint a token scoped to streaming only +const auth = new CortiAuth({ environment: CortiEnvironment.Eu, tenantName: "YOUR_TENANT_NAME" }); +const streamToken = await auth.getToken({ + clientId: "YOUR_CLIENT_ID", + clientSecret: "YOUR_CLIENT_SECRET", + scopes: ["stream"], +}); + +// Send interaction.id + streamToken.accessToken to the client ``` -**How Merging Works:** -- **Each stream is treated as a separate channel** -- **WebRTC provides the remote participant's audio** -- **The local microphone captures the speaker on-site** -- **The merged stream is sent to Corti’s API** +### Document Generation + +After a consultation ends, the server fetches the extracted facts and generates a structured clinical document: ```ts -export const mergeMediaStreams = (mediaStreams: MediaStream[]): { stream: MediaStream; endStream: () => void } => { - const audioContext = new AudioContext(); - const audioDestination = audioContext.createMediaStreamDestination(); - const channelMerger = audioContext.createChannelMerger(mediaStreams.length); - - mediaStreams.forEach((stream, index) => { - const source = audioContext.createMediaStreamSource(stream); - source.connect(channelMerger, 0, index); - }); - - channelMerger.connect(audioDestination); - - return { - stream: audioDestination.stream, - endStream: () => { - audioDestination.stream.getAudioTracks().forEach((track) => track.stop()); - audioContext.close(); - } - }; -}; +// 1. Fetch facts collected during the consultation +const facts = await client.facts.list(interactionId); + +// 2. Create a document from the facts +const document = await client.documents.create(interactionId, { + context: [ + { + type: "facts", + data: facts.map((fact) => ({ + text: fact.text, + group: fact.group, + source: fact.source, + })), + }, + ], + template: { + sections: [ + { key: "corti-hpi" }, + { key: "corti-allergies" }, + { key: "corti-social-history" }, + { key: "corti-plan" }, + ], + }, + outputLanguage: "en", + name: "Consultation Document", + documentationMode: "routed_parallel", +}); ``` --- -## **3. Establishing WebSocket Connection** -Once the audio stream is ready, we establish a WebSocket connection to Corti’s API. +## Audio Utilities (`audio.ts`) + +Three methods for obtaining audio streams, plus a merge utility: -### **Starting the Audio Flow** ```ts -const { stop } = await startAudioFlow(stream, authCreds, interactionId, handleNewMessage); +// 1. Local microphone +const micStream = await getMicrophoneStream(); + +// 2a. Remote participant from a WebRTC peer connection +const remoteStream = getRemoteParticipantStream(peerConnection); + +// 2b. OR: screen / tab capture (alternative when you don't control the peer connection, +// e.g. the video-call app runs in another browser tab) +const remoteStream = await getDisplayMediaStream(); + +// 3. Merge into a single multi-channel stream (virtual consultation mode) +const { stream, endStream } = mergeMediaStreams([micStream, remoteStream]); ``` -- **Sends real-time audio** -- **Receives transcription and facts** -- **Automatically starts when a CONFIG_ACCEPTED message is received** --- -## **4. Handling WebSocket Events (Transcripts & Facts)** -Each incoming WebSocket message is parsed and stored. +## Client (`client.ts`) + +Receives the scoped token + interaction ID from the server, then: + +1. Creates a `CortiClient` with the stream-scoped token. +2. Connects via `client.stream.connect()`. +3. Acquires audio — just the mic in single mode, or mic + remote merged in virtual mode. +4. Streams audio in 200 ms chunks via `MediaRecorder`. +5. Logs transcript and fact events to the console. ```ts -const transcripts: TranscriptEventData[] = []; -const facts: FactEventData[] = []; - -const handleNewMessage = (msg: MessageEvent) => { - const parsed = JSON.parse(msg.data); - if (parsed.type === "transcript") { - transcripts.push(parsed.data as TranscriptEventData); - } else if (parsed.type === "fact") { - facts.push(parsed.data as FactEventData); - } -}; -``` +const client = new CortiClient({ + environment: CortiEnvironment.Eu, + tenantName: "YOUR_TENANT_NAME", + auth: { accessToken }, // stream scope only +}); ---- +const streamSocket = await client.stream.connect({ id: interactionId }); + +// With a stream-scoped token, only streaming works: +// await client.interactions.list(); // Error — outside scope +// await client.transcribe.connect(); // Error — outside scope +``` -## **5. Stopping & Cleanup** -Ensure all resources (WebSocket, MediaRecorder, and merged streams) are properly closed. +### Single Microphone Mode ```ts -stop(); -microphoneStream.getAudioTracks().forEach((track) => track.stop()); -webRTCStream.getAudioTracks().forEach((track) => track.stop()); -endStream(); // Stops the merged audio -console.log("Call ended and resources cleaned up."); +const microphoneStream = await getMicrophoneStream(); +const mediaRecorder = new MediaRecorder(microphoneStream); +mediaRecorder.ondataavailable = (e) => streamSocket.send(e.data); +mediaRecorder.start(200); ``` ---- +### Virtual Consultation Mode + +The remote audio source is selected from the UI — either a WebRTC peer connection or screen/tab capture: -## **6. Full Flow Example** -### **Single-Stream (Diarization Mode)** ```ts -async function startSingleStreamCall() { - const microphoneStream = await getMicrophoneStream(); - const { stop } = await startAudioFlow(microphoneStream, authCreds, interactionId, handleNewMessage); - - return { - endCall: () => { - stop(); - microphoneStream.getAudioTracks().forEach((track) => track.stop()); - }, - }; -} +const microphoneStream = await getMicrophoneStream(); + +// Option A: WebRTC +const remoteStream = getRemoteParticipantStream(peerConnection); + +// Option B: Screen / tab capture (getDisplayMedia) +const remoteStream = await getDisplayMediaStream(); + +// channel 0 = doctor, channel 1 = patient +const { stream, endStream } = mergeMediaStreams([microphoneStream, remoteStream]); + +const mediaRecorder = new MediaRecorder(stream); +mediaRecorder.ondataavailable = (e) => streamSocket.send(e.data); +mediaRecorder.start(200); ``` -### **Dual-Channel (Doctor-Patient Setup)** +### Event Handling + ```ts -async function startDualChannelCall() { - const microphoneStream = await getMicrophoneStream(); - const webRTCStream = new MediaStream(); // Example WebRTC stream - - const { stream, endStream } = mergeMediaStreams([microphoneStream, webRTCStream]); - const { stop } = await startAudioFlow(stream, authCreds, interactionId, handleNewMessage); - - return { - endCall: () => { - stop(); - endStream(); - microphoneStream.getAudioTracks().forEach((track) => track.stop()); - webRTCStream.getAudioTracks().forEach((track) => track.stop()); - }, - }; -} +streamSocket.on("transcript", (data) => console.log("Transcript:", data)); +streamSocket.on("fact", (data) => console.log("Fact:", data)); +``` + +--- + +## UI (`index.html`) + +A minimal page with: + +- Radio buttons to toggle between **Single Microphone** and **Virtual Consultation** mode. +- When **Virtual Consultation** is selected, a second radio group appears to choose between **WebRTC** and **Screen / tab capture** as the remote audio source. +- **Start Consultation** / **End Consultation** buttons to control the streaming session. +- **Create Document** button — enabled after a consultation ends. Calls the server to fetch facts and generate a clinical document, then displays the result on the page. +- Transcript and fact events are logged to the browser console. + +--- + +## Production Build + +For production deployment, compile and run the server: + +```bash +npm run build # Compile TypeScript to dist/ +npm start # Run compiled server ``` --- -## **7. Summary** -🚀 **Two streaming options** – single microphone **(diarization)** or **merged dual-channel streams** (doctor-patient). -✅ **Minimal setup** – simply plug in credentials and select a mode. -📡 **Real-time AI transcription & fact extraction** – powered by **Corti’s API**. +## Resources -For further details, refer to **Corti's API documentation**. \ No newline at end of file +- [`@corti/sdk` on npm](https://www.npmjs.com/package/@corti/sdk) +- [Corti API documentation](https://docs.corti.ai) diff --git a/Web/AmbientScribe/audio.ts b/Web/AmbientScribe/audio.ts new file mode 100644 index 0000000..44163e3 --- /dev/null +++ b/Web/AmbientScribe/audio.ts @@ -0,0 +1,177 @@ +/** + * audio.ts — Audio stream utilities for AmbientScribe. + * + * Exposes three methods for obtaining audio streams: + * 1. getMicrophoneStream() — local microphone (works in both modes) + * 2. getRemoteParticipantStream() — remote party via WebRTC (virtual consultations) + * 3. getDisplayMediaStream() — screen/tab/window audio via getDisplayMedia + * (alternative to WebRTC for virtual consultations, + * e.g. capturing audio from a video-call app) + * + * Also provides mergeMediaStreams() for combining multiple streams into a + * single multi-channel stream before sending to Corti. + */ + +// --------------------------------------------------------------------------- +// 1. Local microphone +// --------------------------------------------------------------------------- + +/** + * Opens the user's microphone and returns the resulting MediaStream. + * + * @param deviceId Optional device ID if a specific microphone is desired. + * When omitted the browser's default audio input is used. + * @returns A MediaStream containing a single audio track from the microphone. + */ +export async function getMicrophoneStream( + deviceId?: string +): Promise { + if (!navigator.mediaDevices) { + throw new Error("Media Devices API not supported in this browser"); + } + + return navigator.mediaDevices.getUserMedia({ + audio: deviceId ? { deviceId: { exact: deviceId } } : true, + }); +} + +// --------------------------------------------------------------------------- +// 2. Remote participant (WebRTC) +// --------------------------------------------------------------------------- + +/** + * Extracts the remote participant's audio from an active WebRTC peer connection. + * + * In a virtual consultation the remote party's audio arrives via WebRTC. + * This helper collects all incoming audio tracks from the connection's + * receivers into a single MediaStream. + * + * @param peerConnection An RTCPeerConnection that already has remote audio tracks. + * @returns A MediaStream containing the remote participant's audio track(s). + * @throws If the peer connection has no remote audio tracks. + */ +export function getRemoteParticipantStream( + peerConnection: RTCPeerConnection +): MediaStream { + const remoteStream = new MediaStream(); + + for (const receiver of peerConnection.getReceivers()) { + if (receiver.track.kind === "audio") { + remoteStream.addTrack(receiver.track); + } + } + + if (!remoteStream.getAudioTracks().length) { + throw new Error("No remote audio tracks found on the peer connection"); + } + + return remoteStream; +} + +// --------------------------------------------------------------------------- +// 3. Screen / tab audio capture (getDisplayMedia) +// --------------------------------------------------------------------------- + +/** + * Captures audio from a screen, window, or browser tab using getDisplayMedia. + * + * This is an alternative to getRemoteParticipantStream() for virtual + * consultations where the remote party's audio comes through a video-call + * app running in another tab or window rather than a direct WebRTC + * peer connection you control. + * + * The browser will show a picker dialog asking which screen/tab to share. + * We request both audio and video (some browsers require video to be + * requested for tab audio to work) and then strip the video track so only + * the audio track remains. + * + * @returns A MediaStream containing only the audio track from the selected + * screen / tab / window. + * @throws If the browser doesn't support getDisplayMedia, the user cancels + * the picker, or the selected source has no audio track. + */ +export async function getDisplayMediaStream(): Promise { + if (!navigator.mediaDevices?.getDisplayMedia) { + throw new Error("getDisplayMedia is not supported in this browser"); + } + + // Request both audio and video — some browsers (e.g. Chrome) only expose + // tab audio when video is also requested. + const stream = await navigator.mediaDevices.getDisplayMedia({ + audio: true, + video: true, + }); + + // Remove all video tracks — we only need the audio. + for (const track of stream.getTracks()) { + if (track.kind === "video") { + track.stop(); + stream.removeTrack(track); + } + } + + if (!stream.getAudioTracks().length) { + throw new Error( + "The selected source does not have an audio track. " + + "Make sure to pick a browser tab that is playing audio." + ); + } + + return stream; +} + +// --------------------------------------------------------------------------- +// 4. Stream merging (used in virtual consultation mode) +// --------------------------------------------------------------------------- + +/** + * Merges multiple MediaStreams into a single multi-channel MediaStream. + * + * Each input stream is mapped to its own channel (by array index), so + * channel 0 = first stream, channel 1 = second stream, etc. + * This lets Corti attribute speech to the correct participant without + * relying on diarization. + * + * @param mediaStreams Array of MediaStreams to merge. Each must have at + * least one audio track. + * @returns An object with: + * - `stream` — the merged MediaStream to feed into MediaRecorder + * - `endStream` — cleanup function that stops tracks and closes the AudioContext + */ +export function mergeMediaStreams( + mediaStreams: MediaStream[] +): { stream: MediaStream; endStream: () => void } { + if (!mediaStreams.length) { + throw new Error("No media streams provided"); + } + + // Validate every stream has audio before we start wiring things up. + mediaStreams.forEach((stream, index) => { + if (!stream.getAudioTracks().length) { + throw new Error( + `MediaStream at index ${index} does not have an audio track` + ); + } + }); + + // Create an AudioContext and a ChannelMerger with one input per stream. + const audioContext = new AudioContext(); + const audioDestination = audioContext.createMediaStreamDestination(); + const channelMerger = audioContext.createChannelMerger(mediaStreams.length); + + // Wire each stream's first audio output into its dedicated merger channel. + mediaStreams.forEach((stream, index) => { + const source = audioContext.createMediaStreamSource(stream); + source.connect(channelMerger, 0, index); + }); + + channelMerger.connect(audioDestination); + + return { + stream: audioDestination.stream, + endStream: () => { + audioDestination.stream.getAudioTracks().forEach((track) => track.stop()); + audioContext.close(); + }, + }; +} diff --git a/Web/AmbientScribe/client.ts b/Web/AmbientScribe/client.ts new file mode 100644 index 0000000..495a0f0 --- /dev/null +++ b/Web/AmbientScribe/client.ts @@ -0,0 +1,177 @@ +/** + * client.ts — Corti SDK streaming integration for AmbientScribe. + * + * Provides a single entry point — startSession() — that: + * 1. Creates a CortiClient with a stream-scoped access token. + * 2. Connects to the Corti streaming WebSocket. + * 3. Acquires audio depending on the selected mode. + * 4. Streams audio to Corti in 200 ms chunks. + * 5. Emits transcript and fact events via callbacks. + * + * This module has no DOM dependencies — all UI wiring lives in index.html. + */ + +import { CortiClient, CortiEnvironment } from "@corti/sdk"; +import { + getMicrophoneStream, + getRemoteParticipantStream, + getDisplayMediaStream, + mergeMediaStreams, +} from "./audio"; + +// --------------------------------------------------------------------------- +// Types +// --------------------------------------------------------------------------- + +export type Mode = "single" | "virtual"; + +/** How the remote participant's audio is captured in virtual mode. */ +export type RemoteSource = "webrtc" | "display"; + +export interface SessionOptions { + accessToken: string; + interactionId: string; + tenantName: string; + mode: Mode; + remoteSource?: RemoteSource; + peerConnection?: RTCPeerConnection; + onTranscript?: (data: unknown) => void; + onFact?: (data: unknown) => void; +} + +export interface ActiveSession { + endConsultation: () => void; +} + +// --------------------------------------------------------------------------- +// startSession +// --------------------------------------------------------------------------- + +/** + * Starts a streaming session in the chosen mode. + * + * 1. Creates a CortiClient using the scoped access token from the server. + * 2. Connects to the streaming WebSocket via client.stream.connect(). + * 3. Acquires the appropriate audio stream(s) depending on the mode. + * 4. Pipes audio to Corti in 200 ms chunks via MediaRecorder. + * 5. Fires onTranscript / onFact callbacks for incoming events. + * + * @returns An object with an `endConsultation()` method for cleanup. + */ +export async function startSession( + options: SessionOptions +): Promise { + const { + accessToken, + interactionId, + tenantName, + mode, + remoteSource = "webrtc", + peerConnection, + onTranscript, + onFact, + } = options; + + // -- 1. Create a client scoped to streaming only ------------------------- + const client = new CortiClient({ + environment: CortiEnvironment.Eu, + tenantName, + auth: { + accessToken, // Token with "stream" scope only + }, + }); + + // With a stream-scoped token these would fail: + // await client.interactions.list(); // outside scope + // await client.transcribe.connect({ id: "..." }); // outside scope + + // -- 2. Connect to the Corti streaming WebSocket ------------------------- + const streamSocket = await client.stream.connect({ id: interactionId }); + + // -- 3. Acquire audio depending on mode ---------------------------------- + // "single" → just the local microphone + // "virtual" → local mic + remote audio (WebRTC or display), merged + + const microphoneStream = await getMicrophoneStream(); + console.log(`[${mode}] Microphone stream acquired`); + + // audioStream is what we feed into MediaRecorder. + // endMergedStream is only set when we merge (virtual mode). + let audioStream: MediaStream; + let remoteStream: MediaStream | undefined; + let endMergedStream: (() => void) | undefined; + + if (mode === "virtual") { + // Get the remote participant's audio from the chosen source. + if (remoteSource === "display") { + // Screen / tab capture — the browser will show a picker dialog. + // Useful when the video-call runs in another tab and you don't + // have direct access to the peer connection. + remoteStream = await getDisplayMediaStream(); + console.log("[virtual:display] Display media stream acquired"); + } else { + // WebRTC — pull audio tracks from an existing peer connection. + if (!peerConnection) { + throw new Error( + 'Virtual mode with remoteSource "webrtc" requires an RTCPeerConnection' + ); + } + remoteStream = getRemoteParticipantStream(peerConnection); + console.log("[virtual:webrtc] Remote participant stream acquired"); + } + + // Merge: channel 0 = doctor (mic), channel 1 = patient (remote) + const merged = mergeMediaStreams([microphoneStream, remoteStream]); + audioStream = merged.stream; + endMergedStream = merged.endStream; + } else { + audioStream = microphoneStream; + } + + // -- 4. Stream audio to Corti in 200 ms chunks -------------------------- + const mediaRecorder = new MediaRecorder(audioStream); + + mediaRecorder.ondataavailable = (event: BlobEvent) => { + if (event.data.size > 0) { + streamSocket.send(event.data); + } + }; + + mediaRecorder.start(200); + console.log(`[${mode}] MediaRecorder started — streaming audio to Corti`); + + // -- 5. Handle incoming events ------------------------------------------- + streamSocket.on("transcript", (data) => { + console.log("Transcript:", data); + onTranscript?.(data); + }); + + streamSocket.on("fact", (data) => { + console.log("Fact:", data); + onFact?.(data); + }); + + // -- 6. Return cleanup function ------------------------------------------ + return { + endConsultation: () => { + // Stop recording + if (mediaRecorder.state !== "inactive") { + mediaRecorder.stop(); + } + + // Close the stream socket + streamSocket.close(); + + // Release the merged stream (virtual mode only) + endMergedStream?.(); + + // Release the remote stream tracks (virtual mode only) + remoteStream?.getAudioTracks().forEach((track) => track.stop()); + + // Release the raw microphone track(s) + microphoneStream.getAudioTracks().forEach((track) => track.stop()); + + console.log(`[${mode}] Consultation ended — all resources cleaned up`); + }, + }; +} diff --git a/Web/AmbientScribe/index.html b/Web/AmbientScribe/index.html new file mode 100644 index 0000000..12c4bda --- /dev/null +++ b/Web/AmbientScribe/index.html @@ -0,0 +1,161 @@ + + + + + Corti AmbientScribe Demo + + + +

AmbientScribe

+ + +
+ Mode + +
+ +
+ + + + + + + + + +

Open the browser console to see transcripts and facts.

+ + + + + + + + diff --git a/Web/AmbientScribe/package.json b/Web/AmbientScribe/package.json new file mode 100644 index 0000000..f09c781 --- /dev/null +++ b/Web/AmbientScribe/package.json @@ -0,0 +1,28 @@ +{ + "name": "corti-ambientscribe-demo", + "version": "1.0.0", + "description": "Live audio transcription and clinical document generation demo using Corti SDK", + "main": "server.ts", + "type": "module", + "scripts": { + "build:client": "esbuild client.ts --bundle --outfile=dist/client.js --format=iife --global-name=AmbientScribe", + "dev": "npm run build:client && ts-node --esm server.ts", + "start": "node dist/server.js", + "build": "npm run build:client && tsc", + "clean": "rm -rf dist" + }, + "keywords": ["corti", "audio", "transcription", "clinical-documentation"], + "author": "", + "license": "MIT", + "dependencies": { + "@corti/sdk": "^1.0.0", + "express": "^4.18.2" + }, + "devDependencies": { + "@types/express": "^4.17.21", + "@types/node": "^20.10.6", + "esbuild": "^0.24.0", + "typescript": "^5.3.3", + "ts-node": "^10.9.2" + } +} diff --git a/Web/AmbientScribe/server.ts b/Web/AmbientScribe/server.ts new file mode 100644 index 0000000..e175908 --- /dev/null +++ b/Web/AmbientScribe/server.ts @@ -0,0 +1,174 @@ +/** + * server.ts — Express server for AmbientScribe. + * + * Responsible for: + * 1. Creating a fully-privileged CortiClient using OAuth2 client credentials. + * 2. Exposing a POST /api/start-session endpoint that: + * a. Creates an interaction via the Corti REST API. + * b. Mints a scoped stream token (WebSocket access only). + * c. Returns both to the client. + * 3. Serving the static front-end files (index.html, client.ts, audio.ts). + * + * IMPORTANT: Client credentials (CLIENT_ID / CLIENT_SECRET) must NEVER be + * exposed to the browser. Only the scoped stream token is sent to the client. + */ + +import express from "express"; +import path from "path"; +import { fileURLToPath } from "url"; +import { CortiClient, CortiAuth, CortiEnvironment } from "@corti/sdk"; +import { randomUUID } from "crypto"; + +const __filename = fileURLToPath(import.meta.url); +const __dirname = path.dirname(__filename); + +// --------------------------------------------------------------------------- +// Configuration — replace with your own values or load from environment +// --------------------------------------------------------------------------- + +const TENANT_NAME = process.env.CORTI_TENANT_NAME ?? "YOUR_TENANT_NAME"; +const CLIENT_ID = process.env.CORTI_CLIENT_ID ?? "YOUR_CLIENT_ID"; +const CLIENT_SECRET = process.env.CORTI_CLIENT_SECRET ?? "YOUR_CLIENT_SECRET"; +const PORT = Number(process.env.PORT ?? 3000); + +// --------------------------------------------------------------------------- +// 1. Create a CortiClient authenticated with client credentials (OAuth2). +// This client has full API access and must only be used server-side. +// --------------------------------------------------------------------------- + +const client = new CortiClient({ + environment: CortiEnvironment.Eu, + tenantName: TENANT_NAME, + auth: { + clientId: CLIENT_ID, + clientSecret: CLIENT_SECRET, + }, +}); + +// --------------------------------------------------------------------------- +// 2. Helper: create an interaction. +// An interaction represents a single clinical encounter / session. +// --------------------------------------------------------------------------- + +async function createInteraction() { + const interaction = await client.interactions.create({ + encounter: { + identifier: randomUUID(), + status: "planned", + type: "first_consultation", + }, + }); + + console.log("Interaction created:", interaction.id); + return interaction; +} + +// --------------------------------------------------------------------------- +// 3. Helper: mint a scoped token with only the "stream" scope. +// This token lets the client connect to the streaming WebSocket but +// cannot list interactions, create documents, or call any other REST +// endpoint — keeping the blast radius minimal if it leaks. +// --------------------------------------------------------------------------- + +async function getScopedStreamToken() { + const auth = new CortiAuth({ + environment: CortiEnvironment.Eu, + tenantName: TENANT_NAME, + }); + + const streamToken = await auth.getToken({ + clientId: CLIENT_ID, + clientSecret: CLIENT_SECRET, + scopes: ["stream"], + }); + + return streamToken; +} + +// --------------------------------------------------------------------------- +// 4. Express app +// --------------------------------------------------------------------------- + +const app = express(); + +// Serve the front-end files (index.html, dist/client.js) from this directory. +app.use(express.static(path.join(__dirname))); +app.use(express.json()); + +// POST /api/start-session +// Creates an interaction + scoped token and returns them to the client. +app.post("/api/start-session", async (_req, res) => { + try { + const interaction = await createInteraction(); + const streamToken = await getScopedStreamToken(); + + // The client only receives the interaction ID, tenant name, and a limited-scope token. + res.json({ + interactionId: interaction.id, + tenantName: TENANT_NAME, + accessToken: streamToken.accessToken, + }); + } catch (err) { + console.error("Failed to start session:", err); + res.status(500).json({ error: "Failed to start session" }); + } +}); + +// --------------------------------------------------------------------------- +// 5. POST /api/create-document +// Fetches the facts collected during the consultation, then generates a +// clinical document from them using the Corti Documents API. +// --------------------------------------------------------------------------- + +app.post("/api/create-document", async (req, res) => { + try { + const { interactionId } = req.body; + + if (!interactionId) { + res.status(400).json({ error: "Missing interactionId" }); + return; + } + + // Step 1: Fetch facts collected during the consultation + const facts = await client.facts.list(interactionId); + console.log(`Fetched ${facts.length} facts for interaction ${interactionId}`); + + // Step 2: Map facts into the format expected by the Documents API + const factsContext = facts.map((fact: { text: string; group: string; source: string }) => ({ + text: fact.text, + group: fact.group, + source: fact.source, + })); + + // Step 3: Create a document using the collected facts + const document = await client.documents.create(interactionId, { + context: [ + { + type: "facts", + data: factsContext, + }, + ], + template: { + sections: [ + { key: "corti-hpi" }, + { key: "corti-allergies" }, + { key: "corti-social-history" }, + { key: "corti-plan" }, + ], + }, + outputLanguage: "en", + name: "Consultation Document", + documentationMode: "routed_parallel", + }); + + console.log("Document created:", document); + res.json({ document }); + } catch (err) { + console.error("Failed to create document:", err); + res.status(500).json({ error: "Failed to create document" }); + } +}); + +app.listen(PORT, () => { + console.log(`AmbientScribe server listening on http://localhost:${PORT}`); +}); diff --git a/Web/AmbientScribe/singleMicrophone.ts b/Web/AmbientScribe/singleMicrophone.ts deleted file mode 100644 index f35ecf8..0000000 --- a/Web/AmbientScribe/singleMicrophone.ts +++ /dev/null @@ -1,199 +0,0 @@ -import type { AuthCreds, Config, TranscriptEventData, FactEventData } from "./types"; - - const DEFAULT_CONFIG: Config = { - type: "config", - configuration: { - transcription: { - primaryLanguage: "en", - isDiarization: true, - isMultichannel: false, - participants: [ - { - channel: 0, - role: "multiple", - }, - ], - }, - mode: { - type: "facts", - outputLocale: "en", - }, - }, - }; - - - - /** - * Retrieves the user's microphone MediaStream. - * If a device ID is provided, attempts to use that specific microphone, otherwise uses the default. - * - * @param deviceId - Optional ID of the desired audio input device. - * @returns A Promise that resolves with the MediaStream. - * @throws An error if accessing the microphone fails. - */ -const getMicrophoneStream = async (deviceId?: string): Promise => { - if (!navigator.mediaDevices) { - throw new Error("Media Devices API not supported in this browser"); - } - try { - return await navigator.mediaDevices.getUserMedia({ - audio: deviceId ? { deviceId: { exact: deviceId } } : true, - }); - } catch (error) { - console.error("Error accessing microphone:", error); - throw error; - } - }; - - - /** - * Starts an audio flow by connecting a MediaStream to a WebSocket endpoint and sending a config. - * The flow begins once a CONFIG_ACCEPTED message is received, after which audio - * data is sent in 200ms chunks via a MediaRecorder. - * - * @param mediaStream - The audio MediaStream to send. - * @param authCreds - Authentication credentials containing environment, tenant, and token. - * @param interactionId - The interaction identifier used in the WebSocket URL. - * @param config - Optional configuration object; falls back to DEFAULT_CONFIG if not provided. - * @returns An object with a: - * - `recorderStarted` boolean indicating whether the MediaRecorder has started - * - `stop` method to end the flow and clean up resources - */ - async function startAudioFlow(mediaStream: MediaStream, authCreds: AuthCreds, interactionId: string, handleEvent: (arg0: MessageEvent) => void, config?: Config): Promise<{ recorderStarted: boolean, stop: () => void }> { - // 2. Set up configuration if not provided - if (!config) { - config = DEFAULT_CONFIG; - } - - // 3. Start WebSocket connection - const wsUrl = `wss://api.${authCreds.environment}.corti.app/audio-bridge/v2/interactions/${interactionId}/streams?tenant-name=${authCreds.tenant}&token=Bearer%20${authCreds.token}`; - const ws = new WebSocket(wsUrl); - let isOpen = false; - let recorderStarted = false; - let mediaRecorder: MediaRecorder; - - ws.onopen = () => { - ws.send(JSON.stringify(config)); - isOpen = true; - }; - - // 4. Wait for CONFIG_ACCEPTED message - ws.onmessage = (msg: MessageEvent) => { - try { - const data = JSON.parse(msg.data); - if (data.type === "CONFIG_ACCEPTED" && !recorderStarted) { - recorderStarted = true; - startMediaRecorder(); - } - handleEvent(msg); - } catch (err) { - console.error("Failed to parse WebSocket message:", err); - } - }; - - ws.onerror = (err: Event) => { - console.error("WebSocket encountered an error:", err); - // Optionally, call stop() to clean up resources - }; - - ws.onclose = (event: Event) => { - console.log("WebSocket closed:", event); - // Ensure cleanup is performed or notify the user - }; - - // 5. Start MediaRecorder with 200ms chunks and send data to WebSocket - function startMediaRecorder() { - mediaRecorder = new MediaRecorder(mediaStream); - mediaRecorder.ondataavailable = (event: BlobEvent) => { - if (isOpen) { - ws.send(event.data); - } - }; - mediaRecorder.start(200); - } - - // 6. End the flow - const stop = () => { - if (ws.readyState === WebSocket.OPEN) { - ws.send(JSON.stringify({ type: "end" })); - } - if (mediaRecorder && mediaRecorder.state !== "inactive") { - mediaRecorder.stop(); - } - setTimeout(() => { - ws.close(); - }, 10000); - }; - - return { recorderStarted, stop }; - } - - - - // Usage Example: - // Define authentication credentials and interaction identifier. - const authCreds: AuthCreds = { - environment: "us", - tenant: "your-tenant", - token: "your-token", - }; - const interactionId = "interaction-id"; - - const transcripts: TranscriptEventData[] = []; - const facts: FactEventData[] = []; - - const handleNewMessage = (msg: MessageEvent) => { - try { - const parsed = JSON.parse(msg.data); - - switch (parsed.type) { - case "transcript": - transcripts.push(parsed.data as TranscriptEventData); - break; - case "fact": - facts.push(parsed.data as FactEventData); - break; - default: - console.log("Unhandled WebSocket event type:", parsed.type); - } - } catch (err) { - console.error("Failed to parse WebSocket message:", err); - } - }; - - // Encapsulate the call setup in an async function. - async function startCall() { - try { - // Retrieve the user's microphone stream. - const microphoneStream = await getMicrophoneStream(); - - // Start the audio flow over a WebSocket connection. - // The returned `stop` method is used to end the audio flow gracefully. - const { stop } = await startAudioFlow(microphoneStream, authCreds, interactionId, handleNewMessage); - - // Define a cleanup function to end the call. - const endCall = () => { - // End the audio flow (closes WebSocket and stops MediaRecorder). - stop(); - // Optionally, stop original streams if no longer needed. - microphoneStream.getAudioTracks().forEach((track) => track.stop()); - console.log("Call ended and resources cleaned up."); - }; - - return { endCall }; - } catch (error) { - console.error("Error starting call:", error); - throw error; - } - } - - // Example usage: start a call and end it after 10 seconds. - startCall() - .then(({ endCall }) => { - setTimeout(endCall, 10000); - }) - .catch((error) => { - // Handle any errors that occurred during setup. - console.error(error); - }); - \ No newline at end of file diff --git a/Web/AmbientScribe/tsconfig.json b/Web/AmbientScribe/tsconfig.json new file mode 100644 index 0000000..81a0dfb --- /dev/null +++ b/Web/AmbientScribe/tsconfig.json @@ -0,0 +1,20 @@ +{ + "compilerOptions": { + "target": "ES2020", + "module": "ES2020", + "moduleResolution": "node", + "lib": ["ES2020", "DOM"], + "outDir": "./dist", + "rootDir": "./", + "strict": true, + "esModuleInterop": true, + "skipLibCheck": true, + "forceConsistentCasingInFileNames": true, + "resolveJsonModule": true, + "declaration": true, + "declarationMap": true, + "sourceMap": true + }, + "include": ["*.ts"], + "exclude": ["node_modules"] +} diff --git a/Web/AmbientScribe/types.ts b/Web/AmbientScribe/types.ts deleted file mode 100644 index 5f95576..0000000 --- a/Web/AmbientScribe/types.ts +++ /dev/null @@ -1,66 +0,0 @@ -export interface AuthCreds { - environment: string; - tenant: string; - token: string; -} - -export interface Config { - type: string; - configuration: { - transcription: { - primaryLanguage: string; - isDiarization: boolean; - isMultichannel: boolean; - participants: Array<{ - channel: number; - role: string; - }>; - }; - mode: { - type: string; - outputLocale: string; - }; - }; -} - -export interface TranscriptEventData { - id: string; - start: number; - duration: number; - transcript: string; - isFinal: boolean; - participant: { - channel: number; - role: string; - }; - time: { - start: number; - end: number; - }; -} - -export interface FactEventData { - id: string; - text: string; - createdAt: string; - createdAtTzOffset: string; - evidence?: Array; - group: string; - groupId: string; - isDiscarded: boolean; - source: "core" | "system" | "user"; - updatedAt: string; - updatedAtTzOffset: string; -} - -export interface TranscriptMessage { - type: "transcript"; - data: TranscriptEventData; -} - -export interface FactMessage { - type: "fact"; - data: FactEventData; -} - -export type WSSEvent = TranscriptMessage | FactMessage; diff --git a/Web/AmbientScribe/virtualConsultations.ts b/Web/AmbientScribe/virtualConsultations.ts deleted file mode 100644 index 84cf7fb..0000000 --- a/Web/AmbientScribe/virtualConsultations.ts +++ /dev/null @@ -1,267 +0,0 @@ -import type { AuthCreds, Config, TranscriptEventData, FactEventData } from "./types"; - -const DEFAULT_CONFIG: Config = { - type: "config", - configuration: { - transcription: { - primaryLanguage: "en", - isDiarization: false, - isMultichannel: false, - participants: [ - { - channel: 0, - role: "doctor", - }, - { - channel: 0, - role: "patient", - }, - ], - }, - mode: { - type: "facts", - outputLocale: "en", - }, - }, -}; - -/** - * Merges multiple audio MediaStreams into a single MediaStream and returns both - * the merged MediaStream and a cleanup method. - * The cleanup method stops the merged stream's audio tracks and closes the AudioContext. - * - * @param mediaStreams - Array of MediaStreams to merge. - * @returns An object containing: - * - stream: the merged MediaStream. - * - endStream: A method to end the merged stream and clean up resources. - * @throws Error if no streams are provided or if any stream lacks an audio track. - */ -const mergeMediaStreams = ( - mediaStreams: MediaStream[] -): { stream: MediaStream; endStream: () => void } => { - if (!mediaStreams.length) { - throw new Error("No media streams provided."); - } - - // Validate that each MediaStream has an audio track. - mediaStreams.forEach((stream, index) => { - if (!stream.getAudioTracks().length) { - throw new Error( - `MediaStream at index ${index} does not have an audio track.` - ); - } - }); - - // Each mediastream is added as a new channel in order of the array. - const audioContext = new AudioContext(); - const audioDestination = audioContext.createMediaStreamDestination(); - const channelMerger = audioContext.createChannelMerger(mediaStreams.length); - mediaStreams.forEach((stream, index) => { - const source = audioContext.createMediaStreamSource(stream); - source.connect(channelMerger, 0, index); - }); - channelMerger.connect(audioDestination); - - // Close the audio context and stop all tracks when the stream ends. - const endStream = () => { - audioDestination.stream.getAudioTracks().forEach((track) => { - track.stop(); - }); - audioContext.close(); - }; - - // Return the merged stream and the endStream method. - return { stream: audioDestination.stream, endStream }; -}; - -/** - * Retrieves the user's microphone MediaStream. - * If a device ID is provided, attempts to use that specific microphone, otherwise uses the default. - * - * @param deviceId - Optional ID of the desired audio input device. - * @returns A Promise that resolves with the MediaStream. - * @throws An error if accessing the microphone fails. - */ -export const getMicrophoneStream = async ( - deviceId?: string -): Promise => { - if (!navigator.mediaDevices) { - throw new Error("Media Devices API not supported in this browser"); - } - try { - return await navigator.mediaDevices.getUserMedia({ - audio: deviceId ? { deviceId: { exact: deviceId } } : true, - }); - } catch (error) { - console.error("Error accessing microphone:", error); - throw error; - } -}; - -/** - * Starts an audio flow by connecting a MediaStream to a WebSocket endpoint and sending a config. - * The flow begins once a CONFIG_ACCEPTED message is received, after which audio - * data is sent in 200ms chunks via a MediaRecorder. - * - * @param mediaStream - The audio MediaStream to send. - * @param authCreds - Authentication credentials containing environment, tenant, and token. - * @param interactionId - The interaction identifier used in the WebSocket URL. - * @param config - Optional configuration object; falls back to DEFAULT_CONFIG if not provided. - * @returns An object with a: - * - `recorderStarted` boolean indicating whether the MediaRecorder has started - * - `stop` method to end the flow and clean up resources - */ -async function startAudioFlow( - mediaStream: MediaStream, - authCreds: AuthCreds, - interactionId: string, - handleEvent: (arg0: MessageEvent) => void, - config?: Config -): Promise<{ recorderStarted: boolean; stop: () => void }> { - // 2. Set up configuration if not provided - if (!config) { - config = DEFAULT_CONFIG; - } - - // 3. Start WebSocket connection - const wsUrl = `wss://api.${authCreds.environment}.corti.app/audio-bridge/v2/interactions/${interactionId}/streams?tenant-name=${authCreds.tenant}&token=Bearer%20${authCreds.token}`; - const ws = new WebSocket(wsUrl); - let isOpen = false; - let recorderStarted = false; - let mediaRecorder: MediaRecorder; - - ws.onopen = () => { - ws.send(JSON.stringify(config)); - isOpen = true; - }; - - // 4. Wait for CONFIG_ACCEPTED message - ws.onmessage = (msg: MessageEvent) => { - try { - const data = JSON.parse(msg.data); - if (data.type === "CONFIG_ACCEPTED" && !recorderStarted) { - recorderStarted = true; - startMediaRecorder(); - } - handleEvent(msg); - } catch (err) { - console.error("Failed to parse WebSocket message:", err); - } - }; - - ws.onerror = (err: Event) => { - console.error("WebSocket encountered an error:", err); - // Optionally, call stop() to clean up resources - }; - - ws.onclose = (event: Event) => { - console.log("WebSocket closed:", event); - // Ensure cleanup is performed or notify the user - }; - - // 5. Start MediaRecorder with 200ms chunks and send data to WebSocket - function startMediaRecorder() { - mediaRecorder = new MediaRecorder(mediaStream); - mediaRecorder.ondataavailable = (event: BlobEvent) => { - if (isOpen) { - ws.send(event.data); - } - }; - mediaRecorder.start(200); - } - - // 6. End the flow - const stop = () => { - if (ws.readyState === WebSocket.OPEN) { - ws.send(JSON.stringify({ type: "end" })); - } - if (mediaRecorder && mediaRecorder.state !== "inactive") { - mediaRecorder.stop(); - } - setTimeout(() => { - ws.close(); - }, 10000); - }; - - return { recorderStarted, stop }; -} - - -// Usage Example: -// Define authentication credentials and interaction identifier. -const authCreds: AuthCreds = { - environment: "us", - tenant: "your-tenant", - token: "your-token", -}; -const interactionId = "interaction-id"; - const transcripts: TranscriptEventData[] = []; - const facts: FactEventData[] = []; - - const handleNewMessage = (msg: MessageEvent) => { - try { - const parsed = JSON.parse(msg.data); - - switch (parsed.type) { - case "transcript": - transcripts.push(parsed.data as TranscriptEventData); - break; - case "fact": - facts.push(parsed.data as FactEventData); - break; - default: - console.log("Unhandled WebSocket event type:", parsed.type); - } - } catch (err) { - console.error("Failed to parse WebSocket message:", err); - } - }; - -// Encapsulate the call setup in an async function. -async function startCall() { - try { - // Retrieve the user's microphone stream. - const microphoneStream = await getMicrophoneStream(); - - // Obtain the WebRTC stream (e.g., from a WebRTC connection). - const webRTCStream = new MediaStream(); - - // Merge the microphone and WebRTC streams. - // The order of the streams should match your default configuration. - const { stream, endStream } = mergeMediaStreams([ - microphoneStream, - webRTCStream, - ]); - - // Start the audio flow over a WebSocket connection. - // The returned `stop` method is used to end the audio flow gracefully. - const { stop } = await startAudioFlow(stream, authCreds, interactionId, handleNewMessage); - - // Define a cleanup function to end the call. - const endCall = () => { - // End the audio flow (closes WebSocket and stops MediaRecorder). - stop(); - // Stop the merged stream. - endStream(); - // Optionally, stop original streams if no longer needed. - microphoneStream.getAudioTracks().forEach((track) => track.stop()); - webRTCStream.getAudioTracks().forEach((track) => track.stop()); - console.log("Call ended and resources cleaned up."); - }; - - return { endCall }; - } catch (error) { - console.error("Error starting call:", error); - throw error; - } -} - -// Example usage: start a call and end it after 10 seconds. -startCall() - .then(({ endCall }) => { - setTimeout(endCall, 10000); - }) - .catch((error) => { - // Handle any errors that occurred during setup. - console.error(error); - });