
OpenAI Realtime API with WebSockets and WebRTC
OpenAI's Realtime API, leveraging WebRTC and WebSockets, enables developers to create dynamic, low-latency, voice-driven applications for real-time interactions.
Introduction
Innovations in real-time AI interactions are reshaping how we build voice-driven applications. OpenAI’s Realtime API provides two powerful methods for enabling low-latency AI conversations:
-
WebRTC for high-quality, real-time audio streaming.
-
WebSockets for handling event-driven data transmission and signaling.
This article walks through both methods, exploring how they work and how developers can integrate them into AI-driven applications.
Real-Time Voice Interaction: The Game Changer
So, what makes real-time voice interaction a big deal? Well, it's about the immediacy and the ability to converse naturally. Instead of waiting for cloud servers to process audio data and return a response, WebRTC allows for direct browser-to-browser communication that's almost instantaneous.
Obtain an API Key
- Obtain an API Key To use OpenAI’s Realtime API, you need an ephemeral API key:
- Visit OpenAI’s API Key Page.
- Generate a key with realtime permissions.
Method 1: WebRTC – Ultra-Low Latency Voice Streaming
How WebRTC Works with OpenAI’s API
WebRTC (Web Real-Time Communication) is an open standard that enables direct peer-to-peer communication between browsers, mobile applications, or other devices. OpenAI’s API uses WebRTC to capture and transmit live audio with minimal latency, making it ideal for real-time AI voice applications.
Steps to Implement WebRTC with OpenAI Realtime API
To start a WebRTC session with OpenAI’s API, create an RTCPeerConnection:
const peerConnection = new RTCPeerConnection({
iceServers: [{ urls: "stun:stun.l.google.com:19302" }]
});
// Add an audio track from the user's microphone
navigator.mediaDevices.getUserMedia({ audio: true }).then(stream => {
stream.getTracks().forEach(track => peerConnection.addTrack(track, stream));
});
// Handle incoming audio stream
peerConnection.ontrack = (event) => {
const [stream] = event.streams;
document.querySelector('#audioElement').srcObject = stream;
};
- Connect to OpenAI’s WebRTC API
After initializing WebRTC, establish a connection to OpenAI’s API:
const openAIConnection = new RTCPeerConnection();
openAIConnection.createOffer().then(offer => {
return openAIConnection.setLocalDescription(offer);
}).then(() => {
// Send the SDP offer to OpenAI's WebRTC API
fetch('https://api.openai.com/realtime/webrtc', {
method: 'POST',
headers: { 'Authorization': `Bearer ${API_KEY}` },
body: JSON.stringify({ sdp: openAIConnection.localDescription })
})
.then(response => response.json())
.then(data => openAIConnection.setRemoteDescription(new RTCSessionDescription(data.sdp)));
});
- Handling Real-Time Audio
Once connected, the AI-generated speech will be streamed back via WebRTC:
peerConnection.ontrack = (event) => {
document.querySelector('#aiAudio').srcObject = event.streams[0];
};
📌 Key Benefits of Using WebRTC with OpenAI: ✅ Ultra-low latency for real-time conversations. ✅ High-quality audio with direct streaming. ✅ Efficient bandwidth usage due to peer-to-peer communication.
Method 2: WebSockets – Flexible Real-Time Data Exchange
How WebSockets Work with OpenAI’s API
Unlike WebRTC, WebSockets provide a full-duplex communication channel over a single TCP connection. This makes them ideal for sending text-based commands, event-driven updates, or signaling for WebRTC connections.
Steps to Implement WebSockets with OpenAI Realtime API
To start a WebSocket session with OpenAI’s API, create a WebSocket:
- Connect to OpenAI’s WebSocket API using JavaScript:
const socket = new WebSocket('wss://api.openai.com/realtime');
// Event listener for when the connection opens
socket.onopen = () => {
console.log("WebSocket connected to OpenAI Realtime API.");
};
// Event listener for receiving messages
socket.onmessage = (event) => {
console.log("Received data: ", event.data);
};
- Sending and Receiving Messages
Send real-time audio or text data to OpenAI’s Realtime API via WebSockets:
const sendData = (message) => {
if (socket.readyState === WebSocket.OPEN) {
socket.send(JSON.stringify({ text: message }));
}
};
// Example usage
sendData("Hello, OpenAI!");
- Handling AI Responses in Real-Time
Listen for responses from OpenAI’s API and process them in your app:
socket.onmessage = (event) => {
const response = JSON.parse(event.data);
console.log("AI Response: ", response);
};
📌 Key Benefits of Using WebSockets with OpenAI: ✅ Lightweight and efficient for text-based interactions. ✅ Low-latency updates for real-time AI responses. ✅ Perfect for managing signaling in WebRTC connections.
Choosing Between WebRTC and WebSockets
When to Use WebRTC
- You need real-time, high-quality voice streaming.
- You want minimal latency for live conversations.
- You are building a voice assistant, AI-driven phone system, or live translator.
When to Use WebSockets
- You need real-time event-driven updates.
- You are sending text-based interactions rather than voice.
- You want to manage WebRTC signaling efficiently.
Bridging Tools for Seamless Integration
//TODO:
Building a Real-Time Voice App with OpenAI’s WebRTC API
Creating a real-time voice application using OpenAI’s WebRTC-based Realtime API is an exciting step toward AI-powered, low-latency voice interactions. Here’s a step-by-step guide to get you started.
- Setting Up Your Environment
Before diving in, ensure that your development environment is ready:
-
Install Node.js and npm for handling server-side logic.
-
Install required dependencies:
npm install express webrtc
- Backend Setup
Since WebRTC requires signaling to establish a connection, we’ll set up a basic Express.js server to handle API requests for initiating WebRTC sessions with OpenAI’s Realtime API.
Create a new file called server.js:
const express = require('express');
const fetch = require('node-fetch');
const app = express();
app.use(express.json());
const OPENAI_API_KEY = "your-api-key-here";
// Endpoint to initiate a WebRTC connection with OpenAI
app.post('/start-webrtc-session', async (req, res) => {
try {
const response = await fetch("https://api.openai.com/realtime/webrtc", {
method: "POST",
headers: {
"Authorization": `Bearer ${OPENAI_API_KEY}`,
"Content-Type": "application/json"
},
body: JSON.stringify({})
});
const data = await response.json();
res.json(data);
} catch (error) {
console.error("Error starting WebRTC session:", error);
res.status(500).json({ error: "Failed to start session" });
}
});
app.listen(3000, () => console.log("Server running on port 3000"));
📌 What This Does:
- Exposes an API endpoint (/start-webrtc-session) that calls OpenAI’s WebRTC API.
- Retrieves an SDP (Session Description Protocol) offer from OpenAI, which is required to establish a WebRTC connection.
- Implementing WebRTC on the Client Side
Now, let’s implement WebRTC in the frontend to exchange audio streams with OpenAI.
1️⃣ Create a Peer Connection
In your JavaScript frontend, establish a WebRTC peer connection with OpenAI.
const peerConnection = new RTCPeerConnection({ iceServers: [{ urls: "stun:stun.l.google.com:19302" }] // Use a STUN server });
2️⃣ Capture Audio from User’s Microphone
navigator.mediaDevices.getUserMedia({ audio: true })
.then(stream => {
stream.getTracks().forEach(track => peerConnection.addTrack(track, stream));
})
.catch(error => console.error("Error accessing microphone:", error));
3️⃣ Connect to OpenAI’s WebRTC API
async function startSession() {
const response = await fetch('http://localhost:3000/start-webrtc-session', { method: "POST" });
const { sdp } = await response.json();
await peerConnection.setRemoteDescription(new RTCSessionDescription(sdp));
const answer = await peerConnection.createAnswer();
await peerConnection.setLocalDescription(answer);
}
📌 What This Does:
- Calls the backend API to start a session with OpenAI.
- Receives an SDP offer from OpenAI and sets it as the remote description.
- Generates an answer and completes the WebRTC handshake.
4️⃣ Handle Incoming AI Audio
peerConnection.ontrack = (event) => {
document.querySelector('#aiAudio').srcObject = event.streams[0];
};
📌 What This Does:
- Listens for incoming AI-generated speech from OpenAI.
- Plays the AI response directly in the browser using an
- Creating the Frontend UI
Now, let’s build a simple HTML interface for users to talk to the AI.
HTML
<div> <h1>Talk to AI</h1> <button onclick="startSession()">Start Conversation</button> <audio id="aiAudio" autoplay></audio> </div>
Final Steps: Run and Test
- Start your backend server
node server.js
Looking Ahead: The Future of Real-Time AI
By integrating OpenAI’s Realtime API with WebRTC or WebSockets, developers can build:
✅ Voice-driven AI assistants ✅ Real-time interactive chatbots ✅ Live AI-driven language translation apps ✅ Next-gen AI-powered phone systems
🚀 Want to get started? Explore OpenAI’s official documentation for deeper insights.
👨💻 Happy coding! 🎙️✨