Realtime Streaming
This guide explains how to implement real-time speech recognition using the socket-based streaming approach with Klarisent STT SDK.
Establishing a Connection
Before streaming audio, you need to establish a connection:
import KlarisentSTT from 'klarisent-stt-sdk';
const stt = new KlarisentSTT({
api_key: 'your-api-key',
debug: true // Optional: Enable debugging
});
// Set up event handlers before establishing connection
stt.onTranscription((data) => {
console.log('Transcription:', data.transcript);
});
stt.onQuestion((data) => {
console.log('Questions detected:', data.questions);
});
stt.onError((error) => {
console.error('Error:', error);
});
// Establish WebSocket connection
async function connectToKlarisent() {
try {
const response = await stt.establishConnection({ replaceableObjects: [] });
console.log('Connection established:', response.message);
// Now ready to stream audio
} catch (error) {
console.error('Connection failed:', error);
}
}
connectToKlarisent();
Streaming Audio
Once connected, you can stream audio for real-time transcription:
// Example: Streaming from a microphone input
import { Readable } from 'stream';
// This is a simplified example. In a real application, you would
// capture audio from a microphone or other audio source
function getMicrophoneStream() {
// Create a readable stream from your audio source
const audioStream = new Readable({
read() {} // Implementation depends on your audio source
});
// Return the stream
return audioStream;
}
// Get audio stream
const micStream = getMicrophoneStream();
// Send the audio stream for transcription
stt.sendAudioStream(micStream);
// Your stream should push audio data in the correct format
// Example:
// micStream.push(audioChunk);
Audio Format Requirements
For optimal recognition with streaming, your audio should be formatted as follows:
Sample Rate: 16000 Hz
Channels: 1 (mono)
Bit Depth: 16-bit
Format: PCM (Linear PCM)
To convert audio to this format using FFmpeg:
ffmpeg -i <your file> \
-ar 16000 \
-ac 1 \
-map 0:a \
-c:a flac \
<output file name>.flac
Voice Activity Detection
The SDK includes Voice Activity Detection (VAD) to optimize transcription by only processing audio segments that contain speech:
// The SDK automatically handles VAD when you use sendAudioStream()
// You can adjust the pause duration during initialization:
const stt = new KlarisentSTT({
api_key: 'your-api-key',
pauseDuration: 0.8 // Longer pause threshold (in seconds, minimum 0.5)
});
Handling Events
Set up comprehensive event handlers to process all possible outcomes:
// Transcription events
stt.onTranscription((data) => {
console.log('Real-time transcription:', data.transcript);
// Update UI or process the transcription
});
// Question detection
stt.onQuestion((data) => {
console.log('Original transcript:', data.transcript);
console.log('Questions detected:', data.questions);
// Handle detected questions
});
// Error handling
stt.onError((error) => {
console.error('Transcription error:', error);
// Implement error recovery strategy
});
// Stream end notification
stt.onStreamEnd(() => {
console.log('Audio stream ended');
// Clean up or start a new stream
});
Using a Custom Connection ID
For persistent sessions or to reconnect to an existing session:
import { v4 as uuidv4 } from 'uuid';
// Generate a custom connection ID
const connectionId = uuidv4();
// Initialize with custom connection ID
const stt = new KlarisentSTT({
api_key: 'your-api-key',
connectionId: connectionId
});
// Save this connectionId for future reconnections
console.log('Save this connection ID:', connectionId);
Enabling Triggers
To enable specific analysis features during transcription:
import { TRIGGER } from 'klarisent-stt-sdk/trigger.enum';
// When establishing connection, specific triggers will be
// automatically applied to all transcriptions in this session
// Note: Triggers should be enabled in your Klarisent dashboard
Closing the Connection
When you're done with transcription, close the connection:
// Properly close the connection
stt.stop();
console.log('Connection closed');
Complete Example
Here's a complete example demonstrating real-time transcription:
import KlarisentSTT from 'klarisent-stt-sdk';
import { TRIGGER } from 'klarisent-stt-sdk/trigger.enum';
import { createMicrophoneStream } from './your-audio-utility';
async function startTranscription() {
// Initialize SDK
const stt = new KlarisentSTT({
api_key: 'your-api-key',
debug: true
});
// Set up event handlers
stt.onTranscription((data) => {
console.log('Transcription:', data.transcript);
document.getElementById('transcript').textContent = data.transcript;
});
stt.onQuestion((data) => {
console.log('Questions:', data.questions);
const questionsList = document.getElementById('questions');
questionsList.innerHTML = '';
data.questions.forEach(q => {
const li = document.createElement('li');
li.textContent = q;
questionsList.appendChild(li);
});
});
stt.onError((error) => {
console.error('Error:', error);
document.getElementById('status').textContent = 'Error: ' + error.message;
});
// Establish connection
try {
await stt.establishConnection({ replaceableObjects: [] });
document.getElementById('status').textContent = 'Connected';
// Get microphone stream
const micStream = createMicrophoneStream();
// Start streaming
stt.sendAudioStream(micStream);
document.getElementById('status').textContent = 'Streaming...';
// Add stop button functionality
document.getElementById('stop-button').addEventListener('click', () => {
stt.stop();
document.getElementById('status').textContent = 'Disconnected';
});
} catch (error) {
console.error('Connection failed:', error);
document.getElementById('status').textContent = 'Connection failed';
}
}
// Start when page loads
window.addEventListener('load', startTranscription);
Last updated