Realtime Streaming

This guide explains how to implement real-time speech recognition using the socket-based streaming approach with Klarisent STT SDK.

Establishing a Connection

Before streaming audio, you need to establish a connection:

import KlarisentSTT from 'klarisent-stt-sdk';

const stt = new KlarisentSTT({
  api_key: 'your-api-key',
  debug: true  // Optional: Enable debugging
});

// Set up event handlers before establishing connection
stt.onTranscription((data) => {
  console.log('Transcription:', data.transcript);
});

stt.onQuestion((data) => {
  console.log('Questions detected:', data.questions);
});

stt.onError((error) => {
  console.error('Error:', error);
});

// Establish WebSocket connection
async function connectToKlarisent() {
  try {
    const response = await stt.establishConnection({ replaceableObjects: [] });
    console.log('Connection established:', response.message);
    // Now ready to stream audio
  } catch (error) {
    console.error('Connection failed:', error);
  }
}

connectToKlarisent();

Streaming Audio

Once connected, you can stream audio for real-time transcription:

// Example: Streaming from a microphone input
import { Readable } from 'stream';

// This is a simplified example. In a real application, you would
// capture audio from a microphone or other audio source
function getMicrophoneStream() {
  // Create a readable stream from your audio source
  const audioStream = new Readable({
    read() {} // Implementation depends on your audio source
  });
  
  // Return the stream
  return audioStream;
}

// Get audio stream
const micStream = getMicrophoneStream();

// Send the audio stream for transcription
stt.sendAudioStream(micStream);

// Your stream should push audio data in the correct format
// Example:
// micStream.push(audioChunk);

Audio Format Requirements

For optimal recognition with streaming, your audio should be formatted as follows:

  • Sample Rate: 16000 Hz

  • Channels: 1 (mono)

  • Bit Depth: 16-bit

  • Format: PCM (Linear PCM)

To convert audio to this format using FFmpeg:

ffmpeg -i <your file> \
  -ar 16000 \
  -ac 1 \
  -map 0:a \
  -c:a flac \
  <output file name>.flac

Voice Activity Detection

The SDK includes Voice Activity Detection (VAD) to optimize transcription by only processing audio segments that contain speech:

// The SDK automatically handles VAD when you use sendAudioStream()
// You can adjust the pause duration during initialization:
const stt = new KlarisentSTT({
  api_key: 'your-api-key',
  pauseDuration: 0.8  // Longer pause threshold (in seconds, minimum 0.5)
});

Handling Events

Set up comprehensive event handlers to process all possible outcomes:

// Transcription events
stt.onTranscription((data) => {
  console.log('Real-time transcription:', data.transcript);
  // Update UI or process the transcription
});

// Question detection
stt.onQuestion((data) => {
  console.log('Original transcript:', data.transcript);
  console.log('Questions detected:', data.questions);
  // Handle detected questions
});

// Error handling
stt.onError((error) => {
  console.error('Transcription error:', error);
  // Implement error recovery strategy
});

// Stream end notification
stt.onStreamEnd(() => {
  console.log('Audio stream ended');
  // Clean up or start a new stream
});

Using a Custom Connection ID

For persistent sessions or to reconnect to an existing session:

import { v4 as uuidv4 } from 'uuid';

// Generate a custom connection ID
const connectionId = uuidv4();

// Initialize with custom connection ID
const stt = new KlarisentSTT({
  api_key: 'your-api-key',
  connectionId: connectionId
});

// Save this connectionId for future reconnections
console.log('Save this connection ID:', connectionId);

Enabling Triggers

To enable specific analysis features during transcription:

import { TRIGGER } from 'klarisent-stt-sdk/trigger.enum';

// When establishing connection, specific triggers will be 
// automatically applied to all transcriptions in this session
// Note: Triggers should be enabled in your Klarisent dashboard

Closing the Connection

When you're done with transcription, close the connection:

// Properly close the connection
stt.stop();
console.log('Connection closed');

Complete Example

Here's a complete example demonstrating real-time transcription:

import KlarisentSTT from 'klarisent-stt-sdk';
import { TRIGGER } from 'klarisent-stt-sdk/trigger.enum';
import { createMicrophoneStream } from './your-audio-utility';

async function startTranscription() {
  // Initialize SDK
  const stt = new KlarisentSTT({
    api_key: 'your-api-key',
    debug: true
  });
  
  // Set up event handlers
  stt.onTranscription((data) => {
    console.log('Transcription:', data.transcript);
    document.getElementById('transcript').textContent = data.transcript;
  });
  
  stt.onQuestion((data) => {
    console.log('Questions:', data.questions);
    const questionsList = document.getElementById('questions');
    questionsList.innerHTML = '';
    data.questions.forEach(q => {
      const li = document.createElement('li');
      li.textContent = q;
      questionsList.appendChild(li);
    });
  });
  
  stt.onError((error) => {
    console.error('Error:', error);
    document.getElementById('status').textContent = 'Error: ' + error.message;
  });
  
  // Establish connection
  try {
    await stt.establishConnection({ replaceableObjects: [] });
    document.getElementById('status').textContent = 'Connected';
    
    // Get microphone stream
    const micStream = createMicrophoneStream();
    
    // Start streaming
    stt.sendAudioStream(micStream);
    document.getElementById('status').textContent = 'Streaming...';
    
    // Add stop button functionality
    document.getElementById('stop-button').addEventListener('click', () => {
      stt.stop();
      document.getElementById('status').textContent = 'Disconnected';
    });
  } catch (error) {
    console.error('Connection failed:', error);
    document.getElementById('status').textContent = 'Connection failed';
  }
}

// Start when page loads
window.addEventListener('load', startTranscription);

Last updated