audio-sentence-detector
TypeScript icon, indicating that this package has built-in type declarations

1.0.5 • Public • Published

Audio Sentence Detector

An advanced audio sentence detection library that uses voice activity detection, silence analysis, and acoustic features to segment audio into sentences.

Installation

npm install audio-sentence-detector

Usage

const AudioSentenceDetector = require('audio-sentence-detector');

// Create detector with custom options
const detector = new AudioSentenceDetector({
    minSilenceDuration: 0.5,
    silenceThreshold: 0.01
});

// Process audio buffer
const sentences = await detector.detect(audioBuffer);

Configuration Options

The AudioSentenceDetector constructor accepts an options object with the following parameters:

Basic Sentence Detection Options

Option Default Description
minSilenceDuration 0.5 Minimum duration of silence (in seconds) to be considered a sentence boundary
silenceThreshold 0.01 RMS threshold below which audio is considered silence
minSentenceLength 1 Minimum length of a sentence in seconds
maxSentenceLength 15 Maximum length of a sentence in seconds
windowSize 2048 Size of the analysis window in samples
idealSentenceLength 5 Ideal length of a sentence in seconds (used for probability calculations)
idealSilenceDuration 0.8 Ideal duration of silence between sentences
allowGaps true Whether to allow gaps between sentences
minSegmentLength 0 Minimum length for merged segments
alignToAudioBoundaries false Whether to align sentences with audio file boundaries

Voice Detection Options

Option Default Description
fundamentalFreqMin 85 Minimum fundamental frequency for voice detection (Hz)
fundamentalFreqMax 255 Maximum fundamental frequency for voice detection (Hz)
voiceActivityThreshold 0.4 Threshold for voice activity detection
minVoiceActivityDuration 0.1 Minimum duration of voice activity (seconds)
energySmoothing 0.95 Smoothing factor for energy calculations
formantEmphasis 0.7 Emphasis factor for formant detection
zeroCrossingRateThreshold 0.3 Threshold for zero-crossing rate in voice detection

Debug Option

Option Default Description
debug false Enable debug logging

Return Value

The detect() method returns an array of sentence objects, each containing:

{
    index: number,          // Index of the sentence
    start: number,          // Start time in seconds
    end: number,           // End time in seconds
    duration: number,      // Duration in seconds
    probability: number    // Confidence score (0-1)
}

Example

const AudioSentenceDetector = require('audio-sentence-detector');

// Create detector with custom settings
const detector = new AudioSentenceDetector({
    minSilenceDuration: 0.3,
    silenceThreshold: 0.02,
    minSentenceLength: 1.5,
    maxSentenceLength: 10,
    debug: true
});

// Process audio file
const fs = require('fs');
const audioBuffer = fs.readFileSync('speech.wav');

try {
    const sentences = await detector.detect(audioBuffer);
    console.log('Detected sentences:', sentences);
} catch (error) {
    console.error('Error processing audio:', error);
}

License

MIT

Package Sidebar

Install

npm i audio-sentence-detector

Weekly Downloads

5

Version

1.0.5

License

MIT

Unpacked Size

31.9 kB

Total Files

5

Last publish

Collaborators

  • im_justmatthew