web-vad
TypeScript icon, indicating that this package has built-in type declarations

0.0.6 • Public • Published

Web Voice Activity Detection (VAD)

Adaption of @ricky0123's vad library that slightly shifts the API to only support passing a media stream, addresses some Typescript issues and reduces the codebase where possible. The primary purpose of this adaption is to support realtime voice agents, such as those provided by Pipecat.

Getting started

npm install onnxruntime-web web-vad

Copy Silero model somewhere accessible

Ensure silero_vad.onnx (included in this repo here) is hosted somewhere accessible (e.g. a public / static path.)

Ensure audio worker is available globally

Browsers ensure worklets cannot be imported as modules for safety reasons. Either import it with your framework specific syntax (e.g. import AudioWorkletURL from "web-vad/dist/worklet.js?worker&url";) or include it manually in a <script> declaration (at a higher order.)

Example project

An barebones example is included in this repo:

cd test-site
yarn
yarn run build # Copies onnx wasm to dist directory
yarn run dev

Navigate to the URL shown in your terminal

Usage

import { VAD } from "web-vad";
import AudioWorkletURL from "web-vad/dist/worklet.js?worker&url";


const localAudioTrack = ... // Get mic or other audio track
const stream = new MediaStream([localAudioTrack!]);

const vad = new VAD({
    workletURL: AudioWorkletURL,
    modelUrl: "path-to-silero.onnx",
    stream,
    onSpeechStart: () => {
        console.log("speaking start");
    },
    onVADMisfire: () => {
        console.log("misfire");
    },
    onSpeechEnd: () => {
        console.log("speaking end");
    },
});

// Initalize and load models
await vad.init();

// Start when ready
vad.start();

console.log(vad.state); 
// > VADState.listening

Next / Vite support

Web VAD uses WASM files provided by ONNX. Whilst these can be loaded at runtime, it is recommended to copy these files to your build / deployment. Here is an example vite.config.js that copies these files across at build time:

// vite.config.js

export default defineConfig({
  assetsInclude: ["**/*.onnx"],
  server: {
    headers: {
      "Cross-Origin-Embedder-Policy": "require-corp",
      "Cross-Origin-Opener-Policy": "same-origin",
    },
  },
  resolve: {
    alias: {
      "@": path.resolve(__dirname, "./src"),
    },
  },
  plugins: [
    viteStaticCopy({
      targets: [
        {
          src: "node_modules/onnxruntime-web/dist/*.wasm",
          dest: "./",
        },
      ],
    }),
  ],
});

Precaching models

Both the Silero.onnx and ONNX runtime wasms are quite large in size (~10mb). The VAD class exposes a static method for precaching these:

import {VAD} from "web-vad";

async function run() {
  console.log("Precaching models");
  await VAD.precacheModels("/silero-vad.onnx");
  console.log("Download complete!");
  
  //...start()
}

References

[1] Silero Team. (2021). Silero VAD: pre-trained enterprise-grade Voice Activity Detector (VAD), Number Detector and Language Classifier. GitHub, GitHub repository, https://github.com/snakers4/silero-vad, hello@silero.ai.

[2] Ricky Samore. Original code, https://github.com/ricky0123/vad, rickycontact9@gmail.com

/web-vad/

    Package Sidebar

    Install

    npm i web-vad

    Weekly Downloads

    46

    Version

    0.0.6

    License

    ISC

    Unpacked Size

    64 kB

    Total Files

    8

    Last publish

    Collaborators

    • makeitmore