@sourcesync-sdk/whisper-web
TypeScript icon, indicating that this package has built-in type declarations

0.1.2 • Public • Published

@sourcesync-sdk/whisper-web

Overview

The @sourcesync-sdk/whisper-web enables audio fingerprinting and matching capabilities in web applications. It allows developers to implement real-time audio capture, generate audio fingerprints, and match them against a database.

Installation

To install the package, run the following command in your project directory:

npm install sourcesync-sdk

Or install a submodule as its own dependency:

npm install @sourcesync-sdk/whisper-web @sourcesync-sdk/app

Basic Usage

Here's a quick guide to get started with the Whisper Web SDK:

  1. Import and initialize the SDK:
import { createWhisperFactory } from 'sourcesync-sdk/whisper-web';
import { initializeApp } from 'sourcesync-sdk/app';

const app = await initializeApp({
  appKey: 'your-app-key',
  
  // app config
})

const whisperFactory = await createWhisperFactory(app, {
  // set default options for Whisper instances
  defaultOpts: {
    apiUrl: 'YOUR_API_URL',
    apiToken: 'YOUR_API_TOKEN',
    apiKey: 'YOUR_API_KEY' // optional
  }
});
  1. Create a Whisper instance:
const whisper = whisperFactory.create();
await whisper.init();
  1. Set up fingerprinting:
await whisper.setCallback((fingerprint) => {
  // Handle the generated fingerprint
  console.log('Fingerprint generated:', fingerprint);
  // Request a match
  whisper.requestMatch(fingerprint)
    .then((matchResult) => {
      console.log('Match result:', matchResult);
    })
    .catch((err) => {
      // Handle errors
    });
});

MatchResponse Object

The MatchResponse object contains the following properties:

 
interface MatchResponse {
  // device id from whisper platform if registered
  wdeviceid: number

  // session id from whisper platform if registered
  wsessionid: number

  // pass-through JSON string, you can store timestamps, custom device id's or anything
  // you need to receive back with the match response in the match request
  requestJson: string

  // match object stores the matches grouped by type or an array of matches
  matches: MatchGroups | MatchItem[]
}
 
 type MatchGroups = Record<string, MatchItem[]>
 
 interface MatchItem {
  // reference content id as registered on the whisper platform
  wrefid: number

  // confidence score, 0 - 100, higher is better
  confidence: number

  // unknown time for the match based on the incoming fingerprint timestamp
  unknowntime: string

  // the time within the reference content that matched
  referencetime: string

  // JSON data that was registered with the content on the whisper platform
  // this could be channel id's, custom content id's, title, description, etc.
  refJson: string

  title?: string
}
  1. Setup audio capture:
// attach the microphone
await whisper.attachMicrophone();

// or attach the video element
await whisper.attachVideoElement(videoElement);

// or attach the audio element
await whisper.attachAudioElement(audioElement);
  1. Start and stop audio capture:
// Start capturing
await whisper.start();

// Stop capturing
await whisper.stop();

Limitations and Requirements

Browser Audio/Video Access

For media resources to be accessible, the server must include the following header in its response:

Access-Control-Allow-Origin: *

Note: Using a wildcard (*) allows access from all origins. For production environments, consider specifying allowed origins explicitly for enhanced security.

WebAssembly (WASM) Requirements

When using the WASM build, specific headers are required to enable SharedArrayBuffers, which are necessary for the WASM runtime to support pthreads via WebWorkers.

Client Site (page embedding the WASM code)

The service must return these headers in the page response:

Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp

WASM Host (e.g., CDN)

The service hosting the WASM code must return these headers in OPTIONS requests:

Access-Control-Allow-Origin: *
Cross-Origin-Resource-Policy: cross-origin

Important Notes

  1. These requirements may not cover all scenarios, especially when hosting components on different domains.
  2. Enabling these headers may potentially break existing website functionality.
  3. Without loading the WASM module, fingerprint generation in the browser is not possible.

Explanation of Headers

  1. Cross-Origin-Opener-Policy: 'same-origin'

    • Isolates the browsing context to same-origin documents
    • Enhances security by preventing cross-origin popup manipulation
    • Reduces the risk of side-channel attacks
  2. Cross-Origin-Embedder-Policy: 'require-corp'

    • Ensures cross-origin resources must explicitly grant permission to be loaded
    • Enables the use of SharedArrayBuffer, crucial for multi-threaded WebAssembly
    • Prevents unauthorized data access from other origins

These headers enable "cross-origin isolation", which is important for WebAssembly because it:

  • Allows use of high-precision timers and SharedArrayBuffer, improving performance
  • Creates a more isolated environment for WebAssembly code execution
  • Enables advanced WebAssembly features like threads in most browsers

Trade-offs

  • May break integrations with some third-party services relying on cross-origin access
  • Can complicate embedding your content in other sites or vice versa

For full WebAssembly functionality, ensure all resources (including .wasm files) are served from the same origin or have appropriate CORS headers.

API Reference

WhisperFactory

  • create(): Creates a new Whisper instance

WhisperWeb Instance

  • init(): Initializes the Whisper instance
  • attachMicrophone(): Attaches the microphone for audio capture
  • attachVideoElement(videoElement): Attaches a video element for audio capture
  • attachAudioElement(audioElement): Attaches an audio element for audio capture
  • setCallback(callback): Sets the callback for fingerprint generation
  • start(): Starts audio capture and fingerprinting
  • stop(): Stops audio capture
  • requestMatch(fingerprint): Requests a match for the given fingerprint

Best Practices

  • Ensure proper cleanup by calling stop() when audio capture is no longer needed

License

This SDK is distributed under the Apache License, Version 2.0. The Apache 2.0 License applies to the SDK only and not any other component of the SourceSync Platform.

Dependents (1)

Package Sidebar

Install

npm i @sourcesync-sdk/whisper-web

Weekly Downloads

15

Version

0.1.2

License

Apache-2.0

Unpacked Size

5.97 MB

Total Files

25

Last publish

Collaborators

  • dev-sourcedigital