audio-transcripter
TypeScript icon, indicating that this package has built-in type declarations

2.0.1 • Public • Published

🎙️ audio-transcripter

npm version license downloads

A lightweight TypeScript library for transcribing audio files using Google Gemini 2.0 models.

Supports local files, remote URLs, and in-memory buffers/blobs.

Ideal for meetings, interviews, podcasts, technical content, and more.


🚀 Installation

npm install audio-transcripter

🌟 Features

  • 🎧 Supports local files (.wav, .mp3, .aac, .flac, .ogg, .webm, etc.)

  • 🌐 Supports remote URLs (HTTP/HTTPS)

  • 📦 Supports Blobs / Buffers

  • ✨ Multiple transcription styles:

    • accurate
    • clean
    • structured
    • technical
    • conversational
  • 🔍 Verbose logging (optional)

  • ⚙️ Written in TypeScript with full type safety


🧑‍💻 Usage

1️⃣ Transcribe Local File

import { runTranscription } from "audio-transcripter";

const result = await runTranscription({
	audioFile: "./assets/audio.webm",
	style: "structured", // optional, default: 'conversational'
	language: "english", // optional
});

if (result.success) {
	console.log("Transcription:", result.transcription);
} else {
	console.error("Error:", result.error);
}

2️⃣ Transcribe Remote URL

const result = await runTranscription({
	audioFile: "https://example.com/audio.mp3",
	style: "clean",
	language: "english",
});

3️⃣ Transcribe Blob / Buffer (for browser or Node.js)

import { runTranscriptionWithBlob } from "audio-transcripter";

// Example with a Node.js Buffer
const fs = await import("fs/promises");
const audioBuffer = await fs.readFile("./assets/audio.wav");

const result = await runTranscriptionWithBlob(audioBuffer, {
	style: "technical",
	language: "english",
});

if (result.success) {
	console.log("Transcription:", result.transcription);
} else {
	console.error("Error:", result.error);
}

📥 Configuration Options

Option Type Default Description
audioFile string required Local file path or remote URL
style string 'conversational' Transcription style (see below)
language string 'english' Language of the audio
verbose boolean true Enable verbose console logs
timeout number 5000 (ms) Timeout for remote URL HEAD check (if applicable)

🎨 Supported Transcription Styles

Style Description
accurate High accuracy, raw transcription including filler words
clean Edited for readability (filler words removed, grammar fixed)
structured Meeting/interview format with speakers and structure
technical Technical content with jargon preserved
conversational Casual, creative, natural conversation transcription

🗂️ Supported File Formats

  • .mp3
  • .wav
  • .aac
  • .flac
  • .ogg
  • .webm / .weba

Unknown formats fallback to audio/octet-stream.


📚 API Reference

runTranscription(config: TranscriptionConfig)

Runs transcription on local file path or remote URL.

Returns: Promise<RunTranscriptionResult>

type RunTranscriptionResult = {
	success: boolean;
	transcription?: string;
	error?: string;
};

runTranscriptionWithBlob(audioBlob: Blob | Buffer, options?)

Runs transcription on an in-memory Blob or Node.js Buffer.

Returns: Promise<RunTranscriptionResult>


🗂️ Type Definitions

export type TranscriptionStyle =
	| "accurate"
	| "clean"
	| "structured"
	| "technical"
	| "conversational";

export interface TranscriptionConfig {
	audioFile: string;
	style?: TranscriptionStyle;
	language?: string | null;
	verbose?: boolean;
	timeout?: number;
}

export interface RunTranscriptionResult {
	success: boolean;
	transcription?: string;
	error?: string;
}

🔐 Authentication

This package requires a Gemini API Key.

1️⃣ Set TRANSCRIBER_KEY in your environment:

export TRANSCRIBER_KEY=your-gemini-api-key-here

or

2️⃣ Create a .env file:

TRANSCRIBER_KEY=your-gemini-api-key-here

Get your API key from Google MakerSuite.


🛠️ Tech Stack


📄 License

MIT License © 2025 Shriansh Agarwal


🙋 FAQ

Q: Does this upload my file to third-party storage?

A: No. Files are uploaded only to Gemini's File API endpoint.

Q: Can I use this in the browser?

A: runTranscriptionWithBlob works with browser Blob and Node.js Buffer.

Q: What models are used?

A: gemini-2.0-flash model via Google GenAI SDK.


Summary

✅ Lightweight
✅ Flexible API
✅ Multiple transcription styles
✅ Works with Files, URLs, Blobs/Buffer
✅ Production-ready TypeScript types


Package Sidebar

Install

npm i audio-transcripter

Weekly Downloads

14

Version

2.0.1

License

MIT

Unpacked Size

34.7 kB

Total Files

17

Last publish

Collaborators

  • shriansh