A lightweight TypeScript library for transcribing audio files using Google Gemini 2.0 models.
Supports local files, remote URLs, and in-memory buffers/blobs.
Ideal for meetings, interviews, podcasts, technical content, and more.
npm install audio-transcripter
-
🎧 Supports local files (
.wav
,.mp3
,.aac
,.flac
,.ogg
,.webm
, etc.) -
🌐 Supports remote URLs (HTTP/HTTPS)
-
📦 Supports Blobs / Buffers
-
✨ Multiple transcription styles:
accurate
clean
structured
technical
conversational
-
🔍 Verbose logging (optional)
-
⚙️ Written in TypeScript with full type safety
import { runTranscription } from "audio-transcripter";
const result = await runTranscription({
audioFile: "./assets/audio.webm",
style: "structured", // optional, default: 'conversational'
language: "english", // optional
});
if (result.success) {
console.log("Transcription:", result.transcription);
} else {
console.error("Error:", result.error);
}
const result = await runTranscription({
audioFile: "https://example.com/audio.mp3",
style: "clean",
language: "english",
});
import { runTranscriptionWithBlob } from "audio-transcripter";
// Example with a Node.js Buffer
const fs = await import("fs/promises");
const audioBuffer = await fs.readFile("./assets/audio.wav");
const result = await runTranscriptionWithBlob(audioBuffer, {
style: "technical",
language: "english",
});
if (result.success) {
console.log("Transcription:", result.transcription);
} else {
console.error("Error:", result.error);
}
Option | Type | Default | Description |
---|---|---|---|
audioFile |
string | required | Local file path or remote URL |
style |
string | 'conversational' |
Transcription style (see below) |
language |
string | 'english' |
Language of the audio |
verbose |
boolean | true |
Enable verbose console logs |
timeout |
number |
5000 (ms) |
Timeout for remote URL HEAD check (if applicable) |
Style | Description |
---|---|
accurate |
High accuracy, raw transcription including filler words |
clean |
Edited for readability (filler words removed, grammar fixed) |
structured |
Meeting/interview format with speakers and structure |
technical |
Technical content with jargon preserved |
conversational |
Casual, creative, natural conversation transcription |
.mp3
.wav
.aac
.flac
.ogg
-
.webm
/.weba
Unknown formats fallback to
audio/octet-stream
.
Runs transcription on local file path or remote URL.
Returns: Promise<RunTranscriptionResult>
type RunTranscriptionResult = {
success: boolean;
transcription?: string;
error?: string;
};
Runs transcription on an in-memory Blob or Node.js Buffer.
Returns: Promise<RunTranscriptionResult>
export type TranscriptionStyle =
| "accurate"
| "clean"
| "structured"
| "technical"
| "conversational";
export interface TranscriptionConfig {
audioFile: string;
style?: TranscriptionStyle;
language?: string | null;
verbose?: boolean;
timeout?: number;
}
export interface RunTranscriptionResult {
success: boolean;
transcription?: string;
error?: string;
}
This package requires a Gemini API Key.
1️⃣ Set TRANSCRIBER_KEY
in your environment:
export TRANSCRIBER_KEY=your-gemini-api-key-here
or
2️⃣ Create a .env
file:
TRANSCRIBER_KEY=your-gemini-api-key-here
Get your API key from Google MakerSuite.
MIT License © 2025 Shriansh Agarwal
Q: Does this upload my file to third-party storage?
A: No. Files are uploaded only to Gemini's File API endpoint.
Q: Can I use this in the browser?
A: runTranscriptionWithBlob
works with browser Blob and Node.js Buffer.
Q: What models are used?
A: gemini-2.0-flash
model via Google GenAI SDK.
✅ Lightweight
✅ Flexible API
✅ Multiple transcription styles
✅ Works with Files, URLs, Blobs/Buffer
✅ Production-ready TypeScript types