A TypeScript client library for interacting with Zyphra's text-to-speech API.
npm install @zyphra/client
# or
yarn add @zyphra/client
import { ZyphraClient } from '@zyphra/client';
// Initialize the client
const client = new ZyphraClient({ apiKey: 'your-api-key' });
// Generate speech
const audioBlob = await client.audio.speech.create({
text: 'Hello, world!',
speaking_rate: 15
});
// Save to file (browser)
const url = URL.createObjectURL(audioBlob);
const a = document.createElement('a');
a.href = url;
a.download = 'output.webm';
a.click();
URL.revokeObjectURL(url);
- Text-to-speech generation with customizable parameters
- Support for multiple languages and audio formats
- Voice cloning capabilities
- TypeScript types included
- Browser and Node.js support
- Returns audio as Blob for easy handling
The text-to-speech API accepts the following parameters:
interface TTSParams {
text: string; // The text to convert to speech
speaker_audio?: string; // Base64 audio for voice cloning
seconds?: number; // Duration in seconds (1-30)
seed?: number; // Random seed (-1 to 2147483647)
speaking_rate?: number; // Speaking rate (5-35)
language_iso_code?: string; // Language code (e.g., "en-us", "fr-fr")
mime_type?: string; // Output audio format (e.g., "audio/webm")
}
The text-to-speech API supports the following languages:
- English (US) -
en-us
- French -
fr-fr
- German -
de
- Japanese -
ja
- Korean -
ko
- Mandarin Chinese -
cmn
The API supports multiple output formats through the mime_type
parameter:
- WebM (default) -
audio/webm
- Ogg -
audio/ogg
- WAV -
audio/wav
- MP3 -
audio/mp3
oraudio/mpeg
- MP4/AAC -
audio/mp4
oraudio/aac
// Generate French speech in MP3 format
const frenchAudio = await client.audio.speech.create({
text: 'Bonjour le monde!',
language_iso_code: 'fr-fr',
mime_type: 'audio/mp3',
speaking_rate: 15
});
// Generate Japanese speech in WAV format
const japaneseAudio = await client.audio.speech.create({
text: 'こんにちは世界!',
language_iso_code: 'ja',
mime_type: 'audio/wav',
speaking_rate: 15
});
You can clone voices by providing a reference audio file as a base64 string:
// Node.js environment
const fs = require('fs');
const audio_base64 = fs.readFileSync('reference_voice.wav').toString('base64');
const audioBlob = await client.audio.speech.create({
text: 'This will use the cloned voice',
speaker_audio: audio_base64,
speaking_rate: 15
});
// Browser environment
const fileInput = document.querySelector('input[type="file"]');
const file = await fileInput.files[0];
const reader = new FileReader();
reader.onload = async () => {
const base64 = reader.result.split(',')[1];
const audioBlob = await client.audio.speech.create({
text: 'This will use the cloned voice',
speaker_audio: base64,
speaking_rate: 15
});
};
reader.readAsDataURL(file);
import { ZyphraError } from '@zyphra/client';
try {
const audioBlob = await client.audio.speech.create({
text: 'Hello, world!',
speaking_rate: 15
});
} catch (error) {
if (error instanceof ZyphraError) {
console.error(`Error: ${error.statusCode} - ${error.response}`);
}
}
-
Zonos-0.1
: Text-to-speech model with emotion control and voice cloning capabilities
MIT License