A TypeScript client library for interacting with Zyphra's text-to-speech API.
npm install @zyphra/client
# or
yarn add @zyphra/client
import { ZyphraClient } from '@zyphra/client';
// Initialize the client
const client = new ZyphraClient({ apiKey: 'your-api-key' });
// Generate speech
const audioBlob = await client.audio.speech.create({
text: 'Hello, world!',
speaking_rate: 15,
model: 'zonos-v0.1-transformer' // Default model
});
// Save to file (browser)
const url = URL.createObjectURL(audioBlob);
const a = document.createElement('a');
a.href = url;
a.download = 'output.webm';
a.click();
URL.revokeObjectURL(url);
- Text-to-speech generation with customizable parameters
- Support for multiple languages and audio formats
- Voice cloning capabilities
- Multiple TTS models with specialized capabilities
- TypeScript types included
- Browser and Node.js support
- Returns audio as Blob for easy handling
- Support for default and custom voice selection
The text-to-speech API accepts the following parameters:
interface TTSParams {
text: string; // The text to convert to speech (required)
speaker_audio?: string; // Base64 audio for voice cloning
speaking_rate?: number; // Speaking rate (5-35, default: 15.0)
fmax?: number; // Frequency max (0-24000, default: 22050)
pitch_std?: number; // Pitch standard deviation (0-500, default: 45.0) (transformer model only)
emotion?: EmotionWeights; // Emotional weights (transformer model only)
language_iso_code?: string; // Language code (e.g., "en-us", "fr-fr")
mime_type?: string; // Output audio format (e.g., "audio/webm")
model?: SupportedModel; // TTS model (default: 'zonos-v0.1-transformer')
speaker_noised?: boolean; // Denoises to improve stability (hybrid model only, default: True)
default_voice_name?: string; // Name of a default voice to use
voice_name?: string; // Name of one of the user's voices to use
}
// Available models
type SupportedModel = 'zonos-v0.1-transformer' | 'zonos-v0.1-hybrid';
interface EmotionWeights {
happiness: number; // default: 0.6
sadness: number; // default: 0.05
disgust: number; // default: 0.05
fear: number; // default: 0.05
surprise: number; // default: 0.05
anger: number; // default: 0.05
other: number; // default: 0.5
neutral: number; // default: 0.6
}
The API supports the following TTS models:
-
zonos-v0.1-transformer
(Default): A standard transformer-based TTS model suitable for most applications.- Supports pitch_std and emotions parameters
-
zonos-v0.1-hybrid
: An advanced model with:- Better support for certain languages (especially Japanese)
- Supports
speaker_noised
denoising parameter - Improved voice quality in some scenarios
The text-to-speech API supports the following languages:
- English (US) -
en-us
- French -
fr-fr
- German -
de
- Japanese -
ja
(recommended to use withzonos-v0.1-hybrid
model) - Korean -
ko
- Mandarin Chinese -
cmn
The API supports multiple output formats through the mime_type
parameter:
- WebM (default) -
audio/webm
- Ogg -
audio/ogg
- WAV -
audio/wav
- MP3 -
audio/mp3
oraudio/mpeg
- MP4/AAC -
audio/mp4
oraudio/aac
// Generate French speech in MP3 format
const frenchAudio = await client.audio.speech.create({
text: 'Bonjour le monde!',
language_iso_code: 'fr-fr',
mime_type: 'audio/mp3',
speaking_rate: 15
});
// Generate Japanese speech with hybrid model (recommended)
const japaneseAudio = await client.audio.speech.create({
text: 'こんにちは世界!',
language_iso_code: 'ja',
mime_type: 'audio/wav',
speaking_rate: 15,
model: 'zonos-v0.1-hybrid' // Better for Japanese
});
You can use pre-defined default voices or your own custom voices:
// Using a default voice
const defaultVoiceAudio = await client.audio.speech.create({
text: 'This uses a default voice.',
default_voice_name: 'american_female',
speaking_rate: 15
});
The following default voices are available:
-
american_female
- Standard American English female voice -
american_male
- Standard American English male voice -
anime_girl
- Stylized anime girl character voice -
british_female
- British English female voice -
british_male
- British English male voice -
energetic_boy
- Energetic young male voice -
energetic_girl
- Energetic young female voice -
japanese_female
- Japanese female voice -
japanese_male
- Japanese male voice
You can use your own custom voices that have been created and stored in your account:
// Using a custom voice you've created and stored
const customVoiceAudio = await client.audio.speech.create({
text: 'This uses your custom voice.',
voice_name: 'my_custom_voice',
speaking_rate: 15
});
Note: When using custom voices, the voice_name
parameter should exactly match the name as it appears in your voices list on playground.zyphra.com/audio. The name is case-sensitive.
For the hybrid model (zonos-v0.1-hybrid
), you can utilize additional parameters:
// Using the hybrid model with its specific parameters
const hybridModelAudio = await client.audio.speech.create({
text: 'This uses the hybrid model with special parameters.',
model: 'zonos-v0.1-hybrid',
speaker_noised: true, // Denoises to improve stability
speaking_rate: 15
});
You can adjust the emotional tone of the speech:
const emotionalSpeech = await client.audio.speech.create({
text: 'This is a happy message!',
emotion: {
happiness: 0.8, // Increase happiness
neutral: 0.3, // Decrease neutrality
sadness: 0.05, // Keep other emotions at default values
disgust: 0.05,
fear: 0.05,
surprise: 0.05,
anger: 0.05,
other: 0.5
}
});
You can clone voices by providing a reference audio file as a base64 string:
// Node.js environment
const fs = require('fs');
const audio_base64 = fs.readFileSync('reference_voice.wav').toString('base64');
const audioBlob = await client.audio.speech.create({
text: 'This will use the cloned voice',
speaker_audio: audio_base64,
speaking_rate: 15
});
// Browser environment
const fileInput = document.querySelector('input[type="file"]');
const file = await fileInput.files[0];
const reader = new FileReader();
reader.onload = async () => {
const base64 = reader.result.split(',')[1];
const audioBlob = await client.audio.speech.create({
text: 'This will use the cloned voice',
speaker_audio: base64,
speaking_rate: 15
});
};
reader.readAsDataURL(file);
For streaming audio directly:
const { stream, mimeType } = await client.audio.speech.createStream({
text: 'This will be streamed to the client',
speaking_rate: 15,
model: 'zonos-v0.1-transformer'
});
// Use with audio element in browser
const audioElement = document.createElement('audio');
audioElement.src = URL.createObjectURL(new Blob([], { type: mimeType }));
audioElement.controls = true;
// Process the stream
const reader = stream.getReader();
while (true) {
const { done, value } = await reader.read();
if (done) break;
// Add each chunk to the audio element
audioElement.src = URL.createObjectURL(
new Blob([value], { type: mimeType })
);
}
document.body.appendChild(audioElement);
You can also use callbacks to track progress during audio generation:
const audioBlob = await client.audio.speech.create(
{
text: 'Audio with progress tracking',
speaking_rate: 15,
model: 'zonos-v0.1-transformer'
},
{
onChunk: (chunk) => {
console.log('Received chunk:', chunk.length, 'bytes');
},
onProgress: (totalBytes) => {
console.log('Total bytes received:', totalBytes);
},
onComplete: (blob) => {
console.log('Audio generation complete!', blob.size, 'bytes');
}
}
);
import { ZyphraError } from '@zyphra/client';
try {
const audioBlob = await client.audio.speech.create({
text: 'Hello, world!',
speaking_rate: 15,
model: 'zonos-v0.1-transformer'
});
} catch (error) {
if (error instanceof ZyphraError) {
console.error(`Error: ${error.statusCode} - ${error.response}`);
}
}
-
zonos-v0.1-transformer
: Default transformer-based TTS model -
zonos-v0.1-hybrid
: Advanced hybrid TTS model with enhanced language support
MIT License