@zyphra/client
TypeScript icon, indicating that this package has built-in type declarations

1.0.5 • Public • Published

Zyphra TypeScript Client

A TypeScript client library for interacting with Zyphra's text-to-speech API.

Installation

npm install @zyphra/client
# or
yarn add @zyphra/client

Quick Start

import { ZyphraClient } from '@zyphra/client';

// Initialize the client
const client = new ZyphraClient({ apiKey: 'your-api-key' });

// Generate speech
const audioBlob = await client.audio.speech.create({
  text: 'Hello, world!',
  speaking_rate: 15,
  model: 'zonos-v0.1-transformer' // Default model
});

// Save to file (browser)
const url = URL.createObjectURL(audioBlob);
const a = document.createElement('a');
a.href = url;
a.download = 'output.webm';
a.click();
URL.revokeObjectURL(url);

Features

  • Text-to-speech generation with customizable parameters
  • Support for multiple languages and audio formats
  • Voice cloning capabilities
  • Multiple TTS models with specialized capabilities
  • TypeScript types included
  • Browser and Node.js support
  • Returns audio as Blob for easy handling
  • Support for default and custom voice selection

Parameters

The text-to-speech API accepts the following parameters:

interface TTSParams {
  text: string;                // The text to convert to speech (required)
  speaker_audio?: string;      // Base64 audio for voice cloning
  speaking_rate?: number;      // Speaking rate (5-35, default: 15.0)
  fmax?: number;               // Frequency max (0-24000, default: 22050)
  pitch_std?: number;          // Pitch standard deviation (0-500, default: 45.0) (transformer model only)
  emotion?: EmotionWeights;    // Emotional weights (transformer model only)
  language_iso_code?: string;  // Language code (e.g., "en-us", "fr-fr") 
  mime_type?: string;          // Output audio format (e.g., "audio/webm")
  model?: SupportedModel;      // TTS model (default: 'zonos-v0.1-transformer')
  speaker_noised?: boolean;    // Denoises to improve stability (hybrid model only, default: True)
  default_voice_name?: string; // Name of a default voice to use
  voice_name?: string;         // Name of one of the user's voices to use
}

// Available models
type SupportedModel = 'zonos-v0.1-transformer' | 'zonos-v0.1-hybrid';

interface EmotionWeights {
  happiness: number;  // default: 0.6
  sadness: number;    // default: 0.05
  disgust: number;    // default: 0.05
  fear: number;       // default: 0.05
  surprise: number;   // default: 0.05
  anger: number;      // default: 0.05
  other: number;      // default: 0.5
  neutral: number;    // default: 0.6
}

Detailed Usage

Supported TTS Models

The API supports the following TTS models:

  • zonos-v0.1-transformer (Default): A standard transformer-based TTS model suitable for most applications.
    • Supports pitch_std and emotions parameters
  • zonos-v0.1-hybrid: An advanced model with:
    • Better support for certain languages (especially Japanese)
    • Supports speaker_noised denoising parameter
    • Improved voice quality in some scenarios

Supported Languages

The text-to-speech API supports the following languages:

  • English (US) - en-us
  • French - fr-fr
  • German - de
  • Japanese - ja (recommended to use with zonos-v0.1-hybrid model)
  • Korean - ko
  • Mandarin Chinese - cmn

Supported Audio Formats

The API supports multiple output formats through the mime_type parameter:

  • WebM (default) - audio/webm
  • Ogg - audio/ogg
  • WAV - audio/wav
  • MP3 - audio/mp3 or audio/mpeg
  • MP4/AAC - audio/mp4 or audio/aac

Language and Format Examples

// Generate French speech in MP3 format
const frenchAudio = await client.audio.speech.create({
  text: 'Bonjour le monde!',
  language_iso_code: 'fr-fr',
  mime_type: 'audio/mp3',
  speaking_rate: 15
});

// Generate Japanese speech with hybrid model (recommended)
const japaneseAudio = await client.audio.speech.create({
  text: 'こんにちは世界!',
  language_iso_code: 'ja',
  mime_type: 'audio/wav',
  speaking_rate: 15,
  model: 'zonos-v0.1-hybrid' // Better for Japanese
});

Using Default and Custom Voices

You can use pre-defined default voices or your own custom voices:

// Using a default voice
const defaultVoiceAudio = await client.audio.speech.create({
  text: 'This uses a default voice.',
  default_voice_name: 'american_female',
  speaking_rate: 15
});

Available Default Voices

The following default voices are available:

  • american_female - Standard American English female voice
  • american_male - Standard American English male voice
  • anime_girl - Stylized anime girl character voice
  • british_female - British English female voice
  • british_male - British English male voice
  • energetic_boy - Energetic young male voice
  • energetic_girl - Energetic young female voice
  • japanese_female - Japanese female voice
  • japanese_male - Japanese male voice

Using Custom Voices

You can use your own custom voices that have been created and stored in your account:

// Using a custom voice you've created and stored
const customVoiceAudio = await client.audio.speech.create({
  text: 'This uses your custom voice.',
  voice_name: 'my_custom_voice',
  speaking_rate: 15
});

Note: When using custom voices, the voice_name parameter should exactly match the name as it appears in your voices list on playground.zyphra.com/audio. The name is case-sensitive.

Model-Specific Parameters

For the hybrid model (zonos-v0.1-hybrid), you can utilize additional parameters:

// Using the hybrid model with its specific parameters
const hybridModelAudio = await client.audio.speech.create({
  text: 'This uses the hybrid model with special parameters.',
  model: 'zonos-v0.1-hybrid',
  speaker_noised: true,   // Denoises to improve stability
  speaking_rate: 15
});

Emotion Control

You can adjust the emotional tone of the speech:

const emotionalSpeech = await client.audio.speech.create({
  text: 'This is a happy message!',
  emotion: {
    happiness: 0.8,  // Increase happiness
    neutral: 0.3,    // Decrease neutrality
    sadness: 0.05,   // Keep other emotions at default values
    disgust: 0.05,
    fear: 0.05,
    surprise: 0.05,
    anger: 0.05,
    other: 0.5
  }
});

Voice Cloning

You can clone voices by providing a reference audio file as a base64 string:

// Node.js environment
const fs = require('fs');
const audio_base64 = fs.readFileSync('reference_voice.wav').toString('base64');

const audioBlob = await client.audio.speech.create({
  text: 'This will use the cloned voice',
  speaker_audio: audio_base64,
  speaking_rate: 15
});

// Browser environment
const fileInput = document.querySelector('input[type="file"]');
const file = await fileInput.files[0];
const reader = new FileReader();

reader.onload = async () => {
  const base64 = reader.result.split(',')[1];
  
  const audioBlob = await client.audio.speech.create({
    text: 'This will use the cloned voice',
    speaker_audio: base64,
    speaking_rate: 15
  });
};

reader.readAsDataURL(file);

Streaming Support

For streaming audio directly:

const { stream, mimeType } = await client.audio.speech.createStream({
  text: 'This will be streamed to the client',
  speaking_rate: 15,
  model: 'zonos-v0.1-transformer'
});

// Use with audio element in browser
const audioElement = document.createElement('audio');
audioElement.src = URL.createObjectURL(new Blob([], { type: mimeType }));
audioElement.controls = true;

// Process the stream
const reader = stream.getReader();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  // Add each chunk to the audio element
  audioElement.src = URL.createObjectURL(
    new Blob([value], { type: mimeType })
  );
}

document.body.appendChild(audioElement);

Callback Options

You can also use callbacks to track progress during audio generation:

const audioBlob = await client.audio.speech.create(
  {
    text: 'Audio with progress tracking',
    speaking_rate: 15,
    model: 'zonos-v0.1-transformer'
  },
  {
    onChunk: (chunk) => {
      console.log('Received chunk:', chunk.length, 'bytes');
    },
    onProgress: (totalBytes) => {
      console.log('Total bytes received:', totalBytes);
    },
    onComplete: (blob) => {
      console.log('Audio generation complete!', blob.size, 'bytes');
    }
  }
);

Error Handling

import { ZyphraError } from '@zyphra/client';

try {
  const audioBlob = await client.audio.speech.create({
    text: 'Hello, world!',
    speaking_rate: 15,
    model: 'zonos-v0.1-transformer'
  });
} catch (error) {
  if (error instanceof ZyphraError) {
    console.error(`Error: ${error.statusCode} - ${error.response}`);
  }
}

Available Models

Speech Models

  • zonos-v0.1-transformer: Default transformer-based TTS model
  • zonos-v0.1-hybrid: Advanced hybrid TTS model with enhanced language support

License

MIT License

Readme

Keywords

none

Package Sidebar

Install

npm i @zyphra/client

Weekly Downloads

140

Version

1.0.5

License

none

Unpacked Size

34.7 kB

Total Files

26

Last publish

Collaborators

  • iansears
  • berenmillidge