@omarimai/agents-plugin-google
TypeScript icon, indicating that this package has built-in type declarations

1.1.13 • Public • Published

Google AI plugin for LiveKit Agents

Support for Gemini, Gemini Live, Cloud Speech-to-Text, and Cloud Text-to-Speech.

Installation

npm install @omarimai/agents-plugin-google

Usage

import { multimodal } from '@livekit/agents';
import * as google from '@omarimai/agents-plugin-google';

const model = new google.realtime.RealtimeModel({
  apiKey: process.env.GOOGLE_API_KEY,
  voice: 'Puck',
});

const agent = new multimodal.MultimodalAgent({
  model,
  fncCtx,
});

Configuration

Set your Google API key:

  • GOOGLE_API_KEY environment variable, or
  • Pass apiKey parameter to the constructor

For VertexAI, also set:

  • GOOGLE_CLOUD_PROJECT environment variable
  • GOOGLE_APPLICATION_CREDENTIALS pointing to your service account key

Step 7: Build and Test

7.1 Build the Project

pnpm build

7.2 Test the Integration

Create a simple test file to verify it works with MultimodalAgent:

// test.ts
import { multimodal, llm } from '@livekit/agents';
import * as google from './src/index.js';

const model = new google.realtime.RealtimeModel({
  apiKey: 'your-api-key',
  voice: 'Puck',
});

const fncCtx = new llm.FunctionContext();

const agent = new multimodal.MultimodalAgent({
  model,
  fncCtx,
});

console.log('Google plugin integrated successfully!');

Next Steps

  • Implement Google Live API Connection: Research Google's Live API documentation and implement the actual WebSocket connection
  • Add Authentication: Implement proper Google Cloud authentication
  • Complete Audio Processing: Finish the audio streaming implementation
  • Add Function Calling: Implement function calling support in the realtime session
  • Add Error Handling: Implement robust error handling and reconnection logic
  • Add Tests: Create comprehensive tests
  • Add LLM/STT/TTS: Complete the standard service implementations

Your plugin structure is now ready and should integrate seamlessly with the existing MultimodalAgent!

Google Gemini Live API TypeScript Plugin

A TypeScript implementation of the Google Gemini Live API for real-time audio conversations with advanced features including function calling, conversation management, and turn detection.

Features

  • Real-time audio streaming with Gemini Live API
  • Function calling and tool integration
  • Advanced conversation management with session.conversation.item.create()
  • Response generation control with session.response.create()
  • Server-side Voice Activity Detection (VAD) with adaptive thresholds
  • Multi-feature speech detection (audio level, energy, zero crossing rate)
  • Event-driven architecture with comprehensive event emission
  • Session management with recovery and error handling

Installation

npm install

Environment Setup

Set your Google API key:

export GOOGLE_API_KEY="your-api-key-here"

Basic Usage

import { RealtimeModel } from './src/realtime/realtime_model.js';

// Create a realtime model with advanced features
const model = new RealtimeModel({
  model: 'gemini-2.0-flash-live-001',
  voice: 'Puck',
  instructions: 'You are a helpful AI assistant.',
  turnDetection: {
    type: 'server_vad',
    threshold: 0.1,
    silence_duration_ms: 1000
  }
});

// Create a session
const session = model.session({
  fncCtx: {},
  chatCtx: new ChatContext()
});

// Advanced conversation management
session.conversation.item.create({
  role: 'user',
  text: 'Hello, how are you?'
});

// Start response generation
session.response.create();

// Enhanced conversation management
const items = session.conversation.item.list();
console.log('Conversation items:', items);

// Update a conversation item
session.conversation.item.update('msg_1', {
  content: 'Updated message content'
});

// Delete a conversation item
session.conversation.item.delete('msg_1');

// Clear all conversation items
session.conversation.item.clear();

Advanced Turn Detection

The plugin includes sophisticated turn detection with multiple features:

const model = new RealtimeModel({
  turnDetection: {
    type: 'server_vad',
    threshold: 0.1,           // Audio level threshold
    silence_duration_ms: 1000, // Silence duration before turn end
    prefix_padding_ms: 200     // Padding before speech start
  }
});

// Listen for turn detection events
session.on('turn_detected', (event) => {
  console.log('Turn detected:', event);
  // event.type: 'silence_threshold'
  // event.duration: silence duration in ms
  // event.timestamp: when the turn was detected
});

session.on('input_speech_started', (event) => {
  console.log('Speech started:', event);
  // event.audioLevel: current audio level
  // event.energyLevel: current energy level
  // event.threshold: adaptive threshold used
});

Function Calling

Register and use tools with the session:

// Register a tool
session.updateTools([
  {
    name: 'get_weather',
    description: 'Get current weather for a location',
    parameters: {
      type: 'object',
      properties: {
        location: { type: 'string' }
      }
    },
    handler: async (args) => {
      const { location } = args;
      return { temperature: '72°F', condition: 'sunny' };
    }
  }
]);

// Listen for tool calls
session.on('toolCall', (toolCall) => {
  console.log('Tool called:', toolCall);
});

Event System

The plugin emits comprehensive events:

// Transcript events
session.on('transcript', (event) => {
  console.log('Transcript:', event.transcript, 'Final:', event.isFinal);
});

// Generation events
session.on('generation_created', (event) => {
  console.log('Generation started:', event.messageId);
});

// Error handling
session.on('error', (error) => {
  console.error('Session error:', error);
});

// Metrics
session.on('metrics_collected', (metrics) => {
  console.log('Usage metrics:', metrics);
});

Session Management

Advanced session control features:

// Interrupt current generation
session.interrupt();

// Start user activity
session.startUserActivity();

// Truncate conversation at specific message
session.truncate('msg_5', 5000); // Truncate at message 5, audio end at 5s

// Update session options
session.updateOptions({
  temperature: 0.7,
  maxOutputTokens: 1000
});

// Update instructions
session.updateInstructions('You are now a coding assistant.');

// Clear audio buffer
session.clearAudio();

// Commit audio for processing
session.commitAudio();

Audio Processing

Handle audio frames with automatic resampling:

// Push audio frames (automatically resampled)
session.pushAudio(audioFrame);

// Push video frames
session.pushVideo(videoFrame);

// Get current audio buffer
const audioBuffer = session.inputAudioBuffer;

Error Recovery

The plugin includes robust error recovery:

// Recover from text response
session.recoverFromTextResponse('item_123');

// Session automatically retries on connection failures
// Exponential backoff with configurable max retries

Configuration Options

const model = new RealtimeModel({
  // Model configuration
  model: 'gemini-2.0-flash-live-001',
  voice: 'Puck',
  instructions: 'Custom instructions',
  
  // Generation parameters
  temperature: 0.8,
  maxOutputTokens: 1000,
  topP: 0.9,
  topK: 40,
  
  // Turn detection
  turnDetection: {
    type: 'server_vad',
    threshold: 0.1,
    silence_duration_ms: 1000
  },
  
  // Language and location
  language: 'en-US',
  location: 'us-central1',
  
  // VertexAI (optional)
  vertexai: false,
  project: process.env.GOOGLE_CLOUD_PROJECT
});

API Reference

RealtimeModel

  • session(options): Create a new session
  • close(): Close all sessions

RealtimeSession

Conversation Management

  • conversation.item.create(message): Create conversation item
  • conversation.item.update(id, updates): Update conversation item
  • conversation.item.delete(id): Delete conversation item
  • conversation.item.list(): List all conversation items
  • conversation.item.get(id): Get specific conversation item
  • conversation.item.clear(): Clear all conversation items

Response Management

  • response.create(): Start response generation

Audio Processing

  • pushAudio(frame): Push audio frame
  • pushVideo(frame): Push video frame
  • commitAudio(): Commit audio for processing
  • clearAudio(): Clear audio buffer

Session Control

  • interrupt(): Interrupt current generation
  • startUserActivity(): Start user activity
  • truncate(messageId, audioEndMs): Truncate conversation
  • updateOptions(options): Update session options
  • updateInstructions(instructions): Update instructions
  • updateTools(tools): Update available tools

Events

  • on(event, listener): Listen for events
  • off(event, listener): Remove event listener
  • emit(event, ...args): Emit event

Available events:

  • transcript: Text transcript updates
  • error: Error events
  • toolCall: Tool call events
  • generation_created: New generation started
  • input_audio_transcription_completed: Audio transcription completed
  • input_speech_started: Speech started
  • metrics_collected: Usage metrics
  • turn_detected: Turn detection events

License

Apache-2.0

Readme

Keywords

none

Package Sidebar

Install

npm i @omarimai/agents-plugin-google

Weekly Downloads

2,903

Version

1.1.13

License

ISC

Unpacked Size

176 kB

Total Files

27

Last publish

Collaborators

  • omarimai