Skip to content

Gemini Live Phone

Verified

Bridge Twilio phone calls to Google Gemini Live API for real-time AI voice conversations. No STT/TTS middleware required. Includes VAD and echo suppression.

50 downloads
$ Add to .claude/skills/

About This Skill

# Gemini Live Phone Bridge

Real-time voice AI over phone calls using Google Gemini's native audio capabilities.

Architecture

``` Phone ↔ Twilio ↔ WebSocket (μ-law 8kHz) ↔ Bridge (PCM transcoding) ↔ Gemini Live API (24kHz PCM) ```

Quick Start

```bash # Set required env vars export GOOGLE_API_KEY="your-key" export TWILIO_AUTH_TOKEN="your-token"

# Run the bridge python scripts/bridge.py --port 3335 ```

Endpoints

| Endpoint | Method | Description | |---|---|---| | `/gemini-live/status` | GET | Health check + active calls | | `/gemini-live/incoming` | POST | TwiML for inbound calls (Twilio webhook) | | `/gemini-live/stream` | WS | Twilio Media Stream WebSocket | | `/gemini-live/call` | POST | Initiate outbound call | | `/gemini-live/twiml` | POST | TwiML for outbound calls | | `/gemini-live/call-status` | POST | Twilio call status webhook |

Outbound Call API

```bash curl -X POST https://your-domain/gemini-live/call \ -H 'Content-Type: application/json' \ -d '{"to": "+1234567890", "greeting": "Hello! This is Marcia."}' ```

Configuration

All settings via CLI args or environment variables:

Core - `--model` — Gemini model (default: `gemini-2.5-flash-native-audio-latest`) - `--voice` — Gemini voice: Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, Zephyr (default: Kore) - `--from-number` — Twilio outbound number (default: env `TWILIO_FROM`) - `--system-prompt` — AI persona system prompt - `--max-duration` — Max call seconds (default: 300)

VAD (Voice Activity Detection) - `--vad-enabled` / `--no-vad` — Toggle server-side VAD (default: on) - `--vad-silence-ms` — Silence duration to trigger activityEnd (default: 500) - `--vad-energy-threshold` — RMS energy threshold (default: 0.01) - `--vad-speech-min-ms` — Min speech duration before activityStart (default: 100)

Echo Suppression - `--echo-multiplier` — VAD threshold multiplier during agent speech (default: 3.0) - `--echo-decay-ms` — Decay time after agent stops speaking (default: 300)

Twilio Setup

  1. Buy a phone number on Twilio
  2. Set Voice webhook: `https://your-domain/gemini-live/incoming` (HTTP POST)
  3. Set Call status URL: `https://your-domain/gemini-live/call-status` (HTTP POST)
  4. Ensure geo-permissions are enabled for target countries

Network Requirements

The bridge must be accessible from the internet (Twilio connects to it). Recommended: Caddy reverse proxy with WebSocket support.

``` # Caddy config example handle /gemini-live/* { reverse_proxy localhost:3335 { flush_interval -1 transport http { read_timeout 0 write_timeout 0 } } } ```

Performance

Latency benchmarks (Gemini 2.5 Flash Native Audio):

| Config | Median | Min | Max | |---|---|---|---| | No VAD, 200ms buffer | 3,660ms | 2,360ms | 5,180ms | | Server VAD, 50ms buffer | 2,500ms | 2,080ms | 6,980ms |

Server-side VAD reduces median latency by ~32%.

Use Cases

  • Build live phone conversation applications powered by Gemini's voice capabilities
  • Create voice assistants that use Gemini for real-time speech understanding
  • Implement phone-based customer service bots with Gemini's natural language processing
  • Design interactive voice response systems with AI-powered conversation flow
  • Prototype voice-first applications using Gemini's audio processing

Pros & Cons

Pros

  • +Real-time voice processing enables natural phone conversation experiences
  • +Gemini's multimodal capabilities support both voice and text interactions
  • +Phone-based interface reaches users without app or web access

Cons

  • -Voice applications require telephony infrastructure beyond the Gemini API
  • -Only available on claude-code and openclaw platforms
  • -Real-time voice processing latency can affect conversation naturalness

FAQ

What does Gemini Live Phone do?
Bridge Twilio phone calls to Google Gemini Live API for real-time AI voice conversations. No STT/TTS middleware required. Includes VAD and echo suppression.
What platforms support Gemini Live Phone?
Gemini Live Phone is available on Claude Code, OpenClaw.
What are the use cases for Gemini Live Phone?
Build live phone conversation applications powered by Gemini's voice capabilities. Create voice assistants that use Gemini for real-time speech understanding. Implement phone-based customer service bots with Gemini's natural language processing.

100+ free AI tools

Writing, PDF, image, and developer tools — all in your browser.

Next Step

Use the skill detail page to evaluate fit and install steps. For a direct browser workflow, move into a focused tool route instead of staying in broader support surfaces.