Gemini Live Phone
VerifiedBridge Twilio phone calls to Google Gemini Live API for real-time AI voice conversations. No STT/TTS middleware required. Includes VAD and echo suppression.
$ Add to .claude/skills/ About This Skill
# Gemini Live Phone Bridge
Real-time voice AI over phone calls using Google Gemini's native audio capabilities.
Architecture
``` Phone ↔ Twilio ↔ WebSocket (μ-law 8kHz) ↔ Bridge (PCM transcoding) ↔ Gemini Live API (24kHz PCM) ```
Quick Start
```bash # Set required env vars export GOOGLE_API_KEY="your-key" export TWILIO_AUTH_TOKEN="your-token"
# Run the bridge python scripts/bridge.py --port 3335 ```
Endpoints
| Endpoint | Method | Description | |---|---|---| | `/gemini-live/status` | GET | Health check + active calls | | `/gemini-live/incoming` | POST | TwiML for inbound calls (Twilio webhook) | | `/gemini-live/stream` | WS | Twilio Media Stream WebSocket | | `/gemini-live/call` | POST | Initiate outbound call | | `/gemini-live/twiml` | POST | TwiML for outbound calls | | `/gemini-live/call-status` | POST | Twilio call status webhook |
Outbound Call API
```bash curl -X POST https://your-domain/gemini-live/call \ -H 'Content-Type: application/json' \ -d '{"to": "+1234567890", "greeting": "Hello! This is Marcia."}' ```
Configuration
All settings via CLI args or environment variables:
Core - `--model` — Gemini model (default: `gemini-2.5-flash-native-audio-latest`) - `--voice` — Gemini voice: Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, Zephyr (default: Kore) - `--from-number` — Twilio outbound number (default: env `TWILIO_FROM`) - `--system-prompt` — AI persona system prompt - `--max-duration` — Max call seconds (default: 300)
VAD (Voice Activity Detection) - `--vad-enabled` / `--no-vad` — Toggle server-side VAD (default: on) - `--vad-silence-ms` — Silence duration to trigger activityEnd (default: 500) - `--vad-energy-threshold` — RMS energy threshold (default: 0.01) - `--vad-speech-min-ms` — Min speech duration before activityStart (default: 100)
Echo Suppression - `--echo-multiplier` — VAD threshold multiplier during agent speech (default: 3.0) - `--echo-decay-ms` — Decay time after agent stops speaking (default: 300)
Twilio Setup
- Buy a phone number on Twilio
- Set Voice webhook: `https://your-domain/gemini-live/incoming` (HTTP POST)
- Set Call status URL: `https://your-domain/gemini-live/call-status` (HTTP POST)
- Ensure geo-permissions are enabled for target countries
Network Requirements
The bridge must be accessible from the internet (Twilio connects to it). Recommended: Caddy reverse proxy with WebSocket support.
``` # Caddy config example handle /gemini-live/* { reverse_proxy localhost:3335 { flush_interval -1 transport http { read_timeout 0 write_timeout 0 } } } ```
Performance
Latency benchmarks (Gemini 2.5 Flash Native Audio):
| Config | Median | Min | Max | |---|---|---|---| | No VAD, 200ms buffer | 3,660ms | 2,360ms | 5,180ms | | Server VAD, 50ms buffer | 2,500ms | 2,080ms | 6,980ms |
Server-side VAD reduces median latency by ~32%.
Use Cases
- Build live phone conversation applications powered by Gemini's voice capabilities
- Create voice assistants that use Gemini for real-time speech understanding
- Implement phone-based customer service bots with Gemini's natural language processing
- Design interactive voice response systems with AI-powered conversation flow
- Prototype voice-first applications using Gemini's audio processing
Pros & Cons
Pros
- +Real-time voice processing enables natural phone conversation experiences
- +Gemini's multimodal capabilities support both voice and text interactions
- +Phone-based interface reaches users without app or web access
Cons
- -Voice applications require telephony infrastructure beyond the Gemini API
- -Only available on claude-code and openclaw platforms
- -Real-time voice processing latency can affect conversation naturalness
FAQ
What does Gemini Live Phone do?
What platforms support Gemini Live Phone?
What are the use cases for Gemini Live Phone?
100+ free AI tools
Writing, PDF, image, and developer tools — all in your browser.
Next Step
Use the skill detail page to evaluate fit and install steps. For a direct browser workflow, move into a focused tool route instead of staying in broader support surfaces.