Building with Gemini Live

Gemini 3 SuperHack 🏈Check out our Gemini Live demo showing how to build with a web client using vision capabilities.

Gemini Live is Google’s speech-to-speech API that enables natural, real-time voice conversations with AI. With Pipecat, you can build production-ready voice agents that leverage Gemini Live for telephony, web, and mobile applications.

API Reference

Gemini Live service documentation

Pipecat CLI

Scaffold and deploy projects

Capabilities

Pipecat’s Gemini Live integration supports multiple modalities and deployment targets:

Voice

Real-time speech-to-speech conversations with natural turn-taking and voice activity detection

Vision

Process video and screenshare alongside audio for multimodal interactions

Telephony

Build phone-based voice agents with Twilio WebSocket integration

Tool Use

Function calling support for external integrations and dynamic responses

Architecture

Pipecat manages connections between your client and Gemini Live:

The Pipecat server handles media streaming with clients via WebRTC (web/mobile) or WebSockets (telephony), while maintaining a persistent connection to Gemini Live for real-time AI processing.

Quick Start

The fastest way to start building is with the Pipecat CLI:

# Install the CLI
uv tool install pipecat-ai-cli

# Create a new project
pipecat init

The CLI will guide you through selecting:

Bot type: Gemini Live (speech-to-speech)
Transport: Daily WebRTC, Twilio, or others
Deployment target: Local development or Pipecat Cloud

All CLI commands can use either pipecat or the shorter pc alias.

Starter Projects

These complete examples demonstrate Gemini Live in production scenarios. Each includes local development setup and Pipecat Cloud deployment configuration.

Phone Bot (Twilio)

A telephone-based voice agent using Gemini Live with Twilio WebSockets. The demo plays “Two Truths and a Lie” to showcase natural conversation flow.

Phone Bot Starter

Build a production phone agent with Twilio integration

Try it now: Call 1-970-LIVE-API (1-970-548-3274) to talk to a live demo. What you’ll learn:

Twilio WebSocket transport configuration
Google STT/TTS integration alongside Gemini Live
TwiML setup for incoming calls
Pipecat Cloud deployment with telephony

Web Bot (Vision)

A browser-based agent with screensharing and vision capabilities, built with the Pipecat Voice UI Kit and Daily WebRTC transport.

Web Bot Starter

Build a web agent with vision and screensharing

What you’ll learn:

Daily WebRTC transport for web clients
Vision/screenshare processing with Gemini Live
Next.js client with Voice UI Kit components
Resizable panels and event logging

Deployment

Both starter projects include configuration for Pipecat Cloud, which handles scaling, monitoring, and global deployment.

# Authenticate with Pipecat Cloud
pipecat cloud auth login

# Build and push your Docker image
pipecat cloud docker build-push

# Deploy your agent
pipecat cloud deploy

Each starter includes a pcc-deploy.toml file with sensible defaults for agent configuration and scaling.

Pipecat Cloud Deployment

Learn more about deploying to production

Next Steps

Function Calling

Add external integrations and dynamic responses

React Client SDK

Build custom web interfaces

Telephony Guide

Deep dive into phone integrations

Core Concepts

Understand Pipecat pipelines and processors

Learning Pipecat

Fundamentals

Features

Telephony

Building with Gemini Live

API Reference

Pipecat CLI

Capabilities

Voice

Vision

Telephony

Tool Use

Architecture

Quick Start

Starter Projects

Phone Bot (Twilio)

Phone Bot Starter

Web Bot (Vision)

Web Bot Starter

Deployment

Pipecat Cloud Deployment

Next Steps

Function Calling

React Client SDK

Telephony Guide

Core Concepts

Learning Pipecat

Fundamentals

Features

Telephony

API Reference

Pipecat CLI

​Capabilities

Voice

Vision

Telephony

Tool Use

​Architecture

​Quick Start

​Starter Projects

​Phone Bot (Twilio)

Phone Bot Starter

​Web Bot (Vision)

Web Bot Starter

​Deployment

Pipecat Cloud Deployment

​Next Steps

Function Calling

React Client SDK

Telephony Guide

Core Concepts

Capabilities

Architecture

Quick Start

Starter Projects

Phone Bot (Twilio)

Web Bot (Vision)

Deployment

Next Steps