Skip to content

Video Understanding

Verified

Analyze and summarize videos from 1000+ sites using Google Gemini AI, providing transcripts, descriptions, summaries, and answers to questions.

311 downloads
$ Add to .claude/skills/

About This Skill

# Video Understanding (Gemini)

Analyze videos using Google Gemini's multimodal video understanding. Supports 1000+ video sources via yt-dlp.

Requirements

  • `yt-dlp` — `brew install yt-dlp` / `pip install yt-dlp`
  • `ffmpeg` — `brew install ffmpeg` (for merging video+audio streams)
  • `GEMINI_API_KEY` environment variable

Default Output

  • Returns structured JSON:
  • transcript — Verbatim transcript with `[MM:SS]` timestamps
  • description — Visual description (people, setting, UI, text on screen, flow)
  • summary — 2-3 sentence summary
  • duration_seconds — Estimated duration
  • speakers — Identified speakers

Usage

Analyze a video (structured JSON output)

```bash uv run {baseDir}/scripts/analyze_video.py "<video-url>" ```

Ask a question (adds "answer" field)

```bash uv run {baseDir}/scripts/analyze_video.py "<video-url>" -q "What product is shown?" ```

Override prompt entirely

```bash uv run {baseDir}/scripts/analyze_video.py "<video-url>" -p "Custom prompt" --raw ```

Download only (no analysis)

```bash uv run {baseDir}/scripts/analyze_video.py "<video-url>" --download-only -o video.mp4 ```

Options

| Flag | Description | Default | |------|-------------|---------| | `-q` / `--question` | Question to answer (added to default fields) | none | | `-p` / `--prompt` | Override entire prompt (ignores -q) | structured JSON | | `-m` / `--model` | Gemini model | gemini-2.5-flash | | `-o` / `--output` | Save output to file | stdout | | `--keep` | Keep downloaded video file | false | | `--download-only` | Download only, skip analysis | false | | `--max-size` | Max file size in MB | 500 | | `--raw` | Raw text output instead of JSON | false |

How It Works

  1. YouTube URLs → Passed directly to Gemini (no download needed)
  2. All other URLs → Downloaded via yt-dlp → uploaded to Gemini File API → poll until processed
  3. Gemini analyzes video with structured prompt → returns JSON
  4. Temp files and Gemini uploads cleaned up automatically

Supported Sources

Any URL supported by yt-dlp: Loom, YouTube, TikTok, Vimeo, Twitter/X, Instagram, Dailymotion, Twitch, and 1000+ more.

Tips

  • Use `-q` for targeted questions on top of the full analysis
  • YouTube is fastest (no download step)
  • Large videos (10min+) work fine — Gemini File API supports up to 2GB (free) / 20GB (paid)
  • The script auto-installs Python dependencies via `uv`

Use Cases

  • Analyze and summarize video content from 1000+ supported websites
  • Generate transcripts from video audio for searchability and documentation
  • Answer specific questions about video content using AI analysis
  • Create video descriptions and summaries for content cataloging
  • Extract key moments and topics from long-form video content

Pros & Cons

Pros

  • +Wide platform support — works with 1000+ video hosting sites
  • +Multiple output types — transcripts, descriptions, summaries, and Q&A
  • +Powered by Google Gemini AI for high-quality video understanding

Cons

  • -Requires Google Gemini API access and credentials
  • -Processing long videos may be slow and token-intensive

FAQ

What does Video Understanding do?
Analyze and summarize videos from 1000+ sites using Google Gemini AI, providing transcripts, descriptions, summaries, and answers to questions.
What platforms support Video Understanding?
Video Understanding is available on Claude Code, OpenClaw.
What are the use cases for Video Understanding?
Analyze and summarize video content from 1000+ supported websites. Generate transcripts from video audio for searchability and documentation. Answer specific questions about video content using AI analysis.

100+ free AI tools

Writing, PDF, image, and developer tools — all in your browser.

Next Step

Use the skill detail page to evaluate fit and install steps. For a direct browser workflow, move into a focused tool route instead of staying in broader support surfaces.