Video Understanding

Verified

Analyze and summarize videos from 1000+ sites using Google Gemini AI, providing transcripts, descriptions, summaries, and answers to questions.

311 downloads

$ Add to .claude/skills/

$ openclaw install

About This Skill

# Video Understanding (Gemini)

Analyze videos using Google Gemini's multimodal video understanding. Supports 1000+ video sources via yt-dlp.

Requirements

`yt-dlp` — `brew install yt-dlp` / `pip install yt-dlp`
`ffmpeg` — `brew install ffmpeg` (for merging video+audio streams)
`GEMINI_API_KEY` environment variable

Default Output

Returns structured JSON:
transcript — Verbatim transcript with `[MM:SS]` timestamps
description — Visual description (people, setting, UI, text on screen, flow)
summary — 2-3 sentence summary
duration_seconds — Estimated duration
speakers — Identified speakers

Usage

Analyze a video (structured JSON output)

```bash uv run {baseDir}/scripts/analyze_video.py "<video-url>" ```

Ask a question (adds "answer" field)

```bash uv run {baseDir}/scripts/analyze_video.py "<video-url>" -q "What product is shown?" ```

Override prompt entirely

```bash uv run {baseDir}/scripts/analyze_video.py "<video-url>" -p "Custom prompt" --raw ```

Download only (no analysis)

```bash uv run {baseDir}/scripts/analyze_video.py "<video-url>" --download-only -o video.mp4 ```

Options

| Flag | Description | Default | |------|-------------|---------| | `-q` / `--question` | Question to answer (added to default fields) | none | | `-p` / `--prompt` | Override entire prompt (ignores -q) | structured JSON | | `-m` / `--model` | Gemini model | gemini-2.5-flash | | `-o` / `--output` | Save output to file | stdout | | `--keep` | Keep downloaded video file | false | | `--download-only` | Download only, skip analysis | false | | `--max-size` | Max file size in MB | 500 | | `--raw` | Raw text output instead of JSON | false |

How It Works

YouTube URLs → Passed directly to Gemini (no download needed)
All other URLs → Downloaded via yt-dlp → uploaded to Gemini File API → poll until processed
Gemini analyzes video with structured prompt → returns JSON
Temp files and Gemini uploads cleaned up automatically

Supported Sources

Any URL supported by yt-dlp: Loom, YouTube, TikTok, Vimeo, Twitter/X, Instagram, Dailymotion, Twitch, and 1000+ more.

Tips

Use `-q` for targeted questions on top of the full analysis
YouTube is fastest (no download step)
Large videos (10min+) work fine — Gemini File API supports up to 2GB (free) / 20GB (paid)
The script auto-installs Python dependencies via `uv`

Use Cases

Analyze and summarize video content from 1000+ supported websites
Generate transcripts from video audio for searchability and documentation
Answer specific questions about video content using AI analysis
Create video descriptions and summaries for content cataloging
Extract key moments and topics from long-form video content

Pros & Cons

Pros

+Wide platform support — works with 1000+ video hosting sites
+Multiple output types — transcripts, descriptions, summaries, and Q&A
+Powered by Google Gemini AI for high-quality video understanding

Cons

-Requires Google Gemini API access and credentials
-Processing long videos may be slow and token-intensive

FAQ

What does Video Understanding do?

Analyze and summarize videos from 1000+ sites using Google Gemini AI, providing transcripts, descriptions, summaries, and answers to questions.

What platforms support Video Understanding?

Video Understanding is available on Claude Code, OpenClaw.

What are the use cases for Video Understanding?

Analyze and summarize video content from 1000+ supported websites. Generate transcripts from video audio for searchability and documentation. Answer specific questions about video content using AI analysis.

100+ free AI tools

Writing, PDF, image, and developer tools — all in your browser.

AI Humanizer

Make AI text undetectable

AI Detector

Free, unlimited

PDF Tools

Merge, split, compress

Next Step

Use the skill detail page to evaluate fit and install steps. For a direct browser workflow, move into a focused tool route instead of staying in broader support surfaces.

Open Free Tools Try AI Detector