Video Understanding
VerifiedAnalyze and summarize videos from 1000+ sites using Google Gemini AI, providing transcripts, descriptions, summaries, and answers to questions.
$ Add to .claude/skills/ About This Skill
# Video Understanding (Gemini)
Analyze videos using Google Gemini's multimodal video understanding. Supports 1000+ video sources via yt-dlp.
Requirements
- `yt-dlp` — `brew install yt-dlp` / `pip install yt-dlp`
- `ffmpeg` — `brew install ffmpeg` (for merging video+audio streams)
- `GEMINI_API_KEY` environment variable
Default Output
- Returns structured JSON:
- transcript — Verbatim transcript with `[MM:SS]` timestamps
- description — Visual description (people, setting, UI, text on screen, flow)
- summary — 2-3 sentence summary
- duration_seconds — Estimated duration
- speakers — Identified speakers
Usage
Analyze a video (structured JSON output)
```bash uv run {baseDir}/scripts/analyze_video.py "<video-url>" ```
Ask a question (adds "answer" field)
```bash uv run {baseDir}/scripts/analyze_video.py "<video-url>" -q "What product is shown?" ```
Override prompt entirely
```bash uv run {baseDir}/scripts/analyze_video.py "<video-url>" -p "Custom prompt" --raw ```
Download only (no analysis)
```bash uv run {baseDir}/scripts/analyze_video.py "<video-url>" --download-only -o video.mp4 ```
Options
| Flag | Description | Default | |------|-------------|---------| | `-q` / `--question` | Question to answer (added to default fields) | none | | `-p` / `--prompt` | Override entire prompt (ignores -q) | structured JSON | | `-m` / `--model` | Gemini model | gemini-2.5-flash | | `-o` / `--output` | Save output to file | stdout | | `--keep` | Keep downloaded video file | false | | `--download-only` | Download only, skip analysis | false | | `--max-size` | Max file size in MB | 500 | | `--raw` | Raw text output instead of JSON | false |
How It Works
- YouTube URLs → Passed directly to Gemini (no download needed)
- All other URLs → Downloaded via yt-dlp → uploaded to Gemini File API → poll until processed
- Gemini analyzes video with structured prompt → returns JSON
- Temp files and Gemini uploads cleaned up automatically
Supported Sources
Any URL supported by yt-dlp: Loom, YouTube, TikTok, Vimeo, Twitter/X, Instagram, Dailymotion, Twitch, and 1000+ more.
Tips
- Use `-q` for targeted questions on top of the full analysis
- YouTube is fastest (no download step)
- Large videos (10min+) work fine — Gemini File API supports up to 2GB (free) / 20GB (paid)
- The script auto-installs Python dependencies via `uv`
Use Cases
- Analyze and summarize video content from 1000+ supported websites
- Generate transcripts from video audio for searchability and documentation
- Answer specific questions about video content using AI analysis
- Create video descriptions and summaries for content cataloging
- Extract key moments and topics from long-form video content
Pros & Cons
Pros
- +Wide platform support — works with 1000+ video hosting sites
- +Multiple output types — transcripts, descriptions, summaries, and Q&A
- +Powered by Google Gemini AI for high-quality video understanding
Cons
- -Requires Google Gemini API access and credentials
- -Processing long videos may be slow and token-intensive
FAQ
What does Video Understanding do?
What platforms support Video Understanding?
What are the use cases for Video Understanding?
100+ free AI tools
Writing, PDF, image, and developer tools — all in your browser.
Next Step
Use the skill detail page to evaluate fit and install steps. For a direct browser workflow, move into a focused tool route instead of staying in broader support surfaces.