OpenRouter Image Generation
VerifiedGenerate images using Google Gemini via OpenRouter API. Supports text-to-image and reference-image-guided generation. Use when the user asks to generate, cre...
$ Add to .claude/skills/ About This Skill
# Gemini Image Generation
Generate images via `google/gemini-3.1-flash-image-preview` on OpenRouter. Cheap ($0.25/M in, $1.5/M out), fast, good quality.
Quick Start
```bash python3 scripts/generate.py "a watercolor illustration of a cozy café" -o output.png ```
With reference image (style/character guidance): ```bash python3 scripts/generate.py "same character but waving hello" -o wave.png --ref reference.png ```
Script path: `skills/gemini-image-gen/scripts/generate.py`
Requirements
- `OPENROUTER_API_KEY` environment variable (or `--api-key` flag)
- Python 3.10+ (stdlib only, no pip installs needed)
How It Works
- Calls OpenRouter `/chat/completions` with `modalities: ["text", "image"]`
- Optionally encodes a reference image as base64 in the message
- Extracts generated image from `choices[0].message.images[0].image_url.url` (data:image/png;base64,...)
- Decodes and saves to output path
Prompt Engineering Tips (from experience)
Aspect Ratio & Composition - Gemini respects aspect ratio instructions in the prompt - For vertical (e.g. phone wallpaper, Xiaohongshu cover): add "vertical composition, 3:4 aspect ratio" - For horizontal (e.g. banner): add "horizontal composition, 16:9 aspect ratio" - For square: add "square composition, 1:1 aspect ratio" - **Always specify** — without it, Gemini defaults to roughly square and may crop awkwardly
Character Consistency - When using `--ref`, describe the character features explicitly in the prompt AND provide the reference image - Key details to specify: hair color/style, eye color, clothing, accessories, expression - Example: "same character from reference: silver-to-ice-blue gradient shoulder-length hair, ice-blue eyes, cream cardigan over light blue shirt, snowflake earring" - Gemini is decent at maintaining consistency but drifts on small details — always re-specify distinguishing features
Style Control - Name the art style explicitly: "soft watercolor illustration", "anime cel-shading", "photorealistic", "flat vector", "oil painting" - For warm/cozy tone: "warm color palette, cream and peach gradient background, bokeh light spots" - For dark/moody: "dark gradient background, deep navy to black, subtle glow effects" - Mentioning a well-known art style works: "in the style of Studio Ghibli", "Makoto Shinkai lighting"
Text in Images - Gemini can render short text in images but it's unreliable for CJK characters - For English text: works reasonably well if you specify font style ("bold sans-serif", "handwritten script") - For Chinese/Japanese: **avoid** — it usually garbles characters. Add text overlays with a separate tool (e.g. ImageMagick, Pillow) instead
Common Pitfalls - **Body proportions**: Gemini sometimes compresses/distorts figures. Add "natural human body proportions, do not squash or stretch" for character art - **Hands**: Still a weak spot. Minimize visible hands or describe hand pose explicitly - **Multiple subjects**: More than 2-3 subjects increases inconsistency. Keep scenes focused - **Batch generation**: For generating multiple variations, run the script multiple times — each call is independent. Do NOT ask for "4 options" in one prompt
Sending Images on Feishu
⚠️ Critical: Images must be saved to a path within `localRoots` (typically your OpenClaw workspace dir). `/tmp` is NOT whitelisted on Feishu.
```python # Save to workspace, not /tmp output_path = "my_image.png" # relative to workspace
# Send via message tool: # media: "file://<workspace_path>/my_image.png" # (use 'media' parameter, NOT 'filePath') ```
After sending, clean up temporary images to avoid workspace clutter.
Advanced: Calling from Python (without CLI)
```python import os, sys sys.path.insert(0, "skills/gemini-image-gen/scripts") from generate import generate
generate( prompt="a cute robot reading a philosophy book", output="robot.png", ref_image=None, # or path to reference image ) ```
Model Alternatives
| Model | Cost | Notes | |-------|------|-------| | `google/gemini-3.1-flash-image-preview` | $0.25/$1.5 per M tokens | Default. Best balance of cost and quality | | `google/gemini-3.1-pro-preview` | $2/$12 per M tokens | Higher quality but 8x more expensive | | `openai/gpt-image-1` | varies | OpenAI's image model, different API format — not supported by this script |
Troubleshooting
- "No image in response": Check `.debug.json` file created alongside output. Usually means the prompt triggered safety filters or the model returned text-only.
- Garbled/distorted output: Try rephrasing. Add "high quality, detailed" and be more specific about composition.
- API error 429: Rate limited. Wait 30s and retry.
- API error 402: Insufficient credits on OpenRouter.
Use Cases
- Generate images via Google Gemini on OpenRouter at low cost ($0.25/M input tokens)
- Create character-consistent illustrations using reference images for style guidance
- Control image composition with aspect ratio instructions for vertical, horizontal, or square outputs
- Generate art in specific styles — watercolor, anime cel-shading, photorealistic, flat vector
- Produce images from Python scripts or CLI for integration into automated workflows
Pros & Cons
Pros
- +Very cost-effective — Gemini Flash at $0.25/$1.5 per million tokens is much cheaper than OpenAI image models
- +Zero pip dependencies — uses Python stdlib only for maximum portability
- +Detailed prompt engineering tips from real experience covering aspect ratio, character consistency, and style control
Cons
- -CJK text rendering in images is unreliable — requires separate tools for Chinese/Japanese text overlays
- -Single-image generation per API call — no batch mode for generating multiple variations
FAQ
What does OpenRouter Image Generation do?
What platforms support OpenRouter Image Generation?
What are the use cases for OpenRouter Image Generation?
100+ free AI tools
Writing, PDF, image, and developer tools — all in your browser.
Next Step
Use the skill detail page to evaluate fit and install steps. For a direct browser workflow, move into a focused tool route instead of staying in broader support surfaces.