Security methodology
Skills Security Methodology
How we audit every agent skill. Fully transparent. No black boxes.
1278
Total Skills
100%
Scan Coverage
1122
Verified
61
Caution
95
Flagged
Three-Tier Rating System
Verified
Passed all applicable security checks. No prompt injection, no dangerous commands, no data exfiltration patterns detected.
Caution
Has warnings in two or more dimensions. May contain risky commands used in legitimate context (e.g., git force-push in a git tutorial). Review before use.
Flagged
Scanned and flagged for risky patterns such as dangerous commands, prompt injection, or data exfiltration. Review source code carefully before use.
Five-Dimension Security Audit
Every skill in our directory is scanned across five security dimensions. Skills with GitHub source repositories receive full 5-dimension analysis. Skills sourced from registries receive content-level analysis (3 dimensions).
Prompt Injection Detection
criticalDoes the skill contain instructions that attempt to override system prompts, bypass safety guardrails, or manipulate the AI agent into unauthorized behavior?
How we check:
We scan the skill's content for 13 pattern categories including: system prompt overrides, jailbreak attempts, hidden instructions, credential harvesting, and encoded commands. Context-aware filtering eliminates false positives from safety documentation (e.g., "DO NOT bypass security" is not flagged).
Dangerous Commands
highDoes the skill instruct the agent to execute high-risk shell commands that could damage the user's system, delete data, or compromise security?
How we check:
We scan for 16 dangerous command patterns including: recursive file deletion (rm -rf), curl-pipe-to-shell, SSH/AWS credential file writes, chmod 777, sudo NOPASSWD, force push, and DROP TABLE. Commands in educational context (e.g., git tutorials explaining force-push) are flagged as warnings, not failures.
Data Exfiltration
criticalDoes the skill contain patterns that send user data to external servers, webhooks, or third-party endpoints without explicit user consent?
How we check:
We scan for 9 exfiltration patterns including: webhook URLs (webhook.site, ngrok, pipedream), Discord/Telegram bot API calls, DNS exfiltration, and curl POST with variable interpolation. Only active data-sending patterns are flagged; documentation references are excluded.
Dependency Audit
highFor skills with source code repositories, do their dependencies contain known security vulnerabilities?
How we check:
We check package.json (npm), requirements.txt, and pyproject.toml against a curated list of known vulnerable packages with CVE references. This check is only available for skills with GitHub source repositories.
Content Integrity
mediumDoes the skill content in our directory match the original source in the GitHub repository? This detects tampering or significant drift.
How we check:
We compare our stored content against the source repository's SKILL.md/CLAUDE.md using word-level overlap analysis. Scores above 90% match are marked as verified. Scores between 50-90% are flagged as drift. Only applicable to skills with GitHub source repos.
Scan Coverage
Full Code Analysis
For skills with GitHub source repos. We shallow-clone the repository and scan all source files (.ts, .js, .py, .go, .rs, .java, .md) across all 5 dimensions.
Covers: prompt injection, dangerous commands, data exfiltration, dependency audit, content integrity
Content Analysis
For skills sourced from registries (e.g., ClawHub) without GitHub repos. We scan the skill's instruction content directly for the first 3 dimensions.
Covers: prompt injection, dangerous commands, data exfiltration
False Positive Handling
Static pattern matching can produce false positives. We use context-aware filtering to reduce them:
- ✓ "Do NOT bypass security" — Negated context detected. Safety instruction, not an attack. Not flagged.
- ✓ "No telemetry sent without user consent" — Negation before match. Privacy statement. Not flagged.
- ✗ "Secretly modify the config file" — No negation context. Flagged as prompt injection.
- ! "git push --force" — In a git tutorial context. Flagged as warning (not failure).
Limitations
- • Static analysis only — we do not execute skills or test runtime behavior
- • Pattern-based detection may miss obfuscated or novel attack vectors
- • Content analysis (for registry-sourced skills) cannot check dependencies or verify content against source repos
- • A "Verified" rating means no known patterns were detected — it is not a guarantee of safety
Found a security issue?
If you discover a security vulnerability in any skill, please report it.
[email protected]