Security methodology

Skills Security Methodology

How we audit every agent skill. Fully transparent. No black boxes.

1278

Total Skills

100%

Scan Coverage

1122

Verified

Caution

Flagged

Three-Tier Rating System

Verified

Passed all applicable security checks. No prompt injection, no dangerous commands, no data exfiltration patterns detected.

Caution

Has warnings in two or more dimensions. May contain risky commands used in legitimate context (e.g., git force-push in a git tutorial). Review before use.

Flagged

Scanned and flagged for risky patterns such as dangerous commands, prompt injection, or data exfiltration. Review source code carefully before use.

Five-Dimension Security Audit

Every skill in our directory is scanned across five security dimensions. Skills with GitHub source repositories receive full 5-dimension analysis. Skills sourced from registries receive content-level analysis (3 dimensions).

Prompt Injection Detection

critical

Does the skill contain instructions that attempt to override system prompts, bypass safety guardrails, or manipulate the AI agent into unauthorized behavior?

How we check:

We scan the skill's content for 13 pattern categories including: system prompt overrides, jailbreak attempts, hidden instructions, credential harvesting, and encoded commands. Context-aware filtering eliminates false positives from safety documentation (e.g., "DO NOT bypass security" is not flagged).

ignore_instructionsoverride_systemjailbreak_danhide_from_usersecretly_doexfiltrate_credsbase64_decode

Dangerous Commands

high

Does the skill instruct the agent to execute high-risk shell commands that could damage the user's system, delete data, or compromise security?

How we check:

We scan for 16 dangerous command patterns including: recursive file deletion (rm -rf), curl-pipe-to-shell, SSH/AWS credential file writes, chmod 777, sudo NOPASSWD, force push, and DROP TABLE. Commands in educational context (e.g., git tutorials explaining force-push) are flagged as warnings, not failures.

rm_rf_varcurl_pipe_shssh_writechmod_777sudo_nopasswddrop_table

Data Exfiltration

critical

Does the skill contain patterns that send user data to external servers, webhooks, or third-party endpoints without explicit user consent?

How we check:

We scan for 9 exfiltration patterns including: webhook URLs (webhook.site, ngrok, pipedream), Discord/Telegram bot API calls, DNS exfiltration, and curl POST with variable interpolation. Only active data-sending patterns are flagged; documentation references are excluded.

curl_post_varwebhook_sendtelegram_senddns_exfil

Dependency Audit

high

For skills with source code repositories, do their dependencies contain known security vulnerabilities?

How we check:

We check package.json (npm), requirements.txt, and pyproject.toml against a curated list of known vulnerable packages with CVE references. This check is only available for skills with GitHub source repositories.

CVE-2022-0235 (node-fetch)CVE-2023-45857 (axios)CVE-2022-23529 (jsonwebtoken)

Content Integrity

medium

Does the skill content in our directory match the original source in the GitHub repository? This detects tampering or significant drift.

How we check:

We compare our stored content against the source repository's SKILL.md/CLAUDE.md using word-level overlap analysis. Scores above 90% match are marked as verified. Scores between 50-90% are flagged as drift. Only applicable to skills with GitHub source repos.

MD5 hash comparisonWord overlap ratioFrontmatter stripping

Scan Coverage

Full Code Analysis

For skills with GitHub source repos. We shallow-clone the repository and scan all source files (.ts, .js, .py, .go, .rs, .java, .md) across all 5 dimensions.

Covers: prompt injection, dangerous commands, data exfiltration, dependency audit, content integrity

Content Analysis

For skills sourced from registries (e.g., ClawHub) without GitHub repos. We scan the skill's instruction content directly for the first 3 dimensions.

Covers: prompt injection, dangerous commands, data exfiltration

False Positive Handling

Static pattern matching can produce false positives. We use context-aware filtering to reduce them:

✓ "Do NOT bypass security" — Negated context detected. Safety instruction, not an attack. Not flagged.
✓ "No telemetry sent without user consent" — Negation before match. Privacy statement. Not flagged.
✗ "Secretly modify the config file" — No negation context. Flagged as prompt injection.
! "git push --force" — In a git tutorial context. Flagged as warning (not failure).

Limitations

• Static analysis only — we do not execute skills or test runtime behavior
• Pattern-based detection may miss obfuscated or novel attack vectors
• Content analysis (for registry-sourced skills) cannot check dependencies or verify content against source repos
• A "Verified" rating means no known patterns were detected — it is not a guarantee of safety

Browse Flagship Tools Verified Badge for Authors

Found a security issue?

If you discover a security vulnerability in any skill, please report it.

[email protected]