LLM Evaluator Pro
VerifiedLLM-as-a-Judge evaluator via Langfuse. Scores traces on relevance, accuracy, hallucination, and helpfulness using GPT-5-nano as judge. Supports single trace...
$ Add to .claude/skills/ About This Skill
# LLM Evaluator ⚖️
LLM-as-a-Judge evaluation system powered by Langfuse. Uses GPT-5-nano to score AI outputs.
When to Use
- Evaluating quality of search results or AI responses
- Scoring traces for relevance, accuracy, hallucination detection
- Batch scoring recent unscored traces
- Quality assurance on agent outputs
Usage
```bash # Test with sample cases python3 {baseDir}/scripts/evaluator.py test
# Score a specific Langfuse trace python3 {baseDir}/scripts/evaluator.py score <trace_id>
# Score with specific evaluator only python3 {baseDir}/scripts/evaluator.py score <trace_id> --evaluators relevance
# Backfill scores on recent unscored traces python3 {baseDir}/scripts/evaluator.py backfill --limit 20 ```
Evaluators
| Evaluator | Measures | Scale | |-----------|----------|-------| | relevance | Response relevance to query | 0–1 | | accuracy | Factual correctness | 0–1 | | hallucination | Made-up information detection | 0–1 | | helpfulness | Overall usefulness | 0–1 |
Credits
Built by M. Abidi | agxntsix.ai YouTube | GitHub Part of the AgxntSix Skill Suite for OpenClaw agents.
📅 Need help setting up OpenClaw for your business? Book a free consultation
Use Cases
- Evaluate LLM outputs using LLM-as-a-Judge methodology via Langfuse
- Score AI traces on relevance, accuracy, hallucination, and helpfulness
- Run batch evaluations across multiple traces for systematic quality assessment
- Build automated LLM quality monitoring pipelines with configurable criteria
- Compare model performance across different prompts and configurations
Pros & Cons
Pros
- +Compatible with multiple platforms including claude-code, openclaw
- +Well-documented with detailed usage instructions and examples
- +Open source with permissive licensing
Cons
- -No built-in analytics or usage metrics dashboard
- -Configuration may require familiarity with ai & machine learning concepts
FAQ
What does LLM Evaluator Pro do?
What platforms support LLM Evaluator Pro?
What are the use cases for LLM Evaluator Pro?
100+ free AI tools
Writing, PDF, image, and developer tools — all in your browser.
Next Step
Use the skill detail page to evaluate fit and install steps. For a direct browser workflow, move into a focused tool route instead of staying in broader support surfaces.