MetriLLM

Verified

Find the best local LLM for your machine. Tests speed, quality and RAM fit, then tells you if a model is worth running on your hardware.

89 downloads

$ Add to .claude/skills/

$ openclaw install

About This Skill

# MetriLLM — Find the Best LLM for Your Hardware

Test any local model and get a clear verdict: is it worth running on your machine?

Prerequisites

Node.js 20+ — check with `node -v`
Ollama or LM Studio installed and running
- Ollama: ollama.com, then `ollama serve`
- LM Studio: lmstudio.ai, load a model and start the server
MetriLLM CLI — install globally:

```bash npm install -g metrillm ```

Usage

List available models

```bash ollama list ```

Run a full benchmark

```bash metrillm bench --model $ARGUMENTS --json ```

This measures:
Performance: tokens/second, time to first token, memory usage
Quality: reasoning, math, coding, instruction following, structured output, multilingual
Fitness verdict: EXCELLENT / GOOD / MARGINAL / NOT RECOMMENDED

Performance-only benchmark (faster)

```bash metrillm bench --model $ARGUMENTS --perf-only --json ```

Skips quality evaluation — measures speed and memory only.

View previous results

```bash ls ~/.metrillm/results/ ```

Read any JSON file to see full benchmark details.

Share to the public leaderboard

```bash metrillm bench --model $ARGUMENTS --share ```

Uploads your result to the MetriLLM community leaderboard — an open, community-driven ranking of local LLM performance across real hardware. Compare your results with others and help the community find the best models for every setup. Shared data includes: model name, scores, hardware specs (CPU, RAM, GPU). No personal data is sent.

Interpreting Results

| Verdict | Score | Meaning | |---|---|---| | EXCELLENT | >= 80 | Fast and accurate — great fit | | GOOD | >= 60 | Solid — suitable for most tasks | | MARGINAL | >= 40 | Usable but with tradeoffs | | NOT RECOMMENDED | < 40 | Too slow or inaccurate |

Key metrics to highlight:
`tokensPerSecond` > 30 = good for interactive use
`ttft` < 500ms = responsive
`memoryUsedGB` vs available RAM = will it fit?

Tips

Use `--perf-only` for quick tests
Close GPU-intensive apps before benchmarking
Benchmark duration varies depending on model speed and response length

Open Source

MetriLLM is free and open source (Apache 2.0). Contributions, issues, and feedback are welcome: github.com/MetriLLM/metrillm

Use Cases

Test and benchmark local LLMs for speed, quality, and RAM usage
Determine if a specific model is worth running on your hardware
Compare local model performance across different quantization levels
Evaluate inference quality and latency for local AI model selection
Build automated model testing pipelines for local LLM deployment decisions

Pros & Cons

Pros

+Compatible with multiple platforms including claude-code, openclaw
+Well-documented with detailed usage instructions and examples
+Open source with permissive licensing

Cons

-No built-in analytics or usage metrics dashboard
-Configuration may require familiarity with ai & machine learning concepts

FAQ

What does MetriLLM do?

Find the best local LLM for your machine. Tests speed, quality and RAM fit, then tells you if a model is worth running on your hardware.

What platforms support MetriLLM?

MetriLLM is available on Claude Code, OpenClaw.

What are the use cases for MetriLLM?

Test and benchmark local LLMs for speed, quality, and RAM usage. Determine if a specific model is worth running on your hardware. Compare local model performance across different quantization levels.

100+ free AI tools

Writing, PDF, image, and developer tools — all in your browser.

AI Humanizer

Make AI text undetectable

AI Detector

Free, unlimited

PDF Tools

Merge, split, compress

Next Step

Use the skill detail page to evaluate fit and install steps. For a direct browser workflow, move into a focused tool route instead of staying in broader support surfaces.

Open Free Tools Try AI Detector