Skip to content

Local-First LLM

Verified

Routes LLM requests to a local model (Ollama, LM Studio, llamafile) before falling back to cloud APIs. Tracks token savings and cost avoidance in a persisten...

165 downloads
$ Add to .claude/skills/

About This Skill

# Local-First LLM

Route requests to a local LLM first; fall back to cloud only when necessary. Track every decision to show real token and cost savings.

Quick Start

1. Check if a local LLM is running

```bash python3 skills/local-first-llm/scripts/check_local.py ```

Returns JSON: `{ "any_available": true, "best": { "provider": "ollama", "models": [...] } }`

2. Route a request

```bash python3 skills/local-first-llm/scripts/route_request.py \ --prompt "Summarize this meeting transcript" \ --tokens 800 \ --local-available \ --local-provider ollama ```

Returns: `{ "decision": "local", "reason": "...", "complexity_score": -1 }`

3. Log the outcome

After executing the request, record it:

```bash python3 skills/local-first-llm/scripts/track_savings.py log \ --tokens 800 \ --model gpt-4o \ --routed-to local ```

4. Show the dashboard

```bash python3 skills/local-first-llm/scripts/dashboard.py ```

---

Full Routing Workflow

``` ┌─────────────────────────────────────────────────────┐ │ 1. check_local.py → is a local provider running? │ │ │ │ 2. route_request.py → local or cloud? │ │ - sensitivity check (private data → local) │ │ - complexity score (high score → cloud) │ │ - availability gate (no local → cloud) │ │ │ │ 3. Execute with the chosen provider │ │ │ │ 4. track_savings.py log → record the outcome │ │ │ │ 5. dashboard.py → show cumulative savings │ └─────────────────────────────────────────────────────┘ ```

---

Routing Rules (Summary)

| Condition | Route | | ----------------------------------------------------------------------------- | -------- | | No local provider available | ☁️ Cloud | | Prompt contains sensitive data (`password`, `secret`, `api key`, `ssn`, etc.) | 🏠 Local | | Complexity score ≥ 3 | ☁️ Cloud | | Complexity score < 3 | 🏠 Local |

For full scoring details, see references/routing-logic.md.

---

Executing with a Local Provider

Once `route_request.py` returns `"decision": "local"`, send the request:

Ollama

```bash curl http://localhost:11434/api/generate \ -d '{"model": "llama3.2", "prompt": "YOUR_PROMPT", "stream": false}' ```

LM Studio / llamafile (OpenAI-compatible)

```bash curl http://localhost:1234/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model": "local-model", "messages": [{"role": "user", "content": "YOUR_PROMPT"}]}' ```

---

Dashboard

The dashboard reads from `~/.openclaw/local-first-llm/savings.json` (auto-created).

``` ┌─────────────────────────────────────────┐ │ 🧠 Local-First LLM — Dashboard │ ├─────────────────────────────────────────┤ │ Local LLM: ✅ ollama (llama3.2...) │ ├─────────────────────────────────────────┤ │ Total requests: 42 │ │ Routed locally: 31 (73.8%) │ │ Routed to cloud: 11 │ ├─────────────────────────────────────────┤ │ Tokens saved: 84,200 │ │ Cost saved: $0.4210 │ └─────────────────────────────────────────┘ ```

Reset savings data:

```bash python3 skills/local-first-llm/scripts/track_savings.py reset ```

---

Additional References

  • Routing scoring details: references/routing-logic.md
  • Local provider setup (Ollama, LM Studio, llamafile): references/local-providers.md
  • Token estimation & cloud cost table: references/token-estimation.md

Use Cases

  • Route LLM requests to local models (Ollama, LM Studio) before cloud fallback
  • Track token savings and cost avoidance from local model usage
  • Reduce AI API costs by prioritizing local inference when possible
  • Build privacy-first AI workflows that prefer local model processing
  • Automatically fall back to cloud APIs when local models are unavailable

Pros & Cons

Pros

  • +Compatible with multiple platforms including claude-code, openclaw
  • +Well-documented with detailed usage instructions and examples
  • +Purpose-built for ai & machine learning tasks with focused functionality

Cons

  • -Requires API tokens or authentication setup before first use
  • -No built-in analytics or usage metrics dashboard

FAQ

What does Local-First LLM do?
Routes LLM requests to a local model (Ollama, LM Studio, llamafile) before falling back to cloud APIs. Tracks token savings and cost avoidance in a persisten...
What platforms support Local-First LLM?
Local-First LLM is available on Claude Code, OpenClaw.
What are the use cases for Local-First LLM?
Route LLM requests to local models (Ollama, LM Studio) before cloud fallback. Track token savings and cost avoidance from local model usage. Reduce AI API costs by prioritizing local inference when possible.

100+ free AI tools

Writing, PDF, image, and developer tools — all in your browser.

Next Step

Use the skill detail page to evaluate fit and install steps. For a direct browser workflow, move into a focused tool route instead of staying in broader support surfaces.