Skip to content

Willow Inference Server

Verified

Local ASR and TTS inference server. Use when the user wants to transcribe audio to text (ASR) or convert text to speech (TTS). Requires a running Willow Infe...

83 downloads
$ Add to .claude/skills/

About This Skill

# Willow Inference Server Skill

Local ASR (speech-to-text) and TTS (text-to-speech) inference server.

Setup

1. Start Willow Inference Server ```bash git clone https://github.com/toverainc/willow-inference-server.git cd willow-inference-server ./utils.sh install ./utils.sh gen-cert your-hostname ./utils.sh run ```

Server runs at `https://your-hostname:19000`

2. Configure Environment Set the server URL: ```bash export WILLOW_BASE_URL="https://your-hostname:19000" ```

Or configure per request (see below).

ASR (Speech-to-Text)

Transcribe Audio File ```bash curl -X POST "${WILLOW_BASE_URL}/api/asr" \ -F "audio_file=@/path/to/audio.m4a" \ -F "language=auto" ```

Parameters | Parameter | Description | Default | |-----------|-------------|---------| | audio_file | Audio file to transcribe | required | | language | Language code (en, zh, etc.) or "auto" | auto | | model | Whisper model (tiny, base, medium, large-v2) | server config | | task | transcribe or translate | transcribe |

Supported Formats - MP3, WAV, M4A, OGG, FLAC, WebM

Example: Transcribe with curl ```bash # Basic transcription curl -X POST "${WILLOW_BASE_URL}/asr" \ -F "[email protected]" \ -F "language=zh"

# With specific model curl -X POST "${WILLOW_BASE_URL}/asr" \ -F "[email protected]" \ -F "language=en" \ -F "model=base" ```

TTS (Text-to-Speech)

Convert Text to Speech ```bash curl -X POST "${WILLOW_BASE_URL}/tts" \ -H "Content-Type: application/json" \ -d '{"text": "Hello world", "voice": "af_sarah"}' ```

Parameters | Parameter | Description | Default | |-----------|-------------|---------| | text | Text to convert to speech | required | | voice | Voice ID (see below) | default voice | | speed | Speech speed (0.5-2.0) | 1.0 | | volume | Volume (0.0-1.0) | 1.0 |

Available Voices Common voices (format: `gender_voicename`): - `af_sarah` - Sarah (Female) - `af_bella` - Bella (Female) - `am_michael` - Michael (Male) - `am_alex` - Alex (Male)

Check server docs for full list: `${WILLOW_BASE_URL}/api/docs`

Example: TTS with curl ```bash # Basic TTS curl -X POST "${WILLOW_BASE_URL}/tts" \ -H "Content-Type: application/json" \ -d '{"text": "你好,这是测试"}' \ -o output.wav

# With custom voice curl -X POST "${WILLOW_BASE_URL}/tts" \ -H "Content-Type: application/json" \ -d '{"text": "Hello!", "voice": "am_michael", "speed": 1.2}' \ -o hello.mp3 ```

Environment Variables

| Variable | Description | Default | |----------|-------------|---------| | WILLOW_BASE_URL | Server URL | https://localhost:19000 |

Workflow Examples

1. Record and Transcribe ```bash # Record audio (macOS) rec test.wav

# Transcribe curl -X POST "${WILLOW_BASE_URL}/asr" \ -F "[email protected]" \ -F "language=auto" ```

2. Text to Speech ```bash # Convert text to speech curl -X POST "${WILLOW_BASE_URL}/tts" \ -H "Content-Type: application/json" \ -d '{"text": "今天的任务是学习新技能"}' \ -o speech.wav ```

3. Batch Transcription ```bash for f in *.m4a; do curl -X POST "${WILLOW_BASE_URL}/asr" \ -F "audio_file=@$f" \ -F "language=auto" \ -o "${f%.m4a}.txt" done ```

API Documentation Full API docs available at: `${WILLOW_BASE_URL}/api/docs`

Notes - All endpoints require HTTPS (or HTTP if configured) - Audio files are processed locally on the server - ASR latency depends on model size and hardware - TTS voices can be customized with custom voice recordings

Use Cases

  • Transcribe audio to text locally using the Willow ASR inference server
  • Convert text to speech for voice interfaces and accessibility
  • Run speech recognition without sending audio data to external APIs
  • Build voice-enabled applications with local inference for privacy
  • Process audio files in batch for transcription and analysis

Pros & Cons

Pros

  • +Local inference keeps sensitive audio data on your machine
  • +Supports both ASR (speech-to-text) and TTS (text-to-speech)
  • +No API costs — runs entirely on local hardware

Cons

  • -Requires local GPU or sufficient CPU resources for real-time inference
  • -Setup complexity compared to cloud-based speech APIs

FAQ

What does Willow Inference Server do?
Local ASR and TTS inference server. Use when the user wants to transcribe audio to text (ASR) or convert text to speech (TTS). Requires a running Willow Infe...
What platforms support Willow Inference Server?
Willow Inference Server is available on Claude Code, OpenClaw.
What are the use cases for Willow Inference Server?
Transcribe audio to text locally using the Willow ASR inference server. Convert text to speech for voice interfaces and accessibility. Run speech recognition without sending audio data to external APIs.

100+ free AI tools

Writing, PDF, image, and developer tools — all in your browser.

Next Step

Use the skill detail page to evaluate fit and install steps. For a direct browser workflow, move into a focused tool route instead of staying in broader support surfaces.