Hugging Face Model Trainer
VerifiedTRL-based model training workflows on Hugging Face.
Install
Claude Code
Add to .claude/skills/ About This Skill
Hugging Face Model Trainer is a skill that guides AI coding agents through TRL (Transformer Reinforcement Learning) based model training workflows. It provides structured instructions for fine-tuning large language models using the Hugging Face ecosystem, covering the full pipeline from dataset preparation through training to model publishing.
The skill encodes best practices for several training paradigms. Supervised Fine-Tuning (SFT) allows you to adapt a pretrained model to specific tasks using labeled examples. Reinforcement Learning from Human Feedback (RLHF) trains a reward model from preference data and then optimizes the base model against it using PPO. Direct Preference Optimization (DPO) offers a simpler alternative to RLHF by directly optimizing the model on preference pairs without a separate reward model. The skill also covers parameter-efficient methods like LoRA and QLoRA, which dramatically reduce GPU memory requirements by training only a small number of adapter parameters.
Key features include automatic configuration of TrainingArguments with sensible defaults for learning rate scheduling, gradient accumulation, and mixed-precision training. The skill handles dataset formatting for different training objectives, sets up proper tokenization with padding and truncation strategies, and configures evaluation metrics. It understands Hugging Face Hub integration for pushing trained models and creating model cards.
When working with the skill, the agent reads your existing codebase to understand your model architecture, dataset format, and hardware constraints, then generates or modifies training scripts accordingly. It can troubleshoot common issues like OOM errors by suggesting batch size adjustments, gradient checkpointing, or switching to quantized training.
Use this skill when you need to fine-tune foundation models for domain-specific tasks, align models with human preferences, or set up reproducible training pipelines on Hugging Face infrastructure.
Use Cases
- Fine-tuning a pretrained LLM on domain-specific instruction datasets using SFT with LoRA adapters
- Setting up a DPO training pipeline to align a chatbot with human preference data
- Configuring QLoRA training to fine-tune a 70B parameter model on a single GPU
- Building a reproducible training pipeline with Weights & Biases logging and Hub model publishing
Pros & Cons
Pros
- + Covers SFT, RLHF, DPO, and parameter-efficient methods in one skill
- + Generates proper TrainingArguments with hardware-aware defaults
- + Integrates with Hugging Face Hub for model sharing and versioning
Cons
- - Requires familiarity with Hugging Face ecosystem and transformer architectures
- - Training scripts still need GPU hardware to actually execute
- - Limited guidance for non-TRL training frameworks like Axolotl or LitGPT
Stay Updated on Agent Skills
Get weekly curated skills + safety alerts
每周精选 Skills + 安全预警