AI Engineer
VerifiedAI/ML engineering specialist for building intelligent features, RAG systems, LLM integrations, data pipelines, vector search, and AI-powered applications. Us...
$ Add to .claude/skills/ About This Skill
# AI Engineer
Build practical AI systems that work in production. Data-driven, systematic, performance-focused.
Core Capabilities
- LLM Integration: OpenAI, Anthropic, local models (Ollama, llama.cpp), LiteLLM
- RAG Systems: Chunking, embeddings, vector search, retrieval, re-ranking
- Vector DBs: Chroma (local), Pinecone (managed), Weaviate, FAISS, Qdrant
- Agents & Tools: Tool-calling, multi-step agents, OpenClaw sub-agents
- Data Pipelines: Ingestion, cleaning, transformation, feature engineering
- MLOps: Model versioning (MLflow), monitoring, drift detection, A/B testing
- Evaluation: Benchmark construction, bias testing, performance metrics
Decision Framework
Which LLM provider? - **Prototyping/speed**: OpenAI GPT-4o or Anthropic Claude Sonnet - **Local/private**: Ollama + Qwen 2.5 32B or Llama 3.3 70B - **Multi-provider abstraction**: LiteLLM (swap models without code changes) - **Embeddings**: text-embedding-3-small (OpenAI) or nomic-embed-text (local)
Which vector DB? - **Local/dev**: Chroma (zero setup) - **Production managed**: Pinecone - **Self-hosted production**: Qdrant or Weaviate - **Already in Postgres**: pgvector extension
RAG or fine-tuning? - **RAG first** — always try RAG before fine-tuning. 90% of cases RAG is enough. - Fine-tune only when: style/tone change needed, domain vocab is highly specialized, latency must be minimal
RAG Workflow
1. Ingest ```python # Chunk documents (rule of thumb: 512 tokens, 50 overlap) from langchain.text_splitter import RecursiveCharacterTextSplitter splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=50) chunks = splitter.split_documents(docs) ```
2. Embed + store ```python import chromadb from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
client = chromadb.PersistentClient(path="./chroma_db") ef = OpenAIEmbeddingFunction(api_key=os.environ["OPENAI_API_KEY"], model_name="text-embedding-3-small") collection = client.get_or_create_collection("docs", embedding_function=ef) collection.add(documents=[c.page_content for c in chunks], ids=[str(i) for i in range(len(chunks))]) ```
3. Retrieve + generate ```python results = collection.query(query_texts=[user_query], n_results=5) context = "\n\n".join(results["documents"][0])
response = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": f"Answer based on this context:\n{context}"}, {"role": "user", "content": user_query}, ] ) ```
See `references/rag-patterns.md` for advanced patterns: re-ranking, hybrid search, HyDE, eval.
LLM Tool Calling (Agents)
```python tools = [{ "type": "function", "function": { "name": "search_docs", "description": "Search internal documentation", "parameters": { "type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"] } } }]
response = openai.chat.completions.create(model="gpt-4o", messages=messages, tools=tools) ```
See `references/agent-patterns.md` for multi-step agent loops, error handling, tool schemas.
Critical Rules
- Evaluate early — build an eval set before you build the system
- RAG before fine-tuning — always
- Log everything — prompts, completions, latency, token usage from day one
- Test for bias — especially for user-facing classification or scoring systems
- Never hardcode API keys — use env vars or secret managers
References
- `references/rag-patterns.md` — Chunking strategies, re-ranking, HyDE, hybrid search, evaluation
- `references/agent-patterns.md` — Tool calling, multi-step loops, memory, error handling
Use Cases
- Integrate LLM APIs from OpenAI, Anthropic, or local models into production applications
- Build RAG systems with chunking, embedding, vector search, and re-ranking pipelines
- Design and implement AI agent architectures with tool use, memory, and planning capabilities
- Optimize model inference for latency and cost using quantization, caching, and batching
- Set up evaluation frameworks to measure AI system accuracy, safety, and performance
Pros & Cons
Pros
- +Comprehensive coverage of the full AI engineering stack — LLMs, RAG, agents, fine-tuning, and eval
- +Production-focused mindset with emphasis on performance optimization and systematic testing
- +Multi-model support including cloud APIs and local models via Ollama and llama.cpp
Cons
- -Broad scope means shallow coverage on any single topic — more of a generalist than a specialist
- -No runnable code or scripts included — provides guidance but not executable implementations
- -Requires significant existing ML/AI knowledge to fully leverage the recommendations
FAQ
What does AI Engineer do?
What platforms support AI Engineer?
What are the use cases for AI Engineer?
100+ free AI tools
Writing, PDF, image, and developer tools — all in your browser.
Next Step
Use the skill detail page to evaluate fit and install steps. For a direct browser workflow, move into a focused tool route instead of staying in broader support surfaces.