Skip to content

A/B Testing Framework

Caution

Design and implement A/B tests with proper statistical methodology, sample size calculation, feature flags, and significance testing for conversion optimization.

By community 3,900 v1.1.0 Updated 2026-03-08

Install

Claude Code

Copy the SKILL.md file to .claude/skills/a-b-testing.md

About This Skill

A/B Testing Framework generates statistically rigorous experimentation infrastructure to avoid the common mistakes that invalidate most A/B tests.

Pre-Experiment Design

Sample size calculator with inputs: baseline conversion rate, minimum detectable effect (MDE), statistical power (80% default), and significance threshold (α=0.05). Runtime estimator based on current traffic volume. Multiple comparison correction (Bonferroni) for multi-variant tests.

Assignment

Deterministic user bucketing via MurmurHash3 on `user_id + experiment_id`. Ensures users see the same variant on every visit. Traffic allocation by percentage. Holdout groups for long-term effect measurement. Exclusion rules to prevent experiment interference.

Feature Flags

Integrates with LaunchDarkly, Unleash, or a self-hosted flag service. Server-side flag evaluation prevents flickering. SDK wrappers for React (useFlag hook), Python, and Go.

Analysis

Frequentist — Z-test for proportions, t-test for continuous metrics, chi-square for multi-category. Confidence intervals. p-value with multiple testing correction.

Bayesian — Beta-Binomial conjugate model for conversion rates. Probability to be best, expected loss, credible intervals. Thompson Sampling for multi-armed bandit scenarios.

Common Pitfalls Detection

Sample Ratio Mismatch (SRM) detection, novelty effect warnings for long-run test drift, and network effect warnings for social products.

Use Cases

  • Running landing page copy tests with proper power analysis and minimum detectable effect
  • Implementing feature flag-based A/B tests with consistent user bucketing
  • Analyzing experiment results with frequentist and Bayesian methods
  • Designing multi-variate tests with proper traffic allocation across variants

Pros & Cons

Pros

  • + Pre-experiment sample size calculator prevents underpowered tests
  • + SRM detection catches assignment bugs that would otherwise invalidate results
  • + Bayesian analysis provides probability-based decisions, not just p-value cutoffs
  • + MurmurHash bucketing ensures consistent assignment without database storage

Cons

  • - Minimum sample sizes mean small sites cannot reach significance on rare conversions
  • - Bayesian analysis requires choosing priors which introduces subjective decisions

Related AI Tools

Related Skills

Stay Updated on Agent Skills

Get weekly curated skills + safety alerts

每周精选 Skills + 安全预警