Even perfect statistics cannot fix broken experiment data. In this lesson, we focus on high-impact quality failures.
Lucky Day Trap
A result can appear significant mainly because of one anomalous day.
Warning signs:
- Most lift comes from a single date window
- Removing one day changes the conclusion entirely
- Traffic behavior looks structurally different on that day
Action:
- Investigate incident logs, campaigns, and outages
- Re-run analysis excluding anomaly as sensitivity check
- Extend test if needed
Sample Ratio Mismatch (SRM)
If you planned 50/50 assignment but observe a meaningful imbalance, randomization may be broken.
Potential causes:
- Assignment bugs
- Event drop in one variant
- Redirect or rendering failures
- Bot filtering asymmetry
Action:
- Stop trusting outcome metrics
- Diagnose assignment and tracking path
- Fix and rerun from scratch
Base-Rate Mismatch
If control baseline differs strongly from expected historical rate, your original power assumptions may be invalid.
Action:
- Compare baseline with recent stable periods
- Recompute expected duration and MDE if necessary
- Document context shifts (seasonality, campaigns, product changes)
Data Loss and Differential Tracking
Missing data is dangerous when loss is asymmetric between variants.
Example risk:
- Heavier treatment page causes more tracking failures, making treatment look worse artificially
Action:
- Cross-check client-side and server-side counts
- Monitor funnel step consistency by variant
- Exclude affected windows or rerun experiment
Data Quality Gate Before Analysis
Only interpret effect size after these checks pass:
- Assignment integrity (including SRM)
- Tracking completeness
- Baseline sanity
- Day-level stability review
Key Takeaways
- Trustworthiness comes before significance
- SRM and differential loss can invalidate the entire test
- Sensitivity analysis helps detect fragile conclusions
In Part 5, we continue with advanced pitfalls: peeking, underpowered inference, and “too good to be true” results.