Power and Sample Size

Many teams run experiments that cannot detect meaningful effects. This is a planning failure, not an analysis failure.

What Is Statistical Power?

Power is the probability of detecting a real effect of a specific size:

\[\text{Power} = 1 - \beta\]

Typical target: 80% or 90%.

Low power means you can easily miss wins and wrongly conclude “no effect”.

To calculate required sample size per variant, specify:

Smaller MDE requires much larger sample.

MDE is the smallest effect worth detecting.

Set MDE based on business value, not wishful thinking.

A good MDE balances decision value and experiment cost.

If required sample is 20,000 users total and traffic is 1,000 users/day, run at least 20 days.

Do not stop early just because interim results look promising.

Underpowered tests increase:

They often produce inconclusive outcomes that still consume engineering and opportunity cost.

In Part 4, we move from theory to common failure modes that invalidate otherwise well-planned tests.