Question 1

How is the sample size calculated?

Accepted Answer

This calculator uses the Evans, Peacock & Hastings formula for two-proportion z-tests. It takes your baseline conversion rate, the variant rate implied by your MDE, and the z-scores corresponding to your chosen confidence level and statistical power. The result is the minimum number of visitors per variant needed to reliably detect an effect of that size.

Question 2

What is the minimum detectable effect (MDE)?

Accepted Answer

The MDE is the smallest relative improvement in conversion rate you want your test to be able to detect. It's expressed as a percentage of the baseline. If your baseline is 2% and you set an MDE of 10%, you're asking the test to detect a change from 2% to 2.2%. A smaller MDE means you can detect subtler effects - but requires significantly more data. Choose an MDE based on the traffic you have, not the improvement you hope to see.

Question 3

What's a good MDE to use?

Accepted Answer

10-20% relative is the most common range. Use 10% if you have high traffic and are optimizing a mature, high-volume page. Use 20% if traffic is moderate and you're testing significant changes. Use 50%+ only when traffic is very low - but at that point, supplement with qualitative research (surveys, user interviews) since A/B tests alone will miss most real effects. Avoid setting MDE below 5% unless you have massive scale - the sample sizes become impractical.

Question 4

What statistical power should I use?

Accepted Answer

80% is the standard - it means your test will detect a real effect 80% of the time when it exists (and miss it 20% of the time). 80% is the right choice for most tests. Use 90% when missing a real improvement would be costly (for example, if you're testing a major site redesign and a false negative would delay an important rollout). Higher power requires larger sample sizes, so there's always a tradeoff.

Question 5

How long should an A/B test run?

Accepted Answer

Long enough to collect the pre-calculated sample size, and at least 7-14 days regardless of when you hit the sample size. The minimum duration matters because visitor behavior has weekly patterns - people visiting on Monday convert differently than people visiting on Saturday in many industries. A test that completes in 2 days will over-represent a specific day's audience. Aim for 2-4 weeks as a practical guideline, and never exceed 6-8 weeks (at which point you're accumulating seasonal noise).

Question 6

What if I don't have enough traffic to run a valid A/B test?

Accepted Answer

Three options: (1) Increase your MDE - you can only detect larger effects, but you can still run valid tests. (2) Test on a higher-traffic page even if the conversion goal is less direct. (3) Switch to qualitative methods. On-site surveys, user interviews, and session recordings don't require statistical significance and can surface the same insights faster on low-traffic sites. For most startups under 10,000 monthly visitors, qualitative research produces better ROI than A/B testing.

Question 7

Can I use this for multivariate tests (MVT)?

Accepted Answer

This calculator is for standard two-variant A/B tests. For multivariate tests, you need to account for the number of combinations being tested - the sample size requirement grows substantially. As a rough guide: multiply this calculator's per-variant number by the number of variant combinations in your MVT. For most teams, true MVT is only practical on very high-traffic pages.

Question 8

Why does a lower baseline conversion rate require a larger sample?

Accepted Answer

Because rare events have higher variance. If your baseline is 0.5%, the statistical noise around that estimate is proportionally larger than if your baseline is 5%. To reliably distinguish signal from noise at a low baseline rate, you need many more observations. This is why A/B testing is difficult for low-conversion goals like enterprise demo requests - and why qualitative research methods are often more appropriate.

How many visitors does your A/B test actually need?

Test parameters

Why pre-calculating sample size matters

Too few visitors - false conclusions

Too many visitors - wasted time

No pre-calculation - peeking bias

How to choose your minimum detectable effect

A/B testing best practices

Set sample size before you start

Choose MDE based on traffic, not wishful thinking

Run tests for at least a full week

Test one thing at a time

Use surveys to form better hypotheses

Frequently asked questions