Question 1

What statistical test does this calculator use?

Accepted Answer

This calculator uses a two-proportion z-test, which is the standard method for A/B testing with binary outcomes (converted / not converted). It calculates the pooled standard error of both proportions, then computes the z-score for the difference. The p-value is derived from the standard normal distribution. This is the same method used by most major A/B testing platforms.

Question 2

What confidence level should I use for A/B testing?

Accepted Answer

95% is the industry standard and appropriate for most tests. It means you're accepting a 5% false positive rate - 1 in 20 tests will show a winner that isn't real. For high-stakes, hard-to-reverse changes (redesigning checkout, changing pricing), hold out for 99%. For low-stakes, easily reversible changes (button color, microcopy), 90% can be acceptable if you're time-constrained and the effect size is large. The key: set your threshold before looking at the data, not after.

Question 3

How do I know when to stop an A/B test?

Accepted Answer

The correct answer: stop when you've reached the pre-calculated sample size, not when you hit significance. Stopping early because you see 95% confidence after 3 days is called "peeking" and inflates false positive rates significantly. Calculate the required sample size before the test using a sample size calculator, and don't look at significance until you've hit that number. The only valid early stopping reasons: clear harm to one group, or a pre-agreed sequential testing protocol.

Question 4

What does p-value mean in A/B testing?

Accepted Answer

The p-value is the probability of observing this large a difference between variants if there were actually no real difference (i.e., if the null hypothesis were true). A p-value of 0.05 means there's a 5% probability the observed difference happened by chance. It does NOT mean there's a 95% probability the variant is better - that's a common misinterpretation. It also doesn't tell you anything about the size of the effect, only whether it's distinguishable from noise.

Question 5

What's the difference between relative and absolute uplift?

Accepted Answer

Absolute uplift is the raw difference: if control converts at 2% and variant at 2.5%, absolute uplift is 0.5 percentage points. Relative uplift is the percentage improvement: that same change is a 25% relative uplift (0.5 / 2.0 = 25%). Be careful with how results are reported - "a 25% improvement" sounds much larger than "0.5 percentage points". Both are accurate descriptions of the same result. Use absolute uplift when communicating revenue impact, relative uplift when comparing tests across pages with different baseline rates.

Question 6

How many visitors do I need for an A/B test?

Accepted Answer

It depends on three things: your baseline conversion rate, the minimum effect size you want to detect, and your desired confidence level and power. A page converting at 5% with 10,000 monthly visitors can detect a 10% relative improvement in about 2-3 weeks at 95% confidence. A page converting at 0.5% needs substantially more. Use a sample size calculator before starting - it'll tell you exactly how many visitors per variant you need.

Question 7

Can I run multiple A/B tests at once?

Accepted Answer

You can, but they should be on different pages or different conversion goals - not on the same page with the same conversion event. Running two tests on the same page simultaneously creates interaction effects: if both win or both lose, you don't know which change drove it. The exception is multivariate testing (MVT), which specifically accounts for interactions - but MVT requires much larger sample sizes. For most teams: one test per page, per conversion goal, at a time.

Question 8

What if my A/B test shows no significant difference?

Accepted Answer

A null result is still a valid result - it means the change you made probably doesn't affect conversion rate in a meaningful way. This is useful information. It rules out that hypothesis and tells you to look elsewhere. The typical mistake is running the same test again hoping for a different result, or calling the test too early when confidence is at 60%. If you've hit your target sample size and there's no significance, the variant probably doesn't matter. Use survey data to find a more impactful hypothesis to test next.

Is your A/B test result actually significant?

Control (A)

Variant (B)

How statistical significance is calculated

Enter your data

We run the z-test

Read the verdict

The formula

How to read your results

Confidence level

P-value

Relative uplift

Z-score

The most expensive A/B testing mistakes

Stopping the test too early

Running multiple tests at once on the same page

Testing without a hypothesis

Frequently asked questions