Make Sense of Your A/B Test Results.
A/B testing isn't just about launching two versions; it's about knowing with confidence which one is truly better. Statistical tests are the tools that help you separate random chance from a real improvement. This guide will help you choose the right test for your data.
Which Test Should I Use?
Answer the questions below to find the best statistical test for your experiment.
1. What kind of metric are you measuring?
2. How many variations are you comparing?
Based on your selections, we recommend the:
Learn MoreStatistical Test Details
Z-Test for Proportions
When to use: When you're comparing two proportions, like the conversion rates of two different landing pages. This is one of the most common tests in A/B testing.
Assumptions: You need a large enough sample size (typically >30 in each group) and the samples should be independent.
Interactive Example: Conversion Rate
Student's t-Test
When to use: To compare the means of two groups for a continuous metric, like average revenue per user or average session duration. Useful for smaller sample sizes where the population variance is unknown.
Assumptions: Data is normally distributed, samples are independent, and variances between the two groups are approximately equal.
Interactive Example: Average Session Time (minutes)
Chi-Squared Test
When to use: To compare the distribution of categorical data between two or more groups. For example, testing if the proportion of users who choose option A, B, or C on a form is different between two website versions.
Assumptions: Data is categorical, observations are independent, and the expected frequency in each cell of the contingency table is at least 5.
Interactive Example: Feature Preference
ANOVA (Analysis of Variance)
When to use: When comparing the means of three or more groups for a continuous metric. It's like a t-Test for more than two variations (A/B/n testing). ANOVA tells you if there's a significant difference *somewhere* among the groups, but not which specific group is different.
Assumptions: Similar to a t-Test: data is normally distributed, samples are independent, and variances are equal across all groups.
Interactive Example: Avg. Purchase Value
For simplicity, this example assumes equal sample sizes and variance. The p-value is a conceptual estimate.
Key Concepts Glossary
P-value
The p-value represents the probability of observing your data (or something more extreme) if the null hypothesis were true. A small p-value (typically < 0.05) suggests that your observed result is unlikely to be due to random chance alone, allowing you to reject the null hypothesis.
Null Hypothesis (H₀)
The null hypothesis is the default assumption that there is no difference between the groups you are testing. For example, "Version A's conversion rate is the same as Version B's." The goal of a statistical test is to see if you have enough evidence to reject this assumption.
Alternative Hypothesis (H₁)
The alternative hypothesis is what you are trying to prove. It states that there *is* a difference between the groups. For example, "Version B's conversion rate is different from Version A's."
Statistical Significance
A result is statistically significant if the p-value is below a predetermined threshold (the significance level, alpha, usually 0.05). It means the observed difference is unlikely to be a fluke of random sampling. It does NOT automatically mean the difference is large or practically important.