Experiments: Measure the impact of a/b testing
The Experiment Report is a separately priced product. It is currently only offered to those on the Enterprise Plan. See our pricing page for more details.
Overview
The Experiment report analyzes how one variant impacts your metrics versus other variant(s), helping you decide which variant should be rolled out more broadly. To access Experiments, click on the Experiments tab in the navigation panel, or Create New > Experiment.
Building an Experiment Report
Step 1: Select an Experiment
Click ‘New Experiment’ from the Experiment report menu and select your experiment. Any experiment started in the last 30 days will automatically be detected and populate in the dropdown. To analyze experiments that started prior to 30 days, please hard-code the experiment name
NOTE: Only experiments tracked via exposure events, i.e $experiment_started, can be analyzed in the experiment report. Read more on how to track experiments here.
Step 2: Choose the ‘Control’ Variant
Select the ‘Variant’ that represents your control. All your other variant(s) will be compared to the control, i.e how much better are they performing vs the control variant.
Step 3: Choose Success Metrics
Choose the primary metrics of success for the experiment. You can choose from either saved Mixpanel metrics or create a new metric leveraging the query panel. You can also add secondary metrics and guardrail metrics as required.
Step 4: Select the Test Duration
Enter either the sample size (the number of users to be exposed to the experiment) or the minimum number of days you want the experiment to run. This will determine the test duration. Once the sample size or days are complete, you can conclusively read the experiment results and make a decision.
Step 5: Confirm other Default Configurations
Mixpanel has set default automatic configurations, seen below . If required, please modify them as needed for the experiment
- Experiment Model type: Sequential
- Confidence Threshold: 95%
- Experiment Start Date: Date of the first user exposed to the experiment
Reading Experiment Report Results
The Experiments report identifies significant differences between the Control and Variant groups. Every metric has two key attributes:
- p-value : this shows if the variants’ delta impact vs the control is statistically significant
- lift : the variants’ delta impact on the metric vs control
Metric rows in the table are highlighted when any difference is calculated with high confidence. Specifically, if the difference is greater than the confidence interval you set up during the experiment configuration
- Positive differences, where the variant value is higher than control, are highlighted in green
- Negative differences, where the variant value is lower than control, are highlighted in red
- Statistically insignificant results remain gray
How do you read statistical significance?
The main reason you look at statistical significance (p-value) is to get confidence on what it means for the larger roll out.
Max Significance Level (p-value) = [1-CI]/2 where CI = Confidence Interval
In the above image for example, max p=0.025 [(1-0.95)/2]
So, if an experiment’s results show
- p ≤ 0.025 : results are statistically significant for this metric, i.e you can be 95% confidence in the lift seen if the change is rolled out to all users.
- p > 0.025 : results are not statistically significant for this metric, i.e you cannot be very confident on the results if the change is rolled out broadly.
How do you read lift?
Lift is the percentage difference between the control and variant(s) metrics.
Lift, mean, and variance are calculated differently based on the type of metric being analyzed. We categorize metrics into 3 types:
- Numeric - any metrics that involve numeric property math (sum, average, etc)
- Binomial - any metric that has a true or false outcome (unique users, funnel conversions, retention)
- Rate - any metric that can be conceptualized as a rate (funnel conversion rate, total events/experiment, etc)
The ‘group rate’ is calculated differently depending on the type of metric.
-
For numeric & binomial metrics:
NOTE: Normalizing the rate based on the number of users exposed helps understand the possible impact on every single user exposed to the experiment
-
For rate metrics: the group rate is the same as the metric for the users in the group. Example: if calculating a funnel conversion rate, then the group rate is the overall conversion rate of the funnel for users in the group.
NOTE: Conversion rates are normalized as is, hence no further normalization is done
When do we say the Experiment is ready to review?
Once the ‘Test Duration’ setup during configuration is complete, we show a banner that says “Experiment is ready to review”.
Test Duration can be either of two options:
- Sample size to be exposed
- Number of days you’d like to run the experiment
NOTE: If you are using a ‘sequential’ testing experiment model type, you can always peek at the results sooner. Learn more about what sequential testing is here (link)
Diagnosing experiments further in regular Mixpanel reports
Click ‘Analyze’ on a metric to dive deeper into the results. This will open a normal Mixpanel insights report for the time range being analyzed with the experiment breakdown applied. This allows you to view users, view replays, or apply additional breakdowns to further analyze the results.
You can also add the experiment breakdowns and filters directly in a report via the Experiments tab in the query builder. This lets you do on-the-fly analysis with the experiment groups. Under the hood, the experiment breakdown and filter works the same as the Experiment report.
Looking under the hood - How does the analysis engine work?
The Experiment report behavior is powered by borrowed properties.
For every user event, we identify if the event is performed after being exposed to an experiment. If it was, then we borrow the variant details from the tracked $experiment_started to attribute the event to the proper variant.
FAQs
-
If a user switches variants mid-experiment, how do we calculate the impact on metrics?
We break a user and their associated behavior into fractional parts for analysis. We consider the initial behavior part of the first variant, then once the variant changes, we consider the rest of the behavior for analysis towards the new variant.
-
If a user is part of multiple experiments, how do we calculate the impact of a single experiment?
We consider the complete user’s behavior for every experiment that they are a part of.
We believe this will still give accurate results for a particular experiment, as the users have been randomly allocated. So there should be enough similar users, ie. part of multiple experiments, across both control and variants for a particular experiment.
-
For what time duration do we associate the user being exposed to an experiment to impact metrics?
Post experiment exposure, we consider a user’s behavior as ‘exposed’ to an experiment for a max of 90 days.
Adding Experiments to an Implementation
Mixpanel experiment analysis work based on exposure events. To use the experiment report, you must send your Exposure events in the following format:
Event Name: “$experiment_started”
Event Properties:
- “Experiment name” - the name of the experiment to which the user has been exposed
- “Variant name” - the name of the variant into which the user was bucketed, for that experiment
An example track call would look like this:
mixpanel.track('$experiment_started', {'Experiment name': 'Test', 'Variant name': 'v1'})
You can specify the event and property that should be used as the exposure event, name, and variant in the project settings in the Overview tab under ‘Experiment Event Settings’. This allows you to use an experiment event that you’re already tracking, for example, via a 3rd party feature flagging tool. Note, only string properties should be used for the ‘Name’ and ‘Variant’.
When to track an exposure event?
-
An exposure event ONLY needs to be sent the first time a user is exposed to an experiment as long as the user is always in the initial bucketed variant. Exposure events don’t have to be sent subsequently in new sessions.
-
If a user is part of multiple experiments, send a corresponding exposure event for each experiment.
-
Send exposure event only when a user is actually exposed, not at the start of a session.
For example,if you want to run an experiment on the payment page of a ride-sharing app, you only really care about users who open the app, book a ride, and then reach the payment page. Users who only open the app and do other activities shouldn’t be considered in the sample size. So exposure event should ideally be implemented to track only once the payment page is reached.
-
Send exposure details and not the assignment.
For example, you begin an experiment on 1st Aug, and 1M users are ‘assigned’ to the control and variant. You do not want to send an ‘exposure’ event for all these users right away, as they have only been assigned to the experiment. It’s possible that some user gets exposed on 4th Aug and some on 8th Aug. You would want to track $experiment_started at the exposure for accurate analysis.
Experiment Pricing
The Experiment Report is a separately priced product offered to organizations on the Enterprise Plan. Please contact us for more details.
Pricing Unit
Experimentation is priced based on MEUs - Monthly Experiment Users. Only users exposed to an experiment in a month are counted towards this tally.
FAQ
How are MEUs different than MTUs (Monthly Tracked Users)?
MTUs count any user who has tracked an event to the project in the calendar month. MEU is a subset of MTU, it’s only users who have tracked an exposure experiment event (ie, $experiment_started) in the calendar month.
How can I estimate MEUs?
If you actively run experiments you can look at the number of monthly users exposed to an experiment. Note the MEU calculation is different if users are, on average, exposed to 30 or more experiments in a month.
If not running experiments, below are some rough estimations of MEU’s based on the number of MTUs being tracked to the project.
MTU bucket | **Estimated MEU (% MTU) ** |
---|---|
Small (< 100k) | 50-100% |
Medium (100k - 1M) | 40-75% |
Large (1M - 10M) | 25-60% |
Very large (10M - 100M) | 20-50% |
Medium (100k - 1M) | 40-75% |
100M + | 10-25% |
Does it matter how many experiments a user is exposed to within the month?
We’ve accounted for a MEU to be exposed to up to 30 experiments per month. If the average number of experiment exposure events per MEU is over 30, then the MEUs will be calculated as the total number of exposure events divided by 30.
What happens if I go over my purchased MEU bucket?
You can continue using Mixpanel Experiment Report, but you will be charged a higher rate for the overages.
Can I analyze experiments prior to the purchase date?
No. You can only analyze experiments starting from your experimentation purchase date. This means that the date used in your experiment cannot start prior to the purchase date.
But I am already paying for exposure events in my regular plan. Am I getting double-charged?
If you buy the Experimentation offering, we waive the charge for exposure events in your regular Mixpanel plan. You only get charged for the exposure events via the MEU calculation.
How can I monitor my account’s MEU consumption?
You can see your experiment MEU usage by going to Organization settings > Plan Details & Billing.
References
Experiment Model Types
-
Sequential
Allows you to detect lift and conclude experiments quickly, but may fail to reach significance for very small lifts
When to use? If you’re looking for impact as tiny as 1% i.e, super low lifts
-
Frequentist
Capable of detecting smaller lifts, but requires you to keep experiments for the full duration. You’re discouraged from preemptively making decisions before the test duration is complete
When to use? If you’re looking for large impact changes like 10%+
Experiment metric types
- Primary Metrics: Main goals you’re trying to improve. These are metrics used to determine if the experiment succeeded. Examples: revenue, conversion rates, ARPU.
- Guardrail Metrics: These are other important metrics that you want to ensure haven’t been negatively affected while focusing on the primary metrics. Examples: CSAT, churn rate.
- Secondary Metrics: These provide a deeper understanding of how users are interacting with your changes, i.e, help to understand the “why” behind changes in the primary metric. Examples: time spent, number of pages visited, or specific user actions.
Post Experiment Analysis Decision
Once the experiment is ready to review, you can choose to ‘End Analysis’. Once complete, you can log a decision, visible to all users, based on the experiment outcome:
- Ship Variant (any of the variants): You had a statistically significant result. You have made a decision to ship a variant to all users. NOTE: Shipping variant here is just a log, it does not actually trigger rolling out the feature flag unless you are using Mixpanel feature flags (in beta today).
- Ship None: You may not have had any statistically significant results, or even if you have statistically significant results, the lift is not sufficient to warrant a change in user experience. You decide not the ship the change.
- Defer Decision: You may have a direction you want to go, but need to sync with other stakeholders before confirming the decision. This is an example where you might defer decision, and come back at a later date and log the final decision.
Experiment Management
You can manage all your experiments via the Experiments Home tab. You can customize which columns you’d like to see.
Was this page useful?