What does it mean to have statistically significant results?  Are insignificant results irrelevant?
To answer these questions, we have to look at what it means to have statistically significant results.  If you have two sets of results, A and B.  
A: 1624 visitors; 72 conversions; 4.43% conversion rate.  
B: 1609 visitors; 96 conversions; 5.97% conversion rate.
B clearly has a higher conversion rate.  Statistical significance helps tell you how likely it is that B will ALWAYS have the higher conversion rate.  
Statistical significance makes an estimate of what the best possible conversion rate will look like and the worst possible conversion rate.  If the worst day for B is still better than the average day for A, then you have a statistically significant result.  However, if the worst day for B is lower than the average day for A, than it is probable, A will actually have a better conversion rate than B on any given day.
In the example above, B's worst day is still better than A's average day, so the results are significant.  In this case we have a 98% confidence level.
But what if the confidence level is low?
Statistical significance is only one tool to help you determine whether a change is good or not.  It is important that you also look at volume, time, revenue, and trends.
If the difference in the volume of conversions is significant to your business, then the results are significant.  In our example above, the difference is 24 conversions.  If 24 conversions, during the time frame of the test, is big for the company, then the results of the changes should not be ignored.
Time is another important factor.  Too little time means you don't have enough natural variation and randomness to demonstrate confidence in your numbers.  If you reach statistical significance in 1 hour, its hard to imagine it will hold at the end of the day.  Results like this often mean you should segment your traffic vs make broad, sweeping changes.  
Revenue can also be a game changer.  If A above generated more revenue because it increased cart size, the conversion rate suddenly becomes less important.  I would suggest testing a hybrid of A and B to both increase conversions and cart size.  
The last thing you want to pay attention to are the trends.  If B is consistently above A on a daily basis, then you have nothing to worry about.  However, if B has a high variability, meaning some days it is very low and some days it is very high, you may want to reconsider.  It can be hard on a company to have an ebb and flow of orders.
Statistical significance is just one part of the story.  It is important to take an overall view of your changes to make sure you are making the right choice.

 
Hi Jared,
ReplyDeleteI've decided to continue our talk here, and no spam that LinkedIn topic :).
I have question about Revenue indicator. Since it's not normally distributed, how do you evaluate it? Just look at numbers or have some special way to treat it?
P.S. How can I subscribe here? Only through registering?
Hi Ilya,
DeleteI agree that revenue can be a little tricky to evaluate. I generally compare the total revenue and value per visitor generated by each test competitor during the testing period. If there seems to be a significant difference I take a deeper look into this story. With revenue, I find, the longer the test, the more accurate your projections can be.
I'm currently working on a statistical model to take into account many of these other indicators and return a practical significance score.
I have added an email and rss feed link to the right side bar. The rss url is: http://contourthis.blogspot.com/feeds/posts/default?alt=rss
Sincerely,
Jared Smith
Hi Jared,
DeleteI found my own way out with revenue problem for up-sales tests by switching to bi-nominal distribution for conversions. Though it's time consuming 'cause you need to analyze conversions and assign 0/1 value if it was better priced product or not. But as far as I understand in this case we avoiding distribution issue. What do you think?
As for the time, when I need to be precise, I analyze the data by day. In this case I can switch from binominal to normal distribution and not to worry about the approximation mistake of the binominal model. But again I have to do it by hand. So, if then like you wrote the trend is 'positive' with more or less equal variances, I can be more certain.
And I'm currently one step 'below'. Statistical modeling will be in my focus of attention, but only when I finish algorithm/framework of CRO program for a new customer. So, any posts about this topic is highly appreciated! :)
P.S. Thanks for RSS, now I'm your reader.