Tuesday, May 14, 2013

Confidence Graphing

The job of an analyst isn't to find stories in the data.  It's a given that they exist.  The real job of an analyst is to be able to effectively communicate and convey those stories in a way that can be understood without elaborate explanations.

One of the biggest problems when presenting data on a graph, is being able to eliminate statistical outliers without modifying the data.  If you plot them the same way that you plot all the other points, they will invariably raise eyebrows and require explanation.

I was playing around with a graph this morning and I came up with a way to graph the confidence of my results.



The graph above has two lines.  The blue line is the conversion rate, and the red line is the number of visitors who saw that variation (the variation is page load time in seconds).

The blue line makes it look like 26 is a clear winner, and 18-20 convert twice as well as all other options.  Unfortunately, the red line tells a completely different story.  Effectively, the higher the red line, the more confident we are that the blue line is correct.

It takes a well trained analyst to be able to ignore the outliers in a graph like this.



My final result is a bar graph where each bar is given the opacity that corresponds with our confidence level. The darker the bar, the more confident we are that it is correct.  The outliers are still there (and going off the chart), but at 1% opacity, they don't warrant any attention.