May 14th, 2012
Last week, I attended a conference presentation where a team presented findings from their A/B Testing efforts. It was a cute presentation where they posted the control and test variants, then asked the audience to pick which one “won” the A/B test. They compared the audience answer to the variant that demonstrated the best increase in the conversion rate (sometimes as little as 0.9%, which the presenters declared as a “huge increase”). For dramatic effect, the variant that won often broke many commonly accepted design principles, supporting their case that A/B testing trumps our traditional rules of good design.
Maybe so. Yet, that wasn’t the part that interested me the most. What interested me was something completely different.
A few months ago, I had conducted usability tests on the online service produced by the presenters’ company’s — the very one they showed in their presentation. Our study looked at several sites, asking shoppers to buy the products and services offered, looking for what each site does well and where their weaknesses were. It was really insightful on many different levels.
In our study, we watched more than a dozen of the presenters’ company’s own customers attempt to buy products. While many were successful, a surprising number weren’t, even though this company is the biggest in its industry (and hailed by many as the most successful). Their site looks slick, but when folks sat down to use it for its primary goal, it’s design put up a ton of frustrating obstacles.
In many cases, the users thought they ordered the product they wanted, only to discover upon receipt that it wasn’t at all what they wanted. As we watched those shoppers make their orders, we could see that they would not get what they wanted. Yet, the design of the site was so convoluted and confusing that the shopper never detected the problem until it was too late.
That’s why, when I was sitting in the conference presentation, I was quite surprised by what the presenters showed. All the examples were not things that made the shopping experience difficult. They were not things that were giving their customers problems. They were little things that, if dramatically improved, wouldn’t, in my opinion, affect the overall experience of the site or the satisfaction of customers buying their products.
I get that the presenters probably didn’t want to reveal their secret sauce to an audience of a hundred or so user experience professionals who might (like me) be working for their competitors. Maybe they were also working on the problems I saw in our study, but didn’t want to talk about that publicly.
However, the A/B tests they presented showed they were applying a ton of effort to optimize things that weren’t close to the things we saw preventing sales on their site. If the message was that A/B testing helps, I didn’t get that because I saw them futzing around with tweaking insignificant button text when there were huge deficiencies in the design that they still haven’t resolved.
Maybe the presenter’s team is working on these things that are wrecking their shopping experience? Maybe they aren’t using A/B tests for these hard problems? I don’t know.
Yet I’ve seen this before: A/B tests are fun and they easily become part of the dog-and-pony show. They give numbers and a clear, easy-to-see winner. They get execs and stakeholders excited, because “improvement” is easy to spot. But are we doing damage to our mission when we give so much attention to these tests of trivial design changes?
Part of this might be because the presenters were determining their winners by using the wrong measure. Conversion rate has lots of problems as a measure of success, but its big crime is that it focuses purely on the pressing of the purchase button. It doesn’t measure whether the users are happy with that purchase or whether they are delighted with the product they finally received and the way they received it. It’s easy to optimize for conversion while sacrificing a great experience. Conversion ≠ Delight.
We should be careful about how we show off tools like A/B tests. Someone might think we don’t actually care about the users’ experiences, as we optimize for overly simplistic measurements.Tweet