Do A/B Tests Focus Us On The Wrong Problems?

Jared Spool

May 14th, 2012

Last week, I attended a conference presentation where a team presented findings from their A/B Testing efforts. It was a cute presentation where they posted the control and test variants, then asked the audience to pick which one “won” the A/B test. They compared the audience answer to the variant that demonstrated the best increase in the conversion rate (sometimes as little as 0.9%, which the presenters declared as a “huge increase”). For dramatic effect, the variant that won often broke many commonly accepted design principles, supporting their case that A/B testing trumps our traditional rules of good design.

Maybe so. Yet, that wasn’t the part that interested me the most. What interested me was something completely different.

A few months ago, I had conducted usability tests on the online service produced by the presenters’ company’s — the very one they showed in their presentation. Our study looked at several sites, asking shoppers to buy the products and services offered, looking for what each site does well and where their weaknesses were. It was really insightful on many different levels.

In our study, we watched more than a dozen of the presenters’ company’s own customers attempt to buy products. While many were successful, a surprising number weren’t, even though this company is the biggest in its industry (and hailed by many as the most successful). Their site looks slick, but when folks sat down to use it for its primary goal, it’s design put up a ton of frustrating obstacles.

In many cases, the users thought they ordered the product they wanted, only to discover upon receipt that it wasn’t at all what they wanted. As we watched those shoppers make their orders, we could see that they would not get what they wanted. Yet, the design of the site was so convoluted and confusing that the shopper never detected the problem until it was too late.

That’s why, when I was sitting in the conference presentation, I was quite surprised by what the presenters showed. All the examples were not things that made the shopping experience difficult. They were not things that were giving their customers problems. They were little things that, if dramatically improved, wouldn’t, in my opinion, affect the overall experience of the site or the satisfaction of customers buying their products.

I get that the presenters probably didn’t want to reveal their secret sauce to an audience of a hundred or so user experience professionals who might (like me) be working for their competitors. Maybe they were also working on the problems I saw in our study, but didn’t want to talk about that publicly.

However, the A/B tests they presented showed they were applying a ton of effort to optimize things that weren’t close to the things we saw preventing sales on their site. If the message was that A/B testing helps, I didn’t get that because I saw them futzing around with tweaking insignificant button text when there were huge deficiencies in the design that they still haven’t resolved.

Maybe the presenter’s team is working on these things that are wrecking their shopping experience? Maybe they aren’t using A/B tests for these hard problems? I don’t know.

Yet I’ve seen this before: A/B tests are fun and they easily become part of the dog-and-pony show. They give numbers and a clear, easy-to-see winner. They get execs and stakeholders excited, because “improvement” is easy to spot. But are we doing damage to our mission when we give so much attention to these tests of trivial design changes?

Part of this might be because the presenters were determining their winners by using the wrong measure. Conversion rate has lots of problems as a measure of success, but its big crime is that it focuses purely on the pressing of the purchase button. It doesn’t measure whether the users are happy with that purchase or whether they are delighted with the product they finally received and the way they received it. It’s easy to optimize for conversion while sacrificing a great experience. Conversion ≠ Delight.

We should be careful about how we show off tools like A/B tests. Someone might think we don’t actually care about the users’ experiences, as we optimize for overly simplistic measurements.

7 Responses to “Do A/B Tests Focus Us On The Wrong Problems?”

  1. Petar Subotic Says:

    Interviewer: What did you do at your previous job?
    Interviewee: Conversion rate optimization, we had an entire department.
    Interviewer: Thank you, we’ll call! Cough cough.

  2. Andrew Anderson Says:

    You are assuming that there is a correlation between happiness and revenue. Or more importantly, what people complain about and what actually influences their behavior. Psychology shows that there is none.

    I can’t speak for the tests you saw, and would argue that .9% lift is noise, but any time you assume what matters, or believe that qualitative measures direct correlate to quantitative outcomes, you are showing a massive bias. Being able to take opinion out, from both the tester and the user, allows us to get a much better unbiased view of influence and of truly changing performance.

  3. Alex Genov Says:

    You are making very good points, Jared!

    On the other hand, A/B (quantitative) testing and usability (qualitative) testing are two very different and, in my opinion, complementary tools. It is not about which one to use, but which one to use at what specific time of the dev lifecycle to answer what research questions.

    I agree that mindless A/B testing can be misused, but then again the results of a small sample usability test can be misused as well. If the two methods are used wisely and used together, then the major of well-done research happens.

    One last point – 1% lift for a high-traffic business can translate in tens of millions of dollars and should not be dismissed lightly.

  4. Nathanael Boehm Says:

    Great article, thanks Jared. Recently getting back into private sector work, the debate over “Conversion is everything” has been a bit of a battle.

  5. Todd Shelton Says:

    Hi Jared,

    Thoughtful as always, thanks. A/B testing by definition can’t see anything beyond itself, right? Are you suggesting that there is some set of methods, which if applied in the right order, will create better results? What might that look like if one could do whatever they thought was best?

    Thanks,
    Todd

  6. Roger Belveal Says:

    As much as I am a proponent of usability testing in general, including A/B, I’d like to challenge the notion that any kind of testing alone will produce a great product. Somewhere in the mix, somebody has to have a great idea. And it probably involves a leaping hypothesis. Oroville and Wilber didn’t test their way from bicycles to airplanes.

    - roger
    http://www.belveal.net

  7. Craig Sullivan Says:

    The answer is to do both – A/B tests can be driven hugely by traditional Usability Research techniques. The customers provide excellent insight and help drive hypothesis design for split testing.

    Sadly some companies do just tweak with testing but those that have a mixture of UX, agile and testing approaches (in my view) do far better than those reliant on ego or opinion *driving* the testing (oh, the irony).

    So A/B tests and Multi variate tests are an excellent tool to get quant data on qual problems – and if carefully managed, yields excellent results. Our approach at Belron is to use a wide range of techniques but not to let them be the master – more use them as our servant to strive towards solutions to experience problems.

Add a Comment