August 16th, 2010
Picking the right assumptions to measure against can improve our results, but basing our measures on the wrong assumptions can send us in the wrong direction.
Former UIEer Joshua Porter wrote a blog post recently, Why A/B Testing Isn’t Just About Small Changes, that captured the attention of the twitterati. His point, which is a good one, is that if you just tweak small things, you hit the problem of running into a local maxima, where you think you’ve got the best result, when in fact there’s better ones you can’t see. Josh’s point is that radical design changes can be helpful to avoid hitting a local maxima unintentionally.
To support his point, he draws on the A/B Test results of Luke Steven’s home page. Luke is writing a book and decided to change his home page, which previously contained his portfolio, to let folks sign up for an email when the new book is published. Luke tested two designs and was shocked to see a 131% improvement with his alternative design.
Luke Steven’s first design alternative with lots of details about the book.
The experiment is simple. He started with a page that had lots of details about the book. Over the test period 655 folks visited the page and, from that, there were 33 “conversions” (people signing up for the book)—a 5.0% conversion rate.
Luke’s second design alternative, with virtually no details about the book.
At the same time, other visitors saw a different page. This one just asked the question “Are you a designer?” and mentioned there would be a new book, but didn’t explain what it was about. You’d think that having fewer details would get fewer conversions, but out of the 661 conversions, 77 converted—an 11.6% conversion rate. The less-detailed design got Luke twice as many conversions.
Luke wrote that his big takeaway was “Engaging visitors through appropriate copy improved sign ups by 100%+”. For Luke, getting more email addresses was an improvement and the second, less detailed design did that quite handsomely.
That got me wondering.
Luke has seemingly made an assumption: Any email address is a good email address. This is a basic approach to marketing—we need a mailing list, let’s collect email addresses. If we get enough email addresses, we’ll get some people who buy from us, and we’ll make money. (This is the entire premise behind the spam industry. Just keep mailing and someone is bound to buy.)
Yet, for small marketers, list quality plays a huge role. So, what if Luke had made a different assumption to measure against? What if he only went after email addresses of people who bought the book?
Now, the book isn’t out yet, so we don’t know the quality of Luke’s list. But let’s make up some numbers and see what happens.
The first design alternative said a little about what the book will contain: how to kick ass with web analytics, A/B testing, usability testing, and advanced CSS & CSS3. Sounds pretty good.
What if fewer people signed up for the email because it didn’t sound that interesting to most of the visitors? From that, we could assume that the people who did sign up were interested in the book. Let’s say 50% of those folks would likely buy the book when it’s published. From the trial, 50% of the 33 who signed up for the list is 17 people who we predict might buy the book. If Luke used sales as his conversion (instead of sign ups), that would make it a 2.6% conversion rate.
The second design alternative didn’t say anything about the book, other than the vague statements of “it’s what comes after web standards” and “you’re going to love it.” People are less likely to know what they are signing up for. While more people signed up, I think we can safely assume that more won’t be interested in the book once they hear about it. Let’s say that only 10% of those folks would likely buy the book. From that page’s trial, 10% of the 77 who signed up is 8 people who we predict might buy the book, a 1.2% conversion rate.
So, if our wild-ass predictions are right, the new assumptions suggest that design alternative A is the clear winner, twice as good as alternative B—the exact opposite of what Luke’s initial testing showed.
Of course, there’s a lot of fabrication here. We don’t know the quality of either list. We’re only guessing that the quality of one list is better than the other.
(Luke could measure the quality of the list today. He could, for example, send an email to each of the 110 emails he’s collected. A simple email with a choice: “I’m not interested. Please stop emailing me.” or “Send me the coupon.” would do the trick. That would tell us more about the list quality.)
The assumption “any email address is win” is different from “anyone who buys the book is a win.” Luke chose the former to measure his design success. Had he chosen the latter, it could’ve taken his design in a different direction.
We need to make sure we’re measuring the right assumptions.Tweet