Originally published: May 19, 2008
Last week, a client called looking for advice on their first usability study. The client is a large consumer information site with millions of visitors each month. (A similar site might be a large financial information site, with details of individual stocks, investment strategies, and "celebrity" investor/analysts that people like to follow.)
They are about to redesign their home page and navigation. They have three home page design alternatives and five navigation alternatives, created by an outside firm who didn't do any evaluations of the designs.
To help figure out which design to pick, the team has (finally!) received approval for their first usability testing study. While their site has been around for years, they've never watched visitors use it before now.
Up until now, management has perceived usability testing as a nice-to-have luxury they couldn't afford, primarily because of the time it takes. The team called us because they are very concerned everyone views their very first test as an overwhelming success.
They fought long and hard to get this project approved. If it's a success, it will be easier to approve future studies. If anyone thinks that it didn't help pick the right design, it will be a huge political challenge to convince management to conduct a second project.
When we started our conversation, the first thing the team members asked was how to compare the design alternatives. Ideally, they thought, we'd have each participant try each of the home page designs and each of the navigation designs, then, somehow render a decision on which one is "best." After two days of testing, we'd tally up the scores, declaring a winner.
Comparing designs is tricky under the best situations. First, you have to assume the alternatives are truly different from each other. If they aren't, all the alternatives may share a core assumption that could render each as a poor choice.
Assuming the team has done a good job creating the alternatives, the next problem is evaluating them with users. To do this, you'd need to run each alternative through a series of realistic tasks.
Choosing tasks is difficult in any study, but it's more complicated when the team has never really studied their users in the past. They've collected some data from market research and site analytics, but, as we talked to them, it was clear they weren't confident they understand why people came to the site.
If we think the team could come up with realistic tasks, there's still one more big challenge: evaluating all the alternatives. Since they wanted to test new designs, the best thing is to test against a benchmark.
A minimum study design would have each alternative (along with the current design) going first, to correct for "learning effects." (Learning effects happen in studies where the tasks and design alternatives are similar. How do you know if the second design succeeds because it's better or because the user learned something from the first design?)
For ratings, we wouldn't recommend less than four people evaluating each alternative in the first slot. That means, for six alternatives, we're talking a minimum of 24 users.
This presented the problem -- there's no effective way to test all these alternatives with 24 users in their allotted two days, within their budget. We needed to think creatively.
What would happen if we didn't ask users to pick the best alternatives? The team would need to decide on the alternative themselves. Instead of testing all the design alternatives, we suggested focusing on the current design and then using the insights gleamed to inform the decision process. Instead of a study that compared designs, we recommended the following steps:
First, we suggested the team use some of the planning time to build a matrix of the differences between the alternatives. Each line of the matrix would reflect something different between the original design and the alternatives.
The group would then assign a weight -- between one and five -- that would represent how important each difference is to the user's success. They could put in a similar number under each alternative, showing how well that variant meets that need. Finally, the team could add up the scores for each alternative to see "best" one.
We recommended the team recruit both loyal and new users as study participants. The first day of testing should be loyal users of the site and the second day should be new users to the site. The loyal users would help figure out what the important tasks are. The new users will help determine what's important for people new to the site, such as how they figure out the basics.
The Inherent Value Test (http://tinyurl.com/688bhu) finds out what is valuable about a current design from loyal users. Then it helps identify if the design communicates those values effectively to the new users.
For each participant from the loyal user group, the moderator would inquire about their current usage. The moderator will learn why the user comes to the site, what they last tried to use it for, and how that worked out. The moderator would then ask the participant to repeat a recent activity, demonstrating to the team the values that keep them loyal to the site. The team will learn what makes the site great for the loyal users who repeatedly visit.
For each participant from the new user group, the moderator would interview the participant to learn which loyal user's tasks they'd most likely use themselves. Then they ask the participant to execute the chosen task while they discover the value of the site. This would help the team learn how well the current design is communicating the site's value.
After each participant uses the current design to perform their tasks, have them spend time with the best of the new alternatives, according the Weighted Differences Matrix. This would be more a critique than usage, since the alternative design isn't functional yet.
However, because the user had just used the existing site, they'd be ready to share how they'd do the same tasks with the alternative. The team would learn the user's perspective on the differences between the current design and the best alternative.
If there's time in each session, spend a few minutes performing the same tasks on a competitor's site. This will help the team see where they stand competitively and provide some insights in design directions they hadn't considered. If the participant regularly uses a competitor, we recommended using that site, yielding the extra benefit of discussing the competitor's advantages.
When analyzing the study results, we recommended the team revisit their Weighted Differences Matrix. We suggested they add the participant's tasks and values to the lines of the matrix and incorporate new ratings (and adjust their original weightings accordingly). When done, the matrix will help with deciding the design alternatives they wish to pursue.
Teams have to make decisions. The most successful teams make informed decisions.
While it may be counter-intuitive, focusing the study on the current design may be the best approach for this client. Asking each participant to somehow rank each design alternative will take more time and produce confusing results. We felt a study that looks primarily at the current design can give the team the most insight into what alternatives, if any, to choose.
Want to hear more insights from Jared Spool? Come see Jared present New Perspectives in User Experience Design, as well as the UI13 Conference Keynote, at this year's User Interface 13 Conference. You can see more details on the Conference and register now at the UI13 Conference Site.
Comments on this article? Have you needed to evaluate multiple designs with limited time and budget? What would you have proposed for our clients? Leave your comments at UIE Brain Sparks blog.
Read related articles:
©1997-2008, User Interface Engineering.
510 Turnpike St., Suite 102, North Andover, MA 01845
800-588-9855 (within U.S. and Canada) or 978 327-5561
Questions or Comments? Talk to Us.