Originally published: Aug 01, 2001
Every once in a while it's a good idea to step back from our own day-to-day work and watch other professionals operate. One instinctively feels that a lot can be learned.
There are a lot of usability handbooks and guidelines out there with seemingly good advice, but should we adopt methods we've never seen in action? How do we learn from usability tests if the details of screening, test protocols, and analysis are not presented along with the test results? Can anyone accurately reproduce a usability test series from the limited descriptions in a typical conference paper? Most importantly, how can we learn from others without dogging their steps from start to finish?
We were excited when we learned that Rolf Molich had completed two studies that truly facilitate this kind of learning. In each study, several usability teams independently tested the same interface. He compares the work of the teams step by step from user screening through task design to testing and reporting. Both papers, called CUE-1 and CUE-2, are on Rolf's web site at www.dialogdesign.dk.
In the CUE-2 study, nine teams set out to usability test the same web site, following what they believed to be the established usability best practices. Rolf describes, reviews, and compares the processes and reports from each of the nine teams in exhaustive detail. His findings are so remarkable that they have changed the way we think about our own work.
The way the study was set up, all of the teams were given the same test scenario and objectives for the same interface. Each team then conducted a study using their organization's standard procedures and techniques. They then compiled reports, which they sent back to Rolf.
Rolf looked at all problems found by each team, a combined total of more than 300 problems. He rigorously evaluated each of the identified problems, finding most of them to be "reasonable and in accordance with generally accepted advice on usable design." So, the good news is that conducting all of these usability tests identified a wealth of usability problems with the interface.
The bad news comes when you compare the findings of each team. Although the teams' definitions of what constituted a usability problem were effectively identical, there wasn't a single problem that every team reported. Even more surprising to us was that eight of the nine teams missed 75% of the usability problems!! When you look at the total number of unique problems identified, only one team reported more than 25% of the these problems.
This is alarming. It's the parable of the blind men studying the elephant
all over again. Each team grabbed onto a different part and came to different
conclusions. Each usability report read like a test of a completely different
interface. This is what makes the CUE-2 study so exciting! We can study the
differences between each team's methods and practices and then look at how
to our own.
The study also raises some central questions for future research of usability testing techniques. How can we construct tests that find the important usability problems as quickly as possible? And how can we improve our practices so different teams will consistently find the same problems? We can find the beginnings of answers to these questions in Rolf's studies. Let's take two examples:
Nine teams created 51 different tasks for the same UI. Rolf found each task to be well designed and valid, but there was scant agreement on which tasks were critical. If each team used the same best practices, then they should have all derived the tasks from test scenario. but that isn't what happened.> the same best practices, then they should have all derived the same tasks from the test scenario. But that isn't what happened.
Instead, there was virtually no overlap. It was surprisingly rare when more than one team used similar tasks. It was as if each team thought the interface was for a completely different purpose. Comparing the tasks developed by the nine teams makes a valuable lesson in effective task design.
Rolf found that the quality of the reports varied dramatically. The size of the reports varied from 5 pages to 52 pages a 10 times difference! Some reports lacked positive findings, executive summaries, and screen shots. Others were complete with detailed descriptions of team's methods and definitions of terminology. By looking through the different reports, we can quickly pick out the attributes that would make our reports more helpful to our clients.
The practices of all of the teams in this study needed review, formalization, and a general tightening up. In all probability, since the teams were professional or professionally led, everyone can benefit from reviewing the practices. We can use this analysis to hold a mirror up to our own work. This long overdue experiment provides extremely valuable material for sharpening individual usability practices. Rolf has done a great job of opening our eyes to the possibilities for improvement. •
Read related articles: