Task Success Rate – Is that the right way to judge a usability test?

Jared Spool

July 22nd, 2011

Over at the Boxes and Arrows LinkedIn discussion group, Carrie asked:

What is a good success rate for a usability test task?
We just conducted user testing on a site map. So we have success rate percentages for each task. They range from 9% – 51% success (in up to 3 tries). Obviously there are problems. (And no, we didn’t create the site map, which makes me feel good.) But what would be considered a “good” success rate? I want to say over 70% for this test. It is only site map, no content, which will limit the success anyway. Maybe I’m aiming too high?

Thinking in terms of % of completion may not be the right approach. (In fact, I’m hard pressed to come up with a time when it is the right approach.)

You haven’t said anything about who the users are or what the site map information contains. But let’s pretend the users are doctors and nurses and the site map contains the necessary information for them to administer drugs safely. If one of those doctors or nurses doesn’t find the information they need, they could improperly administer a treatment which could kill their patient. What would be an acceptable failure rate under these conditions? I’d say 0% — the system needs to ensure success of every user.

Why is your system any less important? Why would you be willing to tolerate any failures?

The real question isn’t “what is an acceptable level of failures?” The question I think you want is “What’s preventing people from succeeding?”

Instead of looking at how many people succeed versus how many fail, what if you were to analyze the failures themselves. Can you rank and categorize all the things that prevent your users from succeeding? Can you assign a classification that helps you understand whether the problems are life and death (as in the example of doctors and nurses I used above), problems that will lose customers, problems that will cost support money, and problems that are annoying without painful side effects?

This will also help you look at the participants you’re recruiting for your study. How similar are they to real users? How realistic are the tasks you’re asking them to complete? How well does the system, if they make a mistake at the site map, help them still succeed by having guidance for common errors on the content pages themselves? (Such as “If you’re looking for x, click here.” type lateral navigation.)

In the end, you really want to understand the problems real users will encounter. That’s the purpose for the studies. Then you want to explore solutions that resolve those problems. In an ideal world, it’s not that you get 100% task completion, it’s that you have addressed and solved all the problems.

The closer you can get your studies to map true in-the-wild user behavior, the more you’ll understand about the problems you’re uncovering and the solutions that will help. Focus on the problems and their resolution and you’ll get the design to where you’d like it to be.

2 Responses to “Task Success Rate – Is that the right way to judge a usability test?”

  1. Kris Says:

    I think it is. They are there to do X task, how many of them complete it? That is the measure of its usability. Yes there are variables like people who are not the target market. However you filter collected data anywhere you go. Why wouldn’t you do it here? It leads to higher statistically valid outcomes.

    And as you said you look at what are causing the failures and eliminate them if its not 100% task completion. Avinish mentions this for his 4qsurvey. He says task completion rate is most important. And it is the reason the users are there. To do X task. If its not 100% then you ask and figure out what’s stopping them and eliminate it.

    I agree with you that you should not tolerate failures. However I also agree that you should be using the ‘task completion rate’ as a metric to determine if you should tolerate it or not.

  2. Jacob Says:

    As Kris says, it’s a useful metric, just not the only metric you should consider.

    My experience also shows that even if you had a blank white page with only one button that people would still somehow manage to click in the wrong location, so 100% success, over a large sample is more or less impossible to achieve, even for more critical applications.

    However, if your testing found that label A achieved a success rate of 90% (and a fast average completion time) and label B 70% (and a slow completion time), those metrics are clearly valuable.

    Of course, I’ve also seen cases where you have a higher success rate and a slower click time, but in those cases you can still use the information to help you look at what isn’t working correctly. Either way having metrics is useful, but it’s another tool in the toolbox, not the only tool.

    That’s my 2 cents on it anyway. Thanks for the post!

Add a Comment