Adam Churchill: Welcome, everyone, to another episode of the SpoolCast. Last month, Noah Iliinsky joined us for a virtual seminar on information design. It was called, "Information Visualization: Letting Data Tell the Story." In the seminar, Noah discussed the types of visualizations in common use today, talked about why and when they're useful, what types to use in different situations, obviously how to think about them, and yeah, there were even some great bad examples that he explained why they failed.

He's offered to come back and tackle some of the questions we didn't get to address in the seminar. Hey, Noah. Welcome back.
Noah Iliinsky: Hey, Adam. Thanks for having me back.
Adam Churchill: Folks who didn't listen to this particular seminar can get at it in UIE's User Experience Training Library. We've got just shy of 60 recorded seminars from experts just like Noah Iliinsky. Noah, for those who weren't with us last week that couldn't join us for your presentation, can you give us an overview?
Noah Iliinsky: Sure. So, the first thing that we started out talking about was why would you use visualizations, what are they good for? And the short answer to that is we are so good at detecting patterns and seeing things like correlations and trends and outliers to trends, and there's a lot of software deep in our brain that deals with the visual system and we can bring more information into our brain more quickly through the eyes than through any other sense that visualization is a really powerful way to teach, and to learn, and to see what's going on.

And this becomes particularly important when you start talking about data, and we've all got more data than we had before, and now we've got even more, and now we've got even more. And one of the best ways to get a handle on your data, and to learn from it, and to understand it, and to take actionable insight away from it is to visualize it. There's a strong justification all the way from the physiological level up to the business level of why visualizations are powerful and useful, and they can also be fun, as we saw.

I talked a little bit about different styles of visualization, and this is both aesthetic styles and functional styles. So on the functional style side you have visualization for analysis where you don't have a particular story to tell but there's a lot of data available in that visualization and sometimes there's some interaction where people can sift through it on their own and kind of see what correlations and what stories they can find on their own.

The other side of that coin is visualizations designed for presentation where there's a specific message, there's a specific story that that visualization has been crafted to highlight. In that second case when you've got a visualization for presentation, usually a lot more of the noise has been removed. A lot of the other interesting, maybe correlated stuff, maybe has been stripped away so that the key notion, the key correlation or the key story is really highlighted and there's not a lot of distraction going on there.

We talked about visualization for persuasion versus visualization for education. And the difference there is when you're talking about visualization for education you might have, again, knowledge available but not necessarily a political agenda, not necessarily a specific kind of a message you want people to take home.

Visualization for persuasion, this is at the propaganda end of things where there's absolutely a political point to be made or there's absolutely a point of view that is being expressed. This can be different from a straight up visualization for education, which maybe has a story but doesn't have a particular point of view or a particular agenda to be advanced.

And we touched briefly on the difference between infographics and data visualization. Infographics tend to be handcrafted, and when I say that I do mean done on a computer, probably in something like Adobe Illustrator, but where there's not a lot of depth of data, where there tends to be a lot of work done where the artist or the designer is drawing shapes, usually a lot of icons.

These are usually sort of prettier things that look more like a pretty picture, like an illustration, versus data visualization or information visualization where the pixels are put on the screen algorithmically. So the designer might write the software, write the rules about the data, but then you can feed the data to the system in a spreadsheet format, out of a database. And it's very easy to revise, it's very easy to make changes because you can just feed in a fresh batch of data and this gets redrawn.

Both of these can be done very well, both of these can be done very poorly. Both of these can be done very aesthetically, very functionally, or not very functionally and aesthetically, but that's a fundamental divide between stuff where somebody sat down and actually drew each line and one where software is drawing the lines based on instructions that the designer put in place.

So that's the beginning. That was sort of the overview of the state of the world of visualizations and what the different styles available are.

The second half we talked about how to do it, and a lot of the stuff came as no surprise to people who are familiar with the world of user experience.

We talked about how you have to know your own goals and what you want to achieve to be successful. We talked about how once you've decided what you need to achieve, you need to really know who your customers are, and what their goals are, and what their biases are, and what their vocabulary is, and what their technology is, and all these other things so that they can be successful in the consumption of this information product, and how if they are not successful, if you are not able to make your customer successful then you, too, have failed.

Moving into the actual beginning of the practical design process, I talked about how you only want to include the necessary data, that if you include lots more data simply because you can, or because it's fun, or because it's available or somebody might think that was interesting, that extraneous data that's not part of the message, that becomes noise.

And the more targets you have on the screen, the more different icons, the more different pixels, the more different lines, the more stuff that there is on the screen, in the visualization, the longer it's going to take someone to actually find the pieces that are relevant to them.

And this is something that you can put people in the lab and you can study. You can say, "Find me the yellow triangle," and if there's four triangles on the screen, the yellow one is not hard to find. But if there's a bunch of triangles, and a bunch of squares, and a bunch of circles, and a bunch of squiggly lines, and some are purple, and some are yellow, and some are green, and some are red and some are blue, it takes them a lot longer to find the one that they want.

So this translates directly into the success of your visualization. Less noise, easier access to the data.

And then finally, we spent a good portion of that second half talking about formats that are appropriate for the data. And the examples that I really like to use for this are the periodic table. The period table is called the periodic table because the elements are periodic, and that's why the table is periodic, and that's why the table is called periodic.

And when you're talking about a periodic table of chili peppers, or the periodic table of mixed drinks, or the table of condiments that periodically expire, these are all funny, they're all interesting uses of the format, but they're not informationally valid because these things that I've just mentioned do not recur periodically.

The periodic table is periodic because the elements have these recurring periodic properties and that's a really useful aspect of the periodic table. And most of the time things that are sort of shoehorned into a shape that looks like the periodic table don't belong there because the shape of the data that they have has a different structure, and a much more interesting, and useful and valid way to represent that data is one that's probably not intended to look periodic but is probably more authentic to whatever the underlying shape of that data is.

So we showed some examples of things that were not periodic and were better represented, in one case, for example, as a family tree, as an evolutionary history rather than a chronological history. And then I showed a bunch of other examples of things that either looked good or didn't look good but had chosen the wrong structure, and did a very quick, lightweight redraw of a few of them to show how in the right structure the information was more easily accessible.

So that's the overview, that's what we talked about last week.
Adam Churchill: All right, great. Well, let's get to some of those questions that were left over. Ryan wants to know, "Do you think accuracy should always prevail over maintaining engagement?"
Noah Iliinsky: This is a really interesting question, and for me this goes back to one of the differences we talked about earlier which is the difference of is your visualization intended to educate or is it intended to persuade.

And like it or not, there's always going to be people, presumably even some of the listeners here, presumably even me some of the time, there are always going to be people who want to use their visualization to persuade. And much like any other informational message, if you're willing to bend the truth just a little bit, you can make it much more persuasive.

Now in general, I'm a stickler for the truth and I'm a stickler for accuracy. Part of that's because I was educated as a scientist as an undergraduate and so I believe a lot in the accuracy of the data. So that's one facet of that. I generally would like to show the truth and let the truth speak on its own rather than biasing it for the sake of the message.

The other part, and this is more specifically what Ryan was asking about, is, his question fundamentally is asking, "Is it better to be truthful and boring or is it OK if it's harder to understand and interesting?" And in that question, again, I think the answer is, "If your goal is to convey actual information, you want to make it truthful and maybe a little more boring because that boring is going to be more than made up for by the actual utility that your audience finds in it."

If it's exciting to look at but hard to extract information from, people are going to glance at it and say, "Well that's exciting, but this isn't very useful because it's very difficult for me to extract any knowledge from. And if there's no way for me to extract that knowledge, there's no actual benefit to me beyond just aesthetics," and then they're going to move on with their life.

So, I believe fundamentally that being able to get the message across to people is the most important goal, and I think it's entirely possible to craft that within a visual context that is interesting, and appealing and useful. But if you start out with a purely aesthetic agenda, making the information accessible as an afterthought is usually going to fail.
Adam Churchill: Noah, the National Academy of Sciences wants to know what advice you have for how to build that capacity to generate ideas or how to visualize things.
Noah Iliinsky: As with any other design discipline, the same fundamental recommendations apply, which is you want to expose yourself to a lot of other solutions that are out there in the world, you want to think critically about them and think about how they work well or don't work well, and you want to practice on your own a lot and be intentional with the designs that you make.

I was very thrilled to see that after the seminar last week somebody on Twitter wrote me and said, "It's working already. The first visualization I saw after your seminar I looked at and said, 'That's wrong and I know why.'" And I was so pleased by that because it meant that this person was no engaging in a critical process of evaluating visualizations that she saw after listening to the seminar. So I thought that was fantastic.

So in terms of building the capacity to generate ideas on your own, I think as you see other examples of the visualizations, you should really consider these things that we talked about briefly before, is this well suited to its audience and to its goal and does the visualization suit the shape of the data, and then practice a lot on your own. Draw different things, make different graphs, make different diagrams, show them to people and ask if they understand them.

And then going back to the ideation part, the more you look at other people's solutions, the more you look at solutions in the wild, and there's links to blogs that have a couple of these a day, I mean, I look at probably a dozen new visualizations everyday just from reading blogs, the more you see different blogs and other people's ideas, the more you can be inspired. You'll have more ideas for ways that you might be able to apply some of those patterns or some of those notions to your own designs.

And finally, there are some books, not just books on rules of how to do visualization well but there's a fantastic example source book called "Information Graphics" by Robert Harris, and we'll put that online. It's 400 and change pages of just different examples of different graph styles, and it's a good book to just flip through and be reminded of all the different possibilities, all the different ways that information can be visualized.

And then, again, when you've gotten more of these ideas in your brain of how other people have done it, it makes it easier to come up with some new interesting ideas when you've got your own data in front of you and you have to make some answers out of it.
Adam Churchill: Kingsley wants to know your opinion on heat mapping as a technique.
Noah Iliinsky: Heat mapping, I think, has potential to be really valuable because you can look at thousands of data points at a glance, where you have an ocean of normal and a few hot spots that say, "This is out of range," or "There is something changing more quickly here." I really like it.

The other nice thing about heat mapping is that you can include a couple of axes of data pretty easily all at once. So you might have an X and a Y axis, and then values in the gallery. You might have a Time axis if you have, for example, you're monitoring computers over time and you have 100 different machines, 100 different channels and their activity level is ticking by on the Time axis, and when a machine gets really busy it turns color or something.

So, heat mapping is great in terms of availability of a lot of knowledge at a glance and the ability to see trends when things are sort of changing over time, if you want to do that. So that's sort of the general heat mapping technique.

I will warn you that there's a common problem with heat mapping is that color is often used to represent subtle differentiations in value. The great example of this is you look at the weather map and where it's really warm it's red, and where it's really cold it's blue, and there's all different shades in between. And we have a lot of strong cultural conventions about what color means, and that one maps particularly well to temperature and weather.

But a lot of the time there is no strong correlation between what a color is and how people are going to interpret it. You can't assume that people are going to know that green is new and red is old, for example, or that blue is a high value and orange is a low value. We don't have cultural conventions and there's not a built into our brain convention about that.

So I will say that when you're doing heat mapping, the more successful way to do that and the more valid way to do that is instead of varying the color is to vary either the brightness or the saturation, so go from light to dark in one color, or go from highly saturated to sort of washed out in one color.

So you might have, for example, an elevation map of the mountains you'll see this, or depths of the ocean where sea level might be white and then the taller the mountain gets, it gets from pale tan to darker brown, and then it goes down from, again, white at sea level to a very light blue on the beach, and then a very, very deep, dark blue at the depths of the ocean.

And so what you have there is you have one color being varied along one channel for elevation above sea level and another color, again, being varied on that same saturation or darkness channel down into the ocean rather than going through a whole spectrum of red, green, blue, orange, yellow, and people don't have a strong correlation for those color changes as they do for simply a saturation change.
Adam Churchill: Joey wants to know what recommendations you have on pricing for creating these visualizations.
Noah Iliinsky: This is an interesting question to me because this is something that I've been working with lately as I've gone to freelance work.

It's a tricky one because at some level, sometimes the answer can be very simple. The answer can look not very sophisticated and that simplicity is a victory because that simplicity allows for efficiency and easy access to the knowledge that that visualization represents.

The downside of that simplicity is occasionally people will look at it and say "Well, that's not very tricky. Why did that take you hours? Why am I paying hundreds of dollars for that?"

And in response to the question I would say that it is entirely valid to charge for the expertise that it took to come up with the right answer. Now the right answer might be very simple and it may not even have taken very many hours to come up with the right answer or to come up with a strong representation of that right answer.

But if it is the right answer, it's the right answer. And that's worth more money than a fancy, complicated, harder to understand solution.

So I would say stand by your guns, hold the moral ground, charge for the expertise that comes up with the right answer regardless of how long it necessarily took, whether that was a long or a short or a simple or a complicated solution.
Adam Churchill: Striker wants to know what you think about using two Y axes, one on the right, the other on the left.
Noah Iliinsky: This is also a really fascinating question. And there's a lot of situations where I feel like this is a particularly useful kind of a graph because you really want to see these two things correlated strongly.

The problem you run into sometimes is that the Y axes will be in really different scales of numbers, so you might want to plot the federal deficit in trillions of dollars versus unemployment numbers in percentage points over the last 10 presidential administrations or something going back to the 1960s.

And so you get a graph where you're talking about billions and trillions of dollars and you get a graph where you're talking about percentages. But you want to see those correlations, you want to see how the numbers move up and down in relation to each other.

Showing them at the same time is really great because it allows you to have that correlation. You can put the axes on the right and the left and have that be a useful thing. Usually those are differentiated with color and that's not a bad way to do it as long as you've got colors that someone, for example, who is colorblind can differentiate, so they can tell which line goes with which axis because it can be confusing.

The other thing that sometimes you run into is people won't see that there's a right-hand axis. They'll assume that all the lines belong to the left-hand axis and they won't necessarily notice the right-hand axis.

One way around this is that instead of overlaying these two lines together in the same graph, you can use a common X axis and you can have, your first one, let's call it the unemployment one, you can have that with its well-defined, normal Y axis with its labels and it's scale on the left column.

And then rather than overlying the next line of the federal debt on top of that same graph, you build another graph that shares the same X axis, and you just place it a little bit lower on the page, and you have it start at the same place, you know, 1960 is along that left margin, just like the one above it is.

But you have the freedom now where you've got a brand new graph to define a brand new left-hand Y axis that people are going to see as a different axis and read separately.

And the lines are still, you're going to be able to compare them if one is right above the other on the page. You're going to be able to see where the ups and downs correlate to each other. But there is a little bit less potential for confusion where people are going to maybe not notice that right axis or have a little confusion about which data line goes against which axis.

That's something you can do if you have the luxury of the space to put two on a page. If you've only got room to put one on a page, yeah, you can put that second Y value on there. You just want to make sure that that data line really is strongly connected to the axis so that there's less chance of confusion and people will be able to tell what the line they're looking at is and which scales it lines up with, which scales are describing it.
Adam Churchill: Noah in a seminar you spoke a bit about and showed some great examples of 3-D pie charts. Related to that, the gang at Platform Computing wants to know what you think about the use of 3-D bar charts.
Noah Iliinsky: Yeah,we talked about both of these just a little bit in the seminar and I'm going to expand on them both because I wouldn't say this is a scourge sweeping the nation, but this is a pervasive problem.

In general, the problem with 3-D is that when you tilt the thing out of the two dimensional plane to give it that three dimensional look, where, we're talking about bar graphs first, you're effectively adding visual surface area to each bar, so that that depth makes the bar look wider effectively.

Now, we're smart and we can read numbers and we know what the numbers mean. But what we're going to walk away with is that picture in our mind of the bar graph that we saw. And what happens when you turn that, is you get the depth and maybe you're looking now across the top of the bar and not just at the front of it. And even if you have bar graphs of accurate height, you've now distorted their representational area. And that distortion makes them less accurate.

The three dimensional can be pretty, but it is removing accuracy from the image. I think a well-crafted two dimensional bar graph always wins over sort of an arbitrary, doing it for dramatic effect, three dimensional bar graph.

There are some situations where that third dimension can be relevant. Those are hard to execute well in ways that you can get the data from them. But there are ways that you can make that third dimension, that depth dimension have some meaning. But normally, I would absolutely advise against it.

I'm going to repeat a comment here that I mentioned in the seminar and was very well received. If you're graphing things in Microsoft Excel, all of the defaults are wrong, all of them. The graph style, three dimensional versus two dimensional, usually the axes, definitely the colors, they're all wrong.

If you want a graphing tool that is using much better defaults, well thought out defaults, go get yourself a copy of Tableau. You can download the free trial or you can use Tableau public. And there's better defaults there.

Now on the pie graph, you have two problems. The problem is even compounded even more when you take it to three dimensions. So pie graphs to get three dimensionals you've got to tip it instead of looking straight on at the disk that is the graph, it's now tilted where some part of it is more towards you and some part is further away from you.

That edge that is coming more towards you now it's showing the depth. And that around the depth of that, whatever the pie wedges are that happen to be on the portion that is closer to you, that thickness, that thickness of the depth, is going to add a considerable amount of visual surface area to the wedge that is closer to you simply because the color that's meant to be wrapped around the side of that pie graph is going to visually scan like it's part of that wedge and it's going to make that wedge look larger.

The other effect that you get and somebody wrote in with this comment and they were spot on and you'll see this done as a manipulation of the information. The other thing that happens is the wedge that's closer to you, commonly "our product", you get this visual foreshortening effect.

And you can see this if you pick up a Frisbee or you pick up something that's round, a dinner plate with food on it. And if you tip one edge closer to you, the stuff that's closer looks bigger and the stuff on the far side of that Frisbee or the far side of that plate looks smaller. So you get this visual foreshortening.

And if you cut wedges onto it, the wedge that is closest to you looks huge. It is looming towards you and the wedge on the far side is sort of receding into the distance. And when you take a market share pie graph and you put your product front and center and you tilt it towards you, and all of the other products are tipping away from you back, even if you have accurately represented that data in terms of the graph angles initially, that tilt thing and that foreshortening effect makes whatever is close to you look enormous relative to the stuff on the other side.

And so you can get, again, this visual distortion where you may have started with a really accurate data representation and then you go to enormously un-accurify it. I'm going to make up that word just now. You distort it for the intent to make your thing look better and the thing on the other side of the graph look worse.

And so it's a huge distortion. It's an abuse of the data and I absolutely recommend against that. I can't think of a situation where a three dimensional pie graph is the right answer. Usually pie graphs are the wrong answer, anyway.

And I'm going to jump ahead, we had one more question we were coming up to. The question was, "Instead of using a pie graph, what are the best ways to represent fractions of a 100 percent whole?" And the right answer there usually is a stacked bar graph, where the full height of the bar represents 100 percent.

And you can have the 60 percent portion of it, and then stacked on top of that you have the 36 percent portion of it and then stacked on top of that you have the four percent portion of that. And instead of doing a pie graph you can get that same 100 percent look and representation in a way where it's much easier to compare sizes.
Adam Churchill: John wanted to know about your thoughts on redundancy, speaking of color, size, texture. Taking it to the next step, do you always recommend redundancy for all your data visualizations?
Noah Iliinsky: I love redundancy. What John is asking about is when you have, for example, a collection of data points. Let's say you have a line graph and you've got five different lines. And to differentiate these lines you've made them all different colors.

That's great, the colors help differentiate them so people can understand what the different lines are. The redundant encoding is when you use a second visual method that doesn't have its own unique meaning, it just serves to further differentiate the data that's already there.

So those five lines that are five colors you could also give five line styles. So you could have a solid style, a dotted style, a dashed style, one that's got sort of a dot-dash thing like Morse code and then one that's got really far spaced apart dashes or something.

We call that redundant encoding. We say that we are now differentiating those by color and by line style. And you could remove either of those encodings, you could remove the color or you could remove that line style, and leave the other and you could still differentiate those lines.

So this useful in a variety of levels. It's useful for people who are colorblind if color is one of yours. It's useful for things like if you have a lower resolution reproduction. So if someone is going to print this report out and it was high res and beautiful on screen and they're going to print it out on a black and white printer and now you've lost the color and so there's the black and white there. Or if someone is going to shrink it down and those line styles are lost but you can still see the color differences because they've shrunk the image down 25 percent and embedded it in a report.

So there's all those practical reasons. The other reason it is useful is it literally is using more different channels to get into your brain and it makes it that much easier for your brain to differentiate these different lines.

Fundamentally, people have a given amount of brain power that they use to understand a thing that's in front of them. And some fraction of that brain power is going to be dedicated towards the decoding, just understanding what the different symbols mean. And then whatever is left they can use to understand it.

And your job as a designer is to give them as little difficulty in the decoding to allow as much left over brain power to do the understanding with. And so this redundant encoding allows you to make it easier for the brain to understand, people can spend less brain power trying to decode and then have more to understand your message and more to take away and be successful.

So, I do like redundancy. I think it's useful to use it when you can. Now situations when you cannot use redundancy are situations when you need to have those properties reserved to represent entirely different meanings, entirely different vectors of the data.

So you might use your different line styles to represent the different data points. And then you need color for something else. You need color because you're going to show these different lines represent different product lines and then different colors represent your different market regions.

Now the color means something entirely different and you don't have the luxury of reusing it. That's a situation where you're probably fine anyway, but when you have the luxury of left over visual encodings that you're not using, absolutely use redundancy when you can.
Adam Churchill: All right, we've got a couple of quick hitters left. Noah, in your seminar, you just gave our audience a boatload of resources. But there was one we wanted to circle back on. Matt wants to know if you could speak a little bit more about the tool that you use to generate charts from data in HTML and JavaScript.
Noah Iliinsky: Yes, that tool is a tool out of Stanford called Protovis, spelled with an "s" on the end. If you go to protovis.org or just Google for it. And it's a JavaScript and HTML5 tool.

It's a programming framework that is basically a lot of scaffolding for drawing graphs and diagrams and all sorts of visualizations. And they've done a really good job of making a tool that is structured enough to get you off the ground quickly and structured enough to make it fairly easy to draw these. But absolutely flexible enough to represent whatever sort of graph or diagram or visualization you want.

And I really love the gallery section on their website because what they've done is they have recreated every single visual graph style that you've ever seen, they have an example of done in their tool.

And of course, it's all JavaScript,so you can just look at the source code, they've got it right there. And you can take their examples.

But some really interesting styles. They've got Minard's map of Napoleon's march to Russia and back. They've got the interesting sort of timeline diagonals of the Tokyo subway system. They've got Florence Nightingale's rows of sort of pie wedged bar graphs, which is a terrible format, but they've been able to represent it accurately in their tool.

So it's a good tool, it's well-supported, anyone using a modern browser, which is to say, anything that's not Internet Explorer, will be able to consume these. I really like that tool.

The other tool that's very popular these days for doing really interesting data visualizations and data graphics is a programming language called Processing. It's also very well-supported. There's lots of examples out there, there's good books about it, there's a strong community of it.

There's a number of examples in my book that were done in Processing, in my book "Beautiful Visualization." That has a number of contributors. Twenty different chapters by 20 different authors and a number of them are using Processing to do really, really exciting things with their data.

So that's another great resource.
Adam Churchill: The folks at Oracle want to know how to handle a specific situation, that occasional situation when they've got a data driven pie chart that has only one 100 percent slice.
Noah Iliinsky: If you're going to use a pie chart and you want to represent 100 percent, it's that solid disk. I would make sure when you're doing that that you have a numeric label. In fact, in most cases when you use a pie chart, I would make sure that you have numeric legends that show how big each slice is.

But in that case, you plaster a big 100 percent in text right there on the graph and people will know that it is in fact meant to be 100 percent and not empty or malfunctioning or something. It may still be malfunctioning, but that's not your fault at that point.
Adam Churchill: All right, very good. Noah, thanks for circling back with us. And to our audience, thanks for listening in.