Episode 70
Data Visualization with Scott Murray
May 6, 2014
We are collecting more data now than ever, and freely sharing some datasets on the open web. Web technology provides the power for us to present complex data interactively. How can you go about a data visualization project? What tools are available? Scott Murray joins Jen Simmons to explore the possibilities of data visualization.
The good news is there are a ton of tools available, and the bad news is there are a ton of tools available.
Transcript
- Jen
-
This is The Web Ahead. A weekly conversation about changing technologies and the future of the web. I'm your host Jen Simmons, and this is episode number 70. I want to first say thank you to the sponsors of today's show, Media Temple and Squarespace.
So here we are in 70—which if you're a fan of the show and have been listening to every episode—back in episode 67, I made you a promise that I would bring you today, Scott Murray. Hi, Scott.
- Scott
-
Hey, Jen.
- Jen
-
So, in 67—in case you missed it—we had Doug Schepers on the show to talk about SVG, a topic that is quite popular. People have been writing in, tweeting, and talking about it quite a lot. So if you have not listened to the SVG show, maybe you want to go listen to it. But, one of the things you can do with SVG graphics is do amazing data visualizations.
So Scott is on the show today to talk about data visualizations and maybe using SVG for them, maybe not. There are other ways you can do them. But, this is something we've not yet talked about on The Web Ahead. So Scott—by the way—he's an assistant professor of design at University of San Francisco, has written a book called Interactive Data Visualization for the Web—which you can read online. And you have been doing quite a lot of this for quite awhile—making amazing—I'm looking at your website—you have all kinds of amazing data visualizations here.
- Scott
-
Thank you. [Laughs] - Jen
-
Maps, and flying lines, and dots and—
- Scott
-
Gotta have lines, gotta have dots, gotta make them move—that's really the secret to any visualization—just lots of moving stuff. [Laughs] Definitely gonna communicate clearly as long—the more it moves, the better. [Laughs]
- Jen
-
Well great! Thank you for coming on the show today—that's it for—[Laughs]
- Scott
-
Yeah, I guess that's it—that's all people need to know, yep. [Laughs]
- Jen
-
So how did you get into this? Where's the interest—the core of the interest for you—with data and visualizations like these?
- Scott
-
I'm a curious person. I think a lot of people who do this kind of work are by nature really just curious about—there's the tech side—learning to use a new tool or some new technology. But just being curious about the world, and how do things work, and why do they work that way. I think that's the fundamental piece of it for me, but I'm—I don't know, it's a little—I don't want to say this—it's weird, it's not weird, hopefully—but I'm interested in this whole spectrum of data visualization which I define very broadly.
So one hand maybe you have strict data visualizations—just straight simple charts and graphs that only exist to communicate data. So that's one end of the spectrum. I see the other end of the spectrum as being more on the fine art side where you have now a lot of generative artists or creative technologists.
You have a bunch of different terms out there—essentially artists using data as source material for creating images, videos, and installations. So you can glance at my projects and see there's this whole weird mix spectrum of stuff, [Laughs] but definitely I think bringing this onto the web—you can't do live installations—but we certainly have lots of opportunities for interactivity and for doing visualization work whether you're treating it more on the purely aesthetic end of things or on the "more practical or functional" end of things.
- Jen
-
So you've got a bunch of maps here that you've worked on. They're presenting data about development population. When I think of data visualization in general, I end up thinking a lot about—it seems whenever there's a big election coming, we start to see more presentations of election data, predictions, and votes, and which states are which color—but there's something to it.
This, to me, feels like part of the web we've not quite used to its full extent yet—or even without the web—just devices themselves. If you hold a tablet in your hand—especially a nice modern one where you're using your hand to pinch and zoom and manipulate, and it feels very direct, and you've got an iPad (or something that's as good of that kind of quality) to be able to have a globe—a 3-dimensional object of some kind—and actually be able to turn it with your hand, and to manipulate what you're looking at so that you—there's just a richness there that's possible. Some kind of incredibly complicated set of data rather than it being a presentation of conclusions only, (where some data scientist has decided or a sociologist or somebody has decided this is what they want to prove from the data, so they made a conclusion, and they're showing you a graph of that conclusion, and there's that one flat graph of, "this is my conclusion—look, here's the proof") this instead could be something where as you said it's more about curiosity—it's more about there's data here, and clearly there's some conclusions that you can make very quickly, but also there's a complexity here. And rather than over-simplifying it for you, we're going to give you the complexity. And we're going to let you play with that complexity, and think about disease and populations, or economic growth and jobs, and what's happening, and how changing this one variable in this complicated set of data had these kinds of impacts. Or if you look at it, if you filter it by this, it's this. You can start to see this kind of thing. Is that part of what you like to get at sometimes, or—?
- Scott
-
I think yes to all of the above.
I think there are 2 main things going on in what you just mentioned; so on one hand there's the intent or the communications design goal of any particular visualization. So usually you hear about the distinction between exploratory graphics or explanatory graphics—so that's an easy way to divide everything into 2 camps.
So on one hand, maybe you're a scientist or a researcher or a journalist or something. You've collected all this data somehow, and you don't know what the key story is. You haven't decided what you're findings are yet. You're just trying to explore it and get a sense of if there are any patterns or trends that you can pull out of that data. So on one hand, that's one particular design intent, and that's not usually unless you're the one doing the exploring. You don't see those kinds of graphics.
On the other camp, you have so-called explanatory graphics, and so these are the ones where somebody has already done the research, and then they're trying to either share their findings with you, or persuade you of a certain point of view or convince you of a particular way of looking at this data. So this is more typically what we think of when we think of the role of visualization—is you're trying to communicate that information. A lot of design is about communication, and part of that is annotation—showing different views and introducing different ways of sorting and filtering and all that stuff.
So to me, that's where the second issue comes in of interactivity. So you have your design goal, and then you have whatever interaction opportunities are available to you. So on the tablet—obviously—you have these touch-based interactions. We have different gestures you can use. Using the mouse and keyboard model, you have other kinds of gestures—it's more limited—that you can use. And then now we're getting all kinds of crazy devices and sensors like depth-sensing things like Microsoft Kinect or Leap Motion (a lot of devices that aren't hooked up to your regular computer quite yet or not everybody has them).
But, I think it's really exciting to think about, how can we take visualizations, and what new communications opportunities do these different interface options give us? What does it mean if I can wiggle my fingers around in front of the screen, and the computer is responding to that? Does that mean I can spin a 3D globe around? And just because I can do that, is that necessarily useful? [Laughs] There's a lot of stuff we can do that looks really cool, but it's not actually useful for communications purposes.
- Jen
-
There's a certain complexity here. I feel like for people who have access to some set of data, and they want to do something with it, maybe you can help talk people through your design process? Or when you're teaching students or teaching other people—how to design with a data set? Or how do you make decisions of—what I mean is, it used to be really simple. I remember back when it was, "You should make a graph! Which kind of graph do you want to make? Do you want to make a pie chart or a bar graph?" These are the 2—
- Scott
-
Is that Clippy talking? It sounded like Clippy's voice there for a second.
- Jen
-
[Laughs] Yeah, the options were so simple, and now there are not only a whole lot of options there already prebuilt that you can use, but there are—there's the possibility of coming up with something brand new. Designing something that hasn't quite yet been seen, and so how in the world—as a designer with a project—do you go about figuring out what the heck are you going to do?
- Scott
-
Yeah, that's the magic question, and you mentioned the p-word—process—one of my favorite words. [Laughs]
So, that's the big question to me. It comes down to intent. So what are you designing for, who are you designing for, what are you trying to accomplish? But I think that is the exciting thing about getting into this area right now.
So anybody who is new to data visualization or is interested in visualization—it's more accessible than it's ever been. There's more public data sets and content. And there's more raw material to work with. There are also more tools than ever to enable this kind of thing. I know for me growing up, the only visualization that the regular person would do on the computer would be using Excel and the chart wizard to plot out some numbers. And even just doing that, I think I never considered that visualization. I wasn't thinking of it in those terms. It's just, "I'm making a chart." And we had Excel and numbers, and Open Office, and Libre Office. All these other tools now help you very quickly to take a spreadsheet and generate a chart, and that's great. But—as you point out—it doesn't allow for a ton of customization, and it doesn't allow for experimenting with new ways of presenting that information beyond what the program can already handle.
So what I think is really exciting about all these tools that let us do visualization on the Web now, is that they're not confined by this template model of "You need to make a bar chart, you need to make a line chart, you need to make this kind of design or this other kind of design." Instead, a lot of these tools are really open, so you can create whatever you want—which is exciting and enabling and powerful.
But at the same time, you need to be super-careful about your design decisions and really considerate with your design decisions, because it's very easy to make—[Laughs] make stuff that's completely useless. You might understand it, but nobody else understands it.
There's a hashtag going around I think it's called #D3brokeandmadeart. Some of these people will post things like, "I was trying to make this really complex thing, and I had this grand vision, and somewhere along the way I screwed up the code, and it made this crazy geometric pattern, and now it's art, because you can't understand what it really means, so I'm just going to call that art." [Laughs] So that's what you have to be careful of. If you're trying to make art, that's great—make art. But if you're not trying to make art, then be careful. [Laughs]
- Jen
-
It feels like journalism is eroding into just a bunch of mush, and one of the things that I've seen going around is—and sometimes it's a mainstream publication that's doing this (what, you guys have no standards anymore?)—it will be, "Hey, we want to show that there's a trend here, so we're going to make a bar graph of this trend, and look, this thing went way up very steep." But what they didn't show you is they chopped off the bottom 80 percent of the bars, so it looks like there's this steep change, but actually they're only showing you from 80 to 100. And between 80 to 100, it went up, but if you look at the whole thing from 0 to 100, you realize that slope is pretty shallow. Or people manipulating data and manipulating the presentation of the data in a way that makes it very disingenuous. I mean they're lying, they're saying, "Hey, there's this big trend," (when there is no trend) or "This went up," when it really went down. Not only can you make it really confusing, but you can make it into a lie.
- Scott
-
Yeah, and sometimes it's intentional, and maybe most of the time it's unintentional, and that's one of the core issues around honesty in graphics. Because you have to have some—so if you're trying to say—you mentioned journalism—so if you're trying to tell a specific story or report on something, ("This value went up a lot, or this value went down a lot") you almost need to have some value judgement to make the case that it's an important story. So there the value judgement is, "What does a lot mean?" This number went up or went down—hopefully nobody disputes that, because we can see the number. But did it go up a lot, [Laughs] did it go up a lot visually on the page, or did it go up a lot relative to the whole figure?
So there are all kinds of considerations people have written about. Mostly, "What scales are you using for your axis?" And even if in that example you threw out there—let's say even if you showed vertically—you included all the values from 0 up to whatever your highest value is—the X axis is super-important too. Did you just show the last month's worth of data? or did you show the last 100 years-worth of data? And depending on how you set that axis, that changes your context for understanding—framing and understanding—how severe the problem is or how severe or significant the success is.
So if I see a little blip heading upwards, in my bar chart—but that's in the context of 100 years, then I'm going to see a lot of blips up and down (that's the technical term, by the way—blips). [Laughs] But if I'm just looking at the last month, and the first half of the month was really flat, and then the last month that line shoots way up, it seems like, "That's really—that's hugely significant." So scale matters, too.
There's the New York Times just today published a new piece that I think is really great. And you just said journalism is turning to mush, but I think this is a really good one. [Laughs] It's a piece by Neil Irwin and Kevin Quealy, called "How Not to Be Misled by the Jobs Report," and it's not—I'm not really going to be able to describe it super-well, but we can share the link with everyone. The visual is really great. They're illustrating—"Every month the jobs report comes out from the Labor Department and they say, 'Out of this many jobs' or 'We lost this many jobs' or whatever." And these numbers aren't precise. They're some statistical noise involved. It's an analysis. There's no actual way to count the individual number of jobs, so it's an estimate. And the graphic that they show—which I think is really brilliant—here's a chart that shows, if job growth were constant, over the last year, and then here's a bar chart that shows what that could look like given the margin of error. And when we see these numbers reported, we don't see those error bars usually. It's reported as one number, "We added 100,000 jobs." But really, that 100,000 could represent a number as small as 0 or as high as 200,000, for example. So depending on where you fall within those error bars, you're going to get really different kind of chart. So what I think is fun about this graphic, is it shows you on the left, you see a bunch of bars that are all exactly the same height, it looks completely flat. So you're like, "Job growth was flat." And then they show you right next to that a chart that is constantly changing just using random values within that—within the error bars. The possible lowest values and possible maximum values. And you look at that, and it looks like this crazy roller coaster. And those are the actual numbers that we're seeing reported. It's interesting.
Whether or not you actually care about jobs or this particular data set, it's an interesting perceptual experiment because you can look at it, and you can say, "These 2 charts essentially mean the same thing, but visually they're painting a completely different picture."
- Jen
-
There's definitely a literacy question here around data, and being able to understand data, and knowing what a margin of error is, or knowing—I mean I've seen—I was a sociology major in undergraduate school, so one of the classes I had to take was a statistics course and surveys—in just 3 or 4 months. But we studied, How do you write a good survey? What are the issues? How do you analyze the information you get back? Find statistically-significant things rather than, How do you figure out what's a little blip? and What's actually a significant item? And it's funny, because at the time I thought, "Why do I need to know this? This is completely unimportant to my life. I'm not going to become a sociologist."
But it feels like our world has changed in these ways where understanding some really, really basic stuff around all that stuff I just rattled off ends up being important. Because we can be manipulated as people by not realizing that, Where did this study come from? or Who—you are telling me that 85 percent of so-and-so's thing such-and-such, but was that a poll on the internet where people opted themselves into the poll? How did they learn about the poll? That data is completely biased by the fact that it wasn't an actual study, it was just a random internet poll. It seems so scientific, these things, they seem so—it's data! It's foolproof! It's clean! It's unbiased. It's objective. But in the reality—[Laughs] both the way it was gathered and in the way that it's being presented—it could be—there's a lot of interpretation in there.
- Scott
-
So part of that is the data literacy side. Ideally we would all understand statistics—and what's a standard deviation, margins of error, and then separate from the analytic side, be critical of the data sources themselves. Because these numbers don't come out of nowhere—like you said—maybe it's from an internet poll, or maybe it's from a legitimate, recognized polling organization. Or maybe this is from a Super PAC or from my website analytics software. And do I trust my software to work the way it's supposed to work? And so there are all these questions about sourcing the information. That's just about the data. Then you get into visualizing it. And looking at charts or maps of these things, and it's a whole other level.
I like to think about it like early days of photography until relatively recently. People trusted photographs. If you had a photograph, it felt like, "That's an honest record of something that was actually there." And probably people who don't have a ton of experience with photography maybe still see it that way, but now we're used to the fact that photographs are manipulated and in fact any photograph—even the most "honest photograph" you could take—is some impression of reality. It's a manipulation—at least it's being manipulated by the lens you're using or the angle that you chose to take it by what pixels, what information is included in the composition and what's excluded from the composition.
I think we're entering this exciting time where visualization is blowing up and becoming more mainstream—which is exciting—because you have these really complex problems and lots of data, and we have to figure out how to understand them, so we need visual ways to understand them. But I think we're still in this early photography phase where people trust images far too much, and people aren't yet always asking those questions about "I see the data you've shown me, but what's outside the composition? What didn't you show me? And why did you choose not to show me that?" Or "Why are you presenting it to me in this particular scale with these colors or with this labeling? What's your motivation, and what are you trying to convince me of?"
So I think visualization or any graphics can be manipulated just like any photograph can be manipulated. I just think we're—it's more obvious when you manipulate a photograph—usually. [Laughs] But if I tweak some values on a chart or change the scale of how that chart is presented, it's not obvious at all, but visually it could have a really significant difference on how you read that story.
- Jen
-
It's also interesting that because of the devices people are using these days, some of these things—like scale and such—we can hand that over to the user to manipulate. And we don't have to—it doesn't have to be set—we could let people play around with things—or do something a little bit—not really what you were talking about before—but the New York Times graph—where it's so much easier with these super-high fidelity graphics and options to turn things on and off, that you can offer people to put in the margin of error or the things in the past that you really kind of needed to get that information out of there, because there wasn't enough fidelity to include it, but now you could include it. Let me jump in here with one of our sponsors.
- Jen
-
So talk to us about some of the tools that are out there for making data visualizations, both designing—going through the process of designing a data visualization and figuring out what you want to do—and putting it on the web and having it be on a website or inside of some kind of web project.
- Scott
-
There's good news and bad news. The good news is there are a ton of tools available, and the bad news is there are a ton of tools available. It's really overwhelming how many things there are out there. So I can just mention a couple that I think are really interesting.
You have a range. You have tools that are more like Excel's chart wizard in a sense that the visual output is more constrained, but they are easier to use and much, much faster—if you're trying to create a chart, and you're not trying to invent some new custom app or interactive model. There are tools on that end of the spectrum. One I really like is this tool called Datawrapper. It was designed originally for journalists, but it's open and accessible to anybody. Think of it almost like YouTube for charts. You can go to the website, drop your own data in. It generates chart styles, and there's a certain amount of customization—you can tweak the colors and stuff. And Datawrapper gives you an image you can drop into your own site, or you can embed that chart on you site or wherever you want. It's a really—it's nice—it's really super-fast. The interface is really beautifully-designed. It's designed to be very quick. It's for journalists operating on a deadline. That also involves no coding knowledge whatsoever. You bring your own data, and that's it. It's really nice.
I guess on the other end of the spectrum there are tools like D3—which is what my book is written about—that's a JavaScript library where you are writing code from scratch to load your data in and express it visually however you want. D3 is super-super powerful because it doesn't dictate any particular visual style or any particular visual design pattern. You can use it to make a simple bar chart. You can use it to make crazy interactive maps. You can use it to do essentially whatever you want. It's the opposite of Datawrapper. You're not limited by or constrained by certain visual representations of your data, which is super-exciting and powerful, but maybe scary if somebody is just starting out, because you have to go into the process knowing what you want to create.
Those are 2 ends of the spectrum on the web, and there are a ton of tools you can run locally on your own computer. Tableau—the most well-known application—a stand-alone application that lets you drop in data and quickly generate a whole bunch of different views or representations of that data. So that's a great exploratory tool. Then Tableau Now also has different—I think it's called Tableau Public—these add-on options that let you publish those to the web. But it's really designed—its initial design purpose was to let you explore data locally.
- Jen
-
And then around D3, is there a whole community of people who have written plugins who are sharing them with each other?
- Scott
-
One thing that is really—I think—great about D3, is there is huge and growing community of people who are working with it. There's a D3 Google group which is super-active. There's also an official Stack Overflow tag. So if you use the #D3.js on Stack Overflow, somebody will usually answer your question pretty quickly.
One issue—I don't know if it's really an issue—but one challenge is that D3, because it's written in JavaScript, and usually people use it in SVG, so I'm going to throw out a bunch of letters really quick—it's JS using SVG with HTML and CSS and whatever other acronyms you can throw in there—so debugging a D3 visualization can be a little bit tricky sometimes.
Just as much as debugging any regular webpage, because it's a webpage, and I just mention that because we get questions on the D3 list a lot of the time that say things like, "Ummm, y'know, I'm new, and have some data, and how do I make this kind of chart?" and that's all we have to work with. People on the list are super-helpful and friendly and I think really supportive of new people coming in—really kind of welcoming—but when you post a question like that, there's no—that kind of question involves answering 200 other questions first. [Laughs]
- Jen
-
It's like someone showing up and saying, "Hey, I have an idea for a website, how do I make a website?" [Laughs]
- Scott
-
It's like, "I want to make a website—it has to be really cool though—how do I make sure it's cool?"
- Jen
-
It's just too big. [Laughs]
- Scott
-
I know what I think is cool. Or I know what I would do, but it depends on so many things.
Mike Bostok, who is the primary author of D3, has created this service called bl.ocks.org and it's really nice. It lets you post stand-alone snippets of code, and you can run it as a bl.ocks page—what it's actually doing is—if you're familiar with GitHub and using git—and GitHub has this service called gists. And a gist is just a collection of code files.
So let's say I'm working on a D3 project. Here's an example of a really helpful question to ask the list, "I'm trying to use this bar chart, and I'm using this particular data, and I'm getting this error. Can people take a look at it, and tell me what this error is about?" Then you would take your index.html page, your CSS file, your data file, and you can post all of this into one gist on GitHub. And any gist automatically shows up on bl.ocks.
I'm using a lot of weird terminology. If you just look at bl.ocks.org, it will make a lot more sense. But it's really nice, because then people can post their examples and their works in progress on bl.ocks, and then you post that link to the list. So, it's super-nice because then that really helps facilitate the exchange of ideas, and helps facilitate communication, because you can really quickly see, "This is what this person's working on, and I can just debug it in my browser in the normal way, and figure out what's going on, and then send them a response."
- Jen
-
And it also—I'm looking at some cool stuff on here—there's just so many—it's just funny to try to talk about visualizations in an audio podcast. [Laughs]
- Scott
-
Yeah, it's really fun. [Laughs]
- Jen
-
But there's—I'm just clicking, clicking, clicking as you're talking, and looking at just all kinds of crazy different things that people are doing. It feels like many of the subjects I talk about on the show. It feels like there's something really awesome here, but we haven't quite seen it yet. It's not mainstream accepted yet, because we haven't quite figured out what it is yet (here's a whole bunch of different colored circles). [Laughs]
- Scott
-
I'm telling you—that's all you need for any visualization—different colored circles moving around super fast. [Laughs] No, it's really great, and Mike Bostock is a huge supporter of examples. Posting examples after example after example. He's not the only one who is posting these things. There are not just pieces of broken code, "Here's something I couldn't get working," a lot of the blocks are really helpful and illustrative examples that show us, "I'm trying to accomplish this specific technique, how do I do that?" And somebody will write up the example, and post that, and share that with everyone else.
So it's amazing that there are so many out there. It's great. The tradeoff is that it can be a little bit overwhelming. For somebody new coming in—and you're like, "This looks really cool, but what does it mean?"—and that's actually why I wrote the book, because we're seeing so many of these kinds of questions on the list that I thought we need something for beginners that explains D3 from the ground up. First, you have to understand the core concept of how to use this tool. Only then can you get into, "How do I make this kind of interactive map? How do I make crazy colors fluctuating around the screen?" whatever it is you're trying to do. So hopefully the book answers those first 100 or 200 questions, and then you can ask your question.
- Jen
-
So, this book is a great place for people to start if they want to learn how to use D3 to build something awesome. And it's written for beginners. People can just get started without—do people need to know JavaScript to get going on this?
- Scott
-
No, it's written—another thing that's interesting about D3—I first got into JavaScript through jQuery back in the day—whenever that was. I know that for a long time jQuery has been—I guess I was trying to build a new portfolio page for myself some years ago—and a friend of mine was like, "Have you heard about this thing jQuery?" and I was like, "What?" And it made a lot of my visual design tasks and interactivity—interactive elements much, much easier.
So I definitely learned jQuery before I understood anything about JavaScript proper. And I feel like D3 is similarly now bringing a lot of people into JavaScript for the first time. So the book is absolutely written for somebody who hopefully has a little bit of prior programming experience—so hopefully has some knowledge of what's a variable and what's a function—but it doesn't assume very much. It's really intended for somebody who's completely new to JavaScript, and there's even a whole—chapter 3 is called "Technology Fundamentals," and it walks through everything you that need to understand before we even get started with D3. "How does a website work? What's HTML? What's CSS? What's JavaScript? What's SVG?"
- Jen
-
Nice. I will put a link to your book in the show notes. Along with all these other links I've been collecting. People can find the notes for this particular episode of the show at 5by5.tv/webahead/70. (I can't believe I'm up to 70—that's kinda cool.) Check it out. It's published by O'Reilly. People can buy the paper version and ebooks and all the different ebook versions.
- Scott
-
I don't want to sound like I'm just trying to sell the book—although if you buy a copy, that's great. But there's the paper version, the ebook version (which is, if you buy it from O'Reilly, free of DRM [Digital Rights Management] and you can get it in any format for any kind of device). And we also have an online version for free, so the whole book is available. It's published through—O'Reilly has this new publishing platform called Atlas. So it's hosted on Atlas, so what's fun about that is obviously a lot of people read it for free, and don't buy the book. And that's fine, but the reason we did an online version is so we could have all the interactive examples embedded in the page.
Normally, if you're working on the paper book or even the ebook—typically, the way these technology books work, is you have the book and you download the code examples separately, and you work through them side-by-side. What we've done with the online, the web-based version of the book, is just using Remy Sharp' jsbin, we've embedded all the code examples and the book comes with more than 100 examples. Those are embedded inline in the page. It's like you're reading the book on paper, except when you get to the examples, you can start tweaking the code in the page, and see how it changes things.
- Jen
-
I think it's such a great smart way to learn code is with interactive examples of things. CodePen is the same way. You can go to CodePen and start looking for—I'm sure what people are putting a lot of different data visualization over here as well—you can tweak them, and change them, and fork them, and cut and paste the code, put it into your own project, see if they can find an error that you've left in the book—but probably you don't have errors in the book, and if you did, you would have caught them right away yourself because—
- Scott
-
I catch everything, and make sure to—no, not really. [Laughs] We did try hard to get everything lining up, and one thing I am really excited about is that so far, I got all the code examples that you download to exactly match what is in the book. That's one of my pet peeves with books, so they'll say, "Look at example 5," and you look at it, and it doesn't match what you're reading in the book.
- Jen
-
I remember back in the day painstakingly, very carefully writing every character of an example into my code editor—retyping it—but making sure it was exactly perfect, because there's a bug in it, and trying, and be trying to find the bug, because I'm sure that I mistyped something, and finally being, "I know! Wait, I know I it typed it exactly right."
- Scott
-
Then 2 years later you get the new edition, and it says, "That one on page 58 was wrong, sorry." No, there shouldn't be any of those errors. If you do though, we have a process on the O'Reilly site. You can report any errors you have in the book, and for the next revision, we'll get those taken care of.
- Jen
-
I don't even know I even brought it up. [Laughs] Playing with code examples in the book gets rid of all that hassle. Instead, you're focused on learning instead of focused on something silly. These are all fascinating. Going over to bl.ocks and just clicking through. What are some of the things that you've seen that get you very excited when it comes to what's possible using web technology to create a data visualization?
- Scott
-
I haven't mentioned SVG yet, but as a little bit of background, D3 is rendering-agnostic. It doesn't care what you use, it just works well with SVG. And I know you talked with Doug on show 67 about SVG, and the future of SVG. It works well—well, can I give a 2-second overview of D3 before—
- Jen
-
Feel free to get into the nerdy details. I know there are people listening going, "Stop with the over—tell me how to—give them the nerd."
- Scott
-
We'll start with the nerd. Why is D3 called D3? Well, it's for data-driven documents. It doesn't seem like a very visual name—documents, really exciting. [Laughs] The reason it's called that is, D3 lets you create visual stuff, but really what it's all about is loading data into a page and manipulating the page based on the data, so it's a document. In this case, it's your web document—HTML—or really, it's the DOM (document object model—the browser's model of the page in memory), and by loading this data in, you can have the data drive the page, so most pages on the web now are data-driven in the sense that they're being fed by a database on the backend.
D3 is optimized for visualization, but the reason SVG comes into it—and the reason it's great to use for SVG—is SVG, as Doug had explained, is this image format that exists in the DOM. So unlike bitmap images, JPEG, or GIF, unlike those which are just binary data, those exist as a magic rectangle in your page. You can't touch what's inside of the image with JavaScript. You can resize it, move it around, but that's it. If you look at SVG code, it looks a lot like HTML code. It's XML-based. It uses the same structure—like the angle brackets—and you have attribute names and values. Because it exists in the DOM with your headings and your paragraphs and your <div>s, you can manipulate it with JavaScript the same way you can manipulate anything else in your page. That explanation sounds a little tedious and boring, but it's powerful to think about an image format that you can manipulate dynamically. So you could have an image that you've already made and use code to manipulate it. You can animate elements. You can remove elements. You can add elements. I think it's powerful to think about it as, "This is an image. This is like a JPG except in my JPG I'm going to take the tree and move it over to the left, or I'm gonna take the circles and move them up and down, or I'm going to make the rectangles taller or shorter based on whatever data values I'm logging in."
- Jen
-
It means—the word interactive gets used without it necessarily meaning anything. It's been overused, but what you're describing means that if you make an interactive data visualization, that the person who is acting—interacting with the data visualization—is not—you can do more than simply offer them buttons to maybe turn on and off certain layers of the image or to zoom in and out, but they can reach in and manipulate the data directly—that the visualization itself can be changed and morphed by the actions of the user, or the interactions of many users, or the data as the data's changing in real time, and all those together—the data can be changing in real-time and individual people could be manipulating it, and the aggregate of a whole bunch of people manipulating it simultaneously could be captured, and affect the data itself all at once.
- Scott
-
Whether or not it's a good idea to enable that—like all other decisions—but you absolutely could do that. And it's up to the designer how interactive you want it to be or not. D3 graphics don't have to be interactive. They can be snapshot images that don't change. But there is the interactivity with the individual user. So can I zoom in, or zoom out, or change the scale or filter and focus. Then there's the interactivity on the data side. If this data source is not static, but it's changing over time like most data—then and every time I visit the graphic, it's going to be automatically-updated. But it's all up to the designer which of those pieces you want to include or not.
- Jen
-
It feels like there's a little bit of—and plus, time—I mean that's the other thing. The resources to code something up. It feels like there's a bit of a gap. Thinking about imagine, plan, and then have the time to build something out is—
- Scott
-
[Laughs] That was why it took me—I think D3 came out—the initial release was I want to say just 2011. Is that right? That sounds really recent. I want to check, because I don't remember. I can find it later. It doesn't matter. So it's pretty recent. It's a new tool. It hasn't been around that long, but when it came out, I remember hearing about it and thinking, "This is gonna be the next thing. It's really powerful." But it also, at the time, wasn't thoroughly documented, and there weren't as many examples as there are now, so it was a lot harder to figure out hot to use it. And because you mentioned time. That was one of my first challenges was I was doing freelance work at the time, and I just couldn't devote my own time to learning the tool until I was able to get a client who had a project for which D3 was really appropriate. Because it takes time to learn something new, and it takes time to wrap your head around how to use it.
- Jen
-
Especially when the whole internet doesn't know how to use it, and it's not well-documented already.
- Scott
-
I hate that when the whole internet doesn't know how to use it. [Laughs] Depends on which one you're on of course.
- Jen
-
It looks like 2011 is when it came out.
- Scott
-
One thing that is great now—that wasn't the case in 2011—is now there are a bunch of these frameworks and other tools built on top of D3. Some of these are easier to create charts ways that you can get that visualization up and running faster without having to write so much plain D3 code by hand. And a couple of these are one called Vega; I think one of the first ones that have been around the longest is NVD3. There's a new one called Fusion or Contour. I really need to update my list. I have a whole list of these in the book which I'm going to be updating, because it seems every 2 weeks someone releases a new one.
But these are really fantastic tools, because you can—it's not as user-friendly as Datawrapper—but it's a way you can get your data loaded into the page quickly, get a chart up and running quickly, and get all of those benefits of running a live, interactive graphic in the browser without having to spend months learning how to serve code from scratch, if you're new to that.
- Jen
-
The thing I love about some of that—going to that kind of option—whether you're forking somebody else's open source code that they are putting out in the world, or whether you're using something like Vega—where Vega or NVD3 where you—quick onramp, but then if you decide, "yes, we do like this, let's keep going, let's iterate this, let's brand this, more our brand, or let's change it this way, or that way, oh, I almost love it, except I wish we had a button right here"—but when it's built on something like D3, then all the power—then you can do that, you can just reach it and customize it. Instead of having something that's a little more cookie-cutter—in which case, you're kind of stuck with, "that would be awesome to have a button there, but we' can't have that button, because they didn't make us the tool to do that."
- Scott
-
That's the other thing. It's all open-source and free, [Laughs] so you can download it. And it's all based on web standards, so anything you already know about making web pages and interactivity, you can use with D3. It's not like you have to wait for D3 to support a button feature. You know how to make a button already, so just make the button. Let me jump in here with our other sponsor, Media Temple.
- Jen
-
So what else are you going to tell us? Juicy stuff about D3 and the nerdy for the people who are still like, "More details!"
- Scott
-
So even more nerdier than that?
- Jen
-
Get more nerdy.
- Scott
-
We can talk about mapping for a second.
- Jen
-
Let's talk about mapping.
- Scott
-
SVG. Say you're doing a world map. Each country is just a shape, and in SVG, that's a path element. The path element lets you draw any arbitrary shape you want. D3 has ridiculous geographic mapping features built into it. I think it supports—so the key thing with any mapping is what projection you're going to use. I think of projections as 2-dimensional scales, so say you have a simple linear scale—data values go from 0 to 1,000—but I need to scale that, so that it works in my on-screen graphic. And my on-screen graphic is only 500 pixels wide so I have to take a input domain from 0 to 1,000 and scale it down to 0 to 500 for pixel values.
Projection values are just the same thing. It's just that we live on this 3-dimensional planet. It's not flat, so we have to—there's some fancy math involved and you're converting longitude and latitude into x values and y values. And the way that you choose to do that—which projection you use—and each projection is just a different algorithm. It's going to be very political, and traditional cartographers are used to studying these things. And tiny countries that want to appear bigger on maps so that they feel more important—they're very sensitive to these issues and come up with some of the globe maps that I've worked on.
In any case, the great thing about D3 is it supports—I think at this point, it supports over 70 different projections and Mike Bostock and Jason Davies are the 2— Jason Davies has been primarily contributing most of the projections, and at this point, 95% of these you should never use. They're just there, because it can be done, and it's an awesome demo, and it shows off the power of the tool, and it's great. I think if you go to the D3 documentation, it'll highlight the 4 or 5 projections that normal people should only ever consider using, and the rest of them are experimental. But they do all work, and they're beautiful, and they're really interesting, because they illustrate these different ways of looking at the world.
And we can choose these different ways of looking at the world depending on what story you're trying to tell, what data you're trying to map on whatever you're trying to communicate—the classic simple example is choosing the world maps that are upside down—where they look upside down—where Australia and Antarctica are on top—there's no reason that the northern hemisphere has to go up (it's just that is how we have gotten used to thinking about it) and that's the feeling I get when I'm experimenting with different projections. None of these are honest—just like we were talking about no photograph even is honest—some of them are better approximations than others, but fundamentally, all these are all trying to take a 3D thing that exists in real-life—the planet—and flatten it out so it fits in a web window. [Laughs]
So every map is going to be a gross distortion. It's important to pay attention to how we choose to distort these things, and how these different tools make it easier or harder to distort it in certain ways to tell the stories we're trying to tell.
- Jen
-
[00:59:57] It's another example of something that could be flexible. That you could have a button where users get to switch from one map, to a different map, to another map—totally depending on what you're doing. That might be simple and a distraction, or that might be really profound and very informative depending on what you're doing. That's an option now instead of it being a permanent decision, because you're sending it to a printer and getting the maps printed out.
- Scott
-
Even before in the last 5 to 10 years, this explosion of web-based mapping tools—I know there are tools that are still out there that you could buy, like Excel plug-ins where you would have your data in an Excel spreadsheet, and it would generate a map for you, but it would be a specific map of the US with a certain projection, and there are all of these biases encoded in the tool itself, because it can only make one type of map.
And I think what's—even though now with D3, most of the mapping we see is US-based and it's still world maps, and whatever standard projections. The fact that the tool is available for us to other kinds of work—if you wanted to do hyperlocal mapping—states and counties often have their own custom projections that are optimized just for their jurisdictions for their areas. I forget what (I'm in San Francisco) the San Francisco one is—they all have these crazy-long code names—but the fact is that if you wanted to make a hyper-local data visualization with a projection and a representation that is optimized for an honest representation of your area, you can do that now. And you don't have to wait for—you don't have to purchase a hugely-expensive tool. You can just do this with the computer you already have in front of you.
I think we'll see more and more of this these—especially as we have more international adoption of D3. We get lots of people in other countries. One thing—I love San Francisco, but Stamen Design is based here, and a ton of visualization is going on here. So a lot of times, you see these visualization examples that are just San Francisco or California or the US, and it's what we're concerned about because the people who do this are concerned about their own area. So I think San Francisco is massively over-represented. [Laughs]
- Jen
-
It's ben overmapped. [Laughs]
- Scott
-
It's been hypermapped. [Laughs]
- Jen
-
Even when there was a debate about Google Maps and iOS Apple making their own Apple Maps and everybody was like, "This one is better than that one," and it was like, Iiin San Francisco it's better, but I don't live in San Francisco.
- Scott
-
The little icon on Apple Maps still shows [highway] 101 or 280. I'm excited to see more people in other parts of the world—whole other continents. [Laughs]
- Jen
-
People listening to the show all around the world.
- Scott
-
And people in other cities doing really small-scale local mapping with the same tools. I think it's really powerful.
- Jen
-
I was thinking before the show today about how my grandmother was really into "office stuff" when I was a kid. To me, it was just called "office stuff." She had a career she worked—I think in some ways she had Joan Harris' job from MadMen—running the office—she ran the office—and she had kept meticulous records of their expenses and whatever and budgets and service manuals and whatever. Whatever data was going on in the life of her family, she kept a record of that life, so what that meant in that era was paper. Paper ledger books. She had these beautiful ledger books that were where she would keep all these records, and we wrote, "We spent this much money on having somebody come repair the washing machine," "We spent this much money on groceries," "We spent this much money on gas," or for a business—and you would total these up—this much travel—and everything was manual, so you had to manually look at the ledger and then manually type up a column of numbers, and manually—usually I guess in the mid-20th century you'd have an adding machine of one kind or another—but errors were easy. You had to double-check everything that you didn't have errors, and it was just so much work. It was so laborious to crunch any numbers together—that those simple charts—a pie chart or a bar chart—were powerful, because it was a way to show this information that was not very deep. There wasn't a lot of data, and the data was pretty rudimentary, but then computers themselves—the word computer comes from the job description of the people who did a lot calculations—and then that was a machine that did those calculations instead, and Excel or Lotus 1-2-3 spreadsheets were the biggest reason—the reason computers went mainstream and businesses around the world—
- Scott
-
And VisiCalc—
- Jen
-
So in some ways it's like having this data, and figuring out how to crunch it, and figuring out how to visualize it is the reason that computers exist. And we haven't quite—we've been using the web for other things—but 50 years in, 100 years in, well 80 years in now—we're—the data we're collecting. I went to the grocery store to buy lunch, and I collected data on myself going to the grocery store. [Laughs]
- Scott
-
What did you collect? [Laughs]
- Jen
-
Steps. [Laughs] And the speed at which how quickly I took those. I mean I didn't collect only that data, but as I was walking to the grocery store thinking about the show today—data… I'm collecting data right now! [Laughs] So we have so much data, and then how do I look at that data? Well, I have an app and there's a website, and it's a Fitbit, so I have a particular set of graphs that are being offered to me. But it's not satisfactory enough. I want more graphs. I want more data visualizations. I want to be able to mash this data together with this other data. These tools are becoming so powerful, and the demand is growing, growing, and growing, and it will get more and more sophisticated.
- Scott
-
I love that reminder too: that the original computers were people, and they—essentially their jobs got outsourced by these robotic computers—which are not people. [Laughs] So now, computer means something else, but the computer—it is a weird thing.
And I love your point about the original computers were for computing, data processing, data making sense of data, and that's all they do now—which is something that I try to introduce in my classes with students, is we experience them as these interfaces on these beautiful screens, and we can touch the screens, and move pictures around, and watch movies, and send emails, text messages—but computers don't really know what a text message is. It doesn't know what a blog is. It doesn't know what a map is. It just knows, "I have some numbers over here, and I'm going to shuttle these numbers around these different parts."
So all it's doing is computing at a low level, but it's a weird thing to point out—to think that this grey box on my desk I call a "computer"—I mean shouldn't it be called a map-making, emailing, document-writing—it should have some more complicated name. [Laughs] It does so much stuff.
- Jen
-
It's the clock radio, newspaper, television set—
- Scott
-
Movie theater—
- Jen
-
Phone smashed together—
- Scott
-
It's crazy, because all that stuff can be encoded as data, so that's another problem too, when we're talking about data and data visualization is—how broadly do you want to define that? Does that mean—that's where the more interesting artistic uses of data are come in—when you're thinking about transcoding and taking data like an audio file—so I have this sound coming in, but I'm treating that sound as though it were position or some physical emotion—and I'm going to represent it accordingly. That's a whole interesting field of practice is taking data that was intended for one purpose and using it for a different purpose.
- Jen
-
We assumed the whole show that we're talking about data sets—and data about the economy, or about populations, or about the weather, or about physical activity—but you're right in that now that we've digitized more, and more, and more of our lives—most of our communications, most of our entertainment, most of our news. It's all coming through a digital format that is data-driven, and there's data back there, and you can do weird things with that data to visualize that activity somehow.
- Scott
-
I would encourage people to do weird things with that data. [Laughs] See what comes out of it.
- Jen
-
What does this podcast look like if you turn it into some sort of data visualization?
- Scott
-
Turn the podcast in to a TV show, or—and then take that TV show, and turn it into a poster, and then take that poster and turn it into a dance performance. I don't know. It's ridiculous. You can do anything now. [Laughs]
- Jen
-
Thank you for coming on the show, and talking to us about data visualizations and D3.
- Scott
-
Of course. Thank you so much for having me. It's been fun.
- Jen
-
Thanks to our sponsors today Media Temple and Squarespace. You can again check out notes from the show at 5by5/thewebahead/70 where also there are coupon codes for the deals that we have from our sponsors. You can follow Scott on Twitter—what is your—I'm assuming you have a Twitter account.
- Scott
-
Alignedleft like—aligned left, right, or centered—and that's my website too.
- Jen
-
Aligned Left dot com. Where you have a whole portfolio of projects you've worked on with lots of lines and circles and dots and maps.
- Scott
-
I'll give you—there's something I did on there that's not on the website. I'll give you the link that you can share with people. It's an example of a map represented in a completely useless way that would be fun for people. It includes circles—bouncing circles and actual countries. [Laughs]
- Jen
-
People can follow me on Twitter at jensimmons or the show itself @thewebahead. And that's it until next week! Thanks for listening!
Show Notes
- Scott Murray — alignedleft
- D3.js - Data-Driven Documents
- Gallery · mbostock/d3 Wiki
- Interactive Data Visualization for the Web - O'Reilly Media
- Interactive Data Visualization for the Web
- How Not to Be Misled by the Jobs Report - NYTimes.com
- Datawrapper
- d3/d3-plugins
- Data visualization tools
- Tableau Public | Tableau Software
- (99+) d3-js - Google Groups
- Newest 'd3.js' Questions - Stack Overflow
- API Reference · mbostock/d3 Wiki
- bl.ocks.org
- bl.ocks.org - mbostock
- Vega: A Visualization Grammar
- NVD3
- xkcd: Map Projections
- The United States of Bouncy Balls
- Webcast: Data Visualization - The Value of Process