Episode 1

HTML5 with Peter Lubbers

September 12, 2011

HTML5 expert Peter Lubbers joins Jen Simmons on the inaugural episode of The Web Ahead to talk about what the heck HTML5, web apps, local storage, offline caching, and web databases are.

There are all sorts of things that we haven't actually been able to do — that we haven't really thought about because we have never been able to do them — that are now going to be possible. The browser itself is radically changing.

Transcript

Thanks to Katherine Senzee for transcribing this episode.

Jen

Welcome to the first episode of The Web Ahead! It's a new podcast on 5by5, a weekly show about the web — about what's happening on the web right now, and all the things that are changing. There's tons of changes happening on the web, almost so quickly that we can't keep up with them. So I want to use this show to talk about those changes, and what's happening, and what we can expect in the future of the web. And by web, I mean the whole web. I mean the desktop web, the mobile web, the web on the refrigerator, the web on tablets, websites, web apps, web APIs, really anything that's hooking up with the internet and the web in some form or fashion.

This week, the first guest that's going to be on the show is Peter Lubbers. Hi, Peter.

Peter
Hi, Jen. Nice to be here. Thanks a lot.
Jen
So Peter works at a place called Kaazing. He's a trainer and travels around and does a lot of HTML5 training, especially some of the geeky details around HTML5 APIs. He runs the HTML5 user group in San Francisco, which has — I think it's 2,000 members now?
Peter
It's growing so quickly it's almost 2,400.
Jen
Wow.
Peter
That just happened in the last few weeks.
Jen
That's crazy. Yeah.
Peter
It is pretty crazy.
Jen
Giant meetups in San Francisco to talk about HTML5 and other sorts of technologies. And Peter also wrote a book called Pro HTML5 Programming, which he wrote with Brian Albers and Frank Salim. Really great book, one of my favorite books actually on HTML5. It was one of the first ones that came out.
Peter
Thank you. Yeah, it's, I think, the second. Well, there was Introducing HTML5 from Bruce Lawson and Remy Sharp, that came out I think a month or two earlier, and of course there was Jeremy Keith's book on HTML5 for web design, but yeah, this was one of the original three.
Jen
Yeah, and it's funny, because I'm a designer, and as a designer, this is the kind of book that I probably would not have bought, because it's very much about the APIs. But — and this is how I met you — I was in China, stranded, just lonely, lonely, stranded in China. And on Twitter, occasionally, when I could poke through the Great Firewall of China, you had tweeted that the HTML5 user group in San Francisco was about to hit — 1,000 members?
Peter
Right.
Jen
And whoever was the one thousandth member would win a book. And that was me. So you sent me your book so generously, and then when I read it, I was like "This book is awesome!" And then it went on from there. Then we were in touch because you're just so accessible and easy to learn from and get to know.
Peter
Thank you. It's actually funny, because not too long ago we had the 2,000th member, and to keep the tradition alive, we also mentioned "Oh, we'll put a book for the two thousandth member," and sure enough, I think people were really timing it, to where two people — I think one person used a separate account to boost the number, and then removed the account, so that two people actually within three seconds of each other became the two thousandth member. So it was like 2,000, then one down, and again 2,000, so I ended up sending out two books.
Jen
That's funny. Yeah, I feel like I got the 1,000 because I was in China and I was in the opposite timezone, and everyone else was asleep.
Peter
That's right.
Jen
So let's talk about — so on this show, over the course of how many ever episodes we have over the weeks and months ahead, we're going to talk about all kinds of things, not just HTML5. But one of the big first things that I think we think of these days, when we're thinking about the web and all the changes that are happening on the web, is HTML5. I think we all have already heard some about it, about the spec, maybe a bit about the history of the spec. But there just seems to be a lot of confusion these days about what HTML5 is, and why we should care. I keep hearing all kinds of crazy things, like "Oh, it's just a bunch of hype, it doesn't really matter." Or "It's required for mobile. HTML5 is the same thing as mobile. If you're doing a mobile site, you do it in HTML5. If you're doing a desktop site —" which already there's a problem right there, right? — but "if you're doing a desktop site, then you should do the site in XHTML 1.0. And that's what HTML5 is for." There just seems to be a lot of confusion. What do you think? What the heck is going on with this stuff?
Peter
It's true, there's a ton of hype around HTML5 and its feature set, and I'm not a big fan of the hype. But to some extent, on the one side you have the people that tried to put HTML5 in the right spot, in terms of where it sits in relation to other specs, like CSS3, SVG. All of those things, in some versions of HTML5, they would be included. Whereas other people like to really keep it to the point, and say, okay, those are clearly handled in different working groups, or different specifications altogether; you shouldn't be confusing things. And I agree with that, right? But there's different perspectives of this. If you look at this from a web developer/web designer point of view, we all know, you go to the CSS specification, look something up; you go to the HTML5 spec, and there's a whole bunch of what Bruce Lawson likes to call "new and exciting web technologies," and the acronym NEWT. [Both laugh] There is actually a — he proposed a logo for it, a little picture of a newt, and it was actually kind of funny, but it didn't go anywhere. In fact I would say that all those new and exciting web technologies — the other way to look at HTML5 is that it includes all of them. You can argue about that all day, and it's not really meaningful. How I describe it is, for the developers and designers, it's kind of important to know where all the things are and what they're part of. But if you think about what HTML5 has become for, I'm talking to people that really aren't web designers — are literally just maybe high-level architects, or common end users, like web users — and they are starting to hear about HTML5. It's been a good thing to use that as a way to describe all of those new and exciting web technologies: if all of that is HTML5, people can sort of grasp that. If we start compartmentalizing it, and saying, "Okay, that's CSS3, and that's technically that part of the spec, and SVG is a separate thing," well, then you've pretty much lost the more common audience. A lot of specifications — like for example web sockets used to be part of the original specification, and it was sort of moved out into its own spec — in a way, you could argue also to keep the discussions focused, to really drive things to completion. One of the interesting things that is starting to happen right now is all the experimental features in HTML. We've had this for CSS for a long time. You've had -moz-border-radius, that would be for Firefox, a rounded corner on an element. You would have these browser-specific prefixes until the specification's really fully implemented. Now we're actually seeing that for APIs. There's MozWebSocket for Firefox until the spec is final. So it's kind of an interesting model, because the one thing we don't want to get into, clearly, is that HTML5 is finally ready (and there's many dates flying around that some people are pretty upset about: the 2022 date, for example, and we can come back to that) but one thing we don't want to get to is, HTML5 is finally done, implemented in all the browsers, and then it's going to take another ten years to get the one feature that was pushed out of HTML5 into HTML6. So the working group has already started, but they've moved to the living standard; it's really not a versioned spec. Features that browsers want to implement, they can pull in, fully implement, either under an experimental flag, or regularly, and then that's how the web wil continue to evolve. Because the last thing we want is to have another point in time that we, ten years from now, look back to and say, "Oh yeah, remember that? That was so cool back then, but look, we haven't had anything change for another ten years."
Jen
The way I've been thinking about it, especially in terms that sort of non-nerds can understand, is that the web is driven by HTML. The whole web runs on this thing called HTML. And for whatever reason, there was a lot of innovation and experimentation in trying different kinds of stuff out, from when the web first started around 1991-1992, until like 1998, 1997, 1999, maybe 2000 (and there's a lot of details about the history if you're into that kind of stuff that you can go look up) but just in broad strokes, there was a lot of different things that were tried out, a lot of things that were added, and then things sort of stopped for ten years. And instead, it feels like all the innovation happened, not really in the browsers and in the HTML, but in CMSes and ideas of what users do with a website. There was a big explosion that got labeled Web 2.0, with this idea that people can log into a website, that people can add content to a website, a website like Flickr, where people from all over the place add the content that is the website. Or that you could buy stuff, with eBay and Amazon, things that now we take for granted, and things that now are just sort of assumed, "Oh, that's what the web is, you can do that on the web." But there was this time when we invented all those things. And it feels like we went from changing HTML and the underlying technology of the web, to keeping that the same for a while, and figuring out new things to do with it. And we did switch from doing layouts and visual design, from kind of the old school, the tables and the horrible inline font styling and stuff, to CSS, and there was a lot of sort of experimentation and a lot of new ideas about how to do stuff with CSS that was around 2002, 2004, stuff like that. But then it felt like that kind of stuff got really just stuck, for political reasons that we could go into, but yawn, whatever.
Peter
Mm-hmm.
Jen
But to me, when people say HTML5, what they're talking about is, the dam broke, and the sort of stuckness that we had has gone away. And now a lot of the ideas that we had during that CMS implementation era are now being put back into the browser. The HTML and, like you said, the other kinds of things that have to do with the browser — with CSS3, HTML5 and all the other things — are catching up to where those ideas were. Instead of having to sort of force on top of the web browser something that the web browser doesn't want to do, now you can have the web browser do the thing that you want it to do, because the web browser knows what that is. There's all sorts of things that we haven't actually been able to do, that we haven't really thought about because we have never been able to do them, that are now going to be possible because the browser itself is changing radically at this point. And it's almost coincidence, although probably not coincidence, that mobile sort of hit at the same time. So we're going from laptops and tower computers to phones and tablets and refrigerators and all these other — cars and stuff. [Peter laughs]

Peter
Yeah, and a couple of points about that: the original work really started off in 2004, when a small group of people sort of split off from the W3C to start a working group. Actually the original name of HTML5 was "Web Applications 1.0." I think that's kind of important to keep in mind: yeah, you can use all the features in a website, and a web application — there's not that much difference from one angle, in terms of the files you use; the underlying architecture's the exact same — but a web application, the kind of things you can now do with HTML5, it really — I mean, it's fitting that it started out as "Web Applications." There were features that they wanted to add to HTML, specifically for applications, making the web just that much more powerful, and it was started mostly by a group of people that worked for the different browsers. It's nice to have ideas, but if you are creating a spec, that is theoretical, and it's ultimately up to the browser vendors to actually put these features in. And that's what made this really so powerful.
Jen
What do you think an application is, anyway? I think that's such a funny word.
Peter
It is, right?
Jen
What is an application?
Peter
A web application, in my opinion, is something that has a specific focus. I like to think of it as something where we use a verb: to do something. One of my favorite apps — and I travel quite a bit, as you pointed out, for HTML5 training. I use this app; it's a Chrome Web Store app, but you can use it in any browser, really. It's called Hipmunk. And Hipmunk is a very fun way to search for travel. They have a pretty genius idea to rate things based on — what do they call it again? I always forget —
Jen
The nuts? The number of nuts they have? They have a five, what, scale of one to five nuts? [both laugh] I don't know, I'm just making it up.
Peter
No, it's an "agony rating." You may have a flight that is the cheapest, but it's not the most direct one. You may have a seven-hour layover somewhere, and although it's cheaper, you have this much higher agony rating. So they list it — I mean, they can list it any other way. But you go to the site — and okay, so it's a Chrome Web Store app, and I'll come back to what that really means under the covers — but the app, if I go there in Firefox, it'll have the same exact experience. And it's very focused on doing one thing. So I just plunk in the from, to, departure dates, and I'm up and running. I'm not going there for information; I'm not going there for browsing Hipmunk and related things; I'm going there to really do something. That, to me, defines what an app is. The one thing, of course, that then would be different between the Chrome Web Store app in Chrome, installed as an app, versus in any other browser just over the web, is really some additional power that you grant to this application. For example, a Chrome Web Store app is nothing more than — well, at the high level, it's like a link to that website, packaged into a manifest file. And the only addditional things that it has are things where you can tell it, you get unlimited local storage, and things in application cache that the offline web application feature in HTML5 — that is really exciting. It can give you more storage space, which will improve the way that the app works, but doesn't prevent it from working in any other platforms.
Jen
It seems to me that this sort of idea of a web app is more of a marketing "how do people find out about your thing/whatever/thingamathing" than a different — you know, it's a website. It's a website with a bookmark-to-my-homepage kind of feature. And you go to the "App Store" to find the "app". But it's funny, because if you think about computers — especially computers before we hooked them up to the web all day every day — an application was this thing you got on a floppy disk that you installed onto your machine so that you could do some things on your machine. Of course that's not what it means now. This term "app" means a lot of things. But it feels like this deep philosophical question to be like, "What is a web app anyway?" Where's the line between a website and a web app? But I think that's a good — if the person who's coming to your web URL is on a mission to do something that's a verb. That's a good, different, it's a verb rather than a —
Peter
Mm-hmm.
Jen
Because I do think that a lot of people right now kind of assume that websites are giant fancy brochures. Still. Even now in 2011, that's still a legacy from the web coming from print, and the web coming from Hypercard, and that index card — the idea of "Hey, this'd be awesome, let's do this, let's make a whole bunch of things that are just like index cards, and we'll link them to each other. And then when you click on the links, you go from one card to another! Isn't that awesome?" That was revolutionary 20 years ago. And I feel like we're still a little bit stuck on that idea of, every web page is a page, and what you're doing is reading ads and bits of content embedded between the ads.

Peter
The other thing is, in the web apps, you go there with a specific purpose in mind. Like you said, underlying, it's the same exact HTML, CSS, JavaScript, but this is where HTML5 comes in. I mentioned a couple of these things, like local storage, application cache (or offline web applications), different functionality that we started out talking about a little bit. When we think of applications, you typically think of desktop applications, and the kind of things that they can do, and you go there also with a specific purpose, to produce something, or to do something, and the web is now really very rapidly becoming on par, or possibly even better, with desktop applications. It's like you can get full-blown client-server applications into the browser now.
Jen
That's the idea with a Chromebook, is that you don't need to run applications on your computer; all you need is Chrome.
Peter
Yeah, that takes it even a step further, where the web browser is basically your OS. To a certain extent, that's really now possible. That was just something that you couldn't even dream of before, and HTML5 really — all of those additional things, it's a plugin-free paradigm. Before, you had a browser, and it wasn't capable, or it wasn't very capable. So to really do the things you wanted to do, you would install some kind of plugin: Flash, Silverlight, some proprietary plugin. And one of the big problems with those plugins is that — well, there's a plethora of problems. First, it's integrating with your page. You know how hard that can be. Clipping issues and the Flash plugin sitting in front of all your content, and your not being able to drag menus around, and it's just not integrated very well in your environment. It's running in a separate execution context. But also, if there are any problems with them, then it's back to you to figure out — you don't have much control over when that's going to be fixed. So you may be running with security vulnerabilities. You just don't really know. A good example of that was web sockets. The original protocol went through quite a few iterations. In Flash, and with the Java plugin, you can have a socket, and now the web browsers offer what's called web sockets. It's a socket connection in the browser. Some research was done, actually based on known security vulnerabilities in Flash and Java, in those plugins, to see if web sockets would have the same problem. And sure enough, in this, I would say fairly theoretical, test, it was proven that yes, web sockets had the same problem. But now, probably not even more than six months later, we have a completely upgraded protocol version that browsers are implementing. Yet those plugins? They continue to have the same security vulnerabilities as they always had. They just haven't been fixed. So it's really kind of important.

Jen
Yeah. A good example of an application, web apps I think a lot of people are familiar with, that have been around for a long time, is Google Apps. So there's Google Docs, there's Gmail, there's a Google calendar; those are all applications that run inside your browser. But Google needed a giant team of engineers to pull that stuff off in the old HTML days, the HTML4/XHTML 1.0/1.1. I mean, that's part of what Google is. Google's giant. They have amazing giant smart engineers, and they could sit around all day and figure out how to get Google Docs to run. And it seems to me now like part of what HTML5 is all about is putting some of those same tools into the real web browser, so anybody can use them. If you want to store information in the browser on the — if you want to make a page where people go and they write a bunch of stuff and it gets saved, as they're working, and it's saved locally on their computer, you can just use the new HTML5 APIs to do that. You don't have to write all sorts of crazy code on top of the browser, like what you're saying, with Flash, or Silverlight, or some proprietary secret thing. And it seems to me like it's also a matter of scale and budget. Instead of having 75 engineers working on this problem, you can have three working on this problem, because they can use these tools that are already baked into the browser.
Peter
That's a great, great point you're making, because one of the things that I really enjoy about HTML5 is its simplicity. You could look in many places in HTML5, starting with the doctype, or the new character set metatag, and you look at the old way that you specified a doctype — I mean, who could ever remember that? You could maybe cut and paste that, or something, and hope it was right. But now, it's just simple. The same for all of these new APIs, right? So the additional power, I mean of course it's - many people, when you talk about HTML5, they compare it straight with HTML4, and it's like, okay, what else is new? Because, okay, you've got some new elements, but ultimately they think about markup for pages. But HTML5 is a lot more, it means a lot more —
Jen
A lot of stuff that's not markup —
Peter
Yeah. Exactly. And a lot of things like we talked about, local storage, web workers, web sockets, all of those are fairly invisible. But even features like geolocation, and to some extent the canvas stuff, yeah, it's a new element, but you can't really do all that much with it if you don't use Javascript. So this simplification: If you look at Gmail — like you said, Google Docs, and so on, but Gmail is a good example of this, where Google actually open sourced a lot of the libraries. It's called the Google Closure project. One of their flagship products is the Closure Compiler, which is just a phenomenal piece of work. But if you look at the kind of code that's required to do the kind of things they're doing, yeah, then you do need this incredible huge team. And it's not to say that in HTML5 it's just click of a button and it's done. Of course not. But the kind of things that people were doing in the past, that really weren't designed into HTML5 right now are designed right in. So you have storage APIs to give you access to a lot of storage, where before you had to do clever things with cookies and server-side interaction, and now it's just, store things locally. Or open a socket connection, have your bidirectional web traffic. Application cache. It's all very easily done, and it gives you this incredible new power to do things that just weren't possible before.
Jen
So let's talk about some of those. I especially am always thinking about web designers, because people aren't going to ever build this stuff if it's not planned into the project. If the client doesn't ask for it or want it, if the designers don't know it's possible, then the developers aren't going to get a chance to do it. So in sort of non-geeky — well, less geeky, or whatever; it's all partly geeky — but storage. Storage is one thing that really excites me.
Peter
Yeah.
Jen
Is is three or four flavors? And let's list them and let's explain them, what that stuff is.
Peter
Actually you could make a first distinction between web storage and then web database storage. So the first category is simple key-value pair storage — that would be web storage — and there's two flavors of web storage. One is called session storage, and the other one is called local storage. And then there's the database storage, and let's come back to that one.
Jen
And there's offline caching, too. So let's explain what these are. Especially for someone who doesn't even know what a key-value pair is.
Peter
Yeah, perfect. So web storage - well, okay, so the first thing is — let's come back to the application cache, or the offline web apps. That's really for caching whole files. Complete CSS, HTML and so on. Web storage you could really think of more as data storage. You have an application; say you wrote a game, and it needs to store incremental data about your score, or something that it needs to just track while you're playing this game, and maybe later on some of the key details of that need to be synced up to the server. And you can store — and this is the pure text way of storing — typically browsers will give you up to about 5 MB of storage, locally, to store data on the client. And what that means is that, in HTML4, the alternative was using a cookie. [Laughs]
Jen
Right.
Peter
And a cookie, you get about 4K of storage, very little, and it travels on your HTTP request to the server. So you add that, and the reason why it's so small is that, of course, you can't put 5 MB on top of your HTTP headers and still expect your website to function. So if you want to store some data while you're running, you can put that in this local storage area. Now there's two flavors: the first one is called session storage. That only sticks around as long as you have the page open. When you close the page, it's also removed. But what's really interesting is the local storage. Local storage gives you the power to store things locally, and it persists past a browser restart. A good demo of this was — there's one online we could maybe link to later on — was a sticky note app. When you logged in, it would find the previously created sticky notes. Maybe the storage key-value pair had something built into it that said the location, the x/y coordinates, and then the data of the sticky note. And it would just display your sticky notes from the past. So that was sort of a pretty good example of what you can do with it.
Jen
Yeah. I think this stuff is really quite revolutionary. And I think most people don't even really — you know, most people don't know what a cookie is. And I hear people, even developers, talking about, "Hey, I have this idea, how are we going to do it?" "Well, I guess we can do that with a cookie." And then they struggle, struggle, struggle trying to figure it out, without realizing that there's a bunch of new options now.
Peter
Right.
Jen
Because cookies have been used typically to, what? If you go back to a website, it remembers that you've already logged in, or, little tiny bits of information that's on your computer, so that when you reload the website new, all over again, from the server, your computer remembers a little tiny bit of your relationship to that website. But there's always this question about "Does it really last or not? Does it disappear? Is there a security problem?" It seems like it's always been very, kind of... janky. It's really great for certain things, but when you try to stretch it too far, it just doesn't —
Peter
I think that's the main thing. Used correctly, there's nothing wrong with cookies.
Jen
Yeah.
Peter
Except, actually, one thing: it's a problem known as data leaking. So data leaking occurs when — just to keep in line with our earlier example, Hipmunk, you go on a travel site and you search for, let's say, a ticket for Friday night, leaving New York on Friday night. Then you open, as most people do, another tab, just to compare on the same site, maybe a Saturday flight. And the cookies are scoped at the window level. If one tab sets your cookie to Friday as a preferred day, and the other tab sets it to Saturday, you could actually run into some problems. And in fact, most travel sites spend quite a bit of time working around those issues.
Jen
Because you want to be able to go back to your Friday tab and still have it say Friday.
Peter
Exactly.

Jen
And go back to your Saturday tab and still have it say Saturday. And sometimes when you click back and forth, you're like, "Wait, why did that one turn into the Saturday? I told it to be Friday."
Peter
Yeah. Exactly. And on travel sites, actually, you probably won't find it so much, mainly because —
Jen
They spent a lot of money figuring this out.
Peter
Right. One of the nice things of web storage is you have the two flavors, session and local, and session storage is scoped by tab. And so it's some data that's stored on that one tab. Now if you go to the same site in another tab, it's not shared. So that's one of the benefits.
Jen
Huh. And if you close the tab, it goes away.
Peter
Yeah, that's right. So if you close the window, it goes away. And a clever developer might say, "Well, in that case, why don't I just store that little bit of data in a Javascript variable?" Because at that point, you're doing something very temporary. The big difference is that session storage survives a browser refresh, which would reset all your variables, but not a browser restart. So if you refresh the page on the same origin, it actually survives. So that's actually probably one of the benefits, apart from the data leaking problem solved. I think for most people, though, local storage is really where the interesting things can be done. That's a persistent area that has quite a bit — so think of key-value pairs as just text-based values, right? You could say key — maybe a number, or some ID — and a value, and a value can be just some text.
Jen
Yeah. So it's like "key" is name. To me, it's like keys are like labels, and the pair means the slot that you stuck some information in. It's like "date" and the number for today's date; "name", this is the person's name. Or whatever. Right? We're on the travel site: what day did you say you wanted to leave? What day did you say you wanted to come back? Did you want to be in first class or not in first class? That really quick, simple — it's like a chart of data with two columns.
Peter
Exactly. It's like a spreadsheet, or like just a list of key-value pairs. What's nice about that is you don't have to go from — in your application or your website — you no longer have to contact the server for that data.
Jen
Right. That's what to me is so revolutionary, and that I want to go around and tell all the web designers. Because to me, it makes such a big difference that this website that's come from a server, and it's downloaded from the server, is now interacting on the computer with things that are just on the computer. And there's — you can go offline, and we'll talk later about offline caching, where actually the entire website can be stored on the person's computer, and the Internet connection can go down and you can be — you can, you know, use your iPhone, go to a website, be doing something, hop into a subway tunnel, and you're no longer online, and continue to do the thing that you were doing. Because the whole website can be saved, and all the stuff that you're entering into the website can be saved. And then when you get back online, it can — I mean, I'm overgeneralizing, but just big picture, it's a very different thing than "I'm going to the web and I'm going to browse this brochure. Let me read this page on the brochure, now let me read this page on the brochure, now let me..." I think that over the last ten years we've been pushing towards this application world. We go to the site to do something, or we go to the site to interact with stuff, but we've hit these limitations, where maybe you have these great ideas, but they're really hard to implement, and you need eight people in a startup; you need like a couple million to get started. And now maybe you don't need eight people. Maybe the technology itself is so much easier, that you can do it more easily, and do new stuff in the same budget.

Peter
One of the great things that people have done — people have turned this feature into — put so much effort into it, to do even more things. We're talking about simple key-value pairs. We're talking about Jen with her score, and Peter with his score, and those are easy things to put in key-value pair storage. It persists, and like you said, if you then have the offline feature turned on in your app, you can then use it offline, but it's not offline just to browse the files. It's actually interacting, like running Javascript, running a game, storing things locally, and so on.
Jen
Because the database is loaded — we can talk later on, there's other kinds of databases too — but the storage is local as well the actual files running.
Peter
On your filesystem. It's literally stored on your disk. A couple of things that people have done, sort of beyond what we're talking about key-value pairs, even that are still using the key-value pairs, is one of the things — you can only store text in there. But if you wanted to store images, you can actually base64 encode images. Kind of like —
Jen
[Laughs] So explain that. Explain what that means to the 98% of the people who have no idea.
Peter
[Laughs] Right. So you can turn a regular image file into a —
Jen
String of numbers, or string of characters —
Peter
String of text. Right. String of characters, that represents that —
Jen
Letters and numbers together.
Peter
Exactly. There's some other great things related to that that maybe we can come back to — the data-uri stuff we talked about in the past —
Jen
You can convert an image, which is actually a bunch of data, into a long string of letters and numbers.
Peter
And you can store that.
Jen
And then you can store that, when it's letters and numbers.
Peter
So that's one thing. Another site that has used — the key would be some article number, and the value would actually be a complete news article. So you can very quickly pull the text from storage —
Jen
Like the entire body of a news article?
Peter
Yeah. Exactly.
Jen
So you can store — is there a limit to how long the text can be?
Peter
Yes. So most browsers will only give you 5 MB per what is known as the origin. So think of it as a domain, right? It's the scheme::host:port combination that you're running on. So —
Jen
So 5 MB per domain name. Sorta. Oversimplified.
Peter
So that's quite a bit. Yeah, and it's not consistently implemented, but what's really nice, and a browser that has done just a fantastic job of what happens when you go over that quota — because the biggest question is, all right, so I reached 4.99 MB, and I want to store one more news article, and I may have a function in my browser page that says, store this newspaper item for later. That would be a great way to do that; just write that key-value pair to the local storage. But what happens when you go over quota? Right now most browsers don't handle that very gracefully. There is work, by the way, happening on a quota API. If you think of what a native iPhone or Android application has access to, they can make a call prior to storing something to a quota API, to see how much is there left, and how much am I going to write, and trap that early on. Whereas that isn't really there in HTML5 yet. So lacking that feature, some browsers just error out. But that's just a matter of time. I think Opera has paved the way with just a wonderful way of doing it, because as soon as you go over 5MB — think of it like this. The browsers will give you 5 MB for free. But a malicious site could start, you know —
Jen
Jamming stuff —
Peter

Writing a couple of gigs of data, or just fill up your disk as long as you're on this site. To avoid that, there's a quota. Opera will then prompt you, in a really nice little dialog that says, "Hey, this site wants to store more data." Of course if you're not aware it's not actually storing anything, it might be a surprise. But if you want the full experience, you can allow it to get more storage. And that's really, I think, where all the browsers need to be heading, just to give a nice way to go allow more storage. You're starting to see that, like the Financial Times HTML5 web app, the first time you install that, it asks for a little bit more storage. You need to give these apps a little bit of room to play with, so that they can speed things up.

And the other thing I wanted to mention about local storage — actually, two things. One is, this is one of the very few features that is supported in IE8. So you don't have an excuse anymore for not looking into this. Because many times, of course, that's another whole discussion — "Oh, but IE..."

Jen
Whoa, it doesn't work in this browser...
Peter
Right, and so there's all kinds of polyfills and emulation. But this one, I mean, it's in IE8. That's pretty good. And there's not too many features in HTML5 that are supported that far back. In fact, there's a project that was developed by a few Microsoft engineers, I believe, that's called Silo. And Silo is a very interesting project that uses local storage, effectively to cache fragments of pages. The browser, instead of waiting for a complete page to either be cached or not, it actually caches at a much more granular snippet level, and checks to see if all the snippets are up to date, and only pulls in the snippets that need to be changed, just to optimize page loading.
Jen
Interesting.
Peter

Some people have really gone nuts with it. But even if you just need to store your latest score, or the latest preferences like you mentioned, the first-class or economy — the little things that you quickly want to store and have available as soon as the page loads. Whenever you're not going all the way back to the web server, it's going to be a huge performance gain.

Then to round out the storage discussion, we talked about web storage. Then there's web database storage. That actually also has two flavors, but it's a little different. First of all, it's not nearly as well supported, and there's actually one flavor that's on the way out. So I would probably warn people to focus more on the other flavor. There's two kinds: Web SQL Database, and IndexedDB. Web SQL Database is implemented in Chrome, Safari, and Opera, but not in Firefox and IE, and it's based on SQLite. The interesting [thing] is that SQLite is a lightweight, zero-config database implementation, and Firefox is actually using that internally pretty heavily. But I think they made a really good argument for tying a specification to a specific flavor of database. And so they said, well, we're not going to do that. We're not going to support that. And you see that a lot with other HTML5 features as well, that the browsers can't always agree on everything. So we have video codec support that's all over the map, we have the IndexedDB vs the SQL Database, but SQL Database is a relational database on the client side. So many people think, database, you talk to it over the server.

Jen
Right.
Peter
But this is a database on your system, so you can have transactional database storage in your application. So if, for example, you think of a multipart commit, if you think of a business application that should only store things when certain conditions are met, then local storage, the simple key-value pair storage, may not be the right answer. You need something that you start a transaction, you build it up, and then you either commit the whole thing or roll back. Nothing in between.
Jen
Yeah. So to translate some of this for anybody that's... [both laugh] Originally the web was HTML files. These text files, you know? They're just text files, where you'd FTP to the server, you'd make a bunch of text files, and you'd use this connection to a server to stick those files on the server. And then when somebody went to the website, they'd basically just download those files into their computer. I think one of the things that absolutely revolutionized what the web is, is databases, where instead of having some papers — a bunch of pieces of paper stored on the server — you have a program running on the server, and the program makes web pages, and the program has a bunch of data that's in a database. So you go to Amazon.com, it's got all the books in this giant database, and when you say, "I'm looking for books on HTML5," it goes, "Oh, let me make you a list." It reaches in the database, it grabs all that information, it makes a big list. I remember when I went to Amazon like the first time — and this was of course a really long time ago — but it said, "We think you might like these books," and it displayed books that were for me. And I think it even had my name on the page. I was like, "Robots made this page just now for me! This is not a piece of paper!" Right? I mean, how did they make this? They just made that webpage on the fly for me. Now we expect that behavior. But I don't know that everyone understands that the reason it's possible, or what's going on, is that there's a database, and there's a program running that makes webpages. But yeah, all that's on the server. So you have to be online, you have to do something in the web browser, it connects to the server, the server connects to the database, the database brings information back and gives it to the server, the server makes the web pages, and then they get delivered back through the connection to you. And you're waiting. You're waiting. That's why sometimes things are slow. You're walking down the street and you have a horrible 3G connection and you're really annoyed and you're waiting and you're waiting. This takes that database and moves it from the server to your local computer, to the computer that's in your hand.
Peter
Yeah. It doesn't necessarily —
Jen
It's different, because —
Peter
It doesn't remove it, but it gives you an additional database to write to.
Jen
That's a better way to say it. Because the entire Amazon inventory of books is not going to be on your computer. [Both laugh] But there might be something that you do want to have, that is a database that you can use, and the developers, when it makes sense, can send information to that database, or store information in that database. And those databases are as — I mean, the Web SQL Database was — that was the original plan, that was the original spec — and it's a SQL database. It's a pretty powerful type of database. You can build all sorts of crazy data models, and relational databases, and multi-column, and multiple tables, and those tables connecting to each other, the same way that you would on a server.
Peter
Not only that, it supports what the local storage doesn't support, which is binary data. So you can actually, again, speed things up. Maybe you have a Javascript object that you would have to first create a string from, like we talked about these images. Now you could store those directly in these databases, using, like you said, transactions and so on. Just to add to that, many features in HTML5, you look at the browser support matrix, and you say, there's still a couple of gaps in it, usually IE — although IE9 has added a lot of support, which is great — but they're still missing gaps. One of the downsides of Web SQL Database is the specification has kind of stalled. And a new specification has emerged, which is called IndexedDB. It's not a relational database, it's more like a data store, which supports binary storage, and you could actually conceive of possibly writing the Web SQL Database API on top of that. I haven't seen that in action yet, but I've heard that's possible. So that is sort of the direction where the database storage is going. And the support for that is also not in all the browsers yet. Local storage is everywhere.
Jen
What I know about IndexedDB is that it seems much more complicated than Web DB, and it seems like it's not as done. It's not supported in browsers. And it seems like, because it's complicated, there will be some other tools built in that world that will make it easier to use in the future, and those tools don't really exist yet.
Peter
Yeah, absolutely.
Jen
But that is the future. I mean, it seems pretty clear. That is the future. That will be the thing. And if you want to have cross-browser support on desktop/laptop computers, that's something to look into. The thing that seems sexy about the Web DB is, because it's supported on Webkit mobile, then you can use it.
Peter
It works today.
Jen
I don't know if deprecated is the right word, but it's dead. Like it's not going to be the future. But it totally does work already right now, and it's not going to stop working. They're not going to take it away. And because it works on mobile, you can use it if you're building an iOS application — if you want to build a website that is either an application that's basically HTML5 wrapped up in a wrapper like Phonegap, or you want to make a website that runs and works on people's devices that they can save to their homescreen if they want to.
Peter
Yeah. If that's your primary audience —
Jen
And Android too.
Peter
Yeah, then you can safely use that, and it'll be supported for a while. I think the real future is, like you said — similar to canvas, right? The canvas APIs, they're low-level APIs. If you really want to do something in canvas, you need to be pretty well versed in Javascript, and even then, it would be extremely boring to create a simple animation.
Jen
You'd have to type a bunch of numbers to make your circles go around in circles.
Peter
Yeah, it's kind of like saying, "I'm drawing a line from this coordinate to that one."
Jen
As they would say on other 5by5 podcasts, it's very close to the metal. Not super close to the metal, like, say, Fortran, but it's closer to the metal than Flash.
Peter
It's closer than the metal than you want to be.
Jen
That's the thing. In Flash, one of the nice things you have is higher-level abstraction tools that you can use to very quickly build an animation.
Jen
The GUI, the timeline in Flash.
Peter
Make no mistake about it. Those are going to come. They'll be there in no time. So for the database storage, yeah, the low level is a bit hard —
Jen
Obtuse.
Peter
Yeah, exactly. But I'm sure there will be extra libraries built on top of it. One of the bigger things for the database storage, and especially for local storage, that prevents people from really taking advantage of it right now, is let's say you've used local storage to store some sensitive data. Right now it's on your disk. You've just written it to disk. And let's say you're on jensimmons.com, and I go to your site, and you use local storage, so I have some data, and it's neatly compartmentalized under jensimmons.com. If I go to yahoo.com, I can't access that data from your site.
Jen
Yeah. Yahoo.com can't grab the data that you, as the person who went to jensimmons.com — if you go to Amazon.com, and you buy something, and they local store some of what you're doing, and then you go over to Yahoo.com, Yahoo.com can't get to the data that was stored from Amazon.com.
Peter
That's a central security mechanism that's there on purpose in HTML5. You can think of it like, it's all stored by domain.
Jen
Yeah. They're looking to make sure, oh, these are separate domains, or separate subdomains.
Peter
But at the same time, that data is still sitting on your machine.
Jen
Right. So if your computer got stolen, or somebody hacked into your computer...
Peter
So I talk to a lot of banks, and they are still very scared to use the feature, even though they would love to use the feature. I think one of the things that will come into play there is some sort of encryption libraries, that maybe you can store things in an encrypted format, and then maybe only if you were logged in and authenticated and authorized. A lot of things are there, and as you said, there will be an explosion of additional handy utilities and other stuff that's going to build on top of that, which will make it even easier.
Jen
I assume there's tools for the developer, where you can say "please dump the data now." "Stop storing it."
Peter
Sure, and it's similar to how you can clear your cookies, you can clear browser cache. It's similar in that respect. The browsers will slowly build fine-grained tools for that.
Jen
I know the browser makers are also very aware of these things, and of how people actually use browsers. A great example is that at colleges, people are constantly using shared machines. They can't afford to buy their own computer. They go to the library, they use the computer at the library. They log into Facebook, they don't log out. But in Safari, there's a setting. You can set it so that it's public browsing and logs people out more often, or dumps information. So I'm hoping — I'm assuming that browser makers are figuring out ways to help do that, so that if people don't hit the log out button, or hit the clear cache preference, that there's some kind of tool that smartly...
Peter

Any of those private browsing modes, like incognito modes and all of those, you're not allowed to write to local storage. You're simply locked out, and that's good. That's how it should be. So those bring in some interesting edge cases for detecting features. Many times you can detect a feature, but you also need to know what browser mode you're running in, to see if you can actually use the feature. Modernizr does a pretty good job of that. We can talk about that later.

I wanted to come back a moment to the other part of storage, which is the application cache, commonly known as offline web applications. That's a really, really exciting feature. Actually there are two parts to that. Sort of a side effect that it has for performance, which wasn't the original intent. So before, we talked about web storage, database storage. These are stores for data in your application. But if you wanted to store an entire website or web application — that means the HTML, the corresponding CSS, the images — so that you could literally go to a site while you're in an airplane, and bring up that site, and browse it just as you last left it — that's now possible. Before, of course we had ways to — well, there's browser caching, right? Good luck going on an airplane.

Jen
I think anyone who has built websites for people who are not professional clients — you build sites for people who are not web professionals. I used to do this, where I would build the site, and I'd be putting hte files on the server, and the client would tell me, "Hey, can you change this? I don't like the way this looks." I'd open up the CSS and I'd change it. Then I would send the new CSS up to the server, and I'd be on the phone with them, and I'd be like, "It's changed. What do you think?" and they'd be like, "No, it looks the same." I'm like, "No, it's changed, you gotta refresh your browser." They're like, "No, it's the same." "No, it's changed. Hit Refresh twice in a row without doing anything else in between." Or "Quit your browser and reopen it." The way that a browser caches things from a website is such a mysterious thing, where sometimes you can open up - I know I've done this - you open up seven tabs, and you put all the articles, all the blog posts that you want to read later, in the tabs very carefully, and you close your laptop and you hop on the plane, and you open it up, and you're just hoping that they'll still be there. And they usually are there. But it's like, sometimes you want the pages to cache, and sometimes you don't want the pages to cache, and you're never quite sure how long the cache works. Drupal has a tool in it, where if you launch a Drupal website, it concatenates all the CSS together into one file, and it puts a number on the end, so that when you separate it back out again, you make a change, and you squish it back into one file again, that funny number on the end changes, just to get the user's web browsers to not cache it anymore, because there was a change for it. Again, it's a little bit like cookies in the way that it kind of was this very basic kindergarten attempt at something, that kind of was good and kind of wasn't, and now there's this full-fledged college-graduate-level tool for us to use, to make it cache when we want it to, and not just cache until you accidentally hit Apple+Q, but really — it's there. It's stored on the computer.
Peter
Yeah, it's like proactive caching. Normally you go to a site, and there are ways to get around it, but normally you browse a site — let's say everything was even cached. You click on the index page, and you click on a link, and you go to another page, and that page is then in your browser cache. And like you said, you don't know how long they're going to stay there, but let's assume that they were even there for a long time, and that the caching settings on the server and the client were configured just perfectly. Then you're caching as you go. So the best case you can hope for is that pages you have already visited are still there. But you would never expect that pages you haven't yet visited, that are part of that site, are going to be there, and that all their resources — CSS, images, Javascript — all of that is going to be there. That's what offline web apps — so there's a new cache built into the modern browsers, which is called the application cache. The application cache allows you to prefetch the entire site. So you go to the site, it loads as fast as it can, but then in the background it starts loading all the other files that you've already pointed out, in the manifest file. As a developer you've pulled your whole site together in a manifest text file. It's going to cache all of those. So as soon as it's done, you could go offline, and you could literally go back to that site and view pages you had never visited before, which is just incredible.

Jen
Or you're still online, and you're clicking on all these pages you haven't gone to yet, but they've already preloaded, so it's super fast.
Peter
They're coming from you; they don't require a server fetch.
Jen
Because it doesn't go to the server.
Peter
Under the covers you send all kinds of headers to the server to request a file, and the server responds, and all of that stuff just slows you down. Now it's just coming, like you said, boom, right out of the cache.
Jen
So explain: Let's say I set up a website. Let's say I take jensimmons.com, and I set up the entire thing to cache. When someone goes to my website for the first time ever, does it make it super slow when they first get there?
Peter
No. The main part of the design is that it always favors loading quickly. The loading of that homepage, jensimmons.com, will take precedence. It will first finish that job, so that at that point, your browser is — normally, the browser would be done rendering until you click on something, or do anything else on the page.
Jen
So it downloads what it needs, it renders the page, it shows it to you, and then at that moment, when it wasn't going to do anything, it starts loading everything else.
Peter
Exactly. It never takes away from the initial load at all. It's not like your first load would now be ten times slower because you're downloading 5 MB of other crap.
Jen
That would have been a dumb way to write the spec. [laughs]
Peter
That's actually the best part of it. You're doing all this in the background, you're using all the additional power you have, you can download — browsers have implemented different size limits. Some of them give you access immediately to about 5 MB, others will prompt you, "Hey, do you want this site to store things locally?" and then you can set the total number of megabytes. Again, that's currently a bit different in different browsers, but the idea is the same: you can cache the entire site. One of the nice things as a side effect, as I mentioned, is that subsequent page loads are coming from the cache, and they will be super fast. And then there's a whole mechanism to make it refresh, and all that; we won't get into all the details there. But at the high level, it's very simple: as a developer, you simply write this one — sort of like a map of all the files that are a part of your site. And this is not just the HTML; it's CSS, Javascript, and all of those things. Then you add that to your site, and you point the browser at it in a manifest attribute on your html element. And that's it. Basically you flag your site. You could have this done on your site in less than ten minutes. Now, one resource that I'm really excited about, and they actually just went to version 2.0, and you're probably aware of — HTML5 Boilerplate?
Jen
Yeah.
Peter
We can put a link in the show notes. HTML5 Boilerplate recently came out with version 2, and if any of the listeners haven't used that for — it's probably the best starter kit you can get for an HTML5 project. Everything is in place. But one of the nice things it has now is a build script. Well, the build script is not new. But the build script is a single command you run that will package up your entire site that you've developed, and minimize the Javascript and optimize the images and do everything possible to make your site perform better, but then, last but not least, it will actually build the application cache manifest file for you. [Laughs] I mean, how much better can it get? The one downside of application cache is you need to be pretty careful when you change files. If you have a fairly large website, anything where you have a lot of changes, you need to make sure that all of those changes are listed. Not the changes, but the new files, and renamed files, all of those, you need to constantly keep that manifest up to date. That begs for some automation, and that's now also being done. I was actually thinking of writing a Perl script or an Ant build script or something to do that for our own site, and there, it's already done. [Laughs]
Jen
It would be nice for it to be up on your server, so that it can run periodically on the server and refresh that file. So it's a text file, basically, right? It's a bit like robots.txt, if you know what that file's for, where you're just listing a bunch of files in it. Do you have to — I don't remember — you don't have to list each...?
Peter
Yeah, you do actually. Well, there are multiple sections in this text file. There's the default section. So it starts with the words CACHE MANIFEST, in uppercase, first line. The default extension right now is *.appcache. You create an .appcache file, which is a simple text file. You add, under that first line, CACHE MANIFEST, the common way to do it is write a comment, starting with a pound sign, the version number, even though to the browser that doesn't really mean anything, except that's how you can control the site from reloading, because it needs to find a change in the manifest file for it to re-download all the files. And then you simply list all the different files that are part of your page. So in the caching section, you can't really use wildcards; you can't just say *.html. But a simple directory listing, on the Mac or in a command prompt or some other tool, gets you all of the files, and then you list them all out. There's a couple of other sections that can control fallback resources and resources you only want to pull from the network, but that gets a little too deep.
Jen
So you can't just say, "Go download everything in themes/images," and give it a directory. You have to actually list all the images that are in that directory?
Peter
Correct.
Jen
And so what about when you don't have HTML files, because you're running a CMS?
Peter
You don't need HTML files only. You can have PHP.
Jen
Do you list the URLs where the files will end up living later?
Peter
Yeah. You can list URLs, or you can list the file locations, relative to the manifest file. So you can have index.html, or index.php, or Javascript files, CSS, images — any real filetype, any whole resource. One thing you can't cache is a fragment of a page. Then you're looking more at web storage. Typically people will do this: they'll put #about-us, or something, a bookmark, or a fragment identifier, is ripped out by the browser, and then it stores only the page. Because it can only store whole files. So you can even have dynamic pages, except they'll be cached at that moment in time. So it doesn't make sense for everything.
Jen
What do you think about a site like nytimes.com? You don't want to download the entire website.
Peter
Yeah, but you can actually use multiple manifest files. So you can use, for example, a manifest for the top —
Jen
The homepage —
Peter
Yeah. If you look at the website as a tree, you can put in manifest files at different levels in the tree that only get — it's basically like it kicks in if you navigate down that far. It doesn't work — it's not for everything. It's not for, for example, a highly dynamic site, or sites that absolutely must use a server connection.
Jen
It seems like the original intent was to download an application. A website that happens to be an application. So maybe it's a game. You've got this crazy cool game that lives in one webpage, but it's all Javascript and a bunch of images and some sounds. You just want to be able to let people download that to their computer, both so they can play it when they're not online, and also because it's just way faster then, right?
Peter
Yeah, and that's the thing, right? The combination of web storage and application cache is killer, because now you can run the entire app offline. And that's another thing: there's a whole bunch of new events that you can listen for in pages. I don't know how many. Things like storage events. There's also two events, online and offline. You could actually detect that you're going offline, so instantly you know, okay, I'm offline. At this point, let's go ahead and store things in web storage, because I can't reach the server. As soon as you find that you're online again, you sync up with the server. There's all kinds of clever things you could build on top of this that are really cool.
Jen
And I'm always coming back to the question of how does this change design? How should people rethink what a website even is in the first place, and think about, "Wow, if this is how it works now, how do we want to plan? What do we want to make for our website?"
Peter
We have customers that may be in a very locked-down environment. For example, in a documentation website, one of the biggest problems has always been, you can't assume people have an Internet connection. Well, you don't really need to anymore. Because once they've visited that site, they've got everything they need for future offline use cases. Sure, the one thing to keep in mind is, while you're offline, that site may be continuing to push out changes, so you're not looking at the changes until you sync up again. The changes don't magically appear in your browser. [Both laugh] And there's a mechanism for that.
Jen
Application cache doesn't create unicorns that magically go through the space where there's no Internet connection and brings you back the...
Peter
I think that's in the spec for next year.
Jen

The unicorn protocol. [Both laugh]

Well, we're way over time, because we got really excited talking about this storage stuff.

Peter
There's just so many cool APIs to talk about, and there's so many possibilities.
Jen
You have this book out, that people could go learn more by reading your book. And we have — you brought us a gift for the listeners of the first ever episode of the show.
Peter
A coupon code.
Jen
So people can go to apress.com (Apress is the publisher that put out this book) and they can use a coupon code 5BY5APRESS, and get half off the ebook. And the ebook's just under $32, so that means they can get the ebook for $16.
Peter
That's right. It's a steal. We also have a website for the book that may be easier to remember; it's prohtml5.com. There's links to all of that. That book focuses a little bit more on the programming side of things, but we do have a good introduction to HTML5, and there's many other books that complement it on the design side. It really talks about how to use all the new Javascript APIs that are quite exciting and open up a lot of potential for web development. It's a great time to be in web development today.
Jen
It is. It's crazy. Well, thank you so much for being on the show. And I'm sure we're going to have you back, so we can talk about more super geeky things.
Peter
It's my pleasure. An honor to be asked, especially for the first time. You can call me back anytime. [Laughs]
Jen
Cool! And people, you know there'll be a contact form up at the 5by5.tv website, where you can go to the contact form and put your comments up about the show, ask questions, make requests, talk about whatever.
Peter
Vote it up on iTunes.
Jen
Vote it up on iTunes! Right, I should say that. Vote it up on iTunes so people can find out more about it. And come back! We're going to have a wide variety of different people, designers and developers and other people to talk about all kinds of different things about the web.
Peter
Well, it's really exciting to have you back in podcast land. A lot of us were missing the daily show.
Jen
It was so fun to do the daily edition every day, for 32 shows or whatever, and I missed it like crazy. So I'm really thankful to Dan. Thank you so much, Dan, for inviting me to do another show, and believing in me enough to just say, "Yeah, do whatever crazy thing you want." Hopefully the show will be great and people will like it and it will be useful and helpful. I just feel like there's so much that's changing so quickly, people are really overwhelmed about understanding what it is, and figuring out what it is. So whatever we can do to help each other learn it is great.
Peter
Yeah, that's right. Thanks a lot, and I'll send you some of the links for including in the show notes.
Jen
Yeah. Show notes at 5by5.tv slash whatever the URL for this web show is going to be. Cool. Thank you, Peter.
Peter
Thanks, Jen. Talk to you soon.

Show Notes