Episode 44
The Web Behind with Tom Bruce
November 28, 2012
Tom Bruce joins Eric Meyer and Jen Simmons to talk about the very earliest days of the web, writing the first Windows web browser, inventing 'marquee', and taking a road trip to NCSA with Tim Berners-Lee.
It was very, very hard at the time to conceive of a mass audience for the web.
Transcript
- Jen
- 
This is The Web Ahead, a weekly conversation about changing technologies and the future of the web. I'm your host Jen Simmons, and this is episode 44. I first want to say thank you to today's sponsors, Shutterstock and In Control Conference. We'll talk more about them later in the show. And I want to say hello again to Eric Meyer. Hi Eric.
 
- Eric
- 
Hey Jen.
 
- Jen
- 
Eric's joining us to do another episode in The Web Behind series, a series about the history of the web, where we find out where the web came from.
 
- Eric
- 
Yes, well, we're on what would be for this series, episode six, am I right? This is episode six?
 
- Jen
- 
Ummmmm… I think it's something like that.
 
- Eric
- 
Yeah, starting to get some really good stories and I'm really excited that we're joined today by Thomas R. Bruce, Tom Bruce, who many many people listening to this podcast probably will not have heard of. And I couldn't be more pleased to correct that now. Tom has a very interesting background, which we'll get into, but he founded the Legal Information Institute at Cornell in 1992, and has been the director since 2004. He's written a lot of software, including, and this is one of the main reasons why I want to have him on, Cello, which was the first web browser for Microsoft Windows. He's done a lot of things. He has degree that have absolutely nothing to do with either law or computers, so of course it makes sense that he'd be the director of the Legal Information Institute and have written a web browser. So we're just going to start talking about that. So welcome Tom, thanks for being here.
 
- Tom
- 
Ah, it's great to be with you Eric.
 
- Eric
- 
So, you founded the Legal Information Institute in 1992. Had you encountered the web by then, or did that come along just a little bit later?
 
- Tom
- 
It came along just a little bit later.
 
- Eric
- 
Ok, so what was it that led you to found the Legal Information Institute? How did you get to that point?
 
- Tom
- 
Well, a couple of things. It may help to review a little history. And because I work in a law school, I want to start in 1581. In that year, there was a printer named Richard Tottle in London, who started the practice of using consistent page numbers across different editions of a law book. Fast forward a few hundred years to 1873 and a guy named Frank Shepherd invents the forward citator, which we would now know as something like Siteseer. In 1973, the legal profession got access to all of its primary materials online through a mainframe computer system called Lexis, which has now evolved into Lexis Nexis. And by the late 80s, that's where you really wanted to start, a lot of legal material was becoming available on CD-ROM in hypertext form. The principle reason for that was that cross references and precedence are extremely important to lawyers and law people. So something that you could click and get to that reference, where it said, "See this other case" or, "See section 52 of the tax code," was something that lawyers really, really wanted and that they had started to do on local platforms. I had gotten fascinated with the internet in a kind of 8-year-old-with-a-hammer idiot way. We began thinking, Peter Martin and I, that there was some room there to begin distributing legal materials via the web, but we were thinking at that point primarily in terms of teaching materials. As an early experiment, we began putting things like the copyright law online in Gopher. And when we discovered the web in late '91 or early '92, when Tim had put up the first CERN server, we realized that we had there something that was at least as capable as a local CD-ROM platform but with much, much wider reach, and much, much better suited to the kind of work that we wanted to do in putting law online. And that was the initial impetus. Like everybody else online at that point, let's throw some stuff up there and see what happens. 
- Eric
- 
Yeah, it's interesting the way that law, that legal codes and legal texts, were kind of hypertexted before there was a web. I was working at a firm you've probably actually heard of, Tom, called Banks Baldwin.
 
- Tom
- 
Uh huh.
 
- Eric
- 
In Cleaveland in 1991 and they had the Ohio legal code, in that case in a DOS thing on CD-ROM, but it was hypertext because there was, as you say, it would refer to Ohio Code Section 721 and then you could use your arrow keys to move between these references and hit the spacebar and it would jump to that thing. And that's part of what I was doing at Banks Baldwin, a couple years before I had seen the web. This whole forward and backward referenceing thing just seems kind of natural. So you came across the web in '92, and you had material that you wanted up there.
 
- Tom
- 
Yeah, late '91 or early '92. I think at the time we put stuff up, we were somewhere around the 30th webserver in the world, and the first one actually with a professional orientation that was outside of high-energy physics.
 
- Eric
- 
Wow.
 
- Tom
- 
Because at that point, everybody was a physicist.
 
- Eric
- 
Yeah...that's....woooooo. We're not worthy! [Both laugh] thirtieth webserver in the world.
 
- Tom
- 
You're just not old. There's a difference there.
 
- Eric
- 
[00:05:41] Well, erm...anyway. So you started to put this information you had on webservers, but I gather that there was some problem in actually getting it to the people who you wanted to get it to. Like, all the physicists could find it, but...
 
- Tom
- 
Yeah, well at that point, I want to say the number that we had in our heads was that somewhere around 95% of all the machines that were on lawyer desktops, and that found its way into law schools as well, were running Microsoft windows, and in a way, the web itself was kind of ghettoized at that point, because it was running on Unix and Macs, and except in those places where people were developing web browsers, they really weren't that popular a platform. It's not was the consumer market was using at all. And so we thought, "Well, if the purpose is to distribute law to anybody who actually has a use for it," as we then thought, "we should probably put it on some kind of a machine that they can actually get it on." And honestly, I don't really know that my intention was as much as it was to write a web browser as it was to shame somebody else who was actually competent to do it, into writing a web browser, but that was how we started out.
 
- Eric
- 
Ok, so that's why you started to write a web browser, was so that lawyers who were running Windows could actually see the stuff that you were putting up there.
 
- Tom
- 
Yeah!
 
- Eric
- 
You were both the publisher and the distributor at that point.
 
- Tom
- 
Oh yeah, it was what I think we would now start a content driven strategy, [both laugh].
 
- Eric
- 
[00:07:23] I'm curious, where did the name Cello come from?
 
- Tom
- 
Oh, that's a very long story involving a failed romance.
 
- Eric
- 
Awesome!
 
- Tom
- 
Not suitable for podcasing.
 
- Eric
- 
Oh, alright. Did it have anything to do with Viola, or did Viola pick because you...?
 
- Tom
- 
Well, actually it did. I had seen Viola, and I had actually been quite impressed with what Eolas had done with it, and thought, "Ok, well we can use the play on words here."
 
- Eric
- 
So what did it take to write a web browser at that point?
 
- Tom
- 
Well, more than you might think. One of the problems was that the code libraries that were being used by all of the Unix developers, certainly, and I think also by the people at NCSA who were working on then the only Mac browser that was being written, were just...they would not run under Windows. There was no memory management there, and so in Windows memory space it wouldn't work at all. And while I had an aweful lot of people telling me that I was of the res for using the same libraries, my answer was, "Well, look folks: I can't. They don't run" So basically, it was done from the ground up. It was done in C++ using initially Boralan's libraries and with a later move into the Microsoft Foundation class. It was old guard C++ from the get go.
 
- Eric
- 
Wow and you basically had to implement everything.
 
- Tom
- 
Yeah. Pretty much. I think when we started doing images, I think we used packaged image code, but that was it. I wasn't going to sit there and write a gif renderer.
 
- Eric
- 
Right, so you had to implement http. You had to do document parsing.
 
- Tom
- 
Yeah.
 
- Eric
- 
Was there a DOM tree?
 
- Tom
- 
Oh no. No, no. No, not at all. In fact, that notion was barely thought of at that point, and in fact, ultimately, I suppose you'll probably end up asking me why we dropped development on, and I look at the need to start dealing with the DOM tree, and the need to start dealing with JavaScript as the thing that permanently put one man development of web browsers totally out of reach. At that point, you'd reach the scale of stuff, at least not for me. It was just not possible to do with a a single developer anymore. It was just too much
 
- Eric
- 
When did that happen?
 
- Tom
- 
Oh, I'd want to say early '94 maybe. So the whole thing had a run of about 18 months.
 
- Eric
- 
So just imagine if you'd had stuck it out till CSS came along.
 
- Tom
- 
Well yeah, I'd have been much happier. I mean, people had all sorts of crazy ideas at that point. And I have to say that it may be that Tim's genius was not so much in inventing the web as it was keeping web standards light enough that people could develop to. There was a crowd running around that wanted to require that everybody do a full SGML rendering in browsers, and I sort of sat there and started thinking, "Well that adds five years to the development process." And there was a crowd that wanted to use a system that I'm pretty sure you're probably familiar with, Eric, called Thistle, which was a rendering thing for SGML as the basis for stylesheets.
You look at these things and it was worse than that class you regretted taking in common lisp. All kinds of ideas for doing these sort of technically perfect things that just could not be put into software in less than geologic time. Fortunately, we resisted that. Otherwise, we would never have had the popularity that the thing ultimately attained. 
 
- Eric
- 
[00:11:30] So at the time, '93, '94, when you're writing your own Windows web browser, which as you say today the concept of one person writing a web browser would be an astonishing thing today. But at the time, it could be done. Eolas did it with Violia. Tim Berners-Lee obviously did it with the first web browser, although he had the advantage of whatever he came up with, that was the way that things worked, and you did it with Cello. What when into the rendering engine, such as it was. I mean, there was no CSS, there wasn't JavaScript that you had to worry about. You bascially had to parse the document and figure out how to display it. What were some of the things that you did to make that happen?
 
- Tom
- 
Well, it was really all about mapping, in a funny way, we almost ended up thinking of the tags as being presentational in nature, right? Which of course they weren't, but you had to think that way if you were going to render something. And it was mostly a matter of mapping the very strange Windows font engine to something configurable that could be made to correspond to the then very limited tag vocabulary. Recall, there was no need to parse stylesheets or anything of that sort. Those styles did not exist. It was all rendering h1, h2, h3, h4, h5. If you could build a map that essentially allowed the user to map those fonts to something in the Windows font set, you were pretty much done as far as presentation was concerned. The rest was just printing it to the screen as you printed to anything else, really. We didn't have any kind of complicated layout. Everything else was really just indentation. It's hard to think now exactly how primitive that was, but it really was kind of a teletype font in some ways.
 
- Eric
- 
That's an interesting way to put it.
 
- Tom
- 
The biggest sticking point was of course Microsoft's font engine in those early versions of Windows, which I used to liken to one of those pacific island cargo cults. You would pray and hope that something good washed ashore.
 
- Eric
- 
Well, in the end we can always blame everything on Microsoft, I guess.
 
- Tom
- 
It's true!
 
- Jen
- 
What version of windows was this? Which era?
 
- Tom
- 
I think 3.1 was advanced stuff, Jen.
 
- Eric
- 
Woooo!
 
- Tom
- 
There was in fact, at the time we did that, there was no Windows TCP/IP stack.
 
- Eric
- 
Ok, so did you have to write your own then?
 
- Tom
- 
No, we actually licensed one. There was a company that came and very rapidly went, called Distinct Corporation. I think they were in Baltimore that had done a TCP/IP stack for Windows. I have no idea why. We arranged to license it from them on a very favorable basis because they saw all of this opening up a lot of possibilities for their business, and we used that for about nine months. And during that time, that was the time that winsoc was being developed. And I recall having these crazy conversations at very strange hours with Peter Tatum, who was in Australia who wrote the original winsoc stack. Because I was one of the apps he was testing with. Our clocks would intersect as these bizarre hours of the day and night because of the time difference. And we'd go back and fourth about the minutiae of the spec and whether or not the API was working and that sort of stuff.
 
- Eric
- 
So you had to create everything, not just the rendering engine, but also the browser chrome. How much of the UI of cello did you base on what little already existed, and how much of it did you come up with on your own?
 
- Tom
- 
A little bit of each. An awful lot of it, it was meant to look like a Windows application, and it did. There were some choices that we made based on what we had seen in other hypertext engines that were running locally, most particularly something I'm sure nobody remembers called OwlGuide, which was a very very early Windows hypertext engine that was used for local publishing, CD-ROM publishing. For example, in the OWLGuide world, hyperlinks were show off by lightly dotted boxes around the link. And originally, that was how Cello displayed them. We later changed it to using underlines under links just like everybody does now. So an awful lot of that stuff we were begging, borrowing, and stealing from wherever we could, but on the other hand there were things that no one had really thought of at all. To give you some idea of how small the community was at that point: for legal text, I very badly needed the funny looking backwards 'P' that's P with two stems, paragraph symbol, and the funny looking twisted 'S' section symbol. And I would just shoot an email to Dave Ragit and say, 'You know, Dave, I need these funny symbols,' and the next thing I knew, they would be in the character entity list.
 
- Eric
- 
Wow. Um, ok.
 
- Tom
- 
It was that small.
 
- Eric
- 
Well, yeah, I mean, there weren't that many of you. When you switched to underlining, was that because of Mosaic, or...?
 
- Tom
- 
Yeah, they pretty much put that forward. And there was a lot of...The dotted box thing, didn't really work well at high link density, you just couldn't read the display anymore.
 
- Eric
- 
Mmm.
 
- Tom
- 
And even I had to acknowledge that this looked like a document that had broken out with some sort of skin rash, and that the underlining was much, much cleaner.
 
- Eric
- 
Yeah, it's interesting because now there's been some, not a ton, but there's been some criticism of underlining of words because it destroys the descenders basically when you visually scan over, it makes it harder to read, but we forget how much harder the alternatives were and how much harder they were to read.
 
- Tom
- 
Well yeah. Your choices are limited. I'm not sure anybody's eyes really track changes in hovering mice very much, especially when they're trying to scan text.
 
- Eric
- 
There aren't a ton of screenshots of Cello, unfortunately, these days, but there are a few and when you look at the the UI, it's kind of the same. There's forward button and then a back button and a home button, and there's a place to enter the URL or the URI depending on which side of that terminology debate you fall on. That sort of stuff, how much was borrowed?
 
- Tom
- 
Oh, well that was all being developed simultaneously. The concept of the home button and the home page was pretty much endemic to everything that Tim had done from the beginning and everybody had that. I remember a certain amount of raging debate on WWW talk about exactly how the back button ought to behave, because people were even then encountering the same sort of problems with, 'Well, is that doing what's expected?" that I think every ajax developer deals with now. I remember some discussions about that. We were still working out what the proper treatment of whitespace was under certain circumstances, just a ton of stuff like that. Behind it all was this real question about, at least in my mind, about what your responsibilities were as a browser developer. Because at that point, it was still possible to point fingers back and forth. Some people who were developing software were very, very tolerant of problems in HTML, and I tended not to be. My feeling was, 'Well, if the document's broken, then why would I shoot novacain into that injured knee by fixing it on behalf of the guy who has a broken document?' And there were some of the same sort of negotiations going back and forth about 'What do you do with errors that are really emanating from the server? Do you try to repair them? Do you try to fail gracefully? What do you do?" None of that stuff had been worked out, I'm not sure it has been yet, frankly.
 
- Eric
- 
[00:20:12] Yeah, well, that's a whole other debate. So, the guys at the NCSA and you and Payway[?] and Tim were all hashing this out as a group basically?
 
- Tom
- 
Yeah, there was a mailing list, and an awful lot of stuff was flying around. We had all met via email very few of us had met in person before. In the late summer of '93, O'Reilly sponsored a get together for all the web developers in the world, and I think they were all there. 35 of them [Laughs]. At O'Reilly's office in Cambridge, and it was the first time I'd met most of these guys face to face, so it was me and the NCSA guys, and the Slack people, and the guys from O'Reilly, the University of Kansas team that had done Links, I remember there were some odd bods from here and there who were very active in web development at the time. But it was a very, very, very small group. If you had a mailing list for all the developers in the world, you had 50 people.
 
- Eric
- 
And when you say web developers, you don't just mean browser developers, you mean...
 
- Tom
- 
I mean server developers and everybody else. I remember jumping in a car with Tim at one point, I was in Chicago for something else altogether. He was in Chicago to visit people at Fermilab, and we jumped in a car and went down to NCSA, and that afternoon basically hashed out the beginnings of what would turn into CGI.
 
- Eric
- 
Oh, ok. These things that....So I have to ask--
 
- Tom
- 
Andreesen was sitting there saying, 'Well, you know we need some way to hook up the database' [Laughs]
 
- Eric
- 
Uh huh.
 
- Jen
- 
[00:22:11] Let's talk about our first sponsor today, the In Control Conference 2013 in Orlando. This conference is brought to you by the folks from Environments for Humans, who've done all the summits that you've heard about all year: the Accessibility Summit, Responsive Web Design Summit, CSS Summit, all those summits are online. This is an in-person conference that they're having in Orlando in February, coming up soon, February, and it's a three-day conference. The first day is full-day workshops. You could do full day CSS workshop, or responsive web design workshop, or mobile design, thinking mobile workshop. And then that's followed by two days of sessions, although their sessions are a bit different than a lot of conferences, rather than having six or seven hour long sessions, there's an opening keynote each morning, and then after that there's four sessions. Each of those are and hour and forty five minutes long, they're calling them mini-workshops. So there'll be a session on accessibility for an hour and forty five minutes with Glenda Simms getting in depth into accessibility and the things that you might might want to know about accessibility. An hour and forty five minutes on jQuery, a session on mobile design with Brad Frost. All really great speakers, go check them out 2013.incontrolconference.com, and you can see all about what's happening there in February, and you can also use the coupon code ORLANDOAHEAD for $100 off either the workshops, the conference, or both. So thanks so much to Environments for Humans for supporting the show.
 
- Eric
- 
[00:24:08] So I have to ask, how was Sir Tim as a road trip companion?
 
- Tom
- 
Oh, he's great.
 
- Eric
- 
Really?
 
- Tom
- 
Yeah! We got horribly lost. Actually, he wasn't driving, it was a friend of his from Fermilab. We got horribly lost [laughs]. Eventually we got to NCSA, and eventually we got back.
 
- Eric
- 
Ok. So he's not one of these guys who sings the same song 20 times in a row?
 
- Tom
- 
No, no, no, no, no. No, not at all. We had a lot of fun, actually.
 
- Eric
- 
Did you introduce him to American custom of slug bug? I just have to ask.
 
- Tom
- 
I don't think I know it myself, Eric.
 
- Eric
- 
Oh. You didn't have a thing growing up--?
 
- Tom
- 
I did introduce him to the American habit of rolling down your Windows while going across the loop in Chicago and screaming "out of state" every time you make a turn on the wrong way onto a one way street.
 
- Eric
- 
That's excellent. Ok, so right. You and Tim got in a car, went down, found Mark Andreesen eventually, and figured out the beginnings of CGI, so that's where we got CGI-bin from.
 
- Tom
- 
Well, yeah. The NCSA guys were well on their way to doing it. They were working on some other project at the time, his name escapes me at this point[?], but it was basically a data visualizer that they were doing for mathematical and physics experiments, and that kind of thing. They were doing it under some NSF grant. It had some of the same needs for that stuff, so they had gotten part of the way there, but there was this talk about, 'Well, how do we do it through the server? how do we communicate with it through the server?" And at that point the whole CGI notion started up. And it wasn't like we walked in there at one o'clock in the afternoon and by two thirty everybody had solved this problem. It was the beginning of something that probably came to fruition between three and six months later.
 
- Eric
- 
Ok.
 
- Tom
- 
None of this stuff happened instantly. I don't recall anybody being struck by lightning and going to the keyboard and starting furiously typing.
 
- Eric
- 
Yeah, well, I guess all the stories can't be awesome, but....
 
- Tom
- 
No.
 
- Eric
- 
So CGI for those who may be listening who have never come across CGI except in the context of Jurassic Park and that sort of thing for "Computer Generated Imagery" was actually the Common Gateway Interface, which became an RFC eventually. CGI 1.1 became and RFC. Previous to that, your webserver couldn't talk to databases.
 
- Tom
- 
No, not really at all. Not in any way that you could present stuff dynamically. You were just looking at HTML pages and that was really it. I'm sure that various people had come up with clever ways of dynamically generating HTML pages, but we hadn't seen many of them. I'm a little unclear, there was a project running at the time at Xerox PARC that was a mapping project that I suspect made very intensive use of database rendering, but I'm not really sure how he was doing it.
 
- Eric
- 
Yeah, no...it's....
 
- Tom
- 
I guess the main point, Eric, is that at that point, all the stuff that we now take as being fundamental, was at a state where the need was apparent, but none of the implementation had been done. So a lot of stuff ended up happening very quickly because it wasn't like you had to sit there thinking, "Well, what should I do next?" There was always a list of problems that needed to be solved. Because everybody could see that they were there and they knew what they wanted to do with this thing.
 
- Eric
- 
Right. So, with the three of you in the room talking about CGI. Was it basically, someone would come up with an idea and then...?
 
- Tom
- 
I think there were about nine of us there, and it was more like, "Well, what about this?" So I think that's a good way of putting it. I don't recall there being a lot of religious warfare in those days, so it wasn't like these things turned into fist pounding sessions. People were willing to give a fair hearing to anything that anybody was willing to implement, and it really was very much was in that environment that everybody romanticizes now of rough consensus and writing code, I think is the stock phrase.
 
- Eric
- 
Yeah.
 
- Tom
- 
It very much operated that way. If you wanted to put the work into it, god bless you.
 
- Eric
- 
Right. Jeff Eaton has done a talk where one of his examples is how the imgelement came to be. Everyone knew that you wanted images embedded in pages. It hadn't been done, and the NCSA guys said, "We're gonna doimg," so that's why we have the image tag today.
 
- Tom
- 
Oh yeah. And the  blinktag. [Laugh]
 
- Eric
- 
[00:29:09] And the  marqueetag. I heard a rumor that you're to blame for that one.
 
- Tom
- 
Yeah, I actually am. I don't know if this is an appropriate story for a family podcast, but sometime in early '94, I was contacted by some guys at, well, actually, I have to go back a step. When I started working on the second version of Cello, which never came to light except as a sub-licensed part of other people's products, where it had a lifespan that actually went on for a number of years, we knew that we needed better image rendering, and my answer to that was basically to buy it off the shelf. There was an image rendering package that was a library for C++ and it wasn't expensive to license, and we thought, "Ok, that'll save a bunch of work." So went out and licensed that, and one of the things that it would do, if appropriately provoked, was cause whatever was in your window title and windows to rotate around in this sort of marquee-like ticker tape-like fashion. And I put it in there as a kind of joke for the guys who were alpha testing for me, to see if anybody would notice. So if you use the tag marqueein anything that this version of Cello read, it would cause the title to spin around like a marquee.
 
- Eric
- 
Ok.
 
- Tom
- 
I got a call from this guy at Microsoft who was looking around for some browser code. They didn't have one at that point. And would I sign a bunch of NDAs and send in our stuff. And I said, "Oh, sure." And off it went. And it came back and ultimately they did a deal with somebody else. But curiously enough, IE is the only place where the marqueetag has ever worked. I believe it works there to this day.
 
- Eric
- 
Hmmm. Yeah. Ok. So you burying an easter egg in Cello lead to the marqueetag?
 
- Tom
- 
Yes.
 
- Eric
- 
Well, I hope you've learned your lesson, sir.
 
- Tom
- 
I have. If you want to slap me, you can.
 
- Eric
- 
Well, I don't think TCP/IP has come quite that far.
 
- Tom
- 
Ah, ok.
 
- Eric
- 
Maybe the next time I'm in your town.
 
- Tom
- 
Well, I'm gaily awaiting for the boxing glove to come out of the screen.
 
- Eric
- 
Boink.
 
- Tom
- 
Exactly.
 
- Eric
- 
[00:31:30] What were some other things that you added that persisted far past your expectations that they ever would have or that they died an early death and you're sad about.
 
- Tom
- 
We had nothing that died an early death, including the browser itself, I might add, or anything that I'm very sad about. I don't know, I remember doing a lot of initial work on how mailto:was going to work. I was the one who thought we needed it, and I did an initial implementation of it that got...one word for it would be 'refined'. Another word for it would be 'fixed'. That ended up where it is now. I remember an awful lot of back and forth about white space that I had a lot to say about, but I don't think any opinion I had on the subject ever particularly prevailed. But this was all cranking along in multiple locations at once, and in those days, there wasn't that much to disagree about. There were a few conventions that we put in that I think have started to come back. I was very, very eager, even in the first version of Cello, to have not only direct access to some kind of search engine off of the browser bar, but to have user control over what search engine that would be. I was putting a fair amount of development time into solving what I saw as problems that people would have if they were looking for information from multiple sources. I really thought of this thing as a researchers' tool, in many ways. Researcher not in the sense of chemicals and test tubes, but researcher in the sense of people who look stuff up in books. So we supported z39.50 directly for some stuff, and in one version we had multiple search engine selection in the way that you now do in some browsers. And I always thought that it was sad that a lot of the rest of the development community didn't pick up on that. What was fascinating was, that at the point where Netscape began introducing heavily visual elements into its HTML, some of them likeblink, but some I thought at the time grossly inappropriate for rendering logical markup. Pretty much everybody veered off to the, "let's make pretty pictures place," because that's what many people who were either doing graphic design or eCommerce wanted. From there, it always seemed to me it took about ten years to get back to the idea that people wanted to find stuff to read.
 
- Eric
- 
What kind of fascinates me at the moment is that you're talking about in 1994 or so...
 
- Tom
- 
Yeah, there abouts.
 
- Eric
- 
Giving people the ability to choose multiple search engines. I was around then, but I don't really remember multiple search engines.
 
- Tom
- 
Well, by that time what did we have? We had Alta Vista, we had...I can't even remember who the players were. There were three or four general search engines. But there were also specialized collections that were really library offerings based on the kind of stuff that Brewster Kale[?] was doing then with z39.50. So the library access protocols that people were actually quite interested in. Mostly academics.
 
- Eric
- 
Actually, can you explain z39.50, because I don't thi--
 
- Tom
- 
No, I probably can't! [Laughs]
 
- Eric
- 
Oh! Can you explain what it was for?
 
- Tom
- 
It was basically meant for bibliographic records retrieval between large online public access catalogs. So you could go into, oh I don't know, I'll make something up, this probably isn't accurate, but you could go into something like the Hollis system at Harvard and pull out a bunch of bibliographic records. MIT was running it. There were numbers of others, and it was used, and may still be for all I know, as a kind of back end protocol for exchange among basically library catalog machines.
 
- Eric
- 
Mmmkay.
 
- Tom
- 
It was a generalized information retrieval protocol.
 
- Eric
- 
Yeah, and of course library catalog machines at the time were also very new.
 
- Tom
- 
Oh, exactly. The little bit i knew about it, has unfortunately faded from my memory. Brewster is probably out there listening somewhere and wishing he hadn't listened to this before dinner, but it was an interesting thing to try to deal with. It was not an easy protocol to implement as I recall.
 
- Eric
- 
[00:36:17] So it wasn't just general search engines that you were thinking to let someone say, "If I'm searching, I always want to be searching the Library of Congress catalog."
 
- Tom
- 
Yeah, exactly.
 
- Eric
- 
Ok. Well that makes some sense.
 
- Tom
- 
I think we were a little more attuned than most people were at that point that there really were people who professional searchers.
 
- Eric
- 
Mmmmm.
 
- Tom
- 
So in that sense we were doing something....This is a very very loose analogy, to say the least, it sort of comes from another universe, but we were doing the same kind of thing in a way that...we had the same set of concerns that the people who were doing something like Zotero[?], would have now. We saw ourselves as being tied into a community of professional academic researchers, or legal researchers, that nobody thinks in terms of anymore now that search engines have broadly generalized.
 
- Eric
- 
Right
 
- Jen
- 
It's always--
 
- Tom
- 
I think one of the things that's getting lost a bit in our discussion here is that it was very very hard at the time to conceive of a mass audience for the web.
 
- Eric
- 
Interesting.
 
- Tom
- 
At least it was for me. I think it was always in Tim's head, it was always in the head of...oh, I'm thinking of Dale Dourghty from O'Reilly, who really did GNN initially. There were people who got that. I'm not sure I did. I remember actually, not long after I first met you, because I think I met you at some crazy thing at Case Western in '95 or '96, and I remember going to an American Association of Law Libraries meeting in Pittsburg right from there, and seeing my first URL going by on the side of a bus advertising a bus and thinking ,"Huh! [laughs] I never thought that would happen."
 
- Eric
- 
Yeah, I think the first URL I might have ever seen was www.honda.com.
 
- Tom
- 
Yeah.
 
- Eric
- 
Right across the bottom of a Honda commercial on TV. Back when it was http://www.honda.com/ because if you didn't have the slash, the server might not have responded.
 
- Tom
- 
Right, it would break. Yeah.
 
- Eric
- 
Stuff like that. Sort of that reaction of, "Wait, the muggles can see that now, too?"
 
- Tom
- 
Yeah, pretty much. Yeah, that was it. I remember the one that got my attention, we were way more concerned with the stuff we were publishing than we were with the browser by the time '95 had rolled around. By then, we had a full version of the United States code up and all that. We'd been doing Supreme Court decisions for a while. I came in one morning very early, it was about four or five in the morning to switch out a hard disk, when I thought i could take the server down without complaint. And it got to be about 6:30, I was having trouble with this. The phone rang, and it was the Washington Bureau of the New York Times saying, "Excuse me, when are the Supreme Court decisions going to be back?" [Laughs].
 
- Eric
- 
Wow.
 
- Tom
- 
And I thought, "Well..." and these were the day when we were sitting there all agog at having 200 hits a day or something, and here's the times calling up to say, "When are you guys going to be back online?" At that point I hadn't yet developed my well-honed response that was, "Well when are you sending a donation?" but...
 
- Eric
- 
Nice. Ok. Yeah, it's interesting. Do you think you'd have approached what you were doing differently if you had thought that there were going to be a billion people on the web within a decade?
 
- Tom
- 
Not really, no. I mean, I don't see really how we could have. That's really much more of a content question.
 
- Eric
- 
Interesting.
 
- Tom
- 
And there, I think we just got lucky, frankly. Because in a funny way we did anticipate that. We saw it happening and we managed to keep up with it. I'm not sure it would have made one iota of a difference to how we approached the technology although again, I don't know. We did have the initial insight that the then mass market computing platform should have a browser.
 
- Eric
- 
Right.
 
- Tom
- 
Which really nobody was thinking about at all.
 
- Eric
- 
Jen, did you have a question?
 
- Jen
- 
[00:40:58] Oh, I was just going to comment because it always fascinates me each time to listen again about how the web was started by people who had lots of content and long form content, like law cases and physics papers and documents that are at the Library of Congress. And somewhere along the way it feels like the web became this brochure rack where nothing should be longer than 500 words and no one wants to read anything more than two paragraphs. And if you need more space than that then you need to put a pager at the bottom and click to go to the next page because people hate scrolling...but meanwhile go ahead and clutter up the whole page with lots of ads. It feels like we're just barely getting back in 2012 to the idea that like, "Let's get rid of all the clutter. And lets give people a reading experience where you can sit down and read something long and meaty and understand on a deeper level some kind of concept or something.
 
- Tom
- 
Oh yeah, the most used application on my iPad for a long time now has been Readability[] for exactly that reason. It sort of reminds me of some things that used to happen. I did a certain amount of work with corporate communication stuff at one point and it sort of reminds me of some of the fads that were coming and going with things like documentary video. For years and years and years it was considered the kiss of death to use static talking heads on screen. Or static images. And then Ken Burns comes along and does Eyes on the Prize and then everything else he did. And all of the sudden, all of that convention was out the window. Because you've got a guy who's creative enough to break all of those alleged rules and have it work. And I sort of think the Jacob Nielsen tyranny of 'everything above the fold' is breaking down because people are realizing, "That's silly, you can't do that if you want to read the United State's Constitution." It's all very well to say, "We have to take the text and somehow jam it into the usability model," but that's not a very realistic usability.
 
- Eric
- 
[00:43:21] Yeah, it's interesting. Whitespace seems to come up again and again. Even from the beginning you said there was a lot of argumentation about, or maybe discussion, maybe it wasn't argument, about what the white space should be. Do you remember any details about what the positions were, or just an example of what was being discussed in terms of what the whitespace should be in a certain place?
 
- Tom
- 
Well, I think it was the usual tension. It was the tension between the people who believed, I think for the very valid document parsing reasons, all whitespace should be collapsed to a single space. Or alternately, that things should be set off in pretags that were in effect designed to preserve the original intention of whoever had done it. I don't know that anyone's putting that much ASCII art on the web, but you see my point.
 
- Eric
- 
Mhmm.
 
- Tom
- 
And that was always it. It was always, "Well, when we render these things..." this must be something that you're intimately familiar with, Eric, right, there's always this question hovering around, "Well, as we introduce these stylistic controls," and whitespace is the most fundamental one, "to what extent do we need to respect the wishes of the original document publisher?"
 
- Eric
- 
So the question there is really, when we're rendering the content, how much of the visual whitespace should be based on the source whitespace?
 
- Tom
- 
Yeah, exactly.
 
- Eric
- 
Ok. So it wasn't just visual, the discussions weren't necessarily how much space should there be before and after an h1, though I'm sure there was at least some of that, but more, "Hey, if there are six returns between the last paragraph before this heading, does that mean we should have more visual whitespace when we lay out the document?
 
- Tom
- 
Yeah, exactly. You have to remember that there were and awful lot more naïve documents out there in those days. I would like to think, this is probably grossly untrue, I would like to think that we're seeing very few tables out there anymore that have been laid out with a spacebar.
 
- Eric
- 
Ok.
 
- Tom
- 
I would like to think that. I sleep better at night believing that. But I have a feeling I'm wrong about that, but you see my point. Nobody really knew at that point, people were just throwing stuff up there. And the other thing you have to remember was that there was this general desire on the part of hypertext guys to demonstrate that HTML was not hard. Because prior systems, I'm particularly thinking of Gopher, were just kinda dump it in a directory and go, whereas with HTML you had to make links and you had to put tags in, and all this stuff. There was a certain community that saw that as being very, very burdensome.
 
- Eric
- 
Mmmm, as opposed to the other community, like the SGML community, who would have seen that as grossly inadequate.
 
- Tom
- 
Yeah, exactly. There's a certain amount of tension there between a bunch of guys who just have a bunch of documents to put up, and a bunch of guys who are prepared to be purists about how it's done. Frankly without much regard for the expense of people who have a lot of retro conversion to do. We see a bunch of that around document authentication issue that exists around legal documents. Some people are cost sensitive and some people aren't. The people who are cost sensitive tend to be the people who have huge amounts of retrofitted content that they have to put up there.
 
- Eric
- 
So, by retrofitted content, you mean what?
 
- Tom
- 
Legacy stuff of whatever kind. You really can't put up some of the decisions the United States Supreme Court. You really can't put up some of the statues of large, I mean we have, but you really shouldn't. There are places where historical completeness is important, and there are people who for that reason have to do very, very, very large document archives. And they're expensive to convert.
 
- Eric
- 
Back in mid '90s when you were transitioning away from this whole thing of, 'Hey, lets write a web browser,' and you're putting up Supreme Count decisions and stuff like that, to the point that the New York Times is calling you and asking you when it was going to be back, what process did you go through to put those up? Because Supreme Court decisions are not necessarily short.
 
- Tom
- 
No. Interestingly enough, in your happy home town there, there was a guy at Cleveland Free Net who had been putting up Supreme Court decisions for some time, and I think the place he grabbed onto that process had to do with the fact that the Court was then using an atex layout system to do all of their printed stuff. Atex was a very very popular formatting engine for things like newspapers. And a lot of people in print publishing were using it. You might know atex as what gave rise to XyWrite[?], which was a popular word processing program in the mid '90s. Which was largely based on atex. Anyway, this guy Cleveland Free Net had figured out a way to do XyWrite to, I think he was just putting them up in ASCII, he was stripping stuff out in an ASCII conversion, and we ultimately figured out how to do a not-so-good atex to HTML conversion, did that for a long time. We also wrote converters for Word Perfect and all manner of stuff. But our basic criterion was that we get them in some electronic form, we never did scanning here to any great degree, or if we did we'd send it out. It was mostly a question of converting legacy electronic formats. And in some ways it's still a problem that hasn't been solved. Because it's just hard to do.
 
- Eric
- 
Well, yeah.
 
- Tom
- 
As you know yourself from cut and paste operations, the basic problem with non-structured word processors is that they just have too many degrees of freedom and it makes conversion very, very hard to write.
 
- Eric
- 
Yeah, and from my own experiences in case with converting legacy content, and in some cases not legacy content. We had a project briefly with The Plain Dealer, the local daily newspaper, to every Friday they would dump, I think it was an .rtf file of, "Here's the entertainment news for the weekend," and we would slice it up and it would turn into a site, and that ran for a while, I don't know what happened with that exactly, but yeah, with these things, in a lot of cases you would have to write a new converter every time. It's interesting that you had an electronic source for at least current decisions and current law, it was going backwards that was really hard.
 
- Tom
- 
It was, although to some extent again we were able to get that off of CD-ROM, and in fact to this day, the US code that we have up there, which is 200,000 pages, something like that, updated as it's updated by Congress, is done from what amounts to typesetting code. It looks very much like old line printer escape code that's a system called Microcomp that the government printing office has used, well essentially since the mid 1980s. And we've gotten quite adept at converting that, and have done so for many years. They're going to start originating in XML soon, but they have not yet.
 
- Eric
- 
Do you have a timeframe on 'soon'? I'm just curious.
 
- Tom
- 
Uhhh, they have a contract out right now that's a little open ended, but I'd expect to see it within the year.
 
- Eric
- 
Oh wow! So legal changes will actually start coming out in XML?
 
- Tom
- 
Oh yeah, a lot of stuff has been for a long time. State legislatures have been doing it for a long time. They are publishing XML versions, and have been for a while, but the official source stuff has been Microcomp for a while. But that's changing. There's been internal XML at the House of Representatives for a while, can't say about the senate, I don't know as much about that.
 
- Jen
- 
[00:51:42] So our second sponsor today is Shutterstock, where you'll find over 20 million stock photos, vectors, illustrations, and video clips. You go to shutterstock.com and you start searching around using a whole bunch of complex tools for finding the kind of artwork that you need for your project. They've got lightboxes where you can grab the ones you think, "maybe this one, maybe this one, oh this one might be good." Grab those up, shove them into a lightbox, and carry them around, then you can share that lightbox with other people, people on your team perhaps, your clients, people you work with who are going to help make decisions, pick out the ones you want, and then when you buy them, at some places when you buy art like this, you're like, "Well, I think we could use the 500px wide image is going to be just fine on our website because of the design we have with the sidebars. 500px, that'll be fine." And the you redesign and you need a bigger image, or you realize, "Ahhh, Retina displays, yeah, 500's not enough and we're doing a responsive design site, and really we need art that's 2000px wide." On some services, you have to go back and buy it again, buy it a second time. Not on shutterstock.com. On shutterstock.com, whatever price you pay the first time to get it, you get all the sizes you need. You can just go back over, log in, and download the high resolution image. That's how they roll. They want to just have one price no matter what kind of resolution you need. You can get one image, or a piece of media, or you can buy a package of images, or you can get a subscription and pay per month and download whatever amount you need on your subscription. And you can try them out with a free account and browse and try out their lightboxes, see what kind of quality of images they have. They have a whole global, they're working with people around the globe to get images and media from all sorts of people all over the place. Go check them out. And then when you want to buy something, if you see something that you want to purchase, you can use the offer code WEBAHEAD11 and get 30% off. A lot of percent off. 30% off. WEBAHEAD11. Check them out. Thanks so much to Shutterstock.
 
- Eric
- 
[00:54:20] As you've said, a lot of the challenges that you faced then, you still face today, how do you feel about how things have changed, or how they're similar, whether in the browser space or just in general?
 
- Tom
- 
Well, you know I mentioned a few minutes ago that there was an initial perception that hypertext is hard. And I think we're starting to see some of the same stuff with semantic web concepts and linked date. One of the more clever things that happened was the shift of work in the semantic web from‒excuse me my telephone is ringing here and I think I'm just going to continue to let it do so because it will go three rings and stop. There was this initial thrust of semantic web toward AI reasoners and other kinds of highly sophisticated agent technology, and all of that became the darling of the AI crowd to some extent. And one of the better things that happened was that around 2005 we shifted to the notion of linked data and this idea that it really should be easier to mash up data than it is. That was a tremendous step, but I think that working in that world, working in the world of stuff like schema.org is still seen as being enormously burdensome. And frankly to some degree it is, because it brings with it all of the issues that people have always had with policy and for lack of a better word, craftsmanship around issues of metadata. It's a new level of data of thinking about how we do stuff. I think it's a singularly important level of thinking about how we do stuff, but it feels to me as if the world of linked data is where the web was circa 1993, 1994: we're sort of just short of critical mass to get some really good applications going. There's been a lot of good thinking on it, there's also been some stuff that's remarkable weak because it was done as initial experiments that are now being tested in ways that they can't particularly withstand. I'm thinking there particularly of, I don't know if I'm wandering off too far into the semantic web deep weeds here, but I'm thinking particularly of something like FoF, the friend of a friend thing, which was just originally a system for describing relationships among people, but which has nor temporal dimension, and so it makes it very very hard to model groups. You can say that "Fred is a member of X", but you can't say in FoF, "Fred was a member of X from 1976 to 1979."
 
- Eric
- 
Yeah, ok.
 
- Tom
- 
There's a bunch of stuff like that that as we move it out into the semantic web, and this is particularly true of identifiers. As we take things like identifiers‒call them URIs, call them what you want‒and move them out of the scope for which they were originally intended, and bring them out so that they can be viewed at web scale, suddenly we realize we've got redundancies, and duplications and stuff that doesn't parse, and things that just don't make sense when they're lifted out of the context that they were originally intended for. There's a lot, at lot, a lot of spackling and patching to do around that set of issues.
 
- Eric
- 
Mmmm, interesting.
 
- Tom
- 
I have a good friend, a metadata librarian, who worked for a long time in a photo archive where she was cataloging stuff in an online catalog, and at one point it was her job to place descriptions of this one photo archive online. The archive was entirely of pictures of Ulysses S. Grant. She said the fascinating thing about the descriptions was that they couldn't just lift them directly because they all said things like, 'On a horse'. And of course, because it was in the Ulysses S. Grant collection, everybody knew that it was Ulysses S. Grant on a horse.
 
- Eric
- 
Right.
 
- Tom
- 
But once you lifted it into a larger cataloging regime, you had to start thinking in terms of those things, of supplying that context along with it. And we've got an awful lot of data out there that is context dependent in those kinds of ways. And it's going to be a big job.
 
- Eric
- 
It seems like an interesting content strategy challenge in a lot of ways. Just spitting out the caption "on a horse" it's like, "Great, who's on a horse?".
 
- Tom
- 
Yeah, exactly. You can look around and find some people who are dealing with it very successfully, of course the proejcts everybody points to are FreeBase and DBP and stuff like GeoNames. But I actually think the New York Times is doing fantastic work with that based on stuff that they've been doing essentially since the 1860s. Really just a very, very, very solid transformation of this kind of stuff they've done historically to answer the question every reporter has, which is, "What has the New York Times said about this subject previously?"
 
- Eric
- 
Right.
 
- Tom
- 
They've had that problem for 150 years. And they've solved it one way, and now they're essentially solving it in linked data space.
 
- Eric
- 
[00:59:26] Huh, very interesting. So, are there any other comparisons that strike you between now and then, maybe when it comes to browsers, or just in general?
 
- Tom
- 
I don't know, I'm sort of with where Jen was a few minutes ago, I have some sense that browser technology is kind of, I guess I've thought this for a long time, and maybe it's just the lament of somebody who on some level wishes that it were within the grasp of one individual to write a browser, I don't know. But it does seem to me that browsers have gotten so feature intensive that they are sort of becoming Microsoft Word, in that there are many, many, many features there that people may or may not use. And they're not very fast. And they're sore of ponderous. If I'm allowed to express preferences, I like Chrome because it is simple and clean in many ways. I don't think that's true of everything. We've had this constant tension, and this is something that has not changed, between people who want to capture the browser space because they see that as being able to put their frame around the content that everyone is viewing, and of course there's commercial value in that, versus people who simply want to build an information access tools.
 
- Eric
- 
Yeah. That's...yeah, I can hardly agree more, actually.
 
- Tom
- 
And I think that Jen's right, I think we are coming back around to an era in which people are starting to realize, "Hey, wait a minute, this really is an information access tool and we probably shouldn't bitch it up with a lot of stuff that's extraneous to that process."
 
- Eric
- 
Yeah.
 
- Tom
- 
There's more of that thinking. I suspect that the tablet space is going to have a little bit of an effect that way, because tablets are a much more personal device in many ways. And they're also real estate limited. One of the things I used to say when images were first coming in was, "A lot of people think a picture is worth 1,000 words, and given the bandwidth that it takes, it'd better be."
 
- Eric
- 
Nice.
 
- Tom
- 
I think we may come back around to that way of thinking just a little bit more than we have.
 
- Eric
- 
[01:02:06] Huh. Well, I think that's a great place to wrap this up, although I would love to spend another hour. We never did quite find a way to work in the anecdote about the elephant and the leopard, so that the live people get a little treat.
 
- Tom
- 
We can't do that later, that's a show biz story though.
 
- Eric
- 
Oh, ok. Fantastic. Well, Tom, I'd like to thank you very much for joining us. It's been a pleasure
 
- Tom
- 
Oh, it's been delightful.
 
- Eric
- 
And I've learned a ton.
 
- Tom
- 
Well, always happy to help out. Thanks
 
- Jen
- 
And if people want to see your work, Tom, and follow you. Are you on Twitter? What websites should they go to?
 
- Tom
- 
I am on twitter as @trbruce, and you can find us at www.law.cornell.edu where we have been since 1992.
 
- Jen
- 
Nice. And I want to say thanks to Shutterstock and the In Control Conference for sponsoring the show today and making it happen. Making it possible. Eric as always, thank you so much, and people should check the schedule at 5by5.tv/schedule to see when it is when we'll be live, if you want to listen to us live, or of cousre subscribe in iTunes blah-di-blah if you want to download the show in the future. Thanks everybody!