Improving the world's most visited webpage
This post is about the world's most popular homepage. It's a really simple page but it's done remarkably poorly in terms of frontend web development. It's pretty much beyond me how a multi billion corporation manages to make such a poor effort when it comes to the most visited webpage on the planet. I'm of course talking about Google. I suppose it's safe to mention their name. Just about anything else seems to be off limits when it comes to the branding of Google which is why I decided to work my way around the guidelines for this post in order to prevent a very unwelcome C&D.

The world's most visited webpage is a very simple page. There's not all that much on it. A menu bar, a form with one textfield, two buttons (one for the normal people and one for the lucky few) and a handful of links. Not a huge web development challenge you would think. Apparently it is. In this article I'll go through the sourcecode of the page, highlight the most remarkable atrocities to the noble art of web development I found in it, discuss some possible reasons for some of the approaches followed and finally present my own attempt at improving the page.
Highlighted flaws on the search homepage
- No doctype
- No doctype has been defined in the page. Even if it was intentional to force the browser into quirks mode there's still not really a valid excuse not to have one. There's other ways to force a browser into quirks mode though it's beyond me why you'd want to.
- No language has been defined in the HEAD section
- Too much hassle, I guess.
<body bgcolor=#ffffff text=#000000 link=#0000cc vlink=#551a8b alink=#ff0000 onload="sf();if(document.images){new Image().src='/images/nav_logo3.png'}" topmargin=3 marginheight=3>- Right.... CSS has been around for a while but ancient attributes are used. And without single or double quotes enclosing them. onload? meh. Just like with the quirksmode issue there may be a deliberate reason for this. Still I think this should be handled differently. More about that later.
- span span span eggs and span span span bacon and span with some span
- Long series of links have been marked up with <span> tags enclosing them. I think unordered lists would have been somewhat nicer?
- Tons of links of secondary importance at the beginning of the page without a way to skip to the important part: the search form.
- Very nice for the disabled indeed!
- Other aspects of accessibility
- Epic FAIL. Try it yourself
- All attributes are without quotes enclosing them
- Only numeric values can go without quotes but even for those I'd recommend using them. Nasty!
- <center>
- Valid in HTML 4.01 but ... seriously...
- A table to markup the search form
- No tabular data to be seen on this page guys!
- <br><br>
- Should only come alone. And only in poetry too.
- <font> tags
- And only used to set sizes too.
- Sloppy code and lots of dumb validation errors
- Have a look. Being 100% anal about validation may or may not be to your taste but using non-existent attributes, a runaway </a> tag, no quotes anywhere and unbelievably dumb stuff such as
<div align=right id=guser style="font-size:84%;padding:0 0 4px" width=100%>? Looks like a disgrace to me. We're talking about the most visited webpage on the planet here!
I have no words for how incredibly poor this simple page has been coded up. Some of these things may have a somewhat plausible explanation but even with that in mind I'm pretty sure you'll all agree it's... well... a piece of trash that's not worthy of the status it currently has on the internet.
A possible explanation for some of this mess
Even though this page isn't quite a schoolbook example of how to properly markup a webpage there may be an explanation for parts of this mess. If you view the page with all CSS disabled, something interesting surfaces: The page looks almost the same!
Besides the ugly browser default font and a small flaw in the menu bar, the completely unstyled page looks remarkably close to the styled version. This is where a reason for the crappy approach may lie. Maybe the creators wanted the page to look as close as possible to the intended design in every browser on the planet, including ancient ones such as Netscape 2.0?
Even though hardly anyone uses these browsers, a missed visitor means missed business. Of course this explanation still doesn't justify half of the flaws found on the page but it definitely is an interesting observation.
Trying to support all browsers on the planet is an admirable thing but there is a better way of doing this: Serving a different (simplified) version of the page to ancient (non-css) browsers. Even if that's too much of a hassle, a single CSS based page can be made to be perfectly usable in these browsers. It's one of the things I tried to accomplish in my own take at the most popular page on Earth.
The remarkable resemblance to the styled page when looking at the unstyled one does NOT address the flaws in accessibility and the horribly sloppy coding style. At least it doesn't in my book!
The i-marco search page
First things first: my take on the world's most visited webpage. I took the original dotcom page as my example. Individual country versions of the original may look different and have different code. For this exercise I did not take these into account.
Obviously I made the page validate. I choose HTML 4.01 Strict as my doctype as I saw no immediate benefits in using XHTML. Making the page validate wasn't a particularly tough challenge.
Secondly, I used strictly CSS only for styling. As any modern web developer should. Since there's no tabular data to be found anywhere on this page there's absolutely no reason to party like it's 1996 on it.
Third, I took as much care as I could to make the page accessible. One of the first things you'll notice when looking at the sourcecode is the fact that I moved the menubar all the way to the bottom of the page. These links are of secondary importance. This page is all about the search form which is why this should come first. The page furthermore complies with all accessibility guidelines I deemed relevant. In case you feel I forgot something feel free to let me know about it in the comments. I love learning more about accessibility! The page passes all tests on cynthiasays.com, very much unlike the original it was based upon.
Fourth, I choose to not put CSS and Javascript in separate files. The reason for this is the fact that a page as popular as this particular search page needs to be tweaked for performance. CSS and Javascript in separate files would triple the amount of requests that will hit the servers that host it which is obviously a very undesirable thing. My sample page is uncompressed but in case mine were to get as many hits as the original I would have compressed it as well, just like the original. For this article I choose to keep the page readable.
The Javascript doesn't use the most modern way of attaching an event but I wanted to keep this part as compact as humanly possibly while keeping things completely unobtrusive. The code in this page is the most compact example I could come up with working in all major browsers I could get my hands on.
Even though the unstyled page doesn't look as close to the styled version like the original does, I still think it's very usable. In fact it's arguably MORE usable than the original despite the visual difference. In case this would be deemed absolutely unacceptable my stance on it is: serve legacy browsers a different page! The styled page may not be 100% pixel perfect to the original but I thought the minor differences it still has with the original and among different browsers would definitely not be showstoppers of any kind in case this page were to be deployed for real. I do realise some people may disagree with my practical attitude towards 'pixel perfectness'. To each their own!
Finally: the page has strictly separated style, markup and behaviour except for everything being in one document, which has a valid reason as I pointed out before. I have changed all link destinations and the company logo in order to not upset the creators of the original by violating their branding guidelines. Under the more menu you'll find some interesting destinations by some of my extremely talented co-workers. The search form can be used to search my blog.
Closing notes
By no means I'd dare claiming that my version of the page is the best humanly possible. I'm sure there's a lot of room for improvement and I'm welcoming any suggestion you may have to make the page better. I'm sure I can make it better myself if I keep tinkering with it for a lot longer but I thought it's definitely good enough to make this post and bring my points across. I had fun doing this little exercise and who knows it will make some managers wonder about whether it may be an idea to invest more in good frontend web developers.
Comments and suggestions for improvement are more than welcome! Web development is and will always remain a continuous learning experience.
Small update: In this post I didn't want to change any of the design on the original page because I wanted to focus solely on the web development side of things. Therefore I took the design 'as is'. Andy Rutledge did an excellent writeup on improving Google strictly from a design point of view. A highly recommended read!
Happy searching (and web developing)!
Filed under: web development
Number of comments:
Number of trackbacks:
Tagged with: 







At 10 December '07 - 22:54 Stefan wrote:
At 10 December '07 - 23:53 Jcl wrote:
Same for not having a doctype or anything that… albeit not “valid”, do not (in reality) change the usability within a “normal” web browser, yet it saves up
a lotof bandwidth when multiplied by the number of visits.Think of it the way of the typical american airlines example, taking one olive out of the salad supposedly saved a million dollars a year. Change olive for bytes :)
At 11 December '07 - 00:55 Marco wrote:
@Jcl this could definitely be true if the page were smaller than my version when compressed. Surprisingly enough, it isn’t. Mine comes out slightly smaller
At 11 December '07 - 02:45 Jcl wrote:
Does not make up for having a plainly shitty code, of course… although there are probably some obscure reasons to have it like that (I can’t understand or think of them… but hey, I don’t work at google
At 11 December '07 - 03:27 bramn wrote:
By the way an interesting addition to your post might be this old post of Andy Rutledge I remembered. He redesigned the Google homepage: http://www.andyrutledge.com/google-redux..
At 11 December '07 - 05:02 Marco wrote:
I find it interesting by the way how people seem to think that there may / most be obscure reasons behind why this page is this shitty. I have to resist this tendency myself as well.
It’s almost like we’re all assuming that there HAS to be reason behind it… because it’s Google! And because it’s the most visited page on the planet!
It may just as well be a crap page… with that being the end of it.
At 11 December '07 - 06:05 Kilian Valkhof wrote:
Another interesting point is the massive amount of bit-wasting google causes. Like you said your version comes out a tad smaller, add to the fact that your css and javascript will be cached and the immense number of page requests google get, it’ll make the server a lot happier.
I remember an article from some time ago that attempted something like you did, but also calculated the amount of bits google would save each day, month, year. The numbers were immense.
At 11 December '07 - 06:09 Premasagar wrote:
At 11 December '07 - 09:13 David Dorward wrote:
A quick web search (I don’t have time for exhaustive research right now) suggests that it might be used by a few non-English search engines (but not so much now), but I don’t know why they would need that when HTML already has the lang attribute, and HTTP has the Content-Language header.
Also, since the only mentions I found were for search engine use, I don’t know why an accessibility checker would highlight it.
Does anyone have any information that suggests it might actually be useful on the WWW of today?
At 11 December '07 - 09:23 Marco wrote:
Rule: 4.3.1 – Documents are required to use the META element with the ‘name’ attribute value ‘language’ in the Head section.
It felt a bit redundant to me since the html element already has a lang attribute but the validator insists on having the META element as well.
At 11 December '07 - 10:13 Teddy Zetterlund wrote:
I noticed one thing of importance though.
Your use of legend is not something I’d recommend. Having a legend text that long will likely make users of screen readers ears bleed. I’m not an expert in the area but I do know that at least JAWS (which I’m not quite sure exactly how popular it is but it should still be widely used) repeats the legend text before each “input” element in the fieldset.
That’d be rather annoying with a legend text that long, I think :)
At 11 December '07 - 10:17 Marco wrote:
I modified the page and put ‘Search’ in the legend and ‘Terms’ in the label. That would make JAWS say ‘Search Terms’ which makes more sense I think!
Thanks!
At 11 December '07 - 10:20 oscar wrote:
At 11 December '07 - 15:36 romacafe wrote:
Think of the time and money wasted if Google employees agonized over the markup of the homepage the way other posters in this thread are… No wonder Google is ahead of the curve, they’ve learned not to care about the B.S…
At 11 December '07 - 16:01 Marco wrote:
Allow me to explain some things to you:
First of all: HTML 4.01 isn’t ‘latest-and-greatest’. It’s been around for ages. Nothing new and exciting about it. The W3C recommendation was published in 1999. Not exactly new I’d say.
Secondly: Accessibility matters. If you don’t understand why this matters then I seriously hope you’ll never go blind or otherwise become disabled in a way that limits the ways in which you can use a computer.
Third: A web developer who knows his/her job doing a webpage RIGHT doesn’t take more time nor costs more money than having a clueless person do it all wrong. There’s no potential waste of money at stake here.
In short, for a multi billion dollar company I really don’t see any compelling reason to produce bad frontend code. Google’s technologies are usually pretty amazing. I found it strange that they’re seriously lacking in this one department. In my opinion it kind of puts a bad smell on otherwise really great things.
I’m sure it has some reason but I’m also quite sure the reason is NOT that the above is all B.S.
But of course everyone is entitled to their own opinion!
At 12 December '07 - 06:45 Max wrote:
I am kinda curious as to what you let Cynthia check. Section 508 is fine according to her (it?), it’s (at least) WCAG 2 that cause it to fail. In fact, when I check Yahoo! and Live against 508, the only one to fail is Yahoo!...
Care to do a take on Live and Yahoo! as well? Possibly other sites as well; I think it’d be just as great a series as the Redux articles by Andy.
At 12 December '07 - 07:55 Marco wrote:
Both http://www.google.com/ and http://search.yahoo.com/ pass Section 508.
On WCAG both Google and Yahoo have issues:
Yahoo only on some small things (meta tags mainly).
Google fails on things I consider a bit more important:
10.5 Until user agents (including assistive technologies) render adjacent links distinctly, include non-link, printable characters (surrounded by spaces) between adjacent links.
Failure – Anchor Element found at Line: 2, Column: 1231 is directly adjacent to the Anchor element that precedes it.
Failure – Anchor Element found at Line: 2, Column: 1308 is directly adjacent to the Anchor element that precedes it.
...
...
...
and
12.4 Associate labels explicitly with their controls.
Rule: 12.4.1 – Identify all non-hidden INPUT elements that do not have an explicit LABEL association.
Failure – INPUT Element, of Type TEXT, at Line: 2, Column: 3362 in FORM Element at Line: 2, Column: 3151
About taking a look at other pages than this one from Google:
I took on Google simply because it’s by far the most used one on the planet. I suppose I could take on any page from any major player and find issues. The primary point of this post isn’t ‘bashing Google’. The point is the fact that I wonder why large corporations in general seem to have so little attention to proper frontend web development.
As an article like this is quite a lot of work I’m not sure if I’m gonna take on another major page in the near future. It’s probably better to go for an entirely different subject. But maybe. Never say never.
At 12 December '07 - 12:09 your name wrote:
google.com only has 13.
At 12 December '07 - 12:31 Marco wrote:
I knew this was coming! I’m aware of the issues these over 3 year old templates have. Over time, some issues slip in. It’s a very complex page and I’m not getting paid to maintain them. It does serve as a reminder to do a bit of springcleaning on them though so thank you very much for pointing this out to me.
Other than that, I suppose you’re trying to tell me I have no right to say the things I said in my post because there’s some validation errors on this page. I honestly don’t think these pages have any relation to the contents of this post. If you want to compare something then compare my version of the page to the original.
But again, thanks for pointing out that some minor problems slipped into my templates over the years!
At 12 December '07 - 12:43 Kamil Szot wrote:
All css layouts must relay on hacks and hacks tend to break down in new versions.
And it maintains layout, css or no css as noted in article, so I can go there with links browser and still see nice familiar webpage not some unordered lists, headings and paragraphs.
At 12 December '07 - 13:17 Max wrote:
Really, Kamil… Anyone with a bit of CSS knowledge can take 1 look at google.com and formulate some simple, future-proof, non-hack-using CSS rules for it. Lots — if not most — CSS does not require any hack to be completely cross-browser, even in IE.
All this “Google Is Doing It This Way Because It Works!” — that line’s been used in a couple of comments, now — is utterly silly. If everyone thought like that we wouldn’t have had Google in the first place, because back then, Altavista and Yahoo worked, too.
Have fun holding on to web 0.3, please excuse us while we move on.
At 13 December '07 - 03:18 Anne Helmond wrote:
I loved the little touches in your take, such as the ‘more’ ;)
At 13 December '07 - 04:04 Tobias wrote:
“Doing stuff (or actually don’t do stuff) because it works’ really kills innovation. If we all would refuse changing things because they ‘simply’ work, we’d probably still be walking around with huge gettoblasters on our shoulder in stead of iPods in our pockets (and I doubt we’d have gettoblasters, but my point is clear). And changing the page in this case should make sense other than only to comply to standards. On my file system Marco’s page is a byte smaller than Google’s. And a byte multiplied by the number of hits (a lot, take that for granted) equals a lot of bytes, equals a lot of bandwidth, equals … (ah well, you get the point).
There was something else in the post that grabbed my attention. It was ‘the lucky few’ that made me wander. It seems to me this is a very under-developed functionality. In fact, all the lucky button does is direct you to the page highest in rank not taking into account context or whatever.
I think the goal of a user typing a search term into the Google search box is not to get a list of where he might find information on his subject, but to actually go to the page where he can find the information needed. This is exactly what the lucky button does, take a user to a (albeit a best guess by rank determined) page where he can find what he’s looking for. If Google can develop a best guess taking into account context of the search term, ranking, historical overview of searches and such the lucky button becomes a real genuine find and go button. Google has taken a first step by letting this man (http://video.google.com/videoplay?docid=8493378861634507068) do a presentation on his ideas of user interfaces. The full 25 minutes are worth wile.
At 13 December '07 - 04:57 Kamil Szot wrote:
Actually Yahoo and Altavista didn’t work. I still remember horror of searching for information back then. I remind myself why Google is the most visited page on the web by visiting from time to time other search engines. They still don’t work.
Maybe you (probably not anyone) could do, no hack, future proof layout of most visited page but will the layout display correctly in links (or lynx) browser ? Will the layout display correctly on some exotic platforms?
There are innovations and there are fads. Innovations help people achieve things easier. Fads make people pay a lot of money and jump through hoops just to stay trendy. I’m not claiming that CSS is just a fad. CSS IS extremely useful but tons of elusive experience and knowledge of css hacks you require for doing simple things in layouts in pure CSS.... I don’t know. I don’t see it making anyones life easier.
W3C guys just haven’t invented any proper way of doing grid layout. That’s too bad because grid is what people need. HTML tables and css floats are just things that are most easily abused to achieve desired goal.
Preaching standards is good. But violently preaching adhering to not complete, not practical, poorly implemented, and inconvenient standards does not seem so good to me.
You said: “Have fun holding on to web 0.3, please excuse us while we move on.”
Technology is not a railroad where you go only forward. Technology is evolution when you try new things but also you keep things that work in their niche. Do you happen too know how long ago nature invented sharks? Did they step aside when the rest moved onward? No. They are still alive and eat lots of other things that are so much millions of years more advanced then them.
Sorry for my english. Hope my message can be understood.
At 13 December '07 - 09:14 Max wrote:
The whole point of CSS in combination with sensible markup — and that combo is key, here — allows the page to degrade gracefully. Whether the layout will display correctly in lynx is moot, since lynx doesn’t do layout in the first place. Lynx is text-mode, so using sensible markup makes even more sense there than it does on “regular” browsers. The same goes for search engines (granted, Google’s front page doesn’t really need the SEO, but other pages do) and visually disabled people.
You have Microsoft to thank for that, mostly. Doing simple things in layouts in CSS is usually a pretty straightforward business. Getting it to work in IE may be a bitch, but you can’t blame CSS or the W3C guys for that. The specs are mostly clear enough — it’s the browsers that should follow them.
I don’t really understand this — might be the language barrier. My definition of a grid layout (which happens to follow the one on wikipedia), is perfectly feasable in today’s HTML/CSS. No abuse, just marking up your page correctly and add proper CSS. You can even get by without using “hacks” to get IE displaying it as it should.
If you’re talking about multi-column layouts, yes, that’s harder to do (even though it will see great improvements in HTML5), but really, is that something you want? Computer screens and pieces of paper make for completely different reading environments.
Sharks may still be alive and eating, but their numbers ARE declining, mostly thanks to one species that is, in fact, more advanced: humans. The sharks may not like it, but there’s your fact. If humans want to eat shark fins, they WILL eat shark fins, and the fact that sharks have been around for millions of years does nothing to prevent that from happening.
This railroad you mention? Actually it does only go forward. Evolution works towards more efficient / higher survival rate. As browsers evolve into beasts that interpret and display markup according to the rules, sites that keep relying on old code will slowly go extinct, and the web will be a better place because of it.
At 13 December '07 - 12:28 Pam wrote:
At 14 December '07 - 06:41 Marco wrote:
I’m happy to announce that this page validates again after some minor spring cleaning.
Glad to get that out of the way!
At 14 December '07 - 06:57 Max wrote:
The problem is, Google is everywhere, and that kind of brand recognition tends to go hand in hand with fanboy-ism. If Marco would’ve picked apart Apple’s front page, there’d be hordes of Mac-fanboys that couldn’t look beyond the URL of the page discussed, and concentrate on the meat of the matter. Same goes for microsoft.com.
I typed that without checking it, but now I did, and (no surprise): Cynthia does not approve of apple.com either. Not at WCAG 3 anyway, 508 passes. Heck, it doesn’t even get through simple HTML validation! Boo, Apple! (and that, before I get flamed to a crisp, is coming from a RABID Mac-fanboy).
So yeah, let’s stick with the REAL topic: a site that gets as many eyeballs as that, made by a company that is such a large player on the interwebs, should be technically sound. No excuses.
At 14 December '07 - 12:05 Max wrote:
At 16 December '07 - 02:30 Kamil Szot wrote:
> Whether the layout will display correctly in lynx is moot, since lynx doesn’t do layout in the first place.
I overshot myself claiming that most visited webpage displays in “lynx” correctly. Actually
$ lynx http://www.google.com/
gives me “400 Bad request error”. I almost never use “lynx” (instead I use “links”) so I definitely should check this before claiming anything. ... but
$ links http://www.google.com/
shows google page with proper layout without any degradation.
> You have Microsoft to thank for that, mostly. Doing simple things in layouts in CSS is usually a pretty straightforward business. Getting it to work in IE may be a bitch, but you can’t blame CSS or the W3C guys for that. The specs are mostly clear enough — it’s the browsers that should follow them.
I am thanking Microsoft every hour that I waste to make things look good. Doing simple layouts in CSS is easy when you decide how the page should look. But if you get design from an artist in .psd format and you have to do it with every detail pixel perfect, and the designer has specific ideas how the page should behave when the content is too short or when the browser window is being resized things start to get messy. I don’t like to tell people something can’t be done, unless I am pretty sure It can’t be. So this messy scenario is recurring pattern in my life.
Unfortunately in some areas specs are not clear at all. Especially when it comes to height: 100%
And even If specs are perfectly clear they still can be bad if they don’t cover some desired scenarios. And such bad specs lead to implementation differences and implementation specific extensions. So I can’t agree with you that W3C is without blame.
> My definition of a grid layout (which happens to follow the one on wikipedia), is perfectly feasable in today’s HTML/CSS.
Sorry for using imprecise term without explaining what I meant. I come from programming background so when I hear “grid layout” I think about something like this http://java.sun.com/j2se/1.4.2/docs/api/..
Full height, full width, split into rows and columns. With ability to place items in cells. With ability to nest new grid layouts inside cells to any necessary depth. Plus ability to specify what should be done when size of contents of cell exceed cell size.
With such tool a lot of things can be done easily. Closest thing to this in html is a table with width and height set at 100%. And IMHO that’s the reason why tables were so abused. Abusing floats to do such thing is very hard (if not impossible). I haven’t seen any solution to this. If you know some please point me to it, I’ll be constantly thanking you in my head
> Sharks may still be alive and eating, but their numbers ARE declining, mostly thanks to one species that is, in fact, more advanced: humans. The sharks may not like it, but there’s your fact.
And I thought I could just scare you away with sharks!
Humans are exceptions to lots of nature rules. But surely there are thousands species more evolutionary advanced than sharks that are still no match for them. I haven’t heard about chimps eating shark fins.
I heard funny thing lately. Some DNA research recently showed that actually chimps are more evolutionary advanced than humans. It looks that It’s not about how far you went, but whether you took the right path on the last few intersections.
And even humans are still being killed by such simple and ancient natures inventions as viruses, bacteria, and parasites.
I’m not Google fanboy, and I don’t hold anything google does as sacred. I guess I just took an opportunity to blow off some steam on subjects such as CSS vs tables, idealistic vs pragmatic, telling other people what to do vs giving them tools and letting them decide for themselves.
At 29 December '07 - 07:53 Rimantas wrote:
One thing I would like to point out: there IS a reason for this page
not to have any doctype. Very valid reason – this page does not follow any standard! So, what doctype do you put on the page, which does not conform with any markup standard? Correct, none. Which Google did.
And which is the right thing to do. By the right thing to do I mean “not putting any doctype for pages you know won’t validate”.
Trying to validate such pages makes no sense: first, they don’t claim to conform to anything, second, how do you choose what to validate for?
Now, let’s pretend Google ment to use HTML4 and just forgot to put doctype on. Still, where did you get the idea, that attributes can be unquoted only if they contain numbers? It is recomended always quote attributes, but it is allowed not to do so if they only contain only alphanumeric characters, hyphens, underscores, periods and colons (section 3.2.2 of the spec).
Regarding accessibility, moving links to the bottom of the source is probably the best you did here. Accesskeys and legend are most likely
overkill with doubtful benefits, if any. And, different from markup, do not
rely too much on automated tools for checking accessibility stuff.
CSS and markup-wise there is still a room to squizee more. META “keywords” on THIS page? Come on… And what’s with a case of classitis– LI class=“pipe”?
If one wants to get out any unnecessary bit, then do not forget that HTML4 allows to omit some end tags, and some tags altogather.
unquoted attributes (which ARE valid) with single letter IDs, class names, etc. will allow to save even more…
At 03 December '09 - 22:25 Geomancer wrote: