by Mark Daoust
Posted on March 22, 2006
|
|
Valid HTML is important - right? We are told every day in the website owner community that valid HTML code is the 'right' way to build your website. Valid HTML allows for greater accessibility, cross-browser compatability, and can even possibly help your search engine rankings.
But then again...
I decided to test whether valid HTML can actually help your rankings in Google. A lot of website owners talk about how their non-compliant websites do well in Google and how their complaint sites may not be doing as well. The implied suggestion here is that Google either simply did not care about errors in HTML, or even more extreme, that Google preferred non-compliant websites - a charge that would certainly be puzzling if it were true.
A Sneak Peak at Results - Google Preferred Invalid HTML?
The results of my test surprised me. Not only did I find that Google apparantly does not give any preference to sites with valid HTML, Google actually seemed to prefer the sites with blatant errors in their code.
Think about this - if Google does give preference to websites with errors in their HTML, then it would actually benefit you to program errors into your website (as far as the SEO is concerned). Now I am not ready to accept that as a valid conclusion, but the results are what they are. With these conclusions staring back at us, I though it necessary to publish the methodology and results, and open up the topic for discussion.
Setting Up the Study
As we all know, Google does not rely on any one factor to rank a website. As a result, a website could be horribly optimized in one aspect of their website, but still reach the top of the rankings because they are well optimized elsewhere. This makes testing individual aspects of SEO tricky.
In order to determine whether valid HTML was actually a factor that contributed to your ranking, even in the smallest of ways, it was necessary to isolate every other aspect that could possibly influence the ranking of a website. To do this, I looked to do the following things:
- The keyword density had to be identical
- The page size should be identical (just in case this effected the crawlability)
- The competing websites should be newly registered domains, all registered on the same day (removing age of site from the equation)
- The competing websites should be hosted on the same server
- Inbound links should be identical and from the same site so as to avoid different link weightings
- The links should have identical anchor text
- Since all the links would come in to the same site, link order might affect rankings - this would need to be neutralized
- The content should be identical so as not to influence the rankings in any other way (possible poison pill - I'll explain later)
The only difference between the two sites would be that one would be made with valid HTML and the other would be made with obvious, and significant, errors in the HTML.
My initial thought was to have just two competing websites both trying to rank for the same non-sense keyword (the keyword would be one that currently has no rankings in Google). The idea was to get a snapshot of how Google initially ranked the websites. But there was a problem.
On Page Links are Not Necessarily Equal
In going over the requirements for the study, I could not guarantee that the links coming from my link partner were all going to be given the same weight by Google. They were all on teh same page and they had to have the exact same anchor text. If Google saw two links with identical anchor text, it seemed reasonable to surmise that they may give more weight to the first one they discovered.
The answer to this small dilemna was to create two sets of competing sites. Each set would have two websites, both competing for the same keyword, with one website sporting valid HTML while the other would take on invalid HTML in each set. When linking over to these four sites, I would alternate how I linked. For example:
Keyword 1:
Link to Site with Invalid HTML
Link to Site with Valid HTML
Keyword 2:
Link to Site with Valid HTML
Link to Site with Invalid HTML
By using this method, I could erase link weights within the page from the overall equation.
The Domains and Keywords
I chose four domains for the project - each with a nonsense name: Iggelomy, Pucroins, Gohthone.com, Hontihes.com. These would be split up into two groups, with the first group focusing on the keyword "Relpepiblus lost" and the second focusing on "startnipol pin".
Next I needed to create some content for the sites and create errors on one site from each set to invalidate the code. For the content, I made my way over to Gordon Mac and downloaded one of the free CSS templates he offers. I then modified the template to fit my needs and discuss the project - all the time making sure I used my targeted keyphrase.
Once the content had been created, I began to work on creating errors in the HTML. Rather than just randomly create errors, I had to be sure to keep the page sizes exactly the same and to not change the keyword density at all. I go over some of the changes that I made in more detail on the test sites. When the sites were done, I was able to create legitimate errors in the HTML, invalid attributes, open ended tags, an incorrect doctype declaration, and a few other errors - all without changing the page size or keyword density.
The Final Step - Linking Up the Site
I now needed a link partner. Fortunately I did not have to look far - I simply plundered my wife's blog (The Lazy Wife) for a couple of links. I don't think she even noticed - so don't tell her.
When linking up the sites, I was careful to link the sites in such a way that would not give preference to the sites with invalid markup, or vice-versa. The final setup of the experimented looked something like this:

Finally - The Results
It only took a few days for Google to index the pages and include them in their index. Once there were results for the keywords "Relpepiblus lost" and "startnipol pin", I eagerly looked to see what Google had ranked first, and what it placed lower in its results.
I was a little bewildered. I was honestly expecting to find that Google would put the site with valid HTML above the site with invalid HTML in at least one of the examples. What I saw was different.
Not only did Google rank both sites with obviously wrong HTML higher, they even refused to include one of the sites with valid HTML altogether (pucroins.com)!
I figured there had to be something wrong - so I waited a little while...
After Waiting
Here are the screenshots after waiting a few days:
Screenshot of Relpepiblus lost

Screenshot of Startnipol Pin

There was definitely a change, but nothing too significant. Mwife's blog, The Lazy Wife was now usurping the rankings (being more established, this was expected). Google had still not given any preference to either site with valid HTML, and was still continuing to ignore pucroins altogether (a site with valid HTML).
To recap: Google actively ranked the websites with invalid HTML higher than the websites with valid HTML. Google even refused to rank one of the valid HTML websites altogether.
This seems to go against their Webmaster Guidelines in which they instruct webmasters to check for any HTML errors.
Some Definite Conclusions
There are quite a few interesting factors to this study that we were able to draw. The first, and most important, factor is that Google does not apparently give any weight to valid HTML. More importantly, Google apparently does not penalize invalid HTML at all. The study itself would almost lead us to believe that Google actually rewards invalid HTML with a higher ranking.
Secondary to the study, it seems that on-page optimization is no match for an established website. After just two days of being in the rankings, all of our test websites lost their top postions to The Lazy Wife. This happened in spite of the fact that our test websites had far more on-page optimization than The Lazy Wife for the keywords in question. The Lazy Wife, although still new, was still far more established, and thus won in the rankings.
Some Not So Definite Conclusions
I am not ready to admit that Google actually gives preference to invalid HTML, but the results seem to want to point us in this direction. The idea that Google actively rewards websites that put errors into their code simply does not make sense.
It is possible, however, that there is some other factor which we are not seeing here that occurs with a website that has invalid HTML. In other words, it may not be the improper HTML causing the sites to rank higher, but some other factor that we cannot see.
Another possibility could be that invalid HTML just happens to 'fit in' better with most reliable websites. The fact is, there are very few high-profile sites that can pass the muster of a validation test - could it be that Google is discounting sites with valid HTML as being 'too good to be true'? Is valid HTML a form of over-optimization?
I would lean towards disagreeing with this, but it is a possibility which should be discussed.
A Parting Shot at Google - and Compliments to MSN Search
Although Google does not seem to reward site owners for putting together a site with valid HTML - a goal of many well respected webmasters - MSN seems to be flawless. Out of curiousity I checked the results for relpepiblus lost and startnipol pin on MSN search and found that not only did MSN rank the sites with valid HTML higher - they kicked out the sites with invalid HTML.
Screenshot of MSN's Results

This would be consistent with the fact that MSN's search result pages validate, while Google's do not. MSN has a long way to go, but they seem to have gotten this part of their engine right.
Looking For Explanations
The results of the study say one thing, but common sense would say another. Is it possible that Google is somehow biased towards sites that have erros in their HTML? It does not have to be a philosphical bias - could there be a technical bias?
Or, was there a problem with the study itself? Were there too few examples to draw any conclusions at all?
I am looking for your ideas on this study. I started a thread on the Site Reference forums to discuss this. I would be interested to hear your feedback.
But then again...
I decided to test whether valid HTML can actually help your rankings in Google. A lot of website owners talk about how their non-compliant websites do well in Google and how their complaint sites may not be doing as well. The implied suggestion here is that Google either simply did not care about errors in HTML, or even more extreme, that Google preferred non-compliant websites - a charge that would certainly be puzzling if it were true.
A Sneak Peak at Results - Google Preferred Invalid HTML?
The results of my test surprised me. Not only did I find that Google apparantly does not give any preference to sites with valid HTML, Google actually seemed to prefer the sites with blatant errors in their code.
Think about this - if Google does give preference to websites with errors in their HTML, then it would actually benefit you to program errors into your website (as far as the SEO is concerned). Now I am not ready to accept that as a valid conclusion, but the results are what they are. With these conclusions staring back at us, I though it necessary to publish the methodology and results, and open up the topic for discussion.
Setting Up the Study
As we all know, Google does not rely on any one factor to rank a website. As a result, a website could be horribly optimized in one aspect of their website, but still reach the top of the rankings because they are well optimized elsewhere. This makes testing individual aspects of SEO tricky.
In order to determine whether valid HTML was actually a factor that contributed to your ranking, even in the smallest of ways, it was necessary to isolate every other aspect that could possibly influence the ranking of a website. To do this, I looked to do the following things:
- The keyword density had to be identical
- The page size should be identical (just in case this effected the crawlability)
- The competing websites should be newly registered domains, all registered on the same day (removing age of site from the equation)
- The competing websites should be hosted on the same server
- Inbound links should be identical and from the same site so as to avoid different link weightings
- The links should have identical anchor text
- Since all the links would come in to the same site, link order might affect rankings - this would need to be neutralized
- The content should be identical so as not to influence the rankings in any other way (possible poison pill - I'll explain later)
The only difference between the two sites would be that one would be made with valid HTML and the other would be made with obvious, and significant, errors in the HTML.
My initial thought was to have just two competing websites both trying to rank for the same non-sense keyword (the keyword would be one that currently has no rankings in Google). The idea was to get a snapshot of how Google initially ranked the websites. But there was a problem.
On Page Links are Not Necessarily Equal
In going over the requirements for the study, I could not guarantee that the links coming from my link partner were all going to be given the same weight by Google. They were all on teh same page and they had to have the exact same anchor text. If Google saw two links with identical anchor text, it seemed reasonable to surmise that they may give more weight to the first one they discovered.
The answer to this small dilemna was to create two sets of competing sites. Each set would have two websites, both competing for the same keyword, with one website sporting valid HTML while the other would take on invalid HTML in each set. When linking over to these four sites, I would alternate how I linked. For example:
Keyword 1:
Link to Site with Invalid HTML
Link to Site with Valid HTML
Keyword 2:
Link to Site with Valid HTML
Link to Site with Invalid HTML
By using this method, I could erase link weights within the page from the overall equation.
The Domains and Keywords
I chose four domains for the project - each with a nonsense name: Iggelomy, Pucroins, Gohthone.com, Hontihes.com. These would be split up into two groups, with the first group focusing on the keyword "Relpepiblus lost" and the second focusing on "startnipol pin".
Next I needed to create some content for the sites and create errors on one site from each set to invalidate the code. For the content, I made my way over to Gordon Mac and downloaded one of the free CSS templates he offers. I then modified the template to fit my needs and discuss the project - all the time making sure I used my targeted keyphrase.
Once the content had been created, I began to work on creating errors in the HTML. Rather than just randomly create errors, I had to be sure to keep the page sizes exactly the same and to not change the keyword density at all. I go over some of the changes that I made in more detail on the test sites. When the sites were done, I was able to create legitimate errors in the HTML, invalid attributes, open ended tags, an incorrect doctype declaration, and a few other errors - all without changing the page size or keyword density.
The Final Step - Linking Up the Site
I now needed a link partner. Fortunately I did not have to look far - I simply plundered my wife's blog (The Lazy Wife) for a couple of links. I don't think she even noticed - so don't tell her.
When linking up the sites, I was careful to link the sites in such a way that would not give preference to the sites with invalid markup, or vice-versa. The final setup of the experimented looked something like this:

Finally - The Results
It only took a few days for Google to index the pages and include them in their index. Once there were results for the keywords "Relpepiblus lost" and "startnipol pin", I eagerly looked to see what Google had ranked first, and what it placed lower in its results.
I was a little bewildered. I was honestly expecting to find that Google would put the site with valid HTML above the site with invalid HTML in at least one of the examples. What I saw was different.
Not only did Google rank both sites with obviously wrong HTML higher, they even refused to include one of the sites with valid HTML altogether (pucroins.com)!
I figured there had to be something wrong - so I waited a little while...
After Waiting
Here are the screenshots after waiting a few days:
Screenshot of Relpepiblus lost

Screenshot of Startnipol Pin

There was definitely a change, but nothing too significant. Mwife's blog, The Lazy Wife was now usurping the rankings (being more established, this was expected). Google had still not given any preference to either site with valid HTML, and was still continuing to ignore pucroins altogether (a site with valid HTML).
To recap: Google actively ranked the websites with invalid HTML higher than the websites with valid HTML. Google even refused to rank one of the valid HTML websites altogether.
This seems to go against their Webmaster Guidelines in which they instruct webmasters to check for any HTML errors.
Some Definite Conclusions
There are quite a few interesting factors to this study that we were able to draw. The first, and most important, factor is that Google does not apparently give any weight to valid HTML. More importantly, Google apparently does not penalize invalid HTML at all. The study itself would almost lead us to believe that Google actually rewards invalid HTML with a higher ranking.
Secondary to the study, it seems that on-page optimization is no match for an established website. After just two days of being in the rankings, all of our test websites lost their top postions to The Lazy Wife. This happened in spite of the fact that our test websites had far more on-page optimization than The Lazy Wife for the keywords in question. The Lazy Wife, although still new, was still far more established, and thus won in the rankings.
Some Not So Definite Conclusions
I am not ready to admit that Google actually gives preference to invalid HTML, but the results seem to want to point us in this direction. The idea that Google actively rewards websites that put errors into their code simply does not make sense.
It is possible, however, that there is some other factor which we are not seeing here that occurs with a website that has invalid HTML. In other words, it may not be the improper HTML causing the sites to rank higher, but some other factor that we cannot see.
Another possibility could be that invalid HTML just happens to 'fit in' better with most reliable websites. The fact is, there are very few high-profile sites that can pass the muster of a validation test - could it be that Google is discounting sites with valid HTML as being 'too good to be true'? Is valid HTML a form of over-optimization?
I would lean towards disagreeing with this, but it is a possibility which should be discussed.
A Parting Shot at Google - and Compliments to MSN Search
Although Google does not seem to reward site owners for putting together a site with valid HTML - a goal of many well respected webmasters - MSN seems to be flawless. Out of curiousity I checked the results for relpepiblus lost and startnipol pin on MSN search and found that not only did MSN rank the sites with valid HTML higher - they kicked out the sites with invalid HTML.
Screenshot of MSN's Results

This would be consistent with the fact that MSN's search result pages validate, while Google's do not. MSN has a long way to go, but they seem to have gotten this part of their engine right.
Looking For Explanations
The results of the study say one thing, but common sense would say another. Is it possible that Google is somehow biased towards sites that have erros in their HTML? It does not have to be a philosphical bias - could there be a technical bias?
Or, was there a problem with the study itself? Were there too few examples to draw any conclusions at all?
I am looking for your ideas on this study. I started a thread on the Site Reference forums to discuss this. I would be interested to hear your feedback.
Mark Daoust is the owner of Site Reference
Join the discussion at Site Reference Forums
COMMENT ON THIS ARTICLE...
Do Search Engines Care About Valid HTML?
Google Confusion - And Some Clarity
5 Steps to Get Top 10 Website Search Engine Rank
Google Confusion - And Some Clarity
5 Steps to Get Top 10 Website Search Engine Rank
SEO Articles
Internet Marketing Articles
Development Articles
General Articles
And also in our Archives
Internet Marketing Articles
Development Articles
General Articles
And also in our Archives
Drive traffic to your business and get recognized as an industry leader by sharing your knowledge on Site-Reference. Authors are given a wide range of exclusive benefits here at SR; so checkout what we can offer to those that…

We’re always on the lookout for new writting talent so even if haven’t written for the web yet, feel free to contact us anytime
We’re always on the lookout for new writting talent so even if haven’t written for the web yet, feel free to contact us anytime





WizeOneNot writes: If Google gave preference of valid HTML over the opposite, they'd have less control over their marketing. Web Developers/Designers will have to create valid code regardless as they were taught to do so while learning the craft.
WHether HTML is valid or not is a small deal. YOu, at least, know something many do not. Use it to your advantage.
17:05:06 Wed Jun 17 2009 CDT
Mike writes: Thank you for running this test. I have been curious about this for quite some time but didn't have the time and resources available to run a full-scale test like this.
I'm hoping that Google will come around and actually care about AND use valid (x)html...
16:10:49 Wed Jun 10 2009 CDT
Pages: 1