Wondering Where The Missing URLs Are From Google’s Index?

GoogleThere's been a lot of talk around the circle over the past few days about the size of Google's index. Because like those capsules to extend that certain part of the male body, apparently it's all about the size of the index, not how you use it.

Dan Lewis over at the new Wikia Search has completed an investigation into where the "missing" URLs are inside of the Google index. My guess is that while Google says there are 1 trillion unique urls out there, how many of them are duplicate content and the like? Dan was able to locate 7 billion URLs not in the main Google index by searching for blog hosting services on both the Google Blog Search and then the main Google Search index. The difference is 7 billion. Dan notes that this certainly doesn't account for all of the missing URLs, it's a start.

From my perspective, more indexed URLs doesn't necessarily mean better quality results.

Here's a search for Wordpress.com, Blogspot and LiveJournal from the Google Blog Search:

And here's the same domains from the main Google Search index:

