by David Harry
Posted on February 28, 2007
|
|
Never let it be said that we here at Site-Reference are behind the times. On the contrary, last fall Phrase Based Optimization was published here to some warm, if not confused reviews. In recent weeks there has been increasing interest in these technologies, so let’s revisit the topic and I shall leave some further resources to learn more for those that are interested.
What’s important to understand is that ‘Phrase Based Indexing and Retrieval’ is a method of indexing that has had numerous patents based upon it filed by Google over the past year or so. While it should be mentioned that simply because some one applies for a patent, doesn’t mean they necessarily use it. I would also surmise that the sheer number of them indicates more than a passing fancy on Google’s behalf.
The Big Picture
What seems to confuse people the most is getting their head around a ‘new’ or ‘separate’ methodology. Much of the existing methodologies were keyword based and link love powered. The PaIR model not only adds depth to the ability to deliver higher quality ‘predictive’ results, but would appear to have better properties for identifying and dealing with Web Spam.
We can also think of this in a reverse onion analogy in that it would not necessarily replace existing methods in as much as being layered onto the existing model. Certainly, over time, the dials can be slowly adjusted to give more prominence to the PaIR aspects, but their ‘could’ be life on other planets as well, right? Let’s play it by ear for the moment.
What is ‘Phrase Based Indexing and Retrieval’?
Hopefully the name gives it away. It is a methodology based on the relational value of phrases on a given page and website. From the horses mouth;
"An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are identified that predict the presence of other phrases in documents. Documents are the indexed according to their included phrases."
‘To identify phrases that have sufficiently frequent and/or distinguished usage in the document collection to indicate that they are "valid" or "good" phrases”
It seeks to identify valid (actual/real) phrases in a given document collection (or web pages in our case). The goal being to classifying each potential phrase as either “a good phrase or a bad phrase” depending on it’s usage and frequency; then using those ‘good’ phrases in predicting the usage of other ‘good phrases’ in the collection of web pages.
There's the basics. To continue the journey here are some resources;
SR Resources
Original SR Article
Site Reference Forum thread
Related Patents
Phrase-based searching in an information retrieval system
Multiple index based information retrieval system
Phrase-based generation of document descriptions
Phrase identification in an information retrieval system
Detecting spam documents in a phrase based information retrieval system
PaIR Articles on Reliable-SEO
Phrase Based Optimization
Phrase Based Indexing and Retrieval II
Spam Detection in a PaIR system
Phrase Based Personalization of Search
Other Resources and Coverage
Dave's GoogleBomb conspiracy theory
Article by Bill Slawski
SEO Black Hat
Brother Mad Hat
Early Search Engine Watch
Article on Search Engine Land
So What Does it Mean to You?
To be honest, if your an average everyday webmaster creating unique content, probably not much. You see, Google has a fascination with high quality RELEVANT results. Much of the Phrase Based Indexing and Retrieval would be an additional layer onto the existing ranking/weighting we know and love in the world of SEO. Some more thought can go into all facets of ones organic search marketing, from link building to page naming conventions with 'PaIR' in mind, but no major overhaul would be required.
In the end I would hope to see a difference in the actual Quality of the SERPs. An added layer of Spam detection won't hurt either. If anything try to think 'less' about phrase density and such srtategies. Should Google be utilizing a 'PaIR' model, they are trying to adapt to legitimate quality content - they are coming to you, in a sense.
That's a Wrap
So now you know all that we here at Site Reference know so far on the topic. There certainly seems to be a lot of 'smoke' on this topic. And you know what they say, 'Where there is smoke'....
What’s important to understand is that ‘Phrase Based Indexing and Retrieval’ is a method of indexing that has had numerous patents based upon it filed by Google over the past year or so. While it should be mentioned that simply because some one applies for a patent, doesn’t mean they necessarily use it. I would also surmise that the sheer number of them indicates more than a passing fancy on Google’s behalf.
The Big Picture
What seems to confuse people the most is getting their head around a ‘new’ or ‘separate’ methodology. Much of the existing methodologies were keyword based and link love powered. The PaIR model not only adds depth to the ability to deliver higher quality ‘predictive’ results, but would appear to have better properties for identifying and dealing with Web Spam.
We can also think of this in a reverse onion analogy in that it would not necessarily replace existing methods in as much as being layered onto the existing model. Certainly, over time, the dials can be slowly adjusted to give more prominence to the PaIR aspects, but their ‘could’ be life on other planets as well, right? Let’s play it by ear for the moment.
What is ‘Phrase Based Indexing and Retrieval’?
Hopefully the name gives it away. It is a methodology based on the relational value of phrases on a given page and website. From the horses mouth;
"An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are identified that predict the presence of other phrases in documents. Documents are the indexed according to their included phrases."
‘To identify phrases that have sufficiently frequent and/or distinguished usage in the document collection to indicate that they are "valid" or "good" phrases”
It seeks to identify valid (actual/real) phrases in a given document collection (or web pages in our case). The goal being to classifying each potential phrase as either “a good phrase or a bad phrase” depending on it’s usage and frequency; then using those ‘good’ phrases in predicting the usage of other ‘good phrases’ in the collection of web pages.
There's the basics. To continue the journey here are some resources;
SR Resources
Original SR Article
Site Reference Forum thread
Related Patents
Phrase-based searching in an information retrieval system
Multiple index based information retrieval system
Phrase-based generation of document descriptions
Phrase identification in an information retrieval system
Detecting spam documents in a phrase based information retrieval system
PaIR Articles on Reliable-SEO
Phrase Based Optimization
Phrase Based Indexing and Retrieval II
Spam Detection in a PaIR system
Phrase Based Personalization of Search
Other Resources and Coverage
Dave's GoogleBomb conspiracy theory
Article by Bill Slawski
SEO Black Hat
Brother Mad Hat
Early Search Engine Watch
Article on Search Engine Land
So What Does it Mean to You?
To be honest, if your an average everyday webmaster creating unique content, probably not much. You see, Google has a fascination with high quality RELEVANT results. Much of the Phrase Based Indexing and Retrieval would be an additional layer onto the existing ranking/weighting we know and love in the world of SEO. Some more thought can go into all facets of ones organic search marketing, from link building to page naming conventions with 'PaIR' in mind, but no major overhaul would be required.
In the end I would hope to see a difference in the actual Quality of the SERPs. An added layer of Spam detection won't hurt either. If anything try to think 'less' about phrase density and such srtategies. Should Google be utilizing a 'PaIR' model, they are trying to adapt to legitimate quality content - they are coming to you, in a sense.
That's a Wrap
So now you know all that we here at Site Reference know so far on the topic. There certainly seems to be a lot of 'smoke' on this topic. And you know what they say, 'Where there is smoke'....
About the Author; David Harry provides affordable SEO services for Reliable SEO and also authored the SEO Handbook and rants on his SEO BLog - the Trail of The Fire Horse
COMMENT ON THIS ARTICLE...
No comments yet. Be the first one to comment.
SEO - Phrase Based Optimization
Should You Be Copyrighting Your Copywriting?
Is Yahoo Under the Influence of TrustRank?
Should You Be Copyrighting Your Copywriting?
Is Yahoo Under the Influence of TrustRank?
SEO Articles
Internet Marketing Articles
Development Articles
General Articles
And also in our Archives
Internet Marketing Articles
Development Articles
General Articles
And also in our Archives
Drive traffic to your business and get recognized as an industry leader by sharing your knowledge on Site-Reference. Authors are given a wide range of exclusive benefits here at SR; so checkout what we can offer to those that…

We’re always on the lookout for new writting talent so even if haven’t written for the web yet, feel free to contact us anytime
We’re always on the lookout for new writting talent so even if haven’t written for the web yet, feel free to contact us anytime




