Sorry to geek out on everyone here, but I'm sitting in a really cool talk with Greg Linden, who is guest lecturing at a data-mining class here at Stanford. Greg is the guy who initially created Amazon's recommendation engine and is talking about the general field of discovery and recommendation engines. Quick summary: search engines help you find what you already know you want. Discovery engines show you what you might want. Big examples of companies using discovery are Tivo, Netflix, Stumbleupon, Pandora and Amazon As some of you may know I'm big into recommendation and discovery engines now, and I'm working on a startup in the space.
Apparently, Amazon has tried to use different recommendation schemes: item-to-item, content-based, and clustering. Apparently item-to-item (people who bought this, bought that) works the best for Amazon. (Of course it's not literally item-to-item, Greg explained, as then you run into the "Harry Potter Problem" that everybody who bought x, probably bought Harry Potter.)
One interesting point Greg brought up was a quote from Peter Norvig:
"Don't worry about the algorithm, worry about the data.
- Peter Norvig
The graph below is an interesting representation of the power of data from a paper by Banko and Brill. Shows that the leaders in the recommendation space are most likely to be those who control the most vast amounts of data.
OK, I better post something non-geeky quick, lest I lose all of my readers.