My news story on recommender systems (subscription required) is in the August issue of Communications of the Association of Computing Machinery.
Even if you don't recognize the phrase, if you do much of anything on the web you've dealt with "recommender systems": they're the programs that let Amazon, L.L. Bean, YouTube, and pretty much everyone else offer "suggestions" about other items that might interest you.
The timing of this story turned out to be very good. A major driver for this field in the past couple of years has been the Netflix Prize, which offered a million dollar reward to a team that could beat, by 10%, Netflix's algorithm for predicting movie preferences. To lure researchers, the company offered access to its enormous database of customer preferences, but they stipulated that the winners must make their techniques publicly available. The openness of the competition has attracted thousands of competitors, and stories in Wired and the New York Times Magazine, and IEEE Spectrum. It's taken a while, and some researchers were even speculating that Netflix had some secret knowledge that 10% was unreachable, but in the last month a couple of different teams have finally inched past the goal. (As of this writing, the official winner hasn't yet been announced, though.)
What makes my story gratifying, though, is that it goes beyond the prize to put these systems in a larger context. The 10% goal is based on the typical (root-mean-square) discrepancy between the predictions and the actual preferences reported by customers in a secret test batch. But predicting things that people will like is only a beginning. What people really need is pleasant surprises--items that they wouldn't have found on their own. In many cases, this means that the most useful predictions must make mistakes. This is a different goal from that for traditional "classifiers" that trade off false positives with false negatives. (In October in New York, the ACM is sponsoring a conference devoted entirely to recommender systems.)
Another key issue is the user interface, including how data is gathered and how recommendations are presented. If Amazon tells you that customers who bought the glass tumbler you ordered "frequently bought" a pet nail grooming rotary tool (as described recently in Consumer Reports), it makes a funny story. If they told you that the pet tool was especially selected for you by their highly tuned software, you'd likely conclude they were delusional.
One of the fun applications is in music. As part of my "research" I tried out the internet radio station Pandora. I seeded the station with some of my quirkier art-rock music from the early 70s, like Gentle Giant, and was really blown away when it played other music from the distant corners of my collection that I didn't think anyone else new about. Interestingly, the Pandora team uses a large team of musicologists to classify tunes, and does not rely exclusively on user's preferences.
The million dollars that Netflix ponied up is a hint of how commercially important these systems are. Even when we're not aware of it, they will shape more and more of our technological experience, both on the web and with mobile devices. Stay tuned.