Personalization is Fool’s Gold?

Over at John Battelle’s blog a debate was started by Raul Valdes-Perez of Vivisimo on the actual “returns” of investing in personalization for a search engine.

Rual’s arguments can be structured into two main ones . . .

Problem #2 - The surfing data used for personalizing search is weak. The data that online booksellers like Amazon use is strong: I’m paying $20 for a book and committing ten hours of my life to reading it (let’s ignore the problems with gift purchases). Surfing data involves the minimal commitments of a mouse click and a few seconds to look at a page before leaving.

Problem #3 – If the data used for inferring user profiles is the whole web page that the user visited, then it’s misleading because the user’s decision to visit the page is based on the title and brief excerpt (snippet) that are shown in the search results, not the whole page.

Problem #4 – Home computers are often shared among family members, whose surfing interests obviously diverge.

Problem #5 – Queries tend to be short. My own spouse couldn’t figure out my interests from a one or two word utterance, so how is a computer going to?

Argument 2-5 are idiosyncratic to search engines rather than to datamining. . . ie . .. the way search engines are architected and the use cases around it does not provide enough data points to mine meaning data to use as a predicitve variable for future behavior. These problems are actually only applicable to small search engines like Vivisimo. MSN and Yahoo have treasure trove of data from its portal properties and registration information that they can combine with prior search behavior to predict future search behavior (ie purchases on YahooShopping). Google recognized this pretty early on and is trying to build out its own suite of products and building single sign-on solutions. Furthermore, with its Wi-Fi initiative, Google will be on equal footing with Yahoo and MSN on owning access businesses which will allow all of them to mine clickstream data of users ENTIRE time on the web. (BTW, not sure of Yahoo or SBC have access to that data so this is just a conjucture). With such a data, Google no only knows when a person clicks through a search result, but whether he/she added an item to a shopping cart and checkout of Amazon! Personalization for a standalone search engine is really HARD but not if you know additional information on your searchers like these gorilla companies. Is short, Economies of Scale & Scope exists for internet companies in the form of data accumulation and knowledge discovery.

Rual’s first point is well taken . . .

Problem #1 - People are not static; they have many fleeting and seasonal interests. A student might intensely research Abraham Lincoln for a school project but may care nothing at all about it later. I’ll read about spectacular tragedies such as a recent fire in Paraguay that led to hundreds of deaths, but am not generally concerned with death, fire, Paraguay, or supermarkets. Seasonal phenomena like elections, the Olympics, sports leagues, etc. also lead to variable interests: I’ll follow the Olympics for the next month or so, but will pay no attention for another four years.

For many dataminers, or people involved in the KDD cirlces, its wellknown that Personalization (personal demographic and behavioral) is useful and indispensable for predicting user behaviour. However, in the last few years another method called “occasionalization” have showned that in many instances, user behavior correlates much higher with an “occasion” than with historical data such as demographics or past purchases. In layman terms, its easier to interprate what you want based on what you are doing at that exact momemt or a few moments prior . . . pretty obvious eh? The problem is past data is exactly what Rual mentions . . . that user behavior is often a “response” to an external stimuli which cant be model unless we know what that stimuli is . . .

There is actually a very good article called “Seize The Occasion” download here published by bunch of Booz Allen consultants in 2001. . . the paper is very good, the data is not so much since the model they built are pretty simplistic . .. BUT the point is well written and taken . . .

As for what search engines could do with personalization? They should take a different approach suggested by “occasionalization” which is to stop treating each search query as an “independent” set of events unrelated to eachother, but rather to treat a set of queries within the same visit as a series of attempts to find ONE particular information with each query acting as refinement . . . (confused?) Simple and obvious example . . .

Person A: “Car” - result set A

Person B: “Honda” - result set B

Person C: “car” -result set A

Person C(same visit, next query): “honda” - result set C

result B /= result C even though both person C and person B typed in honda . . . that is because we know that result set C should skew towards cars rather than honda lawn mowers . . .

The applications are much much wider and algorithms more sophisticated than this. . . we can infer things about that visit/occasion through the queryw ord and the person’s clickthrough to help decide what the next result set might be if the user refines that word (again needs algorithms to define adjacency of words/queries to identify whether the next qeury is a refinement or really a completely different intention. . . .

Lastly . . .

If not search personalization, then what? Many companies, including my own, are placing bets on a display of search results that goes beyond simple ranked lists. The idea is to analyze the search results, show users the variety of themes therein, and let them explore their interests at that moment, which only they know. The best personalization is done by persons themselves. god . . . I wish it was so simple. . . good luck Rual. . . the unfortunate reality is that searchers dont “talk” back to the search engines. I wrote an entire post on this topic (and check out the comment section) . . . Google has created an entire generation of lazy surfers