Product Recommendations at Gilt
A while back, I posted an article on the Gilt Groupe tech blog. It is about the recommendation engine I developed. I am pretty proud of it as it is based on some of my PhD work. Below is a re-blog. The original article with images can be found here.
Product recommendations at Gilt work a little differently than they do at other companies. For example, at Amazon they enjoy the benefit of having a relatively static and large inventory so they can do things like collaborative filter – where you can recommend a product based on what other people have bought or looked at. Gilt is unique because our inventory is in constant flux. The products we have one week are gone the next and there is a chance we won’t have them again.
At Gilt, we’ve employed a technique called contextual retrieval. Contextual retrieval is a search method where we take elements from the user context to help them conduct a search. When someone sees a product that is sold out and they decide to waitlist it, we can infer a lot of things about the user. One is that they really want that product. Another is that we know everything about the product they are interested in. In fact, we use that product as a search query to find related ones that we have in our inventory whether they are currently on sale or not.
So the steps involved in setting up a contextual retrieval based recommendation service are as follows:
· First, you are going to have to create a search index of your products.
Apache Lucene is a great piece of open source technology for doing just that. You are going to want to create an index that contains all the fields that you have describing your products. Those fields are basically the metadata that you need to find similar ones.
· The second step is search query manufacturing.
This is where the recommendation magic really happens. When you construct your search from a product, not all fields are going to be equally important. In fact, it is very likely that different genres of products will have different fields that are going to be more valuable than others. To that end, you need to devise a weighting scheme where you boost the value of some fields over others.
· The third step is caching those recommendations.
You can quite easily get away with caching all your recommendations because unlike a search engine, you already know all the search queries that you are going to encounter – they are the products you have for sale. At Gilt, since we have new products every day, we generate and cache our recommendations once a day.
· The fourth (and most rewarding) step is using those recommendations in ways to help your users.
One of our customer pain points is that the product at Gilt sells out quickly. When a user has decided that they want an item so much that they are willing to sign up for it on a waitlist, we want to do everything in our power to try to find them another product that they will be happy with.
That’s pretty much what we aim to achieve at Gilt – Simple, Fast and Fun!
Retrieval of Single Wikipedia Articles While Reading Abstracts
Well, it finally happened. I published a paper on my PhD thesis work. It got accepted at HICSS-42 in the Digital Media: Content and Communication track under the Information Access and Retrieval: The Web, Users, and HCI mini-track. The abstract is below.
When reading online, users sometimes need auxiliary information to complement or fill in their own background knowledge in order to better understand a document that they are reading. We believe that delivering this information in the least intrusive fashion possible will improve their understanding. We have prototyped a system that selects a single Wikipedia article for users when they highlight text in an abstract. This prototype employs a contextual retrieval algorithm developed for high precision retrieval of Wikipedia articles that uses the terms in the abstract, currently being read, as a context for the search. The results from our evaluation reveal that the top-performing algorithm is able to respond with a single relevant article 77% of the time. The user study that we conducted indicates that participants have a strong preference for this approach to searching while reading.
In other news, Softpedia apparently reviewed my Firefox extension, LiteraryMark. That is the prototype that I developed and used in the above mentioned HICSS paper. As you can guess, it is 100% free of adware/spyware.
Task Effects on Interactive Search: The Query Factor
I recently published a paper with Dr. Toms’ CMI lab about our search research that we did for INEX 2007. It is entitled “Task Effects on Interactive Search: The Query Factor” Site. Coincidently, it is in the proceedings for INEX 2007. Essentially is it about the experimental Wikipedia search system that we have been developing over the past year and looking at how users behave while they search when doing different tasks. Not surprisingly, user behavior differs depending on the task. Consequently, the results from this work provide motivation for developing specialized support tools for different tasks that assist users when they need information, perhaps as a software agent that mediates communication between them and the search engine.
Our preliminary results are in this paper. I mainly was responsible for developing the search system and doing some light user log mining. We are currently doing further analysis on the user logs and hopefully will be submitting a more complete paper later this year. Anyhow, here is the abstract for the paper if you are interested:
The purpose of this research is to examine how search differs according to selected task variables. Three types of task information goals and two types of task structures were explored. This mixed within- and between-subjects designed study had 96 participants complete three of 12 tasks in a laboratory setting using a specialized search system based on Lucene. Using a combination metrics (user perception collected by questionnaires, transaction log data, and characteristics of relevant documents), we assessed the effect of goals and structure on search as demonstrated through queries and their use in interactive searching.