Finding the best cookie

    Finding the best Cookie

    Customers who can’t find products, won’t buy them. The faster they find what they are looking for, the happier they are. Searching always has two elements: Selecting the things to show and ordering them in the right manner. This blog post throws some light on how we deal with the latter part. This is very essential, after all, how often have you looked at the second page of a google search result?

    In this article you will read about:

    1. What is Learning To Rank and why do we need it?
    2. How can we build a Training Set from our traffic data?

    The Search for a Learning To Rank Target

    Some context: Mercateo offers two different kinds of search: Either the customer uses the autocomplete feature to search for a specific keyword, such as “Ice Cream Chocolate Praline”, “Rum Balls”, or “Marzipan” or uses a specific search item like “brownies with marshmellow topping and caramel syrup”. The first will return a precomputed list of articles while the second uses a separate fuzzy search system. For the moment, we at the Data Science Team are only dealing with the first kind of search. The actual content of the list of articles is defined using a large set of hand-curated rules maintained by the content team, a system which contains loads of domain specific knowledge and works very well. That ink cartridge in the screenshot above? It’s a bundle with original danish cookies!

    Cookie 1

    So the problem we are looking at is this:

    Find the “best” ordering in which to display the articles for each specific keyword.

    Why do we want to do this? Our estimate is that by improving the search order, we can permanently increase revenue in the shop by at least 1%.

    How should we do this? At first glance, it turns out there are many good methods which, given a training set created by experts, can learn to predict the ranking using features for each article. The problem with these however is the simple fact that we do not have a training set created by experts. In the case of the 1,512 different variants of cookies offered by Mercateo we can obviously order all of them, including that handy ink & cookie bundle from above, try them all in our team and decide which ones are the best. Yay! But what about the other ~40.000 keywords at Mercateo? Who can decide which is the best of 2,512 “Combination spanner “ and what is the best ranking of 5,760 “Fixed castor” - Products? Creating a good training set such as this is expensive at the very least but possibly simply impossible. We therefore need to find a way to generate our training set using the data available to us. This leads us to the question:

    What characterizes a “best” ranking?

    Is a good ranking one which maximizes profit? Or revenue? After discussing within the team and with business stakeholders, we have decided on the rather abstract target of customer satisfaction. We want the customer to find what he or she is looking for as quickly as possible and for her to be happy with the list displayed. However, customer satisfaction can not be measured in any way that we know of. We therefore need to decide on a proxy. Search engines tend to choose something like clickthrough rate, news sites or social media can try to measure attention by using the time a user spent interacting with an article. In E-Commerce, the choice is not completely clear. Is it a good sign or bad sign if a customer clicks on an article, but does not buy it? Maybe he is gathering information and will buy tomorrow or send the link to his colleague. The current decision we have arrived at in our newest variant of our Learning to Rank algorithm, is to try to rank by purchase probability as a proxy for customer satisfaction. We assume that as a B2B marketplace most customers come to Mercateo to get the job done, i.e. to buy the thing they want as quickly as possible. We think that, contrary to a B2C marketplace, most customers don’t visit Mercateo to gather information about the market and the products in general.

    So far so good, but how do we calculate purchase probability? The simplest way to measure this is to calculate the order rate, i.e. #orders/#views for each product, so say the “Tea Time” cookies from above have been viewed 500 times and purchased 200 times, we have a 40% purchase probability. Simple! But wrong. After all, we all know that the best place to hide a dead body is on page 2 of google results. The problem is that there is a presentation and position bias. It is much more likely for customers to buy products from the first page of results than from any other page and on the first page it is much more likely that they buy the product on position 1 than on any position further down the list. If we would simply order by #orders/#views, we would be creating a self-fulfilling prophecy of the current ranking.

    However, we can measure these biases. The current product ranking is a linear combination of a number of features such as past orders, stock change information, delivery time, etc. Since some of these features are volatile the products have been moving up and down in the ranking. By aggregating the difference in order rates when the same article moves from e.g. position 1 to position 5, we can estimate the bias corresponding to the position of an article. Globally, we arrive at the following measured biases:

    Cookie 2

    This shows that as a global average, the probability of purchase of an article placed on position 15 is only 12% of what it would be if the same article were placed on position 1. This behavior is very different depending on the keyword. While I don’t care which copy paper I buy, I’m certainly very particular about the cookie I want! As an example here is the comparison between “copy paper” and “boltless shelving unit”:

    Cookie 3

    So we can see that moving the same shelving unit from position 1 to position 15 reduces the purchase probability much less than for copy paper! I suspect it is no use to buy a shelf which doesn’t fit into your store room even if it is on position 1. Since we can assume that in general, the purchase probability will not actually increase when moving from say from position 9 to position 10, we enforce this restriction by fitting a monotonically decreasing function to the measured data, here for scissors:

    Cookie 4

    We can now use this to reweight each purchase and view, creating “unbiased orders” and “unbiased views”: Viewing a copy paper product on position 15 without buying would only count as “0.12 unbiased views” but buying a copy paper product on position 15 would count as “8.3 unbiased orders”!

    Sounds great! But have you noticed a problem? Assuming for some reason a product is viewed only once on position 15 and immediately bought, we have the wonderful purchase probability of (#unbiased_orders / #unbiased_views) = (8.3/0.12) = 6,944%! We don’t actually care that this is not a reasonable probability value, since we are only looking for a value to rank by. However, the problem remains that products with little data can have overly excellent scores. We therefore calculate a prior number of views and orders typical for each keyword and change the calculation to p = (#unbiased_orders + order_prior) / (#unbiased_views + view_prior). So with a prior of say 5 orders and 200 views, the probability of our example is only (5+8)/(200+0.12) = ~6.5%.

    So now we could start training our models, looking for good features. But wait! We are agile! The best feature is our target, so for keywords with enough data, we can simply use our target and sort by it. This enables us to reasonably test our assumptions on our target.

    For a few months this new approach is live in all countries on 202 select keywords (including the cookies, of course!) for 80% of our customers. It’s success or failure can be tracked using our A/B Test. The current results look positive, but experience tells us that before we have any results we can actually believe, we need to be patient and wait for enough data to decide whether this approach is promising or not. While we are waiting, now at least we have enough cookies to eat.

    Finding the best cookie

    (Photo by Mockaroon on Unsplash)

    Who writes here?

    Alan Schelten

    I am a data scientist working at Mercateo Unite for the Machine Learning Team. It’s our job to tackle any data mining or machine learning use cases that occur in Mercateo Unite, from massive clustering problems to pricing and ranking optimization. I enjoy working at Mercateo Unite because I get to wrap my head around a wide variety of challenging and fascinating data science projects with a highly talented team of machine learning experts.

    Alan Schelten