Sources

What is AB Testing?

Suppose that you have a problem (say you want to be the best CoD player) and you have many possible solutions to your problem: different play styles, from the most aggressive to the most defensive, using different tools, and other alternatives. You have some ideas as to what are the best tactics, but you’re not sure which one would fit you best. In this case best means the ratio between winning and losing matches.

What AB Testing proposes is that you try out the different options for a length of time. After that time you will have played each option times and have won times. As such will give you an approximation of the win ratio for each alternative, and you can make an informed decision on what play styles to use, etc.

Wouldn’t it take ages to test all the different options?

Yes! That is why the multi armed bandits come into play. Say that you have a certain play style that you think is the best. You define it as your champion and play with it most of the time. However whenever there’s an alternative you think might outplay your current option you bring it into the alternatives pool, and you play with it occasionally. Then you can start ruling out alternatives who have too few wins and betting more on the most promising alternatives, instead of waiting for a fixed amount of time / trials (and possibly sacrificing your long term win rates). You need to be careful not too discard alternatives too soon (just because you lost 3 matches in a row doesn’t mean it’s a bad alternative, maybe it was just bad luck. Some people call this statistical significance), and similarly not to bet exclusively on the higher end alternatives.

So how do you know which ones to use?

There are different suggestions for solving this problem. I would be interested in seeing AB Tests on AB Testing Techniques. However in this post I am going to provide one alternative that I think is best. Do comment below better alternatives / points I’ve missed.

That’s all good, but I haven’t heard your solution yet.

Ok, say you have alternatives of play and you are about to play a match. In the past you have played matches with option , of which you won matches. My current goal is to maximize the percentage of wins over a long time period. So I’ll make this formal and I’ll say I want to maximize my expected wins over the next matches.

Warning: I’m going to get more technical from here onwards.

We can then define our problem as one of finding the following function: Which maps the vector and of observations and wins and the integer to the integer in which is the best choice of alternative to play in order to maximise the long term win rate. In order to calculate this we need another function which maps the same knowledge to the expected number of wins in the next matches assuming optimal choices. Now we do some maths:

How likely am I to win if I choose to play with alternative ?

So you’ve had observations so far and wins. Let be the underlying likelihood of winning. Since we know and then we can estimate the pdf1 of by using Bayes Theorem. Letting , we first compute: where is the sum of Bernoulli variables with probability and is the number of ways of choosing b elements out of elements. From this we can get the pdf of the underlying by normalizing the pdf: Since , are non-negative integers, we can use the fact that to simplify to Finally we can get the probability of success in a single sale by integrating the chances of success over the pdf: = \frac{(a+1)!}{b! (a-b)!} \times \frac{(b+1)!(a-b)!}{(a+2)!} =$$$$ = \frac{(a+1)!(b+1)!(a-b)!}{b!(a-b)!(a+2)!} =

Awesomeness, all of the above simplifies to a really neat expression. If we played matches with wins then the chances of winning the next match is . As a bonus that expression works even if , since in that case we are assuming the pdf of to be the entropy maximizing (and hence knowledge minimizing) uniform distribution, in which case there’s a chance of winning. On second thought, that is to be expected since the pdf and integral above only require and not to be negative, so there’s that.

Yay, now we know that the chances of success by choosing option . How does this help us in making long term choices?

We have found the solutions of and for , namely: C(n,r,1) = \underset{i}{argmax} \{ \frac{r_i + 1}{n_i + 1} \}$$$$ V(n,r,1) = C(n,r,1)Now we need to solve this same problem for . If, for a given , we choose option , then our expected number of wins is given by As such we only need to find the of the expression above to find , and use that to find . Hence we have found a recursive formula for this, which can be solved somewhat efficiently using dynamic programming3 since there will be many values computed repeatedly.

Great, now what?

Well, I still need to implement this, probably in a Jupyter Notebook that I’ll leave available somewhere (only then can I check how feasible it is to compute this result). If it turns out to be computationally unfeasible due to huge ‘s then I’ll try to tackle it in a slightly different way:

Instead of choosing one alternative at a time, I can choose them in a bulk of size , and then decide how many of that bulk should be used in each alternative (effectively deciding a percentage to be used in each alternative). By having large bulk sizes I could avoid having absurd recursion depths of millions of function calls just to compute this value, instead going for recursion depths of hundreds with bulk sizes of tens of thousands (we can play with this numbers after implementation).

EDIT: I’ve found this [paper]({{site.baseurl}}{% link /assets/documents/Top Arm Identification in Multi-Armed Bandits with Batch Arm Pulls.pdf %}) online about Batch Size implementations of this problem. I might write a short summary of it after having read it properly.

Footnotea

1 pdf: Probability Density Function and its’ Wikipedia page 2 : Unit vector along the th dimension, i.e. 3 I never understood why they call it dynamic programming. There’s nothing dynamic about this technique.