In this research paper, we compare four different strategies for choosing the best arm in a sequence of experiments, with the goal of minimizing regret (the difference between the expected reward of the chosen arm and the optimal arm). The four strategies are: Linear Programming 2-Step (LP2S), UCB1, UCB2, and Exponential-Weighted (EW).
We analyze the performance of these methods in two different scenarios: when the total sampling cost is equal for all methods, and when the simple regret is close to that of LP2S. In both cases, we find that LP2S has the lowest simple regret, while also having the lowest sampling cost when the simple regret is similar.
To help explain these concepts, let’s consider an example of a coffee shop owner who wants to determine the best drink to serve to customers. The owner wants to minimize the regret (or difference) between the expected revenue from the chosen drink and the optimal drink. The four strategies we discussed can be thought of as different ways of making this decision, with LP2S being like a carefully planned menu that takes into account both the expected revenue and the sampling cost, UCB1 and UCB2 being like random guesses based on past successes, and EW being like a more cautious approach that gives more weight to recent successes.
In summary, our research shows that LP2S is the most effective method for minimizing regret in this scenario, while also having the lowest sampling cost when the simple regret is similar. By carefully planning the menu and taking into account both expected revenue and sampling cost, the coffee shop owner can make the best decision to maximize their profits.