In this paper, we explore a new algorithm for solving stochastic games with perturbed payoffs and unknown transitions. Our approach is based on smooth fictitious play, which involves updating the policy parameter in each iteration by taking a convex combination between the previous policy and the estimated smoothed best response. This allows us to incorporate information from multiple iterations into our decision-making process, leading to improved performance compared to traditional gradient methods.
To understand how this works, think of the game as a race track with many twists and turns. In each iteration, we update our position on the track based on the current state of the game, while also taking into account information from previous iterations. By combining these two sources of information in a clever way, we can make better decisions that lead to improved performance over time.
One key advantage of our approach is that it allows us to handle games with perturbed payoffs and unknown transitions more effectively than traditional methods. This is because smooth fictitious play can adapt to changing conditions on the track, while still incorporating information from previous iterations. As a result, we can make better decisions even when faced with unexpected challenges or changes in the game environment.
Overall, our proposed algorithm represents an important step forward in the field of stochastic games, and has the potential to lead to significant advances in areas such as artificial intelligence and machine learning. By combining the power of smooth fictitious play with the flexibility of gradient methods, we can create more effective and efficient algorithms for solving these complex problems, opening up new possibilities for a wide range of applications.
Computer Science, Machine Learning