Reinforcement learning is a powerful tool for training artificial intelligence agents to make decisions in complex environments. However, these agents can be vulnerable to attacks that manipulate their decision-making processes. This survey examines the problem of adversarial reinforcement learning, where an attacker seeks to undermine the learning process of a victim agent by manipulating its environment. The article discusses three modes of attack: reward poisoning, observation poisoning, and environment poisoning.
Reward Poisoning
In reward poisoning, the attacker modifies the rewards received by the victim agent to steer its learning towards undesirable outcomes. For example, a gambling website might manipulate the odds to encourage players to keep betting. The attacker must have some knowledge of the victim’s learning algorithm and policy function to effectively poison the rewards.
Observation Poisoning
In observation poisoning, the attacker modifies the observations available to the victim agent to distort its understanding of the environment. For instance, a social media platform might manipulate the news feed to influence the user’s political views. The attacker requires information about the victim’s sensory circuitry and preferences to effectively poison the observations.
Environment Poisoning
Environment poisoning is the most practical mode of attack, as it eliminates the need for intricate knowledge of the victim’s learning algorithms or preferences. The attacker manipulates the dynamics of the environment to disrupt the victim’s learning process. For example, a malicious actor might manipulate the traffic lights in a city to hinder self-driving cars from learning optimal routes.
Conclusion
Adversarial reinforcement learning is a significant concern in the development of artificial intelligence agents. The three modes of attack discussed in this survey demonstrate the various ways an attacker can undermine the decision-making processes of a victim agent. Understanding these attacks is essential for developing robust and secure AI systems that can withstand manipulation by malicious actors.