Adaptive Attacks on Deep Reinforcement Learning

Reinforcement learning is a powerful tool for training artificial intelligence agents to make decisions in complex environments. However, these agents can be vulnerable to attacks that manipulate their decision-making processes. This survey examines the problem of adversarial reinforcement learning, where an attacker seeks to undermine the learning process of a victim agent by manipulating its environment. The article discusses three modes of attack: reward poisoning, observation poisoning, and environment poisoning.

Reward Poisoning

In reward poisoning, the attacker modifies the rewards received by the victim agent to steer its learning towards undesirable outcomes. For example, a gambling website might manipulate the odds to encourage players to keep betting. The attacker must have some knowledge of the victim’s learning algorithm and policy function to effectively poison the rewards.

Observation Poisoning

In observation poisoning, the attacker modifies the observations available to the victim agent to distort its understanding of the environment. For instance, a social media platform might manipulate the news feed to influence the user’s political views. The attacker requires information about the victim’s sensory circuitry and preferences to effectively poison the observations.

Environment Poisoning

Environment poisoning is the most practical mode of attack, as it eliminates the need for intricate knowledge of the victim’s learning algorithms or preferences. The attacker manipulates the dynamics of the environment to disrupt the victim’s learning process. For example, a malicious actor might manipulate the traffic lights in a city to hinder self-driving cars from learning optimal routes.

Conclusion

Adversarial reinforcement learning is a significant concern in the development of artificial intelligence agents. The three modes of attack discussed in this survey demonstrate the various ways an attacker can undermine the decision-making processes of a victim agent. Understanding these attacks is essential for developing robust and secure AI systems that can withstand manipulation by malicious actors.

ARXIV/2401.02652 authored by Ridhima Bector, Abhay Aradhya, Chai Quek, Zinovi Rabinovich.

Adaptive Attacks on Deep Reinforcement Learning

Reward Poisoning

Observation Poisoning

Environment Poisoning

Conclusion

LLama 2 7B Chat

Categories

Tags

Archives

Adaptive Attacks on Deep Reinforcement Learning

Reward Poisoning

Observation Poisoning

Environment Poisoning

Conclusion

LLama 2 7B Chat

Accurate Analysis of Image Captions with CoT-Based Methods

Unsupervised Audio-Caption Alignment via Correspondence Learning

Efficient Method for ML Model Accuracy Improvement in Non-IID Data Settings

Categories

Tags

Archives