In this article, we explore the problem of poison attacks on machine learning models of code, which are designed to analyze or generate code. These attacks manipulate the model’s inputs to cause incorrect outputs, undermining the model’s reliability and security. The authors propose several approaches to detect and mitigate these attacks, including spectral signatures, neuron activations, and keyword analysis. However, these methods are typically white-box and require access to the model’s parameters, which can be challenging for models with limited access.
To address this limitation, the authors propose a black-box approach called ONION, which identifies the most likely trigger word in a sentence that causes perplexity upon its removal. This approach does not require any additional pre-trained models and provides significant improvements in detecting potential triggers in textual models. However, ONION was originally designed for word-level trigger detection and may not be effective against more sophisticated attacks.
The article also discusses other approaches to poison attack and defense, including memorization and generalization in neural code intelligence models, the adverse effects of code duplication in machine learning models of code, and the use of least median of squares regression for analyzing code. The authors emphasize the importance of understanding these attacks and developing effective defenses to ensure the security and reliability of machine learning models of code.
In conclusion, poison attacks on machine learning models of code are a significant threat that can undermine their accuracy and security. While existing approaches may not be effective against more sophisticated attacks, black-box methods like ONION offer promising solutions for detecting potential triggers in textual models. Further research is needed to develop more comprehensive defenses against these attacks and ensure the continued reliability and security of machine learning models of code.
Computer Science, Software Engineering