In recent years, transformer models have gained popularity in natural language processing (NLP) tasks due to their impressive performance. However, these models are often criticized for lacking interpretability, making it difficult to understand why they make certain predictions. One way to address this issue is by quantifying attention flow, which helps explain how the model focuses on specific parts of the input when making decisions. In this article, we explore the use of raw attention values as a relevancy score for single attention layers in both visual and language domains. We also discuss the limitations of using deeper layers for attention scoring and propose using the first layer for more faithful explanations.
Raw Attention Values
In the transformer architecture, each token receives a weighted sum of the input tokens’ representations based on their relevance to the current task. The raw attention value of a token is its weighted sum, which indicates how important that token is in the attention mechanism. In visual and language domains, it is common practice to consider the raw attention value as a relevancy score for a single attention layer. However, for deeper layers, the attention scores may be unreliable due to the token mixing property of the self-attention mechanism.
Limitations of Deeper Layers
When dealing with multiple layers, the attention scores in deeper layers may become less faithful representations of the input tokens’ importance. This is because the token mixing property causes earlier attention scores to be diluted as the model processes more tokens. As a result, deeper layers may not accurately capture the relative importance of each token in the input sequence.
First Layer for More Faithful Explanations
To address these limitations, we propose using the raw attention value from the first layer for more faithful explanations. By focusing on the earliest attention scores, we can capture the relative importance of each token in a way that is less affected by token mixing. This approach allows us to better understand how the model focuses its attention when making predictions, which can improve interpretability and trustworthiness.
Conclusion
In conclusion, quantifying attention flow in transformers is essential for understanding how these models make decisions. By considering raw attention values from the first layer, we can obtain more faithful explanations of the input tokens’ importance. This approach can help demystify complex NLP tasks and improve trustworthiness in AI systems.