In this article, the authors explore the use of deep learning models, specifically Convolutional Neural Networks (CNNs) and Transformers, in multimodal trust modeling. They highlight the limitations of these models in capturing long-term dependencies in time-series data and propose a novel attention mechanism called Multi-head Scaled Dot-Product Attention to address this issue.
The authors explain that traditional CNNs and Transformers are powerful for multimodal signal processing, but their ability to model long-term dependencies is limited. They introduce the concept of attention windows, which help models focus on relevant parts of the input data when computing features. However, this approach can be computationally expensive and may not capture localized features effectively.
To overcome these limitations, the authors propose a new attention mechanism called Multi-head Scaled Dot-Product Attention. This mechanism takes into account the relevance of different parts in both the query and key matrices and calculates weights for each part based on their relevance. The authors show that this approach can efficiently capture localized features while reducing computational complexity.
The article also discusses the structure of the proposed model, called SWAN (Self-Attention with Windows and Attention Networks), which combines CNN and Transformer architectures to leverage the strengths of both approaches. The authors demonstrate the effectiveness of their proposed model on a multimodal trust dataset and show that it outperforms existing models in terms of accuracy.
Overall, the article provides a detailed explanation of the proposed attention mechanism and its application in a novel deep learning model for multimodal trust modeling. The authors provide a thorough analysis of the strengths and limitations of their approach and demonstrate its effectiveness through experiments on a real-world dataset.
Computer Science, Human-Computer Interaction