In this article, we propose a novel attention fusion approach for multi-modal gait analysis, which combines information from various sensors to improve the accuracy of gait classification. Our proposed method, called Attention Fusion, leverages a small convolutional network to fuse the feature maps from different branches, similar to how our brain processes multiple sources of information simultaneously.
To start with, we consider two main branches: silhouette and skeleton. The silhouette branch focuses on the overall shape of the person, while the skeleton branch captures the bone structure. We concatenate these feature maps along the channel dimension and apply a series of transformations to create a cross-branch understanding. This process allows the network to learn how to weigh the importance of each branch based on the input data.
Next, we employ an attention mechanism to assign element-wise attention scores to each branch. These scores indicate how much each branch should contribute to the final output. We then apply a weighted sum operation to combine the feature maps from both branches, resulting in a more accurate gait classification.
We evaluate our proposed method on two datasets: CCPG and OpenGait. Our results show that Attention Fusion outperforms existing methods, achieving an mAP of 90.1% on CCPG. We also provide a detailed analysis of the attention scores, which reveals interesting insights into how the network learns to process multi-modal gait data.
In summary, Attention Fusion is a novel approach for fusing multiple feature maps from different branches in a multi-modal gait analysis framework. By leveraging attention mechanisms, we enable the network to learn how to weigh the importance of each branch based on the input data, leading to improved accuracy in gait classification. Our proposed method has important implications for applications such as gait analysis and fall detection, where accurate multi-modal sensing is crucial for timely intervention.
Computer Science, Computer Vision and Pattern Recognition