Rethinking Transformer-Like Neural Networks for Efficient High-Order Spatial Interactions

In recent years, neural networks have been used to analyze visual data in computer vision tasks. One popular approach is the Transformer architecture, which excels at capturing complex relationships between different parts of an image. However, these "Transformer-like" networks are computationally expensive and slow for real-time applications like edge devices. This article proposes a new way to design neural networks that can efficiently capture high-order interactions without sacrificing accuracy.
The authors argue that traditional Transformer architectures have limitations when dealing with high-order interactions, leading to increased computational complexity and slower inference times. To address this issue, they propose a novel neuron-level design combined with efficient neural network architecture designs. This approach enables the model to capture complex relationships between different parts of an image while reducing computation time.
The proposed method draws inspiration from polynomial-based neural networks [10], which eliminates the need for activation functions and reduces computational complexity. The authors also use quadratic neurons, which have a strong ability for information self-reinforcement across multiple feature dimensions with simpler computations. This allows for faster inference times without sacrificing accuracy.
In summary, this article presents a new approach to Transformer-like neural network constructions that can efficiently capture high-order interactions while reducing computational complexity. By leveraging polynomial-based neural networks and quadratic neurons, the proposed method enables real-time image analysis on edge devices without sacrificing cognition performance.

ARXIV/2311.17956 authored by Chenhui Xu, Fuxun Yu, Zirui Xu, Chenchen Liu, Jinjun Xiong, Xiang Chen.

Rethinking Transformer-Like Neural Networks for Efficient High-Order Spatial Interactions

LLama 2 7B Chat

Categories

Tags

Archives

Rethinking Transformer-Like Neural Networks for Efficient High-Order Spatial Interactions

LLama 2 7B Chat

Accurate Analysis of Image Captions with CoT-Based Methods

Unsupervised Audio-Caption Alignment via Correspondence Learning

Efficient Method for ML Model Accuracy Improvement in Non-IID Data Settings

Categories

Tags

Archives