Softmax-free Vision Transformer With Linear Complexity: Achieving a Superior Accuracy/Complexity Trade-off – Synced

While vision transformers (ViTs) have achieved impressive performance in computer vision and advanced the state-of-the-art for various vision tasks, a bottleneck impeding further progress with ViTs in this area is their quadratic complexity.

In the NeurIPS 2021 spotlight paper SOFT: Softmax-free Transformer with Linear Complexity, researchers from Fudan University, University of Surrey and Huawei Noahs Ark Lab identify the limitations of quadratic complexity for ViTs as rooted in keeping the softmax self-attention during approximations. To alleviate this computational burden, the team proposes the first softmax-free transformer (SOFT), which reduces self-attention computation to linear complexity, achieving a superior trade-off between accuracy and complexity.

The team summarizes their studys main contributions as:

In traditional ViTs, given a sequence of tokens with each token represented by a d-dimensional feature vector, a self-attention mechanism aims to discover the correlations of all token pairs, thus producing the problematic quadratic complexity. The proposed SOFT instead employs a softmax-free self-attention function with the dot-product replaced by a Gaussian kernel. To solve the convergence and quadratic complexity issues, the researchers leverage low-rank regularization, which enables SOFT model complexity to be reduced significantly by not computing the full self-attention matrix.

The team evaluated the proposed SOFT on the ILSVRC-2012 ImageNet-1K dataset, reporting top-1 accuracy for model performance, and model size and floating-point operations to assess cost-effectiveness.

SOFT achieved the best performance in the experiments, bettering recent pure vision transformer based methods ViT and DeiT, as well as the state-of-the-art CNN RegNet; and outperformed all variants of its most architecturally similar counterpart, the Pyramid Vision Transformer (PVT).

Overall, the study shows that SOFTs novel design eliminates the need for softmax normalization and yields a superior trade-off between accuracy and complexity.

The paper SOFT: Softmax-free Transformer with Linear Complexity is on arXiv.

Author: Hecate He |Editor: Michael Sarazen

We know you dont want to miss any news or research breakthroughs.Subscribe to our popular newsletterSynced Global AI Weeklyto get weekly AI updates.

Like Loading...

More:
Softmax-free Vision Transformer With Linear Complexity: Achieving a Superior Accuracy/Complexity Trade-off - Synced

Related Posts

Comments are closed.