Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Intelligent data analysis, 2024-09, Vol.28 (5), p.1213-1228
2024
Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte
Titel
A novel dual-granularity lightweight transformer for vision tasks
Ist Teil von
  • Intelligent data analysis, 2024-09, Vol.28 (5), p.1213-1228
Ort / Verlag
Amsterdam: IOS Press BV
Erscheinungsjahr
2024
Beschreibungen/Notizen
  • Transformer-based networks have revolutionized visual tasks with their continuous innovation, leading to significant progress. However, the widespread adoption of Vision Transformers (ViT) is limited due to their high computational and parameter requirements, making them less feasible for resource-constrained mobile and edge computing devices. Moreover, existing lightweight ViTs exhibit limitations in capturing different granular features, extracting local features efficiently, and incorporating the inductive bias inherent in convolutional neural networks. These limitations somewhat impact the overall performance. To address these limitations, we propose an efficient ViT called Dual-Granularity Former (DGFormer). DGFormer mitigates these limitations by introducing two innovative modules: Dual-Granularity Attention (DG Attention) and Efficient Feed-Forward Network (Efficient FFN). In our experiments, on the image recognition task of ImageNet, DGFormer surpasses lightweight models such as PVTv2-B0 and Swin Transformer by 2.3% in terms of Top1 accuracy. On the object detection task of COCO, under RetinaNet detection framework, DGFormer outperforms PVTv2-B0 and Swin Transformer with increase of 0.5% and 2.4% in average precision (AP), respectively. Similarly, under Mask R-CNN detection framework, DGFormer exhibits improvement of 0.4% and 1.8% in AP compared to PVTv2-B0 and Swin Transformer, respectively. On the semantic segmentation task on the ADE20K, DGFormer achieves a substantial improvement of 2.0% and 2.5% in mean Intersection over Union (mIoU) over PVTv2-B0 and Swin Transformer, respectively. The code is open-source and available at: https://github.com/ISCLab-Bistu/DGFormer.git.
Sprache
Englisch
Identifikatoren
ISSN: 1088-467X
eISSN: 1571-4128
DOI: 10.3233/IDA-230799
Titel-ID: cdi_crossref_primary_10_3233_IDA_230799

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX