UB Paderborn / Katalog / Suche / Details

Zur Ergebnisliste

A novel dual-granularity lightweight transformer for vision tasks

Intelligent data analysis, 2024-09, Vol.28 (5), p.1213-1228

2024

Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte

Titel

A novel dual-granularity lightweight transformer for vision tasks

Ist Teil von

Intelligent data analysis, 2024-09, Vol.28 (5), p.1213-1228

Ort / Verlag

Amsterdam: IOS Press BV

Erscheinungsjahr

2024

Beschreibungen/Notizen

Transformer-based networks have revolutionized visual tasks with their continuous innovation, leading to significant progress. However, the widespread adoption of Vision Transformers (ViT) is limited due to their high computational and parameter requirements, making them less feasible for resource-constrained mobile and edge computing devices. Moreover, existing lightweight ViTs exhibit limitations in capturing different granular features, extracting local features efficiently, and incorporating the inductive bias inherent in convolutional neural networks. These limitations somewhat impact the overall performance. To address these limitations, we propose an efficient ViT called Dual-Granularity Former (DGFormer). DGFormer mitigates these limitations by introducing two innovative modules: Dual-Granularity Attention (DG Attention) and Efficient Feed-Forward Network (Efficient FFN). In our experiments, on the image recognition task of ImageNet, DGFormer surpasses lightweight models such as PVTv2-B0 and Swin Transformer by 2.3% in terms of Top1 accuracy. On the object detection task of COCO, under RetinaNet detection framework, DGFormer outperforms PVTv2-B0 and Swin Transformer with increase of 0.5% and 2.4% in average precision (AP), respectively. Similarly, under Mask R-CNN detection framework, DGFormer exhibits improvement of 0.4% and 1.8% in AP compared to PVTv2-B0 and Swin Transformer, respectively. On the semantic segmentation task on the ADE20K, DGFormer achieves a substantial improvement of 2.0% and 2.5% in mean Intersection over Union (mIoU) over PVTv2-B0 and Swin Transformer, respectively. The code is open-source and available at: https://github.com/ISCLab-Bistu/DGFormer.git.

Sprache: Englisch
Identifikatoren: ISSN: 1088-467X
eISSN: 1571-4128
DOI: 10.3233/IDA-230799
Titel-ID: cdi_crossref_primary_10_3233_IDA_230799

Format: –
Schlagworte: Artificial neural networks, Attention, Edge computing, Lightweight, Object recognition, Semantic segmentation, Source code, Vision, Visual tasks, Weight reduction

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX

Menü

Weitere Dienste

Einstellungen

A novel dual-granularity lightweight transformer for vision tasks

Details

Weiterführende Literatur