UB Paderborn / Katalog / Suche / Details

Ergebnis 17 von 20

ACM transactions on embedded computing systems, 2018-04, Vol.17 (2), p.1-25

2018

Volltextzugriff (PDF)

Autor(en) / Beteiligte

Titel

Exploiting Sparsity to Accelerate Fully Connected Layers of CNN-Based Applications on Mobile SoCs

Ist Teil von

Erscheinungsjahr

2018

Quelle

ACM Digital Library

Beschreibungen/Notizen

Convolutional neural networks (CNNs) are widely employed in many image recognition applications. With the proliferation of embedded and mobile devices, such applications are becoming commonplace on mobile devices. Network pruning is a commonly used strategy to reduce the memory and storage footprints of CNNs on mobile devices. In this article, we propose customized versions of the sparse matrix multiplication algorithm to speed up inference on mobile devices and make it more energy efficient. Specifically, we propose a Block Compressed Sparse Column algorithm and a bit-representation-based algorithm (BitsGEMM) that exploit sparsity to accelerate the fully connected layers of a network on the NVIDIA Jetson TK1 platform. We evaluate the proposed algorithms using real-world object classification and object detection applications. Experiments show that performance speedups can be achieved over the original baseline implementation using cuBLAS. On object detection CNNs, an average speedup of 1.82× is obtained over baseline cuBLAS in the fully connected layer of the VGG model, whereas on classification CNNs, an average speedup of 1.51× is achieved for the fully connected layer of the pruned-VGG model. Energy consumption reduction of 43--46% is also observed due to decreased computational and memory bandwidth demands.

Sprache: Englisch
Identifikatoren: ISSN: 1539-9087
eISSN: 1558-3465
DOI: 10.1145/3122788
Titel-ID: cdi_crossref_primary_10_1145_3122788

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX