Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Ergebnis 17 von 20

Details

Autor(en) / Beteiligte
Titel
Exploiting Sparsity to Accelerate Fully Connected Layers of CNN-Based Applications on Mobile SoCs
Ist Teil von
  • ACM transactions on embedded computing systems, 2018-04, Vol.17 (2), p.1-25
Erscheinungsjahr
2018
Quelle
ACM Digital Library
Beschreibungen/Notizen
  • Convolutional neural networks (CNNs) are widely employed in many image recognition applications. With the proliferation of embedded and mobile devices, such applications are becoming commonplace on mobile devices. Network pruning is a commonly used strategy to reduce the memory and storage footprints of CNNs on mobile devices. In this article, we propose customized versions of the sparse matrix multiplication algorithm to speed up inference on mobile devices and make it more energy efficient. Specifically, we propose a Block Compressed Sparse Column algorithm and a bit-representation-based algorithm (BitsGEMM) that exploit sparsity to accelerate the fully connected layers of a network on the NVIDIA Jetson TK1 platform. We evaluate the proposed algorithms using real-world object classification and object detection applications. Experiments show that performance speedups can be achieved over the original baseline implementation using cuBLAS. On object detection CNNs, an average speedup of 1.82× is obtained over baseline cuBLAS in the fully connected layer of the VGG model, whereas on classification CNNs, an average speedup of 1.51× is achieved for the fully connected layer of the pruned-VGG model. Energy consumption reduction of 43--46% is also observed due to decreased computational and memory bandwidth demands.
Sprache
Englisch
Identifikatoren
ISSN: 1539-9087
eISSN: 1558-3465
DOI: 10.1145/3122788
Titel-ID: cdi_crossref_primary_10_1145_3122788
Format

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX