UB Paderborn / Katalog / Suche / Details

Zur Ergebnisliste

Ergebnis 12 von 63

Fused-layer CNN accelerators

2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2016, p.1-12

2016

Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte

Titel

Fused-layer CNN accelerators

Ist Teil von

2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2016, p.1-12

Ort / Verlag

IEEE

Erscheinungsjahr

2016

Quelle

IEEE Xplore

Beschreibungen/Notizen

Deep convolutional neural networks (CNNs) are rapidly becoming the dominant approach to computer vision and a major component of many other pervasive machine learning tasks, such as speech recognition, natural language processing, and fraud detection. As a result, accelerators for efficiently evaluating CNNs are rapidly growing in popularity. The conventional approaches to designing such CNN accelerators is to focus on creating accelerators to iteratively process the CNN layers. However, by processing each layer to completion, the accelerator designs must use off-chip memory to store intermediate data between layers, because the intermediate data are too large to fit on chip. In this work, we observe that a previously unexplored dimension exists in the design space of CNN accelerators that focuses on the dataflow across convolutional layers. We find that we are able to fuse the processing of multiple CNN layers by modifying the order in which the input data are brought on chip, enabling caching of intermediate data between the evaluation of adjacent CNN layers. We demonstrate the effectiveness of our approach by constructing a fused-layer CNN accelerator for the first five convolutional layers of the VGGNet-E network and comparing it to the state-of-the-art accelerator implemented on a Xilinx Virtex-7 FPGA. We find that, by using 362KB of on-chip storage, our fused-layer accelerator minimizes off-chip feature map data transfer, reducing the total transfer by 95%, from 77MB down to 3.6MB per image.

Sprache: Englisch
Identifikatoren: DOI: 10.1109/MICRO.2016.7783725
Titel-ID: cdi_ieee_primary_7783725

Format: –
Schlagworte: Bandwidth, Convolution, Data transfer, Field programmable gate arrays, Neural networks, Random access memory, System-on-chip

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX

Menü

Weitere Dienste

Einstellungen

Fused-layer CNN accelerators

Details

Weiterführende Literatur