Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Ergebnis 12 von 63
Fused-layer CNN accelerators
2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2016, p.1-12
2016
Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte
Titel
Fused-layer CNN accelerators
Ist Teil von
  • 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2016, p.1-12
Ort / Verlag
IEEE
Erscheinungsjahr
2016
Quelle
IEEE Xplore
Beschreibungen/Notizen
  • Deep convolutional neural networks (CNNs) are rapidly becoming the dominant approach to computer vision and a major component of many other pervasive machine learning tasks, such as speech recognition, natural language processing, and fraud detection. As a result, accelerators for efficiently evaluating CNNs are rapidly growing in popularity. The conventional approaches to designing such CNN accelerators is to focus on creating accelerators to iteratively process the CNN layers. However, by processing each layer to completion, the accelerator designs must use off-chip memory to store intermediate data between layers, because the intermediate data are too large to fit on chip. In this work, we observe that a previously unexplored dimension exists in the design space of CNN accelerators that focuses on the dataflow across convolutional layers. We find that we are able to fuse the processing of multiple CNN layers by modifying the order in which the input data are brought on chip, enabling caching of intermediate data between the evaluation of adjacent CNN layers. We demonstrate the effectiveness of our approach by constructing a fused-layer CNN accelerator for the first five convolutional layers of the VGGNet-E network and comparing it to the state-of-the-art accelerator implemented on a Xilinx Virtex-7 FPGA. We find that, by using 362KB of on-chip storage, our fused-layer accelerator minimizes off-chip feature map data transfer, reducing the total transfer by 95%, from 77MB down to 3.6MB per image.
Sprache
Englisch
Identifikatoren
DOI: 10.1109/MICRO.2016.7783725
Titel-ID: cdi_ieee_primary_7783725

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX