UB Paderborn / Katalog / Suche / Details

Zur Ergebnisliste

Ergebnis 24 von 183

Multiscale Co-Design Analysis of Energy, Latency, Area, and Accuracy of a ReRAM Analog Neural Training Accelerator

IEEE journal on emerging and selected topics in circuits and systems, 2018-03, Vol.8 (1), p.86-101

2018

Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte

Titel

Multiscale Co-Design Analysis of Energy, Latency, Area, and Accuracy of a ReRAM Analog Neural Training Accelerator

Ist Teil von

IEEE journal on emerging and selected topics in circuits and systems, 2018-03, Vol.8 (1), p.86-101

Ort / Verlag

Piscataway: IEEE

Erscheinungsjahr

2018

Quelle

IEEE/IET Electronic Library (IEL)

Beschreibungen/Notizen

Neural networks are an increasingly attractive algorithm for natural language processing and pattern recognition. Deep networks with >50 M parameters are made possible by modern graphics processing unit clusters operating at <50 pJ per op and more recently, production accelerators are capable of <5 pJ per operation at the board level. However, with the slowing of CMOS scaling, new paradigms will be required to achieve the next several orders of magnitude in performance per watt gains. Using an analog resistive memory (ReRAM) crossbar to perform key matrix operations in an accelerator is an attractive option. This paper presents a detailed design using the state-of-the-art 14/16 nm process development kit for of an analog crossbar circuit block designed to process three key kernels required in training and inference of neural networks. A detailed circuit and device-level analysis of energy, latency, area, and accuracy are given and compared with relevant designs using standard digital ReRAM and static random access memory (SRAM) operations. It is shown that the analog accelerator has <inline-formula> <tex-math notation="LaTeX">270\times </tex-math></inline-formula> energy and <inline-formula> <tex-math notation="LaTeX">540\times </tex-math></inline-formula> latency advantage over a similar block utilizing only digital ReRAM and takes only 11 fJ per multiply and accumulate. Compared with an SRAM-based accelerator, the energy is <inline-formula> <tex-math notation="LaTeX">430\times </tex-math></inline-formula> better and latency is <inline-formula> <tex-math notation="LaTeX">34\times </tex-math></inline-formula> better. Although training accuracy is degraded in the analog accelerator, several options to improve this are presented. The possible gains over a similar digital-only version of this accelerator block suggest that continued optimization of analog resistive memories is valuable. This detailed circuit and device analysis of a training accelerator may serve as a foundation for further architecture-level studies.

Sprache: Englisch
Identifikatoren: ISSN: 2156-3357
eISSN: 2156-3365
DOI: 10.1109/JETCAS.2018.2796379
Titel-ID: cdi_ieee_primary_8267068

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX

Menü

Weitere Dienste

Einstellungen

Multiscale Co-Design Analysis of Energy, Latency, Area, and Accuracy of a ReRAM Analog Neural Training Accelerator

Details

Weiterführende Literatur