UB Paderborn / Katalog / Suche / Details

Zur Ergebnisliste

Ergebnis 20 von 29

Ultra-fast and efficient implementation schemes of complex matrix multiplication algorithm for VLIW architectures

Computers & electrical engineering, 2022-09, Vol.102, p.108294, Article 108294

2022

Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte

Titel

Ultra-fast and efficient implementation schemes of complex matrix multiplication algorithm for VLIW architectures

Ist Teil von

Computers & electrical engineering, 2022-09, Vol.102, p.108294, Article 108294

Ort / Verlag

Elsevier Ltd

Erscheinungsjahr

2022

Quelle

Alma/SFX Local Collection

Beschreibungen/Notizen

•Design a fast-parallel low-level kernel of the Complex Matrix Multiplication algorithm based on modulo-scheduling, software pipelining and loop unrolling techniques.•Suggest a novel approach of implementing the Complex Matrix Multiplication algorithm based on the fast-parallel kernel and the miss-pipelining technique.•Introduce an ultra-optimized parallel implementation approach based on the fast-parallel kernel and the internal direct memory access data transfer technique.•Accelerate the beamforming and Doppler Filter Bank algorithms to meet tight real-time constraints of radar applications. The Complex Matrix Multiplication (CMM) algorithm is known to require a high computing performance and presenting exceptional challenges in real-life applications. Recent advances in Very Long Instruction Word (VLIW) based Digital Signal Processors (DSP) demonstrated high computing capabilities with a very low power consumption. In this work, we propose three ultra-fast, parallel and efficient VLIW implementation approaches of the CMM algorithm which could be used to meet tighter real-time constraints of several signal and image processing applications like radars. A novel parallel kernel, task mapping strategy and low-level optimization techniques are suggested, to fit a set of modern VLIW architectures. Additionally, an original memory access management technique was adopted to accelerate the algorithm by avoiding cache misses and bank conflicts. The experimental results showed the effectiveness of the proposed approaches where a peak performance of 15.89 GFLOPS was achieved on one C66x DSP core with a core utilization of 99% and a speedup of about 1.61, 3 and 10 compared to the state-of-the-art, the most optimized vendor and the conventional approaches, respectively. [Display omitted]

Sprache: Englisch
Identifikatoren: ISSN: 0045-7906
eISSN: 1879-0755
DOI: 10.1016/j.compeleceng.2022.108294
Titel-ID: cdi_crossref_primary_10_1016_j_compeleceng_2022_108294

Format: –
Schlagworte: Complex Matrix Multiplication, Parallel implementation, Radars, Signal and Image processing, VLIW, DSP

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX

Menü

Weitere Dienste

Einstellungen

Ultra-fast and efficient implementation schemes of complex matrix multiplication algorithm for VLIW architectures

Details

Weiterführende Literatur