UB Paderborn / Katalog / Suche / Details

Ergebnis 16 von 87

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2017, p.488-491

2017

Volltextzugriff (PDF)

Autor(en) / Beteiligte

Titel

LSTM-based iterative mask estimation and post-processing for multi-channel speech enhancement

Ist Teil von

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2017, p.488-491

Ort / Verlag

IEEE

Erscheinungsjahr

2017

Quelle

IEEE Electronic Library Online

Beschreibungen/Notizen

Recently, we propose an iterative mask estimation (IME) approach to improve the conventional complex Gaussian mixture model (CGMM) based beamforming and yield the best multi-channel speech recognition accuracy in CHiME-4 challenge. In this study, we focus on multi-channel speech enhancement and present a novel approach via long short-term memory (LSTM) based IME and post-processing. First, an LSTM is adopted to estimate the ideal ratio mask (IRM) to improve the mask estimated by a CGMM. Then, the improved mask is used to derive a beamformer. Finally, the IME-based beamformed speech is processed by the LSTM-based regression model. Experiments on the CHiME-4 simulation data show that LSTM-based IME approach can improve the PESQ performance comparing to unprocessed signals, with relative PESQ improvements of 17.33% and 13.89%, and the LSTM-based post-processing can further yield performance gains based on the IME approach, with relative PESQ improvements of 11.42% and 10.00% for 6-channel and 2-channel cases, respectively.

Sprache: Englisch
Identifikatoren: DOI: 10.1109/APSIPA.2017.8282081
Titel-ID: cdi_ieee_primary_8282081

Format: –
Schlagworte: Array signal processing, Estimation, Signal to noise ratio, Speech, Speech enhancement, Time-frequency analysis

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX