Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
LSTM-based iterative mask estimation and post-processing for multi-channel speech enhancement
Ist Teil von
2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2017, p.488-491
Ort / Verlag
IEEE
Erscheinungsjahr
2017
Quelle
IEEE Electronic Library Online
Beschreibungen/Notizen
Recently, we propose an iterative mask estimation (IME) approach to improve the conventional complex Gaussian mixture model (CGMM) based beamforming and yield the best multi-channel speech recognition accuracy in CHiME-4 challenge. In this study, we focus on multi-channel speech enhancement and present a novel approach via long short-term memory (LSTM) based IME and post-processing. First, an LSTM is adopted to estimate the ideal ratio mask (IRM) to improve the mask estimated by a CGMM. Then, the improved mask is used to derive a beamformer. Finally, the IME-based beamformed speech is processed by the LSTM-based regression model. Experiments on the CHiME-4 simulation data show that LSTM-based IME approach can improve the PESQ performance comparing to unprocessed signals, with relative PESQ improvements of 17.33% and 13.89%, and the LSTM-based post-processing can further yield performance gains based on the IME approach, with relative PESQ improvements of 11.42% and 10.00% for 6-channel and 2-channel cases, respectively.