UB Paderborn / Katalog / Suche / Details

Zur Ergebnisliste

End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks

IEEE/ACM transactions on audio, speech, and language processing, 2018-09, Vol.26 (9), p.1570-1584

2018

Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte

Titel

End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks

Ist Teil von

IEEE/ACM transactions on audio, speech, and language processing, 2018-09, Vol.26 (9), p.1570-1584

Ort / Verlag

IEEE

Erscheinungsjahr

2018

Quelle

IEEE Xplore

Beschreibungen/Notizen

Speech enhancement model is used to map a noisy speech to a clean speech. In the training stage, an objective function is often adopted to optimize the model parameters. However, in the existing literature, there is an inconsistency between the model optimization criterion and the evaluation criterion for the enhanced speech. For example, in measuring speech intelligibility, most of the evaluation metric is based on a short-time objective intelligibility (STOI) measure, while the frame based mean square error (MSE) between estimated and clean speech is widely used in optimizing the model. Due to the inconsistency, there is no guarantee that the trained model can provide optimal performance in applications. In this study, we propose an end-to-end utterance-based speech enhancement framework using fully convolutional neural networks (FCN) to reduce the gap between the model optimization and the evaluation criterion. Because of the utterance-based optimization, temporal correlation information of long speech segments, or even at the entire utterance level, can be considered to directly optimize perception-based objective functions. As an example, we implemented the proposed FCN enhancement framework to optimize the STOI measure. Experimental results show that the STOI of a test speech processed by the proposed approach is better than conventional MSE-optimized speech due to the consistency between the training and the evaluation targets. Moreover, by integrating the STOI into model optimization, the intelligibility of human subjects and automatic speech recognition system on the enhanced speech is also substantially improved compared to those generated based on the minimum MSE criterion.

Sprache: Englisch
Identifikatoren: ISSN: 2329-9290
eISSN: 2329-9304
DOI: 10.1109/TASLP.2018.2821903
Titel-ID: cdi_ieee_primary_8331910

Format: –
Schlagworte: Automatic speech recognition, end-to-end speech enhancement, fully convolutional neural network, Linear programming, Noise measurement, Optimization, raw waveform, Speech, Speech enhancement, speech intelligibility, Training

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX

Menü

Weitere Dienste

Einstellungen

End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks

Details

Weiterführende Literatur