Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
IEEE/ACM transactions on audio, speech, and language processing, 2018-09, Vol.26 (9), p.1570-1584
2018
Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte
Titel
End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks
Ist Teil von
  • IEEE/ACM transactions on audio, speech, and language processing, 2018-09, Vol.26 (9), p.1570-1584
Ort / Verlag
IEEE
Erscheinungsjahr
2018
Quelle
IEEE Xplore
Beschreibungen/Notizen
  • Speech enhancement model is used to map a noisy speech to a clean speech. In the training stage, an objective function is often adopted to optimize the model parameters. However, in the existing literature, there is an inconsistency between the model optimization criterion and the evaluation criterion for the enhanced speech. For example, in measuring speech intelligibility, most of the evaluation metric is based on a short-time objective intelligibility (STOI) measure, while the frame based mean square error (MSE) between estimated and clean speech is widely used in optimizing the model. Due to the inconsistency, there is no guarantee that the trained model can provide optimal performance in applications. In this study, we propose an end-to-end utterance-based speech enhancement framework using fully convolutional neural networks (FCN) to reduce the gap between the model optimization and the evaluation criterion. Because of the utterance-based optimization, temporal correlation information of long speech segments, or even at the entire utterance level, can be considered to directly optimize perception-based objective functions. As an example, we implemented the proposed FCN enhancement framework to optimize the STOI measure. Experimental results show that the STOI of a test speech processed by the proposed approach is better than conventional MSE-optimized speech due to the consistency between the training and the evaluation targets. Moreover, by integrating the STOI into model optimization, the intelligibility of human subjects and automatic speech recognition system on the enhanced speech is also substantially improved compared to those generated based on the minimum MSE criterion.
Sprache
Englisch
Identifikatoren
ISSN: 2329-9290
eISSN: 2329-9304
DOI: 10.1109/TASLP.2018.2821903
Titel-ID: cdi_ieee_primary_8331910

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX