Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Ergebnis 20 von 320
IEEE/ACM transactions on audio, speech, and language processing, 2018-08, Vol.26 (8), p.1406-1419
2018
Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte
Titel
Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis
Ist Teil von
  • IEEE/ACM transactions on audio, speech, and language processing, 2018-08, Vol.26 (8), p.1406-1419
Ort / Verlag
IEEE
Erscheinungsjahr
2018
Quelle
ACM Digital Library
Beschreibungen/Notizen
  • Recurrent neural networks (RNNs) have been successfully used as fundamental frequency (F0) models for text-to-speech synthesis. However, this paper showed that a normal RNN may not take into account the statistical dependency of the F0 data across frames and consequently only generate noisy F0 contours when F0 values are sampled from the model. A better model may take into account the causal dependency of the current F0 datum on the previous frames' F0 data. One such model is the shallow autoregressive (AR) recurrent mixture density network (SAR) that we recently proposed. However, as this study showed, an SAR is equivalent to the combination of trainable linear filters and a conventional RNN. It is still weak for F0 modeling. To better model the temporal dependency in F0 contours, we propose a deep AR model (DAR). On the basis of an RNN, this DAR propagates the previous frame's F0 value through the RNN, which allows nonlinear AR dependency to be achieved. We also propose F0 quantization and data dropout strategies for the DAR. Experiments on a Japanese corpus demonstrated that this DAR can generate appropriate F0 contours by using the random-sampling-based generation method, which is impossible for the baseline RNN and SAR. When a conventional mean-based generation method was used in the proposed DAR and other experimental models, the DAR generated accurate and less oversmoothed F0 contours and achieved a better mean-opinion-score in a subjective evaluation test.
Sprache
Englisch
Identifikatoren
ISSN: 2329-9290
eISSN: 2329-9304
DOI: 10.1109/TASLP.2018.2828650
Titel-ID: cdi_ieee_primary_8341752

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX