Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Directly modeling voiced and unvoiced components in speech waveforms by neural networks
Ist Teil von
2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, p.5640-5644
Ort / Verlag
IEEE
Erscheinungsjahr
2016
Quelle
IEEE Electronic Library Online
Beschreibungen/Notizen
This paper proposes a novel acoustic model based on neural networks for statistical parametric speech synthesis. The neural network outputs parameters of a non-zero mean Gaussian process, which defines a probability density function of a speech waveform given linguistic features. The mean and covariance functions of the Gaussian process represent deterministic (voiced) and stochastic (unvoiced) components of a speech waveform, whereas the previous approach considered the unvoiced component only. Experimental results show that the proposed approach can generate speech waveforms approximating natural speech waveforms.