UB Paderborn / Katalog / Suche / Details

Zur Ergebnisliste

Ergebnis 18 von 314

Investigating different representations for modeling and controlling multiple emotions in DNN-based speech synthesis

Speech communication, 2018-05, Vol.99, p.135-143

2018

Details

Autor(en) / Beteiligte

Titel

Investigating different representations for modeling and controlling multiple emotions in DNN-based speech synthesis

Ist Teil von

Speech communication, 2018-05, Vol.99, p.135-143

Ort / Verlag

Amsterdam: Elsevier B.V

Erscheinungsjahr

2018

Link zum Volltext

Quelle

Alma/SFX Local Collection

Beschreibungen/Notizen

•We study the impact of adding large-scale listener's perceptual annotations into the emotional speech modeling process.•We consider a number of different emotional representations that allow us to exploit this perceptual information. These representations also consider ways of manipulating the modeled emotion at synthesis time.•Two large scale perceptual evaluations were carried out, one to evaluate modeling accuracy and another to evaluate control capabilities at synthesis time.•We prove how adding perceptual information based on listener’s annotation significantly improves emotional speech modeling accuracy.•We also show how the proposed representations provide us with notable emotional control capabilities.•They allow us to control both emotion recognition rates and perceived emotional strength without decreasing produced speech quality. In this paper, we investigate the simultaneous modeling of multiple emotions in DNN-based expressive speech synthesis, and how to represent the emotional labels, such as emotional class and strength, for this task. Our goal is to answer two questions: First, what is the best way to annotate speech data with multiple emotions – should we use the labels that the speaker intended to express, or labels based on listener perception of the resulting speech signals? Second, how should the emotional information be represented as labels for supervised DNN training, e.g., should emotional class and emotional strength be factorized into separate inputs or not? We evaluate on a large-scale corpus of emotional speech from a professional voice actress, additionally annotated with perceived emotional labels from crowdsourced listeners. By comparing DNN-based speech synthesizers that utilize different emotional representations, we assess the impact of these representations and design decisions on human emotion recognition rates, perceived emotional strength, and subjective speech quality. Simultaneously, we also study which representations are most appropriate for controlling the emotional strength of synthetic speech.

Sprache: Englisch
Identifikatoren: ISSN: 0167-6393
eISSN: 1872-7182
DOI: 10.1016/j.specom.2018.03.002
Titel-ID: cdi_proquest_journals_2068483768

Format: –
Schlagworte: Control, Educational activities, Emotion recognition, Emotional speech synthesis, Emotions, Labels, Modelling, Perception modeling, Perceptual evaluation, Question answer sequences, Representations, Speakers, Speech perception, Speech recognition, Speech synthesis, Strength, Synthesizers, Voice professionals, Voice simulation

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX

Menü

Weitere Dienste

Einstellungen

Investigating different representations for modeling and controlling multiple emotions in DNN-based speech synthesis

Details

Weiterführende Literatur