UB Paderborn / Katalog / Suche / Details

Ergebnis 2 von 141

IEEE transactions on pattern analysis and machine intelligence, 2012-01, Vol.34 (1), p.79-93

2012

Volltextzugriff (PDF)

Autor(en) / Beteiligte

Titel

Multimodal Speaker Diarization

Ist Teil von

IEEE transactions on pattern analysis and machine intelligence, 2012-01, Vol.34 (1), p.79-93

Ort / Verlag

Los Alamitos, CA: IEEE

Erscheinungsjahr

2012

Quelle

IEEE Xplore

Beschreibungen/Notizen

We present a novel probabilistic framework that fuses information coming from the audio and video modality to perform speaker diarization. The proposed framework is a Dynamic Bayesian Network (DBN) that is an extension of a factorial Hidden Markov Model (fHMM) and models the people appearing in an audiovisual recording as multimodal entities that generate observations in the audio stream, the video stream, and the joint audiovisual space. The framework is very robust to different contexts, makes no assumptions about the location of the recording equipment, and does not require labeled training data as it acquires the model parameters using the Expectation Maximization (EM) algorithm. We apply the proposed model to two meeting videos and a news broadcast video, all of which come from publicly available data sets. The results acquired in speaker diarization are in favor of the proposed multimodal framework, which outperforms the single modality analysis results and improves over the state-of-the-art audio-based speaker diarization.

Sprache: Englisch
Identifikatoren: ISSN: 0162-8828
eISSN: 1939-3539, 2160-9292
DOI: 10.1109/TPAMI.2011.47
Titel-ID: cdi_pascalfrancis_primary_25862082

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX