UB Paderborn / Katalog / Suche / Details

Zur Ergebnisliste

Ergebnis 23 von 403

Unified Audio-visual Saliency Model for Omnidirectional Videos with Spatial Audio

IEEE transactions on multimedia, 2024-01, Vol.26, p.1-13

2024

Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte

Titel

Unified Audio-visual Saliency Model for Omnidirectional Videos with Spatial Audio

Ist Teil von

IEEE transactions on multimedia, 2024-01, Vol.26, p.1-13

Ort / Verlag

IEEE

Erscheinungsjahr

2024

Quelle

IEEE/IET Electronic Library (IEL)

Beschreibungen/Notizen

Spatial audio is a crucial component of omnidirectional videos (ODVs), which can provide an immersive experience by enabling viewers to perceive sound sources in all directions. However, most visual attention modeling works for ODVs focus only on visual cues, and audio modality is rather rarely considered. Additionally, the existing audio-visual saliency models for ODVs lack spatial audio location-awareness (i.e. sound source location-agnostic) and audio content attributes discriminability (i.e. audio content attributes-agnostic). To this end, we propose a novel audio-visual perception saliency (AVPS) model with spatial audio location-awareness and audio content attributes-adaptive to efficiently address the problem of fixation prediction in ODVs. Specifically, we first utilize the improved group equivariant convolutional neural network (G-CNN) with eidetic 3D LSTM (E3D-LSTM) to extract spatial-temporal visual features. Then we perceive sound source locations by computing the audio energy map (AEM) of the audio information in ODVs. Subsequently, we introduce SoundNet to extract audio features with multiple attributes. Finally, we develop an audio-visual feature fusion module to adaptively integrate spatial-temporal visual features and spatial auditory information to generate the final audio-visual saliency map. Extensive experiments in three audio modalities validate the effectiveness of the proposed model. Meanwhile, the performance of the proposed model is superior to the other 10 state-of-the-art saliency models.

Sprache: Englisch
Identifikatoren: ISSN: 1520-9210
eISSN: 1941-0077
DOI: 10.1109/TMM.2023.3271022
Titel-ID: cdi_ieee_primary_10109890

Format: –
Schlagworte: Adaptation models, audio content attributes-adaptive, audio energy map, Audio-visual saliency, Deep learning, Feature extraction, omnidirectional videos, Predictive models, sound source location-awareness, Spatial audio, Videos, Visualization

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX

Menü

Weitere Dienste

Einstellungen

Unified Audio-visual Saliency Model for Omnidirectional Videos with Spatial Audio

Details

Weiterführende Literatur