Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2017, p.381-388
2017
Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte
Titel
Seeing and hearing too: Audio representation for video captioning
Ist Teil von
  • 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2017, p.381-388
Ort / Verlag
IEEE
Erscheinungsjahr
2017
Quelle
IEEE Electronic Library Online
Beschreibungen/Notizen
  • Video captioning has been widely researched. Most related work takes into account only visual content in generating descriptions. However, auditory content such as human speech or environmental sounds contains rich information for describing scenes, but has yet to be widely explored for video captions. Here, we experiment with different ways to use this auditory content in videos, and demonstrate improved caption generation in terms of popular evaluation methods such as BLEU, CIDEr, and METEOR. We also measure the semantic similarities between generated captions and human-provided ground truth using sentence embeddings, and find that good use of multi-modal contents helps the machine to generate captions that are more semantically related to the ground truth. When analyzing the generated sentences, we find some ambiguous situations for which visual-only models yield incorrect results but that are resolved by approaches that take into account auditory cues.
Sprache
Englisch
Identifikatoren
DOI: 10.1109/ASRU.2017.8268961
Titel-ID: cdi_ieee_primary_8268961

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX