UB Paderborn / Katalog / Details

Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...

Grounding of Textual Phrases in Images by Reconstruction

Computer Vision – ECCV 2016, p.817-834

Details

Autor(en) / Beteiligte

Titel

Grounding of Textual Phrases in Images by Reconstruction

Ist Teil von

Computer Vision – ECCV 2016, p.817-834

Ort / Verlag

Cham: Springer International Publishing

Link zum Volltext

Quelle

Alma/SFX Local Collection

Beschreibungen/Notizen

Grounding (i.e. localizing) arbitrary, free-form textual phrases in visual content is a challenging problem with many applications for human-computer interaction and image-text reference resolution. Few datasets provide the ground truth spatial localization of phrases, thus it is desirable to learn from data with no or little grounding supervision. We propose a novel approach which learns grounding by reconstructing a given phrase using an attention mechanism, which can be either latent or optimized directly. During training our approach encodes the phrase using a recurrent network language model and then learns to attend to the relevant image region in order to reconstruct the input phrase. At test time, the correct attention, i.e., the grounding, is evaluated. If grounding supervision is available it can be directly applied via a loss over the attention mechanism. We demonstrate the effectiveness of our approach on the Flickr30k Entities and ReferItGame datasets with different levels of supervision, ranging from no supervision over partial supervision to full supervision. Our supervised variant improves by a large margin over the state-of-the-art on both datasets.

Sprache: Englisch
Identifikatoren: ISBN: 3319464477, 9783319464473
ISSN: 0302-9743
eISSN: 1611-3349
DOI: 10.1007/978-3-319-46448-0_49
Titel-ID: cdi_springer_books_10_1007_978_3_319_46448_0_49

Format: –
Schlagworte: Box Proposals, Ground Supervision, Ground Truth Box, Intersection Over Union (IOU), Text Phrases

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX

Menü

Weitere Dienste

Einstellungen

Grounding of Textual Phrases in Images by Reconstruction

Details

Weiterführende Literatur