UB Paderborn / Katalog / Suche / Details

Zur Ergebnisliste

Ergebnis 21 von 114097

Video Moment Retrieval With Noisy Labels

IEEE transaction on neural networks and learning systems, 2024-05, Vol.35 (5), p.6779-6791

2024

Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte

Titel

Video Moment Retrieval With Noisy Labels

Ist Teil von

IEEE transaction on neural networks and learning systems, 2024-05, Vol.35 (5), p.6779-6791

Ort / Verlag

United States: IEEE

Erscheinungsjahr

2024

Quelle

IEEE/IET Electronic Library (IEL)

Beschreibungen/Notizen

Video moment retrieval (VMR) aims to localize the target moment in an untrimmed video according to the given nature language query. The existing algorithms typically rely on clean annotations to train their models. However, making annotations by human labors may introduce much noise. Thus, the video moment retrieval models will not be well trained in practice. In this article, we present a simple yet effective video moment retrieval framework via bottom-up schema, which is in end-to-end manners and robust to noisy label training. Specifically, we extract the multimodal features by syntactic graph convolutional networks and multihead attention layers, which are fused by the cross gates and the bilinear approach. Then, the feature pyramid networks are constructed to encode plentiful scene relationships and capture high semantics. Furthermore, to mitigate the effects of noisy annotations, we devise the multilevel losses characterized by two levels: a frame-level loss that improves noise tolerance and an instance-level loss that reduces adverse effects of negative instances. For the frame level, we adopt the Gaussian smoothing to regard noisy labels as soft labels through the partial fitting. For the instance level, we exploit a pair of structurally identical models to let them teach each other during iterations. This leads to our proposed robust video moment retrieval model, which experimentally and significantly outperforms the state-of-the-art approaches on standard public datasets ActivityCaption and textually annotated cooking scene (TACoS). We also evaluate the proposed approach on the different manual annotation noises to further demonstrate the effectiveness of our model.

Sprache: Englisch
Identifikatoren: ISSN: 2162-237X
eISSN: 2162-2388
DOI: 10.1109/TNNLS.2022.3212900
Titel-ID: cdi_pubmed_primary_36315534

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX

Menü

Weitere Dienste

Einstellungen

Video Moment Retrieval With Noisy Labels

Details

Weiterführende Literatur