Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Retina enhanced SIFT descriptors for video indexing
Ist Teil von
2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI), 2013, p.201-206
Ort / Verlag
IEEE
Erscheinungsjahr
2013
Quelle
IEEE Xplore
Beschreibungen/Notizen
This paper investigates how the detection of diverse high-level semantic concepts (objects, actions, scene types, persons etc.) in videos can be improved by applying a model of the human retina. A large part of the current approaches for Content-Based Image/Video Retrieval (CBIR/CBVR) relies on the Bag-of-Words (BoW) model, which has shown to perform well especially for object recognition in static images. Nevertheless, the current state-of-the-art framework shows its limits when applied to videos because of the added temporal information. In this paper, we enhance a BoW model based on the classical SIFT local spatial descriptor, by preprocessing videos with a model of the human retina. This retinal preprocessing allows the SIFT descriptor to become aware of temporal information. Our proposed descriptors extend the SIFT genericity to spatio-temporal content, making them interesting for generic video indexing. They also benefit from the retinal spatio-temporal "robustness" to various disturbances such as noise, compression artifacts, luminance variations or shadows. The proposed approaches are evaluated on the TRECVID 2012 Semantic Indexing task dataset.