Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Image Caption Enhancement with GRIT, Portable ResNet and BART Context-Tuning
Ist Teil von
2022 6th International Conference on Universal Village (UV), 2022, p.1-6
Ort / Verlag
IEEE
Erscheinungsjahr
2022
Quelle
IEEE Xplore
Beschreibungen/Notizen
This paper aims to create an image captioning novel architecture that infuses Grid and Region-based image caption transformer, ResNet, and BART language model to offer a more detail-oriented image captioning model. Conventional state-of-the-art image captioning models mainly focuses on region-based features. They rely on decent object detector architectures like Faster R-CNN to extract object-level information to describe the image's content. Nevertheless, they cannot remove contextual information, high computational costs, and the ability to introduce in-depth external details of objects presented in the images-the replacement of conventional CNN-based detectors results in faster computation. The experiment can generate image captions comparatively fast with higher accuracy and details with contextual information.