Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Assamese news image caption generation using attention mechanism
Ist Teil von
Multimedia tools and applications, 2022-03, Vol.81 (7), p.10051-10069
Ort / Verlag
New York: Springer US
Erscheinungsjahr
2022
Quelle
SpringerLink
Beschreibungen/Notizen
In recent times, neural networks and deep learning have made significant contributions in various research domains. In the present work, we report automatic caption generation of an image using these techniques. Automatic image caption generation is an artificial intelligence problem that receives attention from both computer vision and natural language processing researchers. Most of the caption generation tasks exist in the English language and no work has been reported yet in Assamese to the best of our knowledge. Assamese is an Indo-European language spoken by 14 million speakers in the North-East region of India. This paper reports the image caption generation on the Assamese news domain. A quality image captioning system requires an annotated training corpus. However, there is no such standard dataset available for this resource-constrained language. Therefore, we built a dataset of 13000 images collected from various online local Assamese e-newspapers. We employ two different architectures for generating the news image caption. The first model is based on CNN-LSTM and the second model is based on the attention mechanism. These models are evaluated both qualitatively and quantitatively. Qualitative analysis of the generated captions is carried out in terms of fluency and adequacy scores based on a standard rating scale. The quantitative result is evaluated using the BLEU and CIDEr evaluation metrics. We observe that the attention mechanism-based model outperforms the CNN-LSTM based model for our task.