UB Paderborn / Katalog / Suche / Details

Zur Ergebnisliste

Ergebnis 13 von 396

Assamese news image caption generation using attention mechanism

Multimedia tools and applications, 2022-03, Vol.81 (7), p.10051-10069

2022

Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte

Titel

Assamese news image caption generation using attention mechanism

Ist Teil von

Multimedia tools and applications, 2022-03, Vol.81 (7), p.10051-10069

Ort / Verlag

New York: Springer US

Erscheinungsjahr

2022

Quelle

SpringerLink

Beschreibungen/Notizen

In recent times, neural networks and deep learning have made significant contributions in various research domains. In the present work, we report automatic caption generation of an image using these techniques. Automatic image caption generation is an artificial intelligence problem that receives attention from both computer vision and natural language processing researchers. Most of the caption generation tasks exist in the English language and no work has been reported yet in Assamese to the best of our knowledge. Assamese is an Indo-European language spoken by 14 million speakers in the North-East region of India. This paper reports the image caption generation on the Assamese news domain. A quality image captioning system requires an annotated training corpus. However, there is no such standard dataset available for this resource-constrained language. Therefore, we built a dataset of 13000 images collected from various online local Assamese e-newspapers. We employ two different architectures for generating the news image caption. The first model is based on CNN-LSTM and the second model is based on the attention mechanism. These models are evaluated both qualitatively and quantitatively. Qualitative analysis of the generated captions is carried out in terms of fluency and adequacy scores based on a standard rating scale. The quantitative result is evaluated using the BLEU and CIDEr evaluation metrics. We observe that the attention mechanism-based model outperforms the CNN-LSTM based model for our task.

Sprache: Englisch
Identifikatoren: ISSN: 1380-7501
eISSN: 1573-7721
DOI: 10.1007/s11042-022-12042-8
Titel-ID: cdi_proquest_journals_2642112201

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX

Menü

Weitere Dienste

Einstellungen

Assamese news image caption generation using attention mechanism

Details

Weiterführende Literatur