Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
ThEconSum: an Economics-domained Dataset for Thai Text Summarization and Baseline Models
Ist Teil von
2022 17th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP), 2022, p.1-6
Ort / Verlag
IEEE
Erscheinungsjahr
2022
Quelle
IEEE/IET Electronic Library
Beschreibungen/Notizen
Language resources as datasets are an essential component in developing an effective automatic text summarization (ATS) system. Some public datasets are relatively uncommon when compared with popular languages, due to the complexity of language preprocessing resulting in a labor-intensive annotation by linguists. ATS techniques are to condense the size of text into a shorter output and reduce the time for finding the information from the huge textual data. This paper presents the Thai ATS construction with Economics-domain data, called ThEconSum, which manifests some linguistic challenges for Thai summarization. Existing public public datasets were employed for developing the ATS system in Thai economic news articles.