UB Paderborn / Katalog / Suche / Details

Zur Ergebnisliste

Ergebnis 17 von 28

Arabic Document Classification: Performance Investigation of Preprocessing and Representation Techniques

Mathematical problems in engineering, 2022-04, Vol.2022, p.1-16

2022

Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte

Titel

Arabic Document Classification: Performance Investigation of Preprocessing and Representation Techniques

Ist Teil von

Mathematical problems in engineering, 2022-04, Vol.2022, p.1-16

Ort / Verlag

New York: Hindawi

Erscheinungsjahr

2022

Quelle

Free E-Journal (出版社公開部分のみ）

Beschreibungen/Notizen

With the increasing number of online social posts, review comments, and digital documentations, the Arabic text classification (ATC) task has been hugely required for many spontaneous natural language processing (NLP) applications, especially within the coronavirus pandemics. The variations in the meaning of the same Arabic words could directly affect the performance of any AI-based framework. This work aims to identify the effectiveness of machine learning (ML) algorithms through preprocessing and representation techniques. This effectiveness is measured via different AI-based classification techniques. Basically, the ATC process is influenced by several factors such as stemming in preprocessing, method of feature extraction and selection, nature of datasets, and classification algorithm. To improve the overall classification performance, preprocessing techniques are mainly used to convert each Arabic word into its root and decrease the representation dimension among the datasets. Feature extraction and selection always play crucial roles to represent the Arabic text in a meaningful way and improve the classification accuracy rate. The selected classifiers in this study are performed based on various feature selection algorithms. The overall classification evaluation results are compared using different classifiers such as multinomial Naive Bayes (MNB), Bernoulli Naive Bayes (BNB), Stochastic Gradient Descent (SGD), Support Vector Classifier (SVC), Logistic Regression (LR), and Linear SVC. All of these AI classifiers are evaluated using five balanced and unbalanced benchmark datasets: BBC Arabic corpus, CNN Arabic corpus, Open-Source Arabic corpus (OSAc), ArCovidVac, and AlKhaleej. The evaluation results show that the classification performance strongly depends on the preprocessing technique, representation methods and classification technique, and the nature of datasets used. For the considered benchmark datasets, the linear SVC has outperformed other classifiers overall when prominent features are selected.

Sprache: Englisch
Identifikatoren: ISSN: 1024-123X
eISSN: 1563-5147
DOI: 10.1155/2022/3720358
Titel-ID: cdi_proquest_journals_2660744727

Format: –
Schlagworte: Accuracy, Algorithms, Arabic language, Artificial intelligence, Benchmarks, Classification, Classifiers, Datasets, Discriminant analysis, Feature extraction, Investigations, Light, Machine learning, Mathematical problems, Natural language processing, Preprocessing, Representations, Sentiment analysis, Text categorization

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX

Menü

Weitere Dienste

Einstellungen

Arabic Document Classification: Performance Investigation of Preprocessing and Representation Techniques

Details

Weiterführende Literatur