Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Benchmarking topic models on scientific articles using BERTeley
Ist Teil von
Natural Language Processing Journal, 2024-03, Vol.6, p.100044, Article 100044
Ort / Verlag
Elsevier B.V
Erscheinungsjahr
2024
Quelle
Alma/SFX Local Collection
Beschreibungen/Notizen
The introduction of BERTopic marked a crucial advancement in topic modeling and presented a topic model that outperformed both traditional and modern topic models in terms of topic modeling metrics on a variety of corpora. However, unique issues arise when topic modeling is performed on scientific articles. This paper introduces BERTeley, an innovative tool built upon BERTopic, designed to alleviate these shortcomings and improve the usability of BERTopic when conducting topic modeling on a corpus consisting of scientific articles. This is accomplished through BERTeley’s three main features: scientific article preprocessing, topic modeling using pre-trained scientific language models, and topic model metric calculation. Furthermore, an experiment was conducted comparing topic models using four different language models in three corpora consisting of scientific articles.
•Provide transformer-based tools to accelerate topic modeling using scientific articles as input.•Extract text embeddings using bidirectional encoder transformers and Large Language models to discover underlying themes and patterns within documents.•Create visual summaries of science topics from different publication databases.