UB Paderborn / Katalog / Suche / Details

Zur Ergebnisliste

Accelerating the Early Identification of Relevant Studies in Title and Abstract Screening

2021 International Symposium on Computer Science and Intelligent Controls (ISCSIC), 2021, p.132-140

2021

Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte

Titel

Accelerating the Early Identification of Relevant Studies in Title and Abstract Screening

Ist Teil von

2021 International Symposium on Computer Science and Intelligent Controls (ISCSIC), 2021, p.132-140

Ort / Verlag

IEEE

Erscheinungsjahr

2021

Quelle

IEEE Electronic Library (IEL)

Beschreibungen/Notizen

In a systematic literature review (SLR), a large set of citations retrieved from multiple bibliographic databases must be screened to identify those eligible according to prespecified criteria. Initially, records are screened for potential eligibility by appraising titles and abstracts (T As), which is a tedious and time-consuming step. In a subsequent step, citations labelled as potentially relevant are thoroughly screened based on examination of the full texts (FT). In this study, we apply Natural Language Processing (NLP) and Machine Learning (ML) techniques to develop an automatic citation prioritization system to assist the screening process and to accelerate further steps of a SLR. First, we represent titles and abstracts using bag-of-words (BoW) and TFIDF (term frequency-inverse document frequency) with a dimensional reduction. Then, we apply traditional ML algorithms (such as Support Vector Machine, Stochastic Gradient Descent and Logistic Regression) to explore their performance in ranking the relevance of the remaining citations. Furthermore, the research sheds light on the impact of class imbalance in SLR datasets on the performance of each ML algorithm. In this context, oversampling techniques are explored to alleviate this inconvenience. In the evaluation, we use two SLRs carried out in our group (Dataset 1 and Dataset 2) and analyze how our approach accelerates the early identification of relevant citations by means of two figures of merit: the Work Saved over Sampling (WSS) and the Relevant References Found (RRF). For Dataset 1, at only 10% of citations screened (RRF@10), the system had already identified 35% and 55% of all relevant citations in TA and FT screenings, respectively. Also, 95% of relevant citations (WSS@95) were found at 49% (in TA) and 37% (in TF) screened. For Dataset 2, at only 10% screening, the system found 35% (in TA screening) and 47% (in FT screening) of relevant citations. Considering the initial randomly selected samples for training, the system significantly accelerates the early identification of relevant citations. Despite a certain degree of noise and similar behavior at first, oversampling techniques may improve performance in the final phase of the screening and avoid learning problems caused by a small dataset or/and the imbalance class problem. Finally, our system is compared to a recent evaluation of two popular citation screening systems (Abstrackr and EPPI-Reviewer), achieving a comparable performance.

Sprache: Englisch
Identifikatoren: DOI: 10.1109/ISCSIC54682.2021.00034
Titel-ID: cdi_ieee_primary_9644246

Format: –
Schlagworte: Machine learning, Machine learning algorithms, Natural Language Processing, Process control, Sensitivity, Support vector machines, Systematic Literature Review, Systematics, Training

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX

Menü

Weitere Dienste

Einstellungen

Accelerating the Early Identification of Relevant Studies in Title and Abstract Screening

Details

Weiterführende Literatur