Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
2021 International Symposium on Computer Science and Intelligent Controls (ISCSIC), 2021, p.132-140
2021
Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte
Titel
Accelerating the Early Identification of Relevant Studies in Title and Abstract Screening
Ist Teil von
  • 2021 International Symposium on Computer Science and Intelligent Controls (ISCSIC), 2021, p.132-140
Ort / Verlag
IEEE
Erscheinungsjahr
2021
Quelle
IEEE Electronic Library (IEL)
Beschreibungen/Notizen
  • In a systematic literature review (SLR), a large set of citations retrieved from multiple bibliographic databases must be screened to identify those eligible according to prespecified criteria. Initially, records are screened for potential eligibility by appraising titles and abstracts (T As), which is a tedious and time-consuming step. In a subsequent step, citations labelled as potentially relevant are thoroughly screened based on examination of the full texts (FT). In this study, we apply Natural Language Processing (NLP) and Machine Learning (ML) techniques to develop an automatic citation prioritization system to assist the screening process and to accelerate further steps of a SLR. First, we represent titles and abstracts using bag-of-words (BoW) and TFIDF (term frequency-inverse document frequency) with a dimensional reduction. Then, we apply traditional ML algorithms (such as Support Vector Machine, Stochastic Gradient Descent and Logistic Regression) to explore their performance in ranking the relevance of the remaining citations. Furthermore, the research sheds light on the impact of class imbalance in SLR datasets on the performance of each ML algorithm. In this context, oversampling techniques are explored to alleviate this inconvenience. In the evaluation, we use two SLRs carried out in our group (Dataset 1 and Dataset 2) and analyze how our approach accelerates the early identification of relevant citations by means of two figures of merit: the Work Saved over Sampling (WSS) and the Relevant References Found (RRF). For Dataset 1, at only 10% of citations screened (RRF@10), the system had already identified 35% and 55% of all relevant citations in TA and FT screenings, respectively. Also, 95% of relevant citations (WSS@95) were found at 49% (in TA) and 37% (in TF) screened. For Dataset 2, at only 10% screening, the system found 35% (in TA screening) and 47% (in FT screening) of relevant citations. Considering the initial randomly selected samples for training, the system significantly accelerates the early identification of relevant citations. Despite a certain degree of noise and similar behavior at first, oversampling techniques may improve performance in the final phase of the screening and avoid learning problems caused by a small dataset or/and the imbalance class problem. Finally, our system is compared to a recent evaluation of two popular citation screening systems (Abstrackr and EPPI-Reviewer), achieving a comparable performance.
Sprache
Englisch
Identifikatoren
DOI: 10.1109/ISCSIC54682.2021.00034
Titel-ID: cdi_ieee_primary_9644246

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX