Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Ergebnis 10 von 1770

Details

Autor(en) / Beteiligte
Titel
A search tool based on language modelling developed for The Index of Middle English Prose
Ist Teil von
  • Open research Europe, 2023, Vol.3, p.197
Ort / Verlag
Belgium: European Commission, F1000 Research Limited
Erscheinungsjahr
2023
Link zum Volltext
Quelle
Free E-Journal (出版社公開部分のみ)
Beschreibungen/Notizen
  • Non-standardised early vernaculars present a problem for search tools due to the high degree of variation. The challenge lies in the variation found in orthography, syntax, and lexicon between titles, incipits, and explicits in manuscript copies of the same work. Traditional search methods relying on exact string matching or regular expressions fail to address these variations comprehensively. This project presents a web-based search tool specifically designed to handle linguistic and textual variation. The software is made available as a part of the (IMEP). The search tool addresses the issue of variation by utilizing a database of incipits and explicits, character-based n-gram language models (LMs) built with the (SRILM) toolkit, and a fuzzy search script (IMEP: FSS) written in Python. The tool optimizes for recall, retrieving multiple potential matches for a search string, without attempting to identify the 'correct' one. The search process involves looking up exact matches in the database while simultaneously using the fuzzy search script to evaluate the incipits and explicits against a model of the search string, followed by a match of the search string against models of the incipits and explicits. This two-step process shortens the processing time, which would otherwise be unreasonably long, because while using SRILM to match the search string against each incipit or explicit in the IMEP for precision could be time-consuming, running a first step where all texts are matched against a single LM built from the search string allows for faster processing. A web application, built using Django and Docker, combines the results of the direct database lookup and the fuzzy search script, presenting them as a list with exact matches followed by fuzzy matches ordered by increasing model perplexity. The tool is made available Open Access and can be adapted to other datasets.
Sprache
Englisch; Norwegisch
Identifikatoren
ISSN: 2732-5121
eISSN: 2732-5121
DOI: 10.12688/openreseurope.16590.1
Titel-ID: cdi_doaj_primary_oai_doaj_org_article_42624d5584c64703b93d5724f556aa01

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX