UB Paderborn / Katalog / Suche / Details

Zur Ergebnisliste

Ergebnis 10 von 1770

A search tool based on language modelling developed for The Index of Middle English Prose

Open research Europe, 2023, Vol.3, p.197

2023

Details

Autor(en) / Beteiligte

Titel

A search tool based on language modelling developed for The Index of Middle English Prose

Ist Teil von

Open research Europe, 2023, Vol.3, p.197

Ort / Verlag

Belgium: European Commission, F1000 Research Limited

Erscheinungsjahr

2023

Link zum Volltext

Quelle

Free E-Journal (出版社公開部分のみ）

Beschreibungen/Notizen

Non-standardised early vernaculars present a problem for search tools due to the high degree of variation. The challenge lies in the variation found in orthography, syntax, and lexicon between titles, incipits, and explicits in manuscript copies of the same work. Traditional search methods relying on exact string matching or regular expressions fail to address these variations comprehensively. This project presents a web-based search tool specifically designed to handle linguistic and textual variation. The software is made available as a part of the (IMEP). The search tool addresses the issue of variation by utilizing a database of incipits and explicits, character-based n-gram language models (LMs) built with the (SRILM) toolkit, and a fuzzy search script (IMEP: FSS) written in Python. The tool optimizes for recall, retrieving multiple potential matches for a search string, without attempting to identify the 'correct' one. The search process involves looking up exact matches in the database while simultaneously using the fuzzy search script to evaluate the incipits and explicits against a model of the search string, followed by a match of the search string against models of the incipits and explicits. This two-step process shortens the processing time, which would otherwise be unreasonably long, because while using SRILM to match the search string against each incipit or explicit in the IMEP for precision could be time-consuming, running a first step where all texts are matched against a single LM built from the search string allows for faster processing. A web application, built using Django and Docker, combines the results of the direct database lookup and the fuzzy search script, presenting them as a list with exact matches followed by fuzzy matches ordered by increasing model perplexity. The tool is made available Open Access and can be adapted to other datasets.

Sprache: Englisch; Norwegisch
Identifikatoren: ISSN: 2732-5121
eISSN: 2732-5121
DOI: 10.12688/openreseurope.16590.1
Titel-ID: cdi_doaj_primary_oai_doaj_org_article_42624d5584c64703b93d5724f556aa01

Format: –
Schlagworte: bibliography, digital humanities, eng, Language modelling, medieval studies, Middle English, ngrams, Software Tool

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX

Menü

Weitere Dienste

Einstellungen

A search tool based on language modelling developed for The Index of Middle English Prose

Details

Weiterführende Literatur