Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Ergebnis 18 von 117355
Journal of data mining and digital humanities, 2021-11, Vol.2021 (HistoInformatics)
2021

Details

Autor(en) / Beteiligte
Titel
Combining Morphological and Histogram based Text Line Segmentation in the OCR Context
Ist Teil von
  • Journal of data mining and digital humanities, 2021-11, Vol.2021 (HistoInformatics)
Ort / Verlag
Nicolas Turenne
Erscheinungsjahr
2021
Link zum Volltext
Quelle
EZB Electronic Journals Library
Beschreibungen/Notizen
  • Text line segmentation is one of the pre-stages of modern optical character recognition systems. The algorithmic approach proposed by this paper has been designed for this exact purpose. Its main characteristic is the combination of two different techniques, morphological image operations and horizontal histogram projections. The method was developed to be applied on a historic data collection that commonly features quality issues, such as degraded paper, blurred text, or presence of noise. For that reason, the segmenter in question could be of particular interest for cultural institutions, that want access to robust line bounding boxes for a given historic document. Because of the promising segmentation results that are joined by low computational cost, the algorithm was incorporated into the OCR pipeline of the National Library of Luxembourg, in the context of the initiative of reprocessing their historic newspaper collection. The general contribution of this paper is to outline the approach and to evaluate the gains in terms of accuracy and speed, comparing it to the segmentation algorithm bundled with the used open source OCR software.
Sprache
Englisch
Identifikatoren
ISSN: 2416-5999
eISSN: 2416-5999
DOI: 10.46298/jdmdh.7277
Titel-ID: cdi_doaj_primary_oai_doaj_org_article_a17a5ed1a8304276a9c7e6a7803b9528

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX