Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Ergebnis 8 von 82
2015 13th International Conference on Document Analysis and Recognition (ICDAR), 2015, p.1096-1100
2015
Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte
Titel
Text and non-text segmentation based on connected component features
Ist Teil von
  • 2015 13th International Conference on Document Analysis and Recognition (ICDAR), 2015, p.1096-1100
Ort / Verlag
IEEE
Erscheinungsjahr
2015
Quelle
IEEE/IET Electronic Library (IEL)
Beschreibungen/Notizen
  • Document image segmentation is crucial to OCR and other digitization processes. In this paper, we present a learning-based approach for text and non-text separation in document images. The training features are extracted at the level of connected components, a mid-level between the slow noise-sensitive pixel level, and the segmentation-dependent zone level. Given all types, shapes and sizes of connected components, we extract a powerful set of features based on size, shape, stroke width and position of each connected component. Adaboosting with Decision trees is used for labeling connected components. Finally, the classification of connected components into text and non-text is corrected based on classification probabilities and size as well as stroke width analysis of the nearest neighbors of a connected component. The performance of our approach has been evaluated on the two standard datasets: UW-III and ICDAR-2009 competition for document layout analysis. Our results demonstrate that the proposed approach achieves competitive performance for segmenting text and non-text in document images of variable content and degradation.
Sprache
Englisch
Identifikatoren
DOI: 10.1109/ICDAR.2015.7333930
Titel-ID: cdi_ieee_primary_7333930

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX