Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
An approach for stemming in symbolically compressed Indian language imaged documents
Ist Teil von
Eighth International Conference on Document Analysis and Recognition (ICDAR'05), 2005, p.1080-1084 Vol. 2
Ort / Verlag
IEEE
Erscheinungsjahr
2005
Quelle
IEEE/IET Electronic Library
Beschreibungen/Notizen
Stemming is used in many information retrieval (IR) systems to reduce variant word forms to common roots, and thereby improving the overall retrieval efficiency. This paper presents an algorithm for stemming in the context of document image retrieval system. The algorithm assumes that the documents are symbolically compressed and stemming has been attempted in the compressed domain itself. Experiments have been conducted on Indian language imaged documents for which efficient OCR still remains a challenging task. Results obtained from a set 150 document images (in Bangla script, the second most popular script in the Indian sub-continent) consisting of about 12K word show a promising performance of the proposed approach.