Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Building Parallel Corpora for SMT System: A Case Study of English-Manipuri
Ist Teil von
International journal of computer applications, 2012-01, Vol.52 (14), p.47-51
Ort / Verlag
New York: Foundation of Computer Science
Erscheinungsjahr
2012
Quelle
EZB Electronic Journals Library
Beschreibungen/Notizen
The Statistical Machine Translation (SMT) systems are developed using sentence aligned parallel corpus. The difficulty is that there is no parallel corpus at the required measure for many language pairs. The preparation of large scale parallel corpus takes time and demands the linguistics skill. In the present work, the various issues of a quality parallel corpus and a technique that extracts parallel corpus between Manipuri, a morphologically rich and resource constrained Indian language and English has been developed from a web based comparable news corpora. We explore the crux of the parallel corpora towards improving the translation quality through linguistics factors for the language pair.