Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
A Hybrid Sentence Splitting Method by Comma Insertion for Machine Translation with CRF
Ist Teil von
Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, p.141-152
Ort / Verlag
Cham: Springer International Publishing
Quelle
Alma/SFX Local Collection
Beschreibungen/Notizen
When writing formal articles many English writers often use long sentences with few punctuation marks. Since long sentences bring difficulty to machine translation systems, many researchers try to split them using punctuation marks before translation. But dealing with sentences with few punctuation marks is still intractable. In this paper we use a log linear model to insert commas into proper positions to split long sentence, trying to shorten the length of sub-sentence and benefit to machine translation. Experiment results show that our method can reasonably segment long sentences, and improve the quality of machine translation.