Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich.
mehr Informationen...
Chinese Word Frequency Approximation Based on Multitype Corpora
Ist Teil von
Journal of quantitative linguistics, 2010-05, Vol.17 (2), p.142-166
Ort / Verlag
Routledge
Erscheinungsjahr
2010
Link zum Volltext
Quelle
Alma/SFX Local Collection
Beschreibungen/Notizen
Due to the nature of Chinese, a perfect word-segmented Chinese corpus that is ideal for the task of word frequency estimation may never exist. Therefore, a reliable estimation for Chinese word frequencies remains a challenge. Currently, three types of corpora can be considered for this purpose: raw corpora, automatically word-segmented corpora, and manually word-segmented corpora. As each type has its own advantages and drawbacks, none of them is sufficient alone. In this article, we propose a hybrid scheme which utilizes existing corpora of different types for word frequency approximation. Experiments have been performed from statistical and application-oriented perspectives. We demonstrate that, compared with other schemes, the proposed scheme is the most effective one and leads to better word frequency approximation results.