Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Journal of quantitative linguistics, 2010-05, Vol.17 (2), p.142-166
2010

Details

Autor(en) / Beteiligte
Titel
Chinese Word Frequency Approximation Based on Multitype Corpora
Ist Teil von
  • Journal of quantitative linguistics, 2010-05, Vol.17 (2), p.142-166
Ort / Verlag
Routledge
Erscheinungsjahr
2010
Link zum Volltext
Quelle
Alma/SFX Local Collection
Beschreibungen/Notizen
  • Due to the nature of Chinese, a perfect word-segmented Chinese corpus that is ideal for the task of word frequency estimation may never exist. Therefore, a reliable estimation for Chinese word frequencies remains a challenge. Currently, three types of corpora can be considered for this purpose: raw corpora, automatically word-segmented corpora, and manually word-segmented corpora. As each type has its own advantages and drawbacks, none of them is sufficient alone. In this article, we propose a hybrid scheme which utilizes existing corpora of different types for word frequency approximation. Experiments have been performed from statistical and application-oriented perspectives. We demonstrate that, compared with other schemes, the proposed scheme is the most effective one and leads to better word frequency approximation results.
Sprache
Englisch
Identifikatoren
ISSN: 0929-6174
eISSN: 1744-5035
DOI: 10.1080/09296171003643213
Titel-ID: cdi_proquest_miscellaneous_753820999

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX