UB Paderborn / Katalog / Details

Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...

Chinese Word Frequency Approximation Based on Multitype Corpora

Journal of quantitative linguistics, 2010-05, Vol.17 (2), p.142-166

2010

Details

Autor(en) / Beteiligte

Titel

Chinese Word Frequency Approximation Based on Multitype Corpora

Ist Teil von

Journal of quantitative linguistics, 2010-05, Vol.17 (2), p.142-166

Ort / Verlag

Routledge

Erscheinungsjahr

2010

Link zum Volltext

Quelle

Alma/SFX Local Collection

Beschreibungen/Notizen

Due to the nature of Chinese, a perfect word-segmented Chinese corpus that is ideal for the task of word frequency estimation may never exist. Therefore, a reliable estimation for Chinese word frequencies remains a challenge. Currently, three types of corpora can be considered for this purpose: raw corpora, automatically word-segmented corpora, and manually word-segmented corpora. As each type has its own advantages and drawbacks, none of them is sufficient alone. In this article, we propose a hybrid scheme which utilizes existing corpora of different types for word frequency approximation. Experiments have been performed from statistical and application-oriented perspectives. We demonstrate that, compared with other schemes, the proposed scheme is the most effective one and leads to better word frequency approximation results.

Sprache: Englisch
Identifikatoren: ISSN: 0929-6174
eISSN: 1744-5035
DOI: 10.1080/09296171003643213
Titel-ID: cdi_proquest_miscellaneous_753820999

Format: –
Schlagworte: Chinese languages, lexicology, linguistic corpus, word frequency

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX

Menü

Weitere Dienste

Einstellungen

Chinese Word Frequency Approximation Based on Multitype Corpora

Details

Weiterführende Literatur