Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Ergebnis 17 von 67
Radioengineering, 2011-12, Vol.20 (4), p.1002-1008
2011
Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte
Titel
Performance of Czech Speech Recognition with Language Models Created from Public Resources
Ist Teil von
  • Radioengineering, 2011-12, Vol.20 (4), p.1002-1008
Ort / Verlag
Spolecnost pro radioelektronicke inzenyrstvi
Erscheinungsjahr
2011
Quelle
EZB Electronic Journals Library
Beschreibungen/Notizen
  • In this paper, we investigate the usability of publicly available n-gram corpora for the creation of language models (LM) applicable for Czech speech recognition systems. N-gram LMs with various parameters and settings were created from two publicly available sets, Czech Web 1T 5-gram corpus provided by Google and 5-gram corpus obtained from the Czech National Corpus Institute. For comparison, we tested also an LM made of a large private resource of newspaper and broadcast texts collected by a Czech media mining company. The LMs were analyzed and compared from the statistic point of view (mainly via their perplexity rates) and from the performance point of view when employed in large vocabulary continuous speech recognition systems. Our study shows that the Web1T-based LMs, even after intensive cleaning and normalization procedures, cannot compete with those made of smaller but more consistent corpora. The experiments done on large test data also illustrate the impact of Czech as highly inflective language on the perplexity, OOV, and recognition accuracy rates.
Sprache
Englisch
Identifikatoren
ISSN: 1210-2512
Titel-ID: cdi_doaj_primary_oai_doaj_org_article_5c6fedb621fc4d7bb650d096bfe5b5e6

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX