Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Ergebnis 9 von 87

Details

Autor(en) / Beteiligte
Titel
Modelling multi-level prosody and spectral features using deep neural network for an automatic tonal and non-tonal pre-classification-based Indian language identification system
Ist Teil von
  • Language resources and evaluation, 2021-09, Vol.55 (3), p.689-730
Ort / Verlag
Dordrecht: Springer Netherlands
Erscheinungsjahr
2021
Quelle
Alma/SFX Local Collection
Beschreibungen/Notizen
  • In this paper an attempt has been made to prepare an automatic tonal and non-tonal pre-classification-based Indian language identification (LID) system using multi-level prosody and spectral features. Languages are first categorized into tonal and non-tonal groups, and then, from among the languages of the respective groups, individual languages are identified. The system uses syllable, word (tri-syllable) and phrase level (multi-word) prosody (collectively called multi-level prosody) along with spectral features, namely Mel-frequency cepstral coefficients (MFCCs), Mean Hilbert envelope coefficients (MHEC), and shifted delta cepstral coefficients of MFCCs and MHECs for the pre-classification task. Multi-level analysis of spectral features has also been proposed and the complementarity of the syllable, word and phrase level (spectral + prosody) has been examined for pre-classification-based LID task. Four different models, particularly, Gaussian Mixture Model (GMM)-Universal Background Model (UBM), Artificial Neural Network (ANN), i-vector based support vector machine (SVM) and Deep Neural Network (DNN) have been developed to identify the languages. Experiments have been carried out on National Institute of Technology Silchar language database (NITS-LD) and OGI Multi-language Telephone Speech corpus (OGI-MLTS). The experiments confirm that both prosody and (spectral + prosody) obtained from syllable-, word- and phrase-level carry complementary information for pre-classification-based LID task. At the pre-classification stage, DNN models based on multi-level (prosody + MFCC) features, coupled with score combination technique results in the lowest EER value of 9.6% for NITS-LD. For OGI-MLTS database, the lowest EER value of 10.2% is observed for multi-level (prosody + MHEC). The pre-classification module helps to improve the performance of baseline single-stage LID system by 3.2% and 4.2% for NITS-LD and OGI-MLTS database respectively.

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX