Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Ergebnis 11 von 62
Circuits, systems, and signal processing, 2020-10, Vol.39 (10), p.5169-5197
2020
Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte
Titel
Unsupervised Speech Signal-to-Symbol Transformation for Language Identification
Ist Teil von
  • Circuits, systems, and signal processing, 2020-10, Vol.39 (10), p.5169-5197
Ort / Verlag
New York: Springer US
Erscheinungsjahr
2020
Quelle
Alma/SFX Local Collection
Beschreibungen/Notizen
  • This paper presents a new approach for unsupervised segmentation and labeling of acoustically homogeneous segments from the speech signals. The virtual labels, thus obtained, are used to build unsupervised acoustic models in the absence of manual transcriptions. We refer to this approach as unsupervised speech signal-to-symbol transformation. This approach mainly involves three steps: (i) segmenting the speech signal into acoustically homogeneous regions, (ii) assigning consistent labels to the acoustic segments with similar characteristics and (iii) iterative modeling of the acoustic segments sharing the same label. This work focuses on improving initial segmentation and acoustic segment labeling. A new kernel-Gram matrix-based approach is proposed for segmentation. The number of segments is automatically determined using this approach, and performance comparable to the state-of-the-art algorithms is achieved. The segment labeling is formulated in a graph clustering framework. Graph clustering methods require extensive computational resources for large datasets. A new graph growing-based strategy is proposed to make the algorithm scalable. A two-stage iterative modeling is used to refine the segment boundaries and segment labels alternately. The proposed method achieves highest normalized mutual information and purity on TIMIT dataset. Quality assessment of the virtual labels is performed by building a language identification (LID) system for Indian languages. A bigram language model is built using these virtual phones. The LID system built using these virtual labels and corresponding language model performs very close to the system trained using manual labels and an i-vector-based LID system. The fusion of unsupervised LID system scores from our approach and the i-vector approach outperforms the LID system built under the supervision of manual labels by a relative margin of 31.19% demonstrating the effectiveness of unsupervised LID systems that can be at par with supervised systems by using virtual labels.
Sprache
Englisch
Identifikatoren
ISSN: 0278-081X
eISSN: 1531-5878
DOI: 10.1007/s00034-020-01408-8
Titel-ID: cdi_proquest_journals_2435008457

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX