UB Paderborn / Katalog / Suche / Details

Zur Ergebnisliste

Ergebnis 11 von 62

Unsupervised Speech Signal-to-Symbol Transformation for Language Identification

Circuits, systems, and signal processing, 2020-10, Vol.39 (10), p.5169-5197

2020

Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte

Titel

Unsupervised Speech Signal-to-Symbol Transformation for Language Identification

Ist Teil von

Circuits, systems, and signal processing, 2020-10, Vol.39 (10), p.5169-5197

Ort / Verlag

New York: Springer US

Erscheinungsjahr

2020

Quelle

Alma/SFX Local Collection

Beschreibungen/Notizen

This paper presents a new approach for unsupervised segmentation and labeling of acoustically homogeneous segments from the speech signals. The virtual labels, thus obtained, are used to build unsupervised acoustic models in the absence of manual transcriptions. We refer to this approach as unsupervised speech signal-to-symbol transformation. This approach mainly involves three steps: (i) segmenting the speech signal into acoustically homogeneous regions, (ii) assigning consistent labels to the acoustic segments with similar characteristics and (iii) iterative modeling of the acoustic segments sharing the same label. This work focuses on improving initial segmentation and acoustic segment labeling. A new kernel-Gram matrix-based approach is proposed for segmentation. The number of segments is automatically determined using this approach, and performance comparable to the state-of-the-art algorithms is achieved. The segment labeling is formulated in a graph clustering framework. Graph clustering methods require extensive computational resources for large datasets. A new graph growing-based strategy is proposed to make the algorithm scalable. A two-stage iterative modeling is used to refine the segment boundaries and segment labels alternately. The proposed method achieves highest normalized mutual information and purity on TIMIT dataset. Quality assessment of the virtual labels is performed by building a language identification (LID) system for Indian languages. A bigram language model is built using these virtual phones. The LID system built using these virtual labels and corresponding language model performs very close to the system trained using manual labels and an i-vector-based LID system. The fusion of unsupervised LID system scores from our approach and the i-vector approach outperforms the LID system built under the supervision of manual labels by a relative margin of 31.19% demonstrating the effectiveness of unsupervised LID systems that can be at par with supervised systems by using virtual labels.

Sprache: Englisch
Identifikatoren: ISSN: 0278-081X
eISSN: 1531-5878
DOI: 10.1007/s00034-020-01408-8
Titel-ID: cdi_proquest_journals_2435008457

Format: –
Schlagworte: Acoustics, Algorithms, Circuits and Systems, Clustering, Construction, Datasets, Electrical Engineering, Electronics and Microelectronics, Engineering, Instrumentation, Iterative methods, Labeling, Labels, Quality assessment, Segmentation, Segments, Signal,Image and Speech Processing, Speech, System effectiveness

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX

Menü

Weitere Dienste

Einstellungen

Unsupervised Speech Signal-to-Symbol Transformation for Language Identification

Details

Weiterführende Literatur