Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Ergebnis 24 von 1136
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, p.1-5
2023
Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte
Titel
E2E Segmentation in a Two-Pass Cascaded Encoder ASR Model
Ist Teil von
  • ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, p.1-5
Ort / Verlag
IEEE
Erscheinungsjahr
2023
Quelle
IEEE Xplore
Beschreibungen/Notizen
  • We explore unifying a neural segmenter with two-pass cascaded encoder ASR into a single model. A key challenge is allowing the segmenter (which runs in real-time, synchronously with the decoder) to finalize the non-causal 2nd pass (which runs 900 ms behind real-time) without introducing user-perceived latency or deletion errors during inference. We propose a design where the neural segmenter is integrated with the causal 1st pass decoder to emit a end-of-segment (EOS) signal in real-time. The EOS signal is then used to finalize the non-causal 2nd pass. We experiment with different ways to finalize the 2nd pass, and find that a dummy frame injection strategy allows for simultaneous high quality 2nd pass results and low finalization latency. On a real-world long-form captioning task (YouTube), we achieve 2.4% relative WER and 140 ms EOS latency gains over a baseline VAD-based segmenter with the same cascaded encoder.
Sprache
Englisch
Identifikatoren
eISSN: 2379-190X
DOI: 10.1109/ICASSP49357.2023.10095355
Titel-ID: cdi_ieee_primary_10095355

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX