Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...

Details

Autor(en) / Beteiligte
Titel
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
Ist Teil von
  • IEEE transactions on multimedia, 2024, Vol.26, p.6462-6474
Ort / Verlag
Piscataway: IEEE
Erscheinungsjahr
2024
Quelle
IEEE Electronic Library (IEL)
Beschreibungen/Notizen
  • Visual Speech Recognition (VSR) is the task of predicting spoken words from silent lip movements. VSR is regarded as a challenging task because of the insufficient information on lip movements. In this article, we propose an Audio Knowledge empowered Visual Speech Recognition framework (AKVSR) to complement the insufficient speech information of visual modality by using audio modality. Different from the previous methods, the proposed AKVSR 1) utilizes rich audio knowledge encoded by a large-scale pretrained audio model, 2) saves the linguistic information of audio knowledge in compact audio memory by discarding the non-linguistic information from the audio through quantization, and 3) includes Audio Bridging Module which can find the best-matched audio features from the compact audio memory, which makes our training possible without audio inputs, once after the compact audio memory is composed. We validate the effectiveness of the proposed method through extensive experiments, and achieve new state-of-the-art performances on the widely-used LRS3 dataset.
Sprache
Englisch
Identifikatoren
ISSN: 1520-9210
eISSN: 1941-0077
DOI: 10.1109/TMM.2024.3352388
Titel-ID: cdi_crossref_primary_10_1109_TMM_2024_3352388

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX