Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Examining Speaker and Keyword Uniqueness: Partitioning Keyword Spotting Datasets for Federated Learning with the Largest Differencing Method
Ist Teil von
Artificial Intelligence and Machine Learning, 2023, Vol.1805, p.167-177
Ort / Verlag
Switzerland: Springer International Publishing AG
Erscheinungsjahr
2023
Link zum Volltext
Quelle
Alma/SFX Local Collection
Beschreibungen/Notizen
Federated learning is a powerful training strategy for neural networks where several independent clients train a model without the need of sharing potentially sensitive data. However, real world client-local data is usually biased: A single client might have access to only a few lighting conditions in computer visions, patient groups in a hospital or speakers and keywords in a smart device performing keyword spotting. We help researchers to better understand and estimate the expected performance impacts by introducing a new method to partition a given dataset into an arbitrary amount of clients, each with unique properties, to simulate such conditions.
We apply the method to partition the Google Speech Command dataset into clients with non-overlapping speakers and additionally unique keywords and share the script to create the novel GSC-FL dataset. The results, using convolutional neural networks, show that the performance of the final model is stable up to at least 16 clients and models trained only on local data are clearly outperformed by federated learning. However, unique speakers for each client have a negative performance impact and it increases even more with unique keywords. Our script can be applied with only minor adjustments to partition any other dataset for federated learning investigations as well.