Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Machine Learning Performance at the Edge: When to Offload an Inference Task
Ist Teil von
Proceedings of the 2nd Workshop on Networked Sensing Systems for a Sustainable Society, 2023, p.180-186
Ort / Verlag
New York, NY, USA: ACM
Erscheinungsjahr
2023
Quelle
ACM Digital Library
Beschreibungen/Notizen
Machine Learning (ML) techniques play a crucial role in extracting valuable insights from the large amounts of data massively collected through networked sensing systems. Given the increased capabilities of user devices and the growing demand for inference in mobile sensing applications, we are witnessing a paradigm shift where inference is executed at the end devices instead of burdening the network and cloud infrastructures. This paper investigates the performance of inference execution at the network edge and at end-devices, when using both a full and a pruned model. While pruning reduces model size, thus making the model amenable for execution at an end-device and decreasing communication footprint, trade-offs in time complexity, potential accuracy loss, and energy consumption must be accounted for. We tackle such trade-offs through extensive experiments under various ML models, edge load conditions, and pruning factors. Our results show that executing a pruned model provides time and energy (on the device side) savings up to 40% and 53%, respectively, w.r.t. the full model. Also, executing inference at the end-device may lead to 60% faster decision-making compared to inference execution at a highly loaded edge.