Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Channel-Wise Spatiotemporal Aggregation Technology for Face Video Forensics
Ist Teil von
Security and communication networks, 2021, Vol.2021, p.1-13
Ort / Verlag
London: Hindawi
Erscheinungsjahr
2021
Quelle
EZB Electronic Journals Library
Beschreibungen/Notizen
Recent progress in deep learning, in particular the generative models, makes it easier to synthesize sophisticated forged faces in videos, leading to severe threats on social media about personal privacy and reputation. It is therefore highly necessary to develop forensics approaches to distinguish those forged videos from the authentic. Existing works are absorbed in exploring frame-level cues but insufficient in leveraging affluent temporal information. Although some approaches identify forgeries from the perspective of motion inconsistency, there is so far not a promising spatiotemporal feature fusion strategy. Towards this end, we propose the Channel-Wise Spatiotemporal Aggregation (CWSA) module to fuse deep features of continuous video frames without any recurrent units. Our approach starts by cropping the face region with some background remained, which transforms the learning objective from manipulations to the difference between pristine and manipulated pixels. A deep convolutional neural network (CNN) with skip connections that are conducive to the preservation of detection-helpful low-level features is then utilized to extract frame-level features. The CWSA module finally makes the real or fake decision by aggregating deep features of the frame sequence. Evaluation against a list of large facial video manipulation benchmarks has illustrated its effectiveness. On all three datasets, FaceForensics++, Celeb-DF, and DeepFake Detection Challenge Preview, the proposed approach outperforms the state-of-the-art methods with significant advantages.