Where Do Deep Fakes Look? Synthetic Face Detection via Gaze Tracking

Recent advances of AI have caused not only useful advancements but detrimental uses, such as deep fakes. Various deep fake detection algorithms have been proposed. A novel approach focuses on the consistency of eyes and gazes.

It is the first study to perform detection based on holistic eye and gaze features and not picking a few of them.

Face recognition – artistic interpretation in Hollywood CA. Image credit: YO! What Happened To Peace? via Flickr, CC BY-SA 2.0

Alongside currently used single artifacts as blinks and reflections, it includes such signatures as inconsistent gaze directions or eye symmetries. Features are analyzed on five domains: geometric (vergence points and gaze directions), visual (color and shape), temporal (the consistency of all signals), spectral (signal and noise correlation), and metric (spatial coherence).

Evaluation on publicly available datasets achieved the detection accuracy up to 89.79% using only the proposed eye and gaze features. The system can be integrated into any existing fake detector.

Following the recent initiatives for the democratization of AI, deep fake generators have become increasingly popular and accessible, causing dystopian scenarios towards social erosion of trust. A particular domain, such as biological signals, attracted attention towards detection methods that are capable of exploiting authenticity signatures in real videos that are not yet faked by generative approaches. In this paper, we first propose several prominent eye and gaze features that deep fakes exhibit differently. Second, we compile those features into signatures and analyze and compare those of real and fake videos, formulating geometric, visual, metric, temporal, and spectral variations. Third, we generalize this formulation to deep fake detection problem by a deep neural network, to classify any video in the wild as fake or real. We evaluate our approach on several deep fake datasets, achieving 89.79% accuracy on FaceForensics++, 80.0% on Deep Fakes (in the wild), and 88.35% on CelebDF datasets. We conduct ablation studies involving different features, architectures, sequence durations, and post-processing artifacts. Our analysis concludes with 6.29% improved accuracy over complex network architectures without the proposed gaze signatures.

Link: https://arxiv.org/abs/2101.01165


Notify of
Inline Feedbacks
View all comments
Would love your thoughts, please comment.x