This document describes Movee, a system for verifying the authenticity of videos taken with mobile devices. Movee analyzes both video and accelerometer data to determine if a video was genuinely captured on a mobile device or if it is fraudulent. It works by comparing motion detected in the video to motion detected from the device's accelerometer. The document outlines attacks Movee aims to detect, its methodology involving video motion analysis, inertial sensor analysis and similarity computation, experimental results showing high accuracy, and limitations and opportunities for future work.
Artificial intelligence in the post-deep learning era
Visual Verifications through Liveness Analysis using Mobile Devices
1. Seeing is Not Believing: Visual
Verifications Through Liveness Analysis
using Mobile Devices
Mahmudur Rahman
Umut Topkara, Bogdan Carbunar
ACSAC 2013
11. Other Attacks
Copy-Paste attack
Copy video from victim and upload
Replay attack
Point camera to video replay (e.g., on TV)
Upload result
Projection Attack
Move camera over a static image (poster, projection)
Upload result
12. Movee: How It Works
Camera Accelerometer
Video Motion
Analysis
Inertial Sensor
Analysis
Similarity Computation
Classifier
Features
13. Video Motion Analysis (VMA)
Compute frame-by-frame displacement vector of all
consecutive frames of video
Apply phase correlation (image processing technique).
Based on Fourier shift property
Camera moves
to the left
14. Movee: How It Works
Camera Accelerometer
Video Motion
Analysis
Inertial Sensor
Analysis
Similarity Computation
Classifier
Features
15. Inertial Sensor Motion Analysis (IMA)
Use a combination of low-pass and high-pass filters to remove
the effects of gravity
Two step integration:
First to obtain velocity
Second to retrieve position
On both X and Y axes
Apply the trapezoidal rule for approximating the definite integral
Sensor coordinate system
16. Movee: How It Works
Camera Accelerometer
Video Motion
Analysis
Inertial Sensor
Analysis
Similarity Computation
Classifier
Features
17. Similarity Computation (SC)
Compares the two motion sequences computed by
the VMA and the IMA modules
Returns a set of features summarizing the nature of
the similarity between the two sequences
Has three major steps:
Dynamic Time Warping (DTW)
Stretching
Calibration
18. Dynamic Time Warping (DTW)
Calculates an optimal match b/w two temporal
sequences which may vary in time or speed
Input: Two sequences x1,x2,...,xn and y1,y2,...,ym.
Goal: Align two sequence based on a common time-axis
shifting of time axis
two series in different time phase
19. Extend the lower sampling rate sequence (L) to the
length of the higher sampling rate sequence (H)
Linear interpolation generates new points for the sequence L
Stretching
20. VMA cannot infer the distance of objects into the
camera
Compute the calibration factor (CF), used to multiply all
the points of the video stream
Truncated Mean
Polynomial Curve Fitting
Calibration
21. Stretching
Calibration
Penalized
DTW
Motion X axis
Motion Y axis
Motion direction
Motion Y axis
Ratio of expansion points
Motion direction
Motion X axis
Calibration Factor
Ratio of diagonal points
Ratio of contraction points
Normalized penalty cost
DTW cost
Ratio of overlap points
Video Motion
Analysis
Inertial Sensor
Analysis
Similarity Computation
Features Used by Classification
Features
Classifier
22. Classification
Runs trained classifiers over all the features
Multi Layer Perceptron (MLP)
Random Forest (RF)
Decision Tree: C4.5 (J48)
Decide whether a video stream is genuine or fake
23. Genuine sample A
Fake sample C Fake sample D
Genuine sample B
Dataset and Data Collection
Data collected from 10 users, 10 well-defined samples from each
user: a total of 100 genuine samples
Directions random (random dataset) Directions same (direction sync dataset)
24. Movee in Action
Client: Android
Server component: C++ and PHP; OpenCV library for VMA
Verification step considers the first 6s (inspired by Vine)
25. Experimental Setup
Client platform: Samsung Admire smartphone
running Android OS Gingerbread 2.3 with an
800MHz CPU
Server Platform: Dell laptop equipped with a
2.4GHz Intel Core i5 processor and 4GB of RAM
Classifier platform: Weka version 3.7.9
27. Experimental Results
ROC curve on random
dataset with MLP
Receiver Operating Characteristic (ROC Curve): illustrates the
performance of a binary classifier system
Equal Error Rate (EER): the proportion where accept and
reject errors are equal
Lower EER denotes a more accurate solution
28. Attack Detection Analysis
Attacks Detection
accuracy (%)
Comment
Copy-Paste 100 no sensor stream exists
Replay 100 no sensor stream exists
Projection 0 human observer can detect immediately
Random
movement
92 MLP
Direction sync 84 C4.5
30. Impact of SC steps on Movee accuracy
Random attack scenario
Stretching and DTW steps contribute almost 12% for the random movement
attack, 8% for direction sync attack.
For the direction sync attack, the penalization step brings almost 11% for all
three classifiers.
Direction sync attack scenario
31. Limitations
• Still to be explored
• Very short videos (less than 6s)
• Videos shot in unusual circumstances.
• Doctored video and accelerometer streams
32. Conclusions and Future Work
Novel approach to verify the “liveness” of videos
claimed to have been captured on mobile devices
Our experiments on real user data show that Movee
achieves up to 92% accuracy
Perform user study to evaluate and demonstrate the
usability and efficacy of Movee
There are variety of technologies, including not only the smartphones but the New media technology such as social networking and media-sharing websites (the recently emerging social networks and media sharing sites) like facebook, youtube, vine are encouraging users to upload their own videos and share them with friends and the rest of the world. But they want to ensure that their own captured videos have not been copied and claimed by others. This raises important questions in the mind of the general users and service providers such as – are the posted videos genuine? Can they be plagiarized?
Now we want to mention two major applications which really motivated our work. The first one is Citizen Journalism. A lot of public events happen everyday around the world. Concerned citizens, who are witnesses of events, often use their smartphones and report breaking news and scenes of major public events more quickly than traditional media reporters– and these videos are often used by major media outlets. For example, CitizenTube is a subnetwork of youtube which is dedicated for the citizen reporters to report events, mostly political events and breaking news happening around the world. Therefore the validity of videos taken by smartphones is very important.
The second application we want to talk about is 311 service. Cities use crowdsourcing (Mobile 311 apps) for maintenance issues, such as potholes, open manholes, and other hazards.
One issue is that users can easily introduce fake reports into the system. However, if we ask users to take a video of the scene, we can then verify the validity of the video. So the system doesn’t need to wait to receive multiple complaints.
Therefore, our goal is to verify that the visual stream captured in the mobile device has not been tampered with and hence can be trusted
So what’s going on? The user needs to install an application – Movee that we denote as the client.
What it does is when the user records a video, the Movee app captures data both from the camera and from other mobile device sensors (for instance, the accelerometer)
If the data is consistent, Movee uploads the video to the server
Now let’s see how Movee works: Movee extracts video frames from the video stream, feeds them to the Video Motion Analysis module that extracts movement information.
Movee extracts accelerometer data at the same time from the inertial sensor, feeds it to the Inertial Sensor Analysis module that also infers movement information
These information are fed into the Similarity Computation Module, that measures how similar the two movement types are. It also extracts several features.
All these features are fed into the Classification module, which then classifies and determines whether the video stream is genuine or fake.
We assume that users can be malicious.
Their goal is to fraudulently claim ownership of videos that they did not create. How can they do it?
For this, they can employ a variety of attacks, that we describe in the following…
In the random movement attack, the attacker copies a video. He then moves the mobile device in a random direction to allow Movee to capture the sensor data.
He then uploads the video claiming them to be his own.
The direction sync attack is more complex and at the same time more sophisticated than the previous attack we just described. the attacker watches the copied video and attempts to emulate the camera movement using his mobile device.
He then uploads the video.
There are other simpler attacks which we also considered. For the Copy-Paste attack, the attacker just copies the video from the victim and upload.
For the Replay attack, the attacker points the camera of his mobile device to a replay on TV or projector screen, captures the video and upload.
So now let’s see each how each individual module of Movee works.
We start with the Video Motion Analysis module
The Video Motion Analysis module infers the movement between the frames of the video.
Let’s look at the first figure. Now we move the camera to the left. Then how we can obtain the displacement between these two frames?
We apply an image processing technique phase correlation.
It does this by looking at consecutive frames and measuring their displacement (on the x and y axes) – obtains their linear shifts
As a pre-processing step, it applies a Hamming window filter to eliminate noise (edge effects) from each frame.
We now look at the inertial sensor analysis module.
An acceleration sensor measures the acceleration applied to the device, including the force of gravity
So we use ……
Our approach was using the double integral.
Apply the trapezoidal rule for approximating the integral
Now we have the movement data both from the video and from the accelerometer. we now look at the Similarity Computation module. what it does
……
It has three major steps. We gonna cover each step in the following slides
DTW is a dynamic programming solution that calculates an optimal match b/w two temporal sequences converting one sequence to the other.
Here is an example. Here we see two sequences in different time phases but applying DTW aligns each other.
Stretching is another processing step. the sampling rates can be different for video and accelerometer-hence their length are not the same.
For example, here are the video and accelerometer sequences after DTW was applied which is not showing the alignment of two motions. Here comes the stretching.
It stretches the shorter sequence to the length of the longer sequence and compares.
Now look if we first apply stretching and then DTW, the two sequences seem to be better aligned
The same motion pattern can be registered as faster when the objects in the view are close to the camera, and slower when the objects are far.
Calibration compensates this artifact.
It calibrates the speed of the video motion vector with a coefficient to match that of the speed of the inertial sensor motion vector.
This is the previous plot where we applied stretching and then DTW. But the sequences are even more aligned when we apply at first stretching and calibration and then DTW. This is definitely impressive.
//Best performed calibration methods for our experiments:
Here we show all the features extracted from the first three modules of Movee which will be fed into the classification module
We tried several classifiers. These are the best performed classifiers for our experiment.
Now we are moving forward. Let’s talk about our dataset. Data was collected…..
We needed to train the classifiers to detect a fake video based on the features extracted using Movee modules.
We tried to emulate the random movement attack and more powerful direction sync attack. For this, we created a synthetic dataset according to each of these attacks.
For the random dataset, each fake sample is created by coupling video of one genuine sample with sensor data of another randomly chosen genuine sample.
For the direction sync dataset, video and sensor data from two different genuine samples with the same direction are merged to create a fake sample.
We have implemented a Movee client using Android and a server component using C++ and PHP.
Once the user starts capturing the scene, he needs to follow the target icon (bulls eye). Once the camera center superimposes the target, the target changes to a new place and the user needs to follow. Movee considers the first 6s for the verification but the user can continue capturing the scene. We were inspired by Vine to choose the verification interval to be 6s. Another advantage is that it keeps the size of the video small.(150 KB in Samsung Admire phone)
This is our experimental setup; client on a samsung admire phone, server on a dell laptop and the classifier using Weka.
We explored the accuracy of Movee in detecting fraudulent samples, on both random and direction sync data sets.
For the random dataset, the multilayer perceptron neural networks (MLP) provides the highest accuracy of 92%.
For the direction sync dataset C4.5 exhibits the best performance with 84% accuracy. C4.5 seems pretty good as it performs better than other classifiers in much smarter attack while for random attack, it’s accuracy is still 90%.
We plotted the ROC curve for MLP on random dataset. The EER value of MLP is as small as 0.08.
A lower EER denotes a more accurate solution.
Now we compare all the attacks we have mentioned before. It is straightforward to see that Movee prevents the “Copy-Paste” and “Replay” attacks as no …..
Movee does not detect the “Projection” attack, as the video and sensor streams are captured in the same user hand movement. However a human observer can detect immediately that the movie is of a poster.
Lastly, we measured the overhead of the Movee modules on the server.
It seems VMA is the most time consuming module, slightly exceeding 1s.
IMA and Classification modules impose the smallest overheads, together being 110ms.
The overhead of the SC module is around 150ms, with the smallest cost imposed by the stretching step and the highest cost by the penalty based DTW.
Finally, we evaluated the impact of each step of the SC module on the accuracy of Movee, for both test datasets. For each dataset
Contribute the most
Now time for the caveats of our approach. Involving very high accelerometer activity, e.g., running, or when the user is in a moving vehicle.
And we didn’t experiment with Doctored video and accelerometer streams. This is a much harder attack. This requires investment from the attacker as he needs to create a corresponding accelerometer sample by recovering 3D trajectory of the camera movement according to the video and feed it to Movee.
In this work, we introduced a new solution……
Movee can provides accuracy ranging between 84% to 92%
We are currently performing user studies…