SlideShare a Scribd company logo
1 of 39
Download to read offline
1
Jiang Zhu
jiang.zhu@sv.cmu.edu
September 30th, 2013
Collaborators: Ben Draffin, Sean Wang, Pang Wu, Joy Ying Zhang*
* In alphabetical order
2
North
America
Global
0
50
100
150
200
250
300
2011
2011-2016
15.5
58.3
33.6
256.7Tablets
North
America
Global
0
500
1000
1500
2000
2500
2011
2011-2016
138 253
698
2027
Smartphones
•  Cisco Visual Networking Index VNI 2012, Cisco Systems Inc. 2012
Smartphone Tablets 3-8
2-4 2016
3
0%
10%
20%
30%
40%
50%
60%
Mobile Device Loss or theft
Strategy One Survey conducted among a U.S. sample of 3017 adults age 18 years older in September
21-28, 2010, with an oversample in the top 20 cities (based on population).
•  “The 329 organizations
polled had collectively lost
more than 86,000 devices
… with average cost of lost
data at $49,246 per device,
worth $2.1 billion or $6.4
million per organization.
"The Billion Dollar Lost-Laptop Study,"
conducted by Intel Corporation and the
Ponemon Institute, analyzed the scope
and circumstances of missing laptop
PCs.
4
Password
Application
Usability
A major source of
security vulnerabilities.
Easy to guess, reuse,
forgotten, shared
Different
applications may
have different
sensitivities
Authentication too-often or
sometimes too loose
5
Passwords
Normal passwords are not strong enough: usually meaningful words that can be
remembered
Stringent strong password can be annoying
Most users do not use the password-aid tools (Hong et al. 2009)
Fingerprint? Iris recognition? Face recognition? Voice recognition?
Password for the DHS E-file:
Contain from 8 to 16 characters
Contain at least 2 of the following 3 characters: uppercase alphabetic,
lowercase alphabetic, numeric
Contain at least 1 special character (e.g., @, #, $, %, & *, +, =)
Begin and end with an alphabetic character
Not contain spaces
Not contain all or part of your UserID
Not use 2 identical characters consecutively
Not be a recently used password
6
•  Derived from
•  Behavioral: the way a human subject behaves
•  Biometrics: technologies and methods that measure and analyzes
biological characteristics of the human body
•  Finger prints, eye retina, voice patterns
•  BehavioMetrics: Measurable behavior to Recognize or to Verify
•  Identity of a human subject, or
•  Subject’s certain behaviors
Behavioral BiometricsBehaviometrics
7
•  Mobile devices come with embedded sensors
•  Accelerometers, gyroscope, magnetometer
•  GPS receiver
•  WiFi, Bluetooth, NFC
•  Microphone, camera,
•  Temperature, light sensor
•  “Clock” and “Calendar”
•  Connect with other sensors
•  EEG, EMG, GSR
•  Mobile devices are connected with the Internet
•  Upload sensor data to the cloud
•  Viewing information computing on the server side
•  Users carry the device almost at all time.
•  My phone “knows” where I am, what I am doing and my future
activities.
8
•  Network Factors
•  Personal Factors
•  Behavioral Factors
•  Application Factors
•  Accelerometer
•  activity, motion, hand trembling, driving
style
•  sleeping pattern
•  inferred activity level, steps made per
day, estimated calorie burned
•  Motion sensors, WiFi, Bluetooth
•  accurate indoor position and trace.
•  GPS
•  outdoor location, geo-trace,
commuting pattern
•  Microphone, camera:
•  From background noise: activity, type
of location.
•  From voice: stress level, emotion
•  Video/audio: additional contexts
•  Keyboard, touches, slides
•  Specific tasks, user interactions, …
9
•  Monitor and track user behavior on smartphones using various
on-device sensors
•  Convert sensory traces and other context information to Personal
Behavior Features
•  Build continuous n-gram model with these features and use it for
calculation of Sureness Scores
•  Trigger various Authentication Schemes when certain application
is launched.
10
•  Human behavior/activities share some common properties
with natural languages
•  Meanings are composed from meanings of building blocks
•  Exists an underlying structure (grammar)
•  Expressed as a sequence (time-series)
•  Apply rich sets of Statistical NLPs to mobile sensory data
3
3.5
4
4.5
5
5.5
6
0 20 40 60 80 100 120 140 160 180 200
log(freq)
Rank of words by frequency
Zipf’s Law
11
Quantization Clustering
12
•  Generative language model: P( English sentence) given a
model
P(“President Obama has signed the Bill of … ”| Politics ) >>
P(“President Obama has signed the Bill of … ” | Sports )
LM reflects the n-gram distribution of the training data: domain,
genre, topics.
•  With labeled behavior text data, we can train a LM for
each activity type: “walking”-LM, “running”-LM and
classify the activity as
13
•  User activity at time t depends only on the last n-1 locations
•  Sequence of activities can be predicted by n consecutive activities
in the past
•  Maximum Likelihood Estimation from training data by counting:
•  MLE assign zero probability to unseen n-grams
Incorporate smoothing function (Katz)
Discount probability for observed grams
Reserve probability for unseen grams
14
•  Long distance dependency of words in sentences
• tri-grams for “I hit the tennis ball”: “I hit the”, “hit the tennis” “the tennis ball”
• “I hit ball” not captured
•  Future activities depends on activities far in the past. Intermediate
behavior has little relevance or influence
• Noise in the data sets: “ping-pong” effects in time-series, interference,
sampling errors, etc
• Model size
15
•  Build BehavioMetrics models for M classes P0, P1, P2, PM-1
•  Genders, age groups, occupations
•  Behaviors, activities, actions
•  Health and mental status
•  For a new behavioral text string L, we calculate the probability if L
is generated by model m
•  Classification problem formulated as
P(L, m) = P(l1, l2, . . . , lN , m) =
NY
i=1
Pm(li|li 1
i n+1)
ˆu = argmax
m
P(L, m) = argmax
m
NX
i=1
log Pm(li|li 1
i n+1)
16
•  Is this play Shakespeare’s work?
•  Comparing the play to Shakespeare’s known
library of works
•  Track words and phases patterns in the data
•  Calculate the probability the unknown U
given all the known Shakespeare’s work {S}
•  Compare with a threshold θ
•  Authentic work (a=1)
•  Fake, Forgery or Plagiarism (a=0)
ˆa = sign[P(U|{S}) > ]
17
•  A special binary classification problem
•  Given a normal BehavioMetrics model Pn, a new behavior text
sequence L, and a threshold θ, calculate the likelihood L is
generated by Pn and compare with θ
•  If the outcome is -1, flag an anomaly alert
•  Variation caused by noise could be smoothed out statistically
•  Need certain feedbacks to handle false positives, usually caused
by unseen behaviors or sub-optimal threshold.
ˆa(L|n, ) = sign[P(L, n) > )]
18
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Sliding Window Position
AverageLogProbability
Log Probility
Low Threshold
High Threshold
A
B
DC
19
•  Convert feature vector series to label streams – dimension reduction
•  Step window with assigned length
A1 A2 A1 A4
G2 G5 G2 G2
W2 W1 W2
P1 P3 P6 P1
A2 G2G5 W1 P1P3 A1A4 G2 W1W2 P1
20
21
Quantization
Risk Analysis
Tree
Clustering
Activity
Recognition
<
Application Sensitivity
Application Access Control
Certainty of Risk
Sensor Fusion
and Segmentation
Application
Access
Control
22
Inference
ModelingPreprocessingSensing
Feature
Construction
Behavior Text
Generation
N-gram
Model
Classifier
Binary
Classifier
Threshold
User
Authentication
User
Classification
•  SenSec collects sensor data
• Motion sensors
• GPS and WiFi Scanning
• In-use applications and their traffic patterns
•  SenSec modulebuild user behavior models
• Unsupervised Activity Segmentation and model the sequence using
Language model
• Building Risk Analysis Tree (DT) to detect anomaly
• Combine above to estimate risk (online): certainty score
•  Application Access Control Module activate authentication based
on the score and a customizable threshold.
23
•  Accelerometer
• Used to summarize
acceleration stream
• Calculated separately for each
dimension [x,y,z,m]
• Meta features:
Total Time, Window Size
•  GPS: location string from Google Map API and mobility path
•  WiFi: SSIDs, RSSIs and path
•  Applications: Bitmap of well-known applications
•  Application Traffic Pattern: TCP UDP traffic pattern vectors:
[ remote host, port, rate ]
24
!
25
•  Offline data collection (for training and testing)
Pick up the device from a desk
Unlock the device using the right slide pattern
Invoke Email app from the "Home Screen”
Some typing on the soft keyboard
Lock the device by pressing the "Power" button
Put the device back on the desk
2626
27
28
• 71.3% True-Positive Rate with 13.1% False Positive
29
•  Alpha test in Jun 2012, 1st Google Play Store release in Oct 2012
•  False Positive: 13% FPR still annoying users sometimes
Possible Solutions
•  Use adaptive model
•  Adding the trace data shortly before a false positive to the training data and
update the model
•  Change passcode validation to sliding pattern
•  A false positive will grant a “free ride” for a configurable duration
•  Assumption: just authenticated user should control the device for a given
period of time
•  “Free Ride” period will end immediately if abrupt context change is
detected.
30
31
•  Hypothesis: the micro-behavior a user interacts with the soft keyboard
reflects his/her cognitive and physical characteristics.
Cognitive fingerprints: typing rhythms, correction rate, delay between keys,
duration at each key….
Physical characteristics: area of pressure, amount of pressure, position of
contact, shift …
32
33
•  When pressing a key, the lifting-up position drifts away from the touch-
down position.
34
35
•  Discriminative model can
identify a user at 99%
accuracy with just one
keypress:
•  When all users’ behavior
is known.
•  Models trained over
4000 keys each from 4
users.
•  Generative model to detect
unauthorized use from an
unknown user
•  Only the authorized
user’s behavior is known
•  After 15 key presses:
detection rate is 86%
with a False Acceptance
(FAR) of 14% and a
False Rejection Rate
(FRR) of only 2.2%.
36
•  Experiments to discover anomaly usage with ~80%accuracy with
only days of training data
Quantization
Risk Analysis
Tree
Clustering
Activity
Recognition
<
Application Sensitivity
Application Access Control
Certainty of Risk
Sensor Fusion
and Segmentation
37
•  Extended data set for feature construction
TCP, UDP traffic; sound; ambient lighting; battery status, etc.
•  Data and Modeling
Gain more insights into the data, features and factorized relationships among
various sensors
Try other classification methods and compare results: LR, SVM, Random
Forest, etc
•  Enhanced security of SenSec components
Integration with Android security framework and other applications
•  Privacy as expectation (Liu et al., 2012)
Users need to know where the data resides, how the data is going to be used
and shared. Whom to trust the data with?
•  Energy efficiency
38
•  Participate in MobiSens and StressSens Data Collection
Experiments: http://mlt.sv.cmu.edu:3000/
•  Sign-up for SenSec 2.0 and KeySens 1.0 Beta Testers
Thank you.

More Related Content

Similar to Guest Lecture: SenSec - Mobile Security through BehavioMetrics

Similar to Guest Lecture: SenSec - Mobile Security through BehavioMetrics (20)

Behavioral Analytics with Smartphone Data. Talk at Strata + Hadoop World 2014...
Behavioral Analytics with Smartphone Data. Talk at Strata + Hadoop World 2014...Behavioral Analytics with Smartphone Data. Talk at Strata + Hadoop World 2014...
Behavioral Analytics with Smartphone Data. Talk at Strata + Hadoop World 2014...
 
Future mobility blumtritt_43pr
Future mobility blumtritt_43prFuture mobility blumtritt_43pr
Future mobility blumtritt_43pr
 
The Science of Fun - Data-driven Game Development
The Science of Fun - Data-driven Game DevelopmentThe Science of Fun - Data-driven Game Development
The Science of Fun - Data-driven Game Development
 
Machine Learning from Statistical Point of View
Machine Learning from Statistical Point of ViewMachine Learning from Statistical Point of View
Machine Learning from Statistical Point of View
 
Ml ppt at
Ml ppt atMl ppt at
Ml ppt at
 
Mobile Data Analytics
Mobile Data AnalyticsMobile Data Analytics
Mobile Data Analytics
 
Context-Awareness & Occupancy/Traffic Monitoring
Context-Awareness & Occupancy/Traffic MonitoringContext-Awareness & Occupancy/Traffic Monitoring
Context-Awareness & Occupancy/Traffic Monitoring
 
ICS3211 lecture 04
ICS3211 lecture 04ICS3211 lecture 04
ICS3211 lecture 04
 
Wearable technologies: what's brewing in the lab?
Wearable technologies: what's brewing in the lab?Wearable technologies: what's brewing in the lab?
Wearable technologies: what's brewing in the lab?
 
Building Social Life Networks 130818
Building Social Life Networks 130818Building Social Life Networks 130818
Building Social Life Networks 130818
 
Defense
DefenseDefense
Defense
 
Smartphones as ubiquitous devices for behavior analysis and better lifestyle ...
Smartphones as ubiquitous devices for behavior analysis and better lifestyle ...Smartphones as ubiquitous devices for behavior analysis and better lifestyle ...
Smartphones as ubiquitous devices for behavior analysis and better lifestyle ...
 
Iot, analytics and other trends
Iot, analytics and other trendsIot, analytics and other trends
Iot, analytics and other trends
 
Leap Motion ppt
Leap Motion pptLeap Motion ppt
Leap Motion ppt
 
lecture5-wearables-and-motion-sening.pptx
lecture5-wearables-and-motion-sening.pptxlecture5-wearables-and-motion-sening.pptx
lecture5-wearables-and-motion-sening.pptx
 
Conference talk: Understanding Vulnerabilities of Location Privacy Mechanisms...
Conference talk: Understanding Vulnerabilities of Location Privacy Mechanisms...Conference talk: Understanding Vulnerabilities of Location Privacy Mechanisms...
Conference talk: Understanding Vulnerabilities of Location Privacy Mechanisms...
 
Human computation and participatory systems
Human computation and participatory systems Human computation and participatory systems
Human computation and participatory systems
 
HCI 3e - Ch 18: Modelling rich interaction
HCI 3e - Ch 18:  Modelling rich interactionHCI 3e - Ch 18:  Modelling rich interaction
HCI 3e - Ch 18: Modelling rich interaction
 
Statistical Inference for development statistical model.pptx
Statistical Inference for development statistical model.pptxStatistical Inference for development statistical model.pptx
Statistical Inference for development statistical model.pptx
 
Sensors1(1)
Sensors1(1)Sensors1(1)
Sensors1(1)
 

Recently uploaded

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Recently uploaded (20)

Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 

Guest Lecture: SenSec - Mobile Security through BehavioMetrics

  • 1. 1 Jiang Zhu jiang.zhu@sv.cmu.edu September 30th, 2013 Collaborators: Ben Draffin, Sean Wang, Pang Wu, Joy Ying Zhang* * In alphabetical order
  • 3. 3 0% 10% 20% 30% 40% 50% 60% Mobile Device Loss or theft Strategy One Survey conducted among a U.S. sample of 3017 adults age 18 years older in September 21-28, 2010, with an oversample in the top 20 cities (based on population). •  “The 329 organizations polled had collectively lost more than 86,000 devices … with average cost of lost data at $49,246 per device, worth $2.1 billion or $6.4 million per organization. "The Billion Dollar Lost-Laptop Study," conducted by Intel Corporation and the Ponemon Institute, analyzed the scope and circumstances of missing laptop PCs.
  • 4. 4 Password Application Usability A major source of security vulnerabilities. Easy to guess, reuse, forgotten, shared Different applications may have different sensitivities Authentication too-often or sometimes too loose
  • 5. 5 Passwords Normal passwords are not strong enough: usually meaningful words that can be remembered Stringent strong password can be annoying Most users do not use the password-aid tools (Hong et al. 2009) Fingerprint? Iris recognition? Face recognition? Voice recognition? Password for the DHS E-file: Contain from 8 to 16 characters Contain at least 2 of the following 3 characters: uppercase alphabetic, lowercase alphabetic, numeric Contain at least 1 special character (e.g., @, #, $, %, & *, +, =) Begin and end with an alphabetic character Not contain spaces Not contain all or part of your UserID Not use 2 identical characters consecutively Not be a recently used password
  • 6. 6 •  Derived from •  Behavioral: the way a human subject behaves •  Biometrics: technologies and methods that measure and analyzes biological characteristics of the human body •  Finger prints, eye retina, voice patterns •  BehavioMetrics: Measurable behavior to Recognize or to Verify •  Identity of a human subject, or •  Subject’s certain behaviors Behavioral BiometricsBehaviometrics
  • 7. 7 •  Mobile devices come with embedded sensors •  Accelerometers, gyroscope, magnetometer •  GPS receiver •  WiFi, Bluetooth, NFC •  Microphone, camera, •  Temperature, light sensor •  “Clock” and “Calendar” •  Connect with other sensors •  EEG, EMG, GSR •  Mobile devices are connected with the Internet •  Upload sensor data to the cloud •  Viewing information computing on the server side •  Users carry the device almost at all time. •  My phone “knows” where I am, what I am doing and my future activities.
  • 8. 8 •  Network Factors •  Personal Factors •  Behavioral Factors •  Application Factors •  Accelerometer •  activity, motion, hand trembling, driving style •  sleeping pattern •  inferred activity level, steps made per day, estimated calorie burned •  Motion sensors, WiFi, Bluetooth •  accurate indoor position and trace. •  GPS •  outdoor location, geo-trace, commuting pattern •  Microphone, camera: •  From background noise: activity, type of location. •  From voice: stress level, emotion •  Video/audio: additional contexts •  Keyboard, touches, slides •  Specific tasks, user interactions, …
  • 9. 9 •  Monitor and track user behavior on smartphones using various on-device sensors •  Convert sensory traces and other context information to Personal Behavior Features •  Build continuous n-gram model with these features and use it for calculation of Sureness Scores •  Trigger various Authentication Schemes when certain application is launched.
  • 10. 10 •  Human behavior/activities share some common properties with natural languages •  Meanings are composed from meanings of building blocks •  Exists an underlying structure (grammar) •  Expressed as a sequence (time-series) •  Apply rich sets of Statistical NLPs to mobile sensory data 3 3.5 4 4.5 5 5.5 6 0 20 40 60 80 100 120 140 160 180 200 log(freq) Rank of words by frequency Zipf’s Law
  • 12. 12 •  Generative language model: P( English sentence) given a model P(“President Obama has signed the Bill of … ”| Politics ) >> P(“President Obama has signed the Bill of … ” | Sports ) LM reflects the n-gram distribution of the training data: domain, genre, topics. •  With labeled behavior text data, we can train a LM for each activity type: “walking”-LM, “running”-LM and classify the activity as
  • 13. 13 •  User activity at time t depends only on the last n-1 locations •  Sequence of activities can be predicted by n consecutive activities in the past •  Maximum Likelihood Estimation from training data by counting: •  MLE assign zero probability to unseen n-grams Incorporate smoothing function (Katz) Discount probability for observed grams Reserve probability for unseen grams
  • 14. 14 •  Long distance dependency of words in sentences • tri-grams for “I hit the tennis ball”: “I hit the”, “hit the tennis” “the tennis ball” • “I hit ball” not captured •  Future activities depends on activities far in the past. Intermediate behavior has little relevance or influence • Noise in the data sets: “ping-pong” effects in time-series, interference, sampling errors, etc • Model size
  • 15. 15 •  Build BehavioMetrics models for M classes P0, P1, P2, PM-1 •  Genders, age groups, occupations •  Behaviors, activities, actions •  Health and mental status •  For a new behavioral text string L, we calculate the probability if L is generated by model m •  Classification problem formulated as P(L, m) = P(l1, l2, . . . , lN , m) = NY i=1 Pm(li|li 1 i n+1) ˆu = argmax m P(L, m) = argmax m NX i=1 log Pm(li|li 1 i n+1)
  • 16. 16 •  Is this play Shakespeare’s work? •  Comparing the play to Shakespeare’s known library of works •  Track words and phases patterns in the data •  Calculate the probability the unknown U given all the known Shakespeare’s work {S} •  Compare with a threshold θ •  Authentic work (a=1) •  Fake, Forgery or Plagiarism (a=0) ˆa = sign[P(U|{S}) > ]
  • 17. 17 •  A special binary classification problem •  Given a normal BehavioMetrics model Pn, a new behavior text sequence L, and a threshold θ, calculate the likelihood L is generated by Pn and compare with θ •  If the outcome is -1, flag an anomaly alert •  Variation caused by noise could be smoothed out statistically •  Need certain feedbacks to handle false positives, usually caused by unseen behaviors or sub-optimal threshold. ˆa(L|n, ) = sign[P(L, n) > )]
  • 19. 19 •  Convert feature vector series to label streams – dimension reduction •  Step window with assigned length A1 A2 A1 A4 G2 G5 G2 G2 W2 W1 W2 P1 P3 P6 P1 A2 G2G5 W1 P1P3 A1A4 G2 W1W2 P1
  • 20. 20
  • 21. 21 Quantization Risk Analysis Tree Clustering Activity Recognition < Application Sensitivity Application Access Control Certainty of Risk Sensor Fusion and Segmentation Application Access Control
  • 22. 22 Inference ModelingPreprocessingSensing Feature Construction Behavior Text Generation N-gram Model Classifier Binary Classifier Threshold User Authentication User Classification •  SenSec collects sensor data • Motion sensors • GPS and WiFi Scanning • In-use applications and their traffic patterns •  SenSec modulebuild user behavior models • Unsupervised Activity Segmentation and model the sequence using Language model • Building Risk Analysis Tree (DT) to detect anomaly • Combine above to estimate risk (online): certainty score •  Application Access Control Module activate authentication based on the score and a customizable threshold.
  • 23. 23 •  Accelerometer • Used to summarize acceleration stream • Calculated separately for each dimension [x,y,z,m] • Meta features: Total Time, Window Size •  GPS: location string from Google Map API and mobility path •  WiFi: SSIDs, RSSIs and path •  Applications: Bitmap of well-known applications •  Application Traffic Pattern: TCP UDP traffic pattern vectors: [ remote host, port, rate ]
  • 24. 24 !
  • 25. 25 •  Offline data collection (for training and testing) Pick up the device from a desk Unlock the device using the right slide pattern Invoke Email app from the "Home Screen” Some typing on the soft keyboard Lock the device by pressing the "Power" button Put the device back on the desk
  • 26. 2626
  • 27. 27
  • 28. 28 • 71.3% True-Positive Rate with 13.1% False Positive
  • 29. 29 •  Alpha test in Jun 2012, 1st Google Play Store release in Oct 2012 •  False Positive: 13% FPR still annoying users sometimes Possible Solutions •  Use adaptive model •  Adding the trace data shortly before a false positive to the training data and update the model •  Change passcode validation to sliding pattern •  A false positive will grant a “free ride” for a configurable duration •  Assumption: just authenticated user should control the device for a given period of time •  “Free Ride” period will end immediately if abrupt context change is detected.
  • 30. 30
  • 31. 31 •  Hypothesis: the micro-behavior a user interacts with the soft keyboard reflects his/her cognitive and physical characteristics. Cognitive fingerprints: typing rhythms, correction rate, delay between keys, duration at each key…. Physical characteristics: area of pressure, amount of pressure, position of contact, shift …
  • 32. 32
  • 33. 33 •  When pressing a key, the lifting-up position drifts away from the touch- down position.
  • 34. 34
  • 35. 35 •  Discriminative model can identify a user at 99% accuracy with just one keypress: •  When all users’ behavior is known. •  Models trained over 4000 keys each from 4 users. •  Generative model to detect unauthorized use from an unknown user •  Only the authorized user’s behavior is known •  After 15 key presses: detection rate is 86% with a False Acceptance (FAR) of 14% and a False Rejection Rate (FRR) of only 2.2%.
  • 36. 36 •  Experiments to discover anomaly usage with ~80%accuracy with only days of training data Quantization Risk Analysis Tree Clustering Activity Recognition < Application Sensitivity Application Access Control Certainty of Risk Sensor Fusion and Segmentation
  • 37. 37 •  Extended data set for feature construction TCP, UDP traffic; sound; ambient lighting; battery status, etc. •  Data and Modeling Gain more insights into the data, features and factorized relationships among various sensors Try other classification methods and compare results: LR, SVM, Random Forest, etc •  Enhanced security of SenSec components Integration with Android security framework and other applications •  Privacy as expectation (Liu et al., 2012) Users need to know where the data resides, how the data is going to be used and shared. Whom to trust the data with? •  Energy efficiency
  • 38. 38 •  Participate in MobiSens and StressSens Data Collection Experiments: http://mlt.sv.cmu.edu:3000/ •  Sign-up for SenSec 2.0 and KeySens 1.0 Beta Testers