Behavior-Based Authentication System
Based on Smartphone Life-Logs Data
School of Electronics & Computer Engineering
Chonnam National University
Priagung Khusumanegara
Advisor : Professor Deok Jai Choi
Contents
⌘ Introduction
⌘ Contributions
⌘ General Steps of Our Work
⌘ Data Collection
⌘ Used Data
⌘ Preprocessing
⌘ Data Cleansing
⌘ Data Transformations
⌘ Feature Extraction
⌘ Feature Normalization
⌘ Feature Selection
⌘ User Behavior Model
⌘ Similarity Measure
⌘ Experimental Results
⌘ Conclusions
⌘ Future Works
2
Introduction
⌘ Smartphone is not only used for common telecommunications
such as calling and texting, but also for online activity such as
sending and receiving emails, internet banking, social media, etc.
⌘ The smartphone may store sensitive information, such as credit
card number, personal password, mobile banking, etc.
⌘ Therefore, it is very important to develop a user authentication
system that can be used to protect the smartphone from illegal
users.
3
Introduction (Cont’d)
The current authentication methods can be classified into 3 types:​
4
Figure: Authentication Methods
Introduction (Cont’d)
⌘ Behavior-based authentication method is an authentication
method that takes into account the way of a user interacts with
his smartphone.
⌘ In behavior-based authentication
1. We do not need to remember the password or PIN
2. We do not need anxious about it can be lost or stolen
3. We can check legality of the user after login
5
Introduction (Cont’d)
Related Works:
1. Implicit Authentication for Mobile Devices [2009]
2. Implicit Authentication Through Learning User Behavior [2011]
3. Multi-sensor Authentication to Improve Smartphone Security
[2015]
4. Continuous Touchscreen Mobile Authentication Using Several
Gestures [2016]
6
Contributions
1. We utilized the rich life logs dataset of smartphone users that are
collected from multiple sensors and smartphone database
1. We extracted several key features to represent smartphone users’
behavior including two aspects: 1) User’s Behavior and 2)
Environment.
1. We selected a subset of smartphone user behavior features that
relevant with smartphone user behavior identification.
1. We built a user behavior model that can characterize the
smartphone user’s behavior pattern.
7
General Steps of Our Work
8
Data Collection
⌘ 47 students followed our study and each student was equipped
with Android smartphone running life logs data collection
application during around 2 months.
⌘ Our life logs data collection application that is intended for
Android smartphone was built based on Funf Open Sensing
Framework.
⌘ Funf Open Sensing Framework is an open source Android-
based extensible sensing framework for Android smartphone.
9
Used Data
⌘ The collected smartphone life logs data of 47 participants are stored in an
archive file, which has size 4.25 GB. After we extracted the archive file,
there are 47 folders within different name in which each folder contains the
smartphone life logs data of each participant.
⌘ Used data to identify smartphone user behavior have to meet three
requirements following below:
1. Have a good condition​
2. Enough to be used in behavior analysis​
3. Have correlation with human behavior (based on previous studies)
10
Used Data
Have a good condition
Based on data condition
➢ We found one folder, which does not has data (empty)
➢ Four folders contains malformed smartphone life logs data
11
Used Data
Enough to be used in behavior analysis
⌘ There are five participants who have less than one month in
duration of data collection, consequently they are discarded since
they are not enough to be processed in smartphone user behavior
analysis.
12
Used Data
Have correlation with behavior (based on previous researches)
➢ Our life logs data collection application uses 19 kind of probes to
collect specific information.
➢ In this study, we only use probes that have correlation with user
behavior based on previous studies.
13
Used Data
Have correlation with behavior (based on previous
researches)
➢ Our life logs data collection application uses 19 kind
of probes to collect specific information.
➢ In this study, we only use probes that have correlation
with user behavior based on previous researches.
14
Data Preprocessing
⌘Data Cleansing
⌘ It works to “clean” the smartphone life logs data by removing
duplicate and outlier values.
⌘Data Transformation​
⌘It is used to transform the data from *db format to csv files.
⌘When we load all of life logs data of 37 participants which have
size 14.22 GB in the same time, it will spend resource of our
computer such as processor and RAM.
15
Feature Extraction
⌘ The characteristics of smartphone users can be reflected based
on their behavior and environment.
E.g.: different number of communication, different environment,
different activity, and etc.
⌘ Therefore, we classify the extracted features into two categories,
namely behavior feature and environment feature.
16
Data Preprocessing
● The characteristics of smartphone users can be reflected based
on their behavior and environment.
E.g.: different number of communication, different environment,
different activity, and etc.
● Therefore, we classify the extracted features into two categories,
namely behavior feature and environment feature.
17
Data Preprocessing
● The characteristics of smartphone users can be
reflected based on their behavior and
environment. ​E.g.: different number of
communication, different environment, different
activity, and etc.
● Therefore, we classify the extracted features into two
categories, namely behavior feature and environment
feature.
18
Data Preprocessing
● Behavior features means the features that coming from user’s
behavior.
● Environment features means the features that coming from user’s
environment.
19
Feature Normalization
20
Feature Extraction
● Behavior features means the features that coming from
user’s behavior.
● Environment features means the features that coming
from user’s environment.
21
Feature Extraction
Behavior Feature
(27 features)
Environment Feature
(2 features)
22
Feature Selection
⌘Random forest algorithm works as a large collection of decision
tress.
⌘It works based on the bagging technique which means that is
combination of learning models to increase the classification
accuracy.
⌘The reasons why we conducted random forest technique
1. Very accurate
2. Rarely over-fitting
3. Can handle mixed data (continuous / categorical)
4. Naturally multivariate.
23
Feature Selection
24
Figure: Results of Random Forest Scoring Technique
User Behavior Model
● A user behavior model is formed based on observation of the user’s
behavior pattern.
● We focus on how to build a user behavior model that characterize
the user's behavior pattern so that it can be used to find the
differences among the users.
E.g.: How frequently smartphone user makes phone calls, how frequently
the smartphone user sends a message, how frequently smartphone user
charge his phone, etc.
25
User Behavior Model
●
26
X Frequency
1 1
2 2
3 1
4 1
Total Frequency 5
X Probability Mass Function
1 1/5 =0.2
2 2/5 =0.4
3 1/5 =0.2
4 1/5 =0.2
User Behavior Model
●
27
User Behavior Model
● The problem of our model is the length of probability mass function
of each day may be different.
● So to overcome that problem, we use bin to make same length of
probability mass function for each day.
28
User Behavior Model
●
29
User Behavior Model
30
USER 1
USER 2
DAY 1 DAY 2
Similarity Measure
●
31
Similarity Measure
●
32
Similarity Measure
●
33
Experimental Setup
We used smartphone life logs data of 37 students (men and women)
that were collected during 42 days (6 weeks)
We divided our dataset into two parts which are
Enrolment data : data that were collected from day-1 up to day-21 for
each user
Verification data :data that were collected from day-22 up to day-42
for each user
34
Experimental Setup
●
35
Experimental Results
36
Conclusion
⌘ We collected smartphone life-logs data of 47 students during continuous period in
around two months.
⌘ We extracted several key features to represent smartphone users’ behavior including
two aspects: 1) User’s Behavior and 2) Environment, and then selected a subset of
these features that relevant with smartphone user identification.
⌘ We built a user behavior model that characterize user’s behavior patterns to make a
user’s profile.
⌘ Our approach can achieve best performance with Equal Error Rate (EER) equals to
7.05% by using Mahalanobis distance.
⌘ The low value Equal Error Rate (EER) indicates that our behavior-based
authentication provided good security post login.
37
Conclusion
● In point of view of window size, our user behavior model relied one
days as the windows size, in the future we have a plan to
experiment with different windows size such as two days, three
days, etc. in order to analyze the influence of window size in our
user behavior model.
● In our experiment, we compared the days between current days in
same week, in the future we plan to compare the same day however
in different week.
38

Behavior-Based Authentication System Based on Smartphone Life-Logs Data

  • 1.
    Behavior-Based Authentication System Basedon Smartphone Life-Logs Data School of Electronics & Computer Engineering Chonnam National University Priagung Khusumanegara Advisor : Professor Deok Jai Choi
  • 2.
    Contents ⌘ Introduction ⌘ Contributions ⌘General Steps of Our Work ⌘ Data Collection ⌘ Used Data ⌘ Preprocessing ⌘ Data Cleansing ⌘ Data Transformations ⌘ Feature Extraction ⌘ Feature Normalization ⌘ Feature Selection ⌘ User Behavior Model ⌘ Similarity Measure ⌘ Experimental Results ⌘ Conclusions ⌘ Future Works 2
  • 3.
    Introduction ⌘ Smartphone isnot only used for common telecommunications such as calling and texting, but also for online activity such as sending and receiving emails, internet banking, social media, etc. ⌘ The smartphone may store sensitive information, such as credit card number, personal password, mobile banking, etc. ⌘ Therefore, it is very important to develop a user authentication system that can be used to protect the smartphone from illegal users. 3
  • 4.
    Introduction (Cont’d) The currentauthentication methods can be classified into 3 types:​ 4 Figure: Authentication Methods
  • 5.
    Introduction (Cont’d) ⌘ Behavior-basedauthentication method is an authentication method that takes into account the way of a user interacts with his smartphone. ⌘ In behavior-based authentication 1. We do not need to remember the password or PIN 2. We do not need anxious about it can be lost or stolen 3. We can check legality of the user after login 5
  • 6.
    Introduction (Cont’d) Related Works: 1.Implicit Authentication for Mobile Devices [2009] 2. Implicit Authentication Through Learning User Behavior [2011] 3. Multi-sensor Authentication to Improve Smartphone Security [2015] 4. Continuous Touchscreen Mobile Authentication Using Several Gestures [2016] 6
  • 7.
    Contributions 1. We utilizedthe rich life logs dataset of smartphone users that are collected from multiple sensors and smartphone database 1. We extracted several key features to represent smartphone users’ behavior including two aspects: 1) User’s Behavior and 2) Environment. 1. We selected a subset of smartphone user behavior features that relevant with smartphone user behavior identification. 1. We built a user behavior model that can characterize the smartphone user’s behavior pattern. 7
  • 8.
    General Steps ofOur Work 8
  • 9.
    Data Collection ⌘ 47students followed our study and each student was equipped with Android smartphone running life logs data collection application during around 2 months. ⌘ Our life logs data collection application that is intended for Android smartphone was built based on Funf Open Sensing Framework. ⌘ Funf Open Sensing Framework is an open source Android- based extensible sensing framework for Android smartphone. 9
  • 10.
    Used Data ⌘ Thecollected smartphone life logs data of 47 participants are stored in an archive file, which has size 4.25 GB. After we extracted the archive file, there are 47 folders within different name in which each folder contains the smartphone life logs data of each participant. ⌘ Used data to identify smartphone user behavior have to meet three requirements following below: 1. Have a good condition​ 2. Enough to be used in behavior analysis​ 3. Have correlation with human behavior (based on previous studies) 10
  • 11.
    Used Data Have agood condition Based on data condition ➢ We found one folder, which does not has data (empty) ➢ Four folders contains malformed smartphone life logs data 11
  • 12.
    Used Data Enough tobe used in behavior analysis ⌘ There are five participants who have less than one month in duration of data collection, consequently they are discarded since they are not enough to be processed in smartphone user behavior analysis. 12
  • 13.
    Used Data Have correlationwith behavior (based on previous researches) ➢ Our life logs data collection application uses 19 kind of probes to collect specific information. ➢ In this study, we only use probes that have correlation with user behavior based on previous studies. 13
  • 14.
    Used Data Have correlationwith behavior (based on previous researches) ➢ Our life logs data collection application uses 19 kind of probes to collect specific information. ➢ In this study, we only use probes that have correlation with user behavior based on previous researches. 14
  • 15.
    Data Preprocessing ⌘Data Cleansing ⌘It works to “clean” the smartphone life logs data by removing duplicate and outlier values. ⌘Data Transformation​ ⌘It is used to transform the data from *db format to csv files. ⌘When we load all of life logs data of 37 participants which have size 14.22 GB in the same time, it will spend resource of our computer such as processor and RAM. 15
  • 16.
    Feature Extraction ⌘ Thecharacteristics of smartphone users can be reflected based on their behavior and environment. E.g.: different number of communication, different environment, different activity, and etc. ⌘ Therefore, we classify the extracted features into two categories, namely behavior feature and environment feature. 16
  • 17.
    Data Preprocessing ● Thecharacteristics of smartphone users can be reflected based on their behavior and environment. E.g.: different number of communication, different environment, different activity, and etc. ● Therefore, we classify the extracted features into two categories, namely behavior feature and environment feature. 17
  • 18.
    Data Preprocessing ● Thecharacteristics of smartphone users can be reflected based on their behavior and environment. ​E.g.: different number of communication, different environment, different activity, and etc. ● Therefore, we classify the extracted features into two categories, namely behavior feature and environment feature. 18
  • 19.
    Data Preprocessing ● Behaviorfeatures means the features that coming from user’s behavior. ● Environment features means the features that coming from user’s environment. 19
  • 20.
  • 21.
    Feature Extraction ● Behaviorfeatures means the features that coming from user’s behavior. ● Environment features means the features that coming from user’s environment. 21
  • 22.
    Feature Extraction Behavior Feature (27features) Environment Feature (2 features) 22
  • 23.
    Feature Selection ⌘Random forestalgorithm works as a large collection of decision tress. ⌘It works based on the bagging technique which means that is combination of learning models to increase the classification accuracy. ⌘The reasons why we conducted random forest technique 1. Very accurate 2. Rarely over-fitting 3. Can handle mixed data (continuous / categorical) 4. Naturally multivariate. 23
  • 24.
    Feature Selection 24 Figure: Resultsof Random Forest Scoring Technique
  • 25.
    User Behavior Model ●A user behavior model is formed based on observation of the user’s behavior pattern. ● We focus on how to build a user behavior model that characterize the user's behavior pattern so that it can be used to find the differences among the users. E.g.: How frequently smartphone user makes phone calls, how frequently the smartphone user sends a message, how frequently smartphone user charge his phone, etc. 25
  • 26.
    User Behavior Model ● 26 XFrequency 1 1 2 2 3 1 4 1 Total Frequency 5 X Probability Mass Function 1 1/5 =0.2 2 2/5 =0.4 3 1/5 =0.2 4 1/5 =0.2
  • 27.
  • 28.
    User Behavior Model ●The problem of our model is the length of probability mass function of each day may be different. ● So to overcome that problem, we use bin to make same length of probability mass function for each day. 28
  • 29.
  • 30.
    User Behavior Model 30 USER1 USER 2 DAY 1 DAY 2
  • 31.
  • 32.
  • 33.
  • 34.
    Experimental Setup We usedsmartphone life logs data of 37 students (men and women) that were collected during 42 days (6 weeks) We divided our dataset into two parts which are Enrolment data : data that were collected from day-1 up to day-21 for each user Verification data :data that were collected from day-22 up to day-42 for each user 34
  • 35.
  • 36.
  • 37.
    Conclusion ⌘ We collectedsmartphone life-logs data of 47 students during continuous period in around two months. ⌘ We extracted several key features to represent smartphone users’ behavior including two aspects: 1) User’s Behavior and 2) Environment, and then selected a subset of these features that relevant with smartphone user identification. ⌘ We built a user behavior model that characterize user’s behavior patterns to make a user’s profile. ⌘ Our approach can achieve best performance with Equal Error Rate (EER) equals to 7.05% by using Mahalanobis distance. ⌘ The low value Equal Error Rate (EER) indicates that our behavior-based authentication provided good security post login. 37
  • 38.
    Conclusion ● In pointof view of window size, our user behavior model relied one days as the windows size, in the future we have a plan to experiment with different windows size such as two days, three days, etc. in order to analyze the influence of window size in our user behavior model. ● In our experiment, we compared the days between current days in same week, in the future we plan to compare the same day however in different week. 38