SlideShare a Scribd company logo
1 of 27
HACETTEPE UNIVERSITY 
Department : Computer Engineering 
Course : BIL 656(Advanced Computer And Network Security) 
Supervisor : Asst.Prof.Sevil Sen 
Student : Ulvi Ismayilov (ID:N14124839) 
Date : 04/11/2014
A Machine-Learning Approach for 
Classifying and Categorizing Android 
Sources and Sinks 
Technical University of Darmstadt 
Authors : 
Steven Arzt (EC SPRIDE) 
Siegfried Rasthofer (EC SPRIDE) 
Eric Bodden (EC SPRIDE)
CONTENTS : 
1. Introduction 
2. Definition of Sources and Sinks 
3. Example to source-sink connection(leak) 
4. Classification Approach 
5. Evaluation ( Tradeoffs ) 
6. Related Work 
7. Conclusion
1.1 Motivation 
Why do we need machine-learning approach 
for identifying sources and sinks? 
 Information-flow tools require specifications of 
sources and sinks 
 Analysis approaches often use a small hand-selected 
set of sources and sinks known from 
literature 
 Lists of sources and sinks known from literature are 
incomplete, causing many data leaks in systems. 
 Manual identification of lists(sources and sinks) is 
impractical(over 110000 public methods in Android 4.2).
1.2 What is SuSi ? 
An automated machine-learning guided approach for 
identifying sources and sinks directly from the code of 
Android API. 
Features : 
 SuSi analyzes not only the framework API methods but also pre-installed 
application codes. 
 Cross-validation over 92%. 
 SuSi doesn’t use permission lists for detecting sources and sinks 
 SuSi is an open-source project and available at : 
htttps://github.com/secure-software-engineering/SuSi 
 Susi has an ability to detect sources and sinks in case of new, previously unseen 
Android versions. 
Main Goal : 
Fully automated generation of a categorized list of 
sources and sinks for android applications. 
(the list can be directly used by existing static and dynamic analysis approaches)
2.1 Definition of Sources and Sinks 
 There are 2 main concepts must be spoken about before defining 
sources and sinks: 
1)Data : is a value or a reference to a value 
2)Resource method: reads data from or writes data to a shared 
resource. 
 There is only one restriction that if method values(return value , 
parameter value ) are constant then we decide that this resource 
method is neither a source nor a sink . 
Example 1 Example 2 
Resource the phone’s hardware GSM network 
Data IMEI (as numerical 
value) 
Message (as string 
value) 
Resource method getDeviceId() 
In TelephonyManager 
Class 
sendTextMessage() 
In SmsManager Class 
Source getDeviceId() 
Sink sendTextMessage()
2.2 Definition of Sources and 
Sinks  Android Source: 
Sources are calls into resource methods returning 
non-constant values into the application code. 
Ex: getLac() returns Location Area Code 
which is not a constant value 
 Android Sink: 
Sinks are calls into resource methods accepting at least 
one non-constant data value from the application code 
as parameter, if and only if resource method parameter 
gets a new value or is overwritten 
Ex: sendTextMessage(a , b)  receives 2 non-constant 
parameters: 
a)The message text b)the phone number
3.1 Example to source-sink 
connection 
 The example creates a publicly 
accessible file on the phone’s internal 
storage, which can be accessed by 
arbitrary other applications without 
requiring any permissions. 
 The code uses such resource method 
that SuSi identifies as a ”FILE” sink 
but which is normally hidden from the 
SDK
3.2 Example to source-sink 
connection 
Line 12: 
the code checks for the specific well-known 
cell-tower ID in Berlin(it returns 
true-false) 
Line 14: 
Converts needed data to string type 
(assign as taint) 
Line 15: 
Create a direction from where 
attacker will easily reach to private 
data(shared) 
Line 17: 
the code uses a little known Android 
system function instead of Java’s 
normal writing functions
4.1 Classification Approach 
 There are two steps for classification 
of Android resource methods : 
 Identification 
Susi decides whether it is a source , a sink or 
neither 
 Categorization 
Susi separates sources and sinks which were 
identified in the first step to the specific 
categories 
Note: All methods previously identified as neither 
sink nor source are ignored for the second 
step
4.2 Simple machine-learning 
explanation 
As shown in Table I ,there 
are three features(input) : 
1) Driving Experience : 
negatively correlated with 
accident rate 
2) Blood alcohol level: 
positively correlated with 
accident rate 
3) Driver’s phone number : 
completely unrelated 
Note: 
The impact of a single feature 
on the overall estimate is 
deduced from its value 
distribution over the annotated 
training set.
4.3 Support Vector 
Machines Tested approaches: 
 A simple rule-based classifier 
Problem: In some cases, the classifier would actually pick randomly , since 
both accident : yes and accident : no are equally likely 
 A probabilistic classifier(Naive Bayes) 
Problem: Gives very imprecise results because our classification is almost 
rule-based and has a fixed semantics 
 Pruned C4.5 decision tree. 
Problem: Lack of flexibility of rule set 
 Support Vector Machine (SMO in Weka) 
Chosen for implementation: Usually gives the best results , but not always, 
can be expressed more appropriately by shifting the hyper-plane for 
separation
4.3 Support Vector Machines 
 SVM is a supervised learning model to train 
classifier . 
 The main principle is to represent datasets of two 
classes(in our scenario “sink” and “source”) using 
vectors in a vector space. 
 If the data is not linearly separable problem can be 
transformed into higher-dimensional spaces(you 
may also assume as multidimensional matrix ) 
 SMO is only capable of separating two classes . 
However , in SuSi , we have three classes in the 
first step.(Sink , Source , Neither) 
Solution: one-against-all technique applied.
4.4 SuSi’s overall architecture 
Training dataset << Test dataset 
Identification 0.7% training and 99.3% test data 
Categorization0.4% training and 99.6% test data 
• No-category concept and adding a new category
4.5 Categories 
 12 different Source categories : 
1.Account 2.Bluetooth 3.Browser 4.Calendar 
5.Contact 6.Database 7.File 8.Network 9.NFC 
10.Settings 11.Sync 12.Unique-identifier 
 15 different Sink categories : 
1.Account 2.Audio 3.Browser 4.Calendar 
5.Contact 6.File 7.Log 8.Network 9.NFC 10.Phone-connection 
11.Phone-state 12.SMS/MMS 13.Sync 
14.System 15.Voip
Output of classifier  
Feature classes 
Source Sink Neither Source 
nor Sink 
Method Name Starts with 
”get” 
Method has Parameters Less 
parameters 
more parameters 
Method Return Value 
Type 
Returned 
cursor 
Void value type 
Method Parameter Type Specific 
types 
Ex: java.io.* 
Specific types 
Ex: java.io.* 
Method Parameter is an 
Interface 
Don’t perform any 
actual operation on 
data itself 
Method Modifiers Public 
Methods 
Public Methods Static Methods 
Class Modifiers Methods declared 
in Protected 
Classes 
Dataflow to Sink Method Parameter 
calls other specific 
method update() 
4.6 Feature Database
4.7 Dataflow features 
 It becomes apparent that semantic features are much more 
suitable for identifying sources and sinks than categorizing them. 
 On the source-code level , Android’s sources and sinks share 
common patterns which can be exploited by dataflow feature. 
Based on initialization, we then run a fixed-point iteration 
with the following rules: 
When the first source-to-sink connection is found the iteration is 
aborted and returns “True” . If the dataflow analysis completes 
without any source-to-sink connections ,the feature returns 
“False”
5.1 Evaluation (Cross 
validation)  Precision  is the fraction of correctly classified elements in 
class within all elements that were assigned to the same 
class. 
 Recall  is the fraction of correctly classified elements in 
class within all elements should have been assigned to the 
same class. 
Interestingly, the average 
precision and recall are almost the 
same with the permission featured 
and without 
Implicit annotations for Virtual 
Dispatch  generic machine-learning 
tool has no knowledge 
about the language semantics of 
Java. 
Evaluated SuSi on the extended 
test set using the implicit 
annotation and again got more 
than 92% precision
5.2 Sources and Sinks in Malware 
Apps 
 Tested 11000 malware Apps from Virus Share and 
founded that current malware is leaking more 
private information. 
 Second example is LeakMiner. It creates its own 
source and sink list from a permission map .But 
SuSi determined that there are more other not well-known 
methods which don’t need a permission 
Ex: getSimOperatorName() , getCountry() , 
getSimCounrtyIso() 
 SuSi found that there are plenty of wrapper 
methods in internal Android classes or per-installed 
apps that return privacy-sensitive information, such 
as the IMEI .
5.3 Changes during different Android 
versions From the figure we can 
clearly deduce that new 
sources are introduced 
with every version. 
The results show that 
SuSi detects the changes 
in different API versions 
very well . 
Susi reliably finds new 
sources and sinks that 
were added to the Android 
platform 
But new detected 
sources and sinks which 
couldn’t be categorized by 
SuSi should be done by 
hand (create a new 
category)
5.4 Source and Sink lists used by other analysis 
tools 
Analysis Tools Source Lists Identifying Method 
Leak Miner Permission Map 
CHEX Semi-automatic approach(not public) 
ScanDal Do not provide 
AndroidLeaks Do not provide 
Aurasium Intercept calls at system level libraries(Linux and 
Android) 
TaintDroid Like Aurasium but in lower-level internal system 
Scandroid Not public  but was extracted source and sink 
specifications from the source code and appeared 
list is fully covered by SuSi’s output.
5.5 Source and Sink lists used by other analysis 
tools
5.6 Disadvantages of SuSi 
 If number of test set is less in specific category then the 
precision of categorization will decrease 
(Ex: BLUETOOTH category just a few methods among 
110000 Android API methods) 
 Many developers of Android framework do in fact follow a 
certain regular coding style or duplicate parts of one’s 
method’s implementation . These aspects lead to a 
regularity and redundancy in the code base. 
That’s why machine-learning approach can take an 
advantage of it. 
But if developer uses not regular coding style ? 
 There are call back methods(receive data from operating 
system)  SuSi can not detect these methods as sources 
or sinks 
( onNmeaReceived() instead of onLocationChanged() )
6.1 Related Work 
 MERLIN 
 Probabilistic approach 
 Uses incomplete specifications of (sources and 
sinks) 
 Based on string-related vulnerabilities(scripting or 
sql-injections) 
 Need information about client or application 
 Fit a web application scenario but SuSi focuses on 
privacy related aspects of Android where data is 
usually not of type string
6.2 Related Work 
 Machine Learning used for security : 
1)Automatic Spam detection 
2)Anomaly detection(network traffic) 
3)MCA(Multiple Correspondence 
analysis) 
Identifies malwares from different markets 
Difference between MCA and Susi : 
SuSi works on independent and discrete classes 
but MCA requires a logical ordering of records
Conclusion 
Future aim for improvement of project : 
1) Implement it to other platforms(J2EE,PHP,C++ and etc) 
2)Automated detecting sensitive calbacks
Thanks for attention

More Related Content

What's hot

GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT
GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT
GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT IAEME Publication
 
Bortniker_S610_ReconProject
Bortniker_S610_ReconProjectBortniker_S610_ReconProject
Bortniker_S610_ReconProjectJustin Bortniker
 
Dynamic Taint Analysis Tools: A Review
Dynamic Taint Analysis Tools: A ReviewDynamic Taint Analysis Tools: A Review
Dynamic Taint Analysis Tools: A ReviewCSCJournals
 
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWAREMINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWAREIJNSA Journal
 
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWAREMINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWAREIJNSA Journal
 
IRJET- Analysis and Detection of E-Mail Phishing using Pyspark
IRJET- Analysis and Detection of E-Mail Phishing using PysparkIRJET- Analysis and Detection of E-Mail Phishing using Pyspark
IRJET- Analysis and Detection of E-Mail Phishing using PysparkIRJET Journal
 
A STATIC MALWARE DETECTION SYSTEM USING DATA MINING METHODS
A STATIC MALWARE DETECTION SYSTEM USING DATA MINING METHODSA STATIC MALWARE DETECTION SYSTEM USING DATA MINING METHODS
A STATIC MALWARE DETECTION SYSTEM USING DATA MINING METHODSijaia
 
Is it time for a career switch
Is it time for a career switchIs it time for a career switch
Is it time for a career switchmoresmile
 
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTSUSING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTSijseajournal
 
iFixR: Bug Report Driven Program Repair
iFixR: Bug Report Driven Program RepairiFixR: Bug Report Driven Program Repair
iFixR: Bug Report Driven Program RepairDongsun Kim
 
Classification of Malware based on Data Mining Approach
Classification of Malware based on Data Mining ApproachClassification of Malware based on Data Mining Approach
Classification of Malware based on Data Mining Approachijsrd.com
 
A hybrid model to detect malicious executables
A hybrid model to detect malicious executablesA hybrid model to detect malicious executables
A hybrid model to detect malicious executablesUltraUploader
 
Record matching over multiple query result - Document
Record matching over multiple query result - DocumentRecord matching over multiple query result - Document
Record matching over multiple query result - DocumentNishna Ma
 
FUZZY AIDED APPLICATION LAYER SEMANTIC INTRUSION DETECTION SYSTEM - FASIDS
FUZZY AIDED APPLICATION LAYER SEMANTIC INTRUSION DETECTION SYSTEM - FASIDSFUZZY AIDED APPLICATION LAYER SEMANTIC INTRUSION DETECTION SYSTEM - FASIDS
FUZZY AIDED APPLICATION LAYER SEMANTIC INTRUSION DETECTION SYSTEM - FASIDSIJNSA Journal
 
survey on analysing the crash reports of software applications
survey on analysing the crash reports of software applicationssurvey on analysing the crash reports of software applications
survey on analysing the crash reports of software applicationsIRJET Journal
 
Intelligence Intelligence (Uber)
Intelligence Intelligence (Uber)Intelligence Intelligence (Uber)
Intelligence Intelligence (Uber)Divya Kothari
 
Venice boats classification
Venice boats classificationVenice boats classification
Venice boats classificationRoberto Falconi
 
Automated server-side model for recognition of security vulnerabilities in sc...
Automated server-side model for recognition of security vulnerabilities in sc...Automated server-side model for recognition of security vulnerabilities in sc...
Automated server-side model for recognition of security vulnerabilities in sc...IJECEIAES
 
Application of genetic algorithm in intrusion detection system
Application of genetic algorithm in intrusion detection systemApplication of genetic algorithm in intrusion detection system
Application of genetic algorithm in intrusion detection systemAlexander Decker
 

What's hot (20)

GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT
GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT
GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT
 
Bortniker_S610_ReconProject
Bortniker_S610_ReconProjectBortniker_S610_ReconProject
Bortniker_S610_ReconProject
 
Dynamic Taint Analysis Tools: A Review
Dynamic Taint Analysis Tools: A ReviewDynamic Taint Analysis Tools: A Review
Dynamic Taint Analysis Tools: A Review
 
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWAREMINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
 
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWAREMINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
 
IRJET- Analysis and Detection of E-Mail Phishing using Pyspark
IRJET- Analysis and Detection of E-Mail Phishing using PysparkIRJET- Analysis and Detection of E-Mail Phishing using Pyspark
IRJET- Analysis and Detection of E-Mail Phishing using Pyspark
 
A STATIC MALWARE DETECTION SYSTEM USING DATA MINING METHODS
A STATIC MALWARE DETECTION SYSTEM USING DATA MINING METHODSA STATIC MALWARE DETECTION SYSTEM USING DATA MINING METHODS
A STATIC MALWARE DETECTION SYSTEM USING DATA MINING METHODS
 
Is it time for a career switch
Is it time for a career switchIs it time for a career switch
Is it time for a career switch
 
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTSUSING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTS
 
iFixR: Bug Report Driven Program Repair
iFixR: Bug Report Driven Program RepairiFixR: Bug Report Driven Program Repair
iFixR: Bug Report Driven Program Repair
 
Classification of Malware based on Data Mining Approach
Classification of Malware based on Data Mining ApproachClassification of Malware based on Data Mining Approach
Classification of Malware based on Data Mining Approach
 
C04701019027
C04701019027C04701019027
C04701019027
 
A hybrid model to detect malicious executables
A hybrid model to detect malicious executablesA hybrid model to detect malicious executables
A hybrid model to detect malicious executables
 
Record matching over multiple query result - Document
Record matching over multiple query result - DocumentRecord matching over multiple query result - Document
Record matching over multiple query result - Document
 
FUZZY AIDED APPLICATION LAYER SEMANTIC INTRUSION DETECTION SYSTEM - FASIDS
FUZZY AIDED APPLICATION LAYER SEMANTIC INTRUSION DETECTION SYSTEM - FASIDSFUZZY AIDED APPLICATION LAYER SEMANTIC INTRUSION DETECTION SYSTEM - FASIDS
FUZZY AIDED APPLICATION LAYER SEMANTIC INTRUSION DETECTION SYSTEM - FASIDS
 
survey on analysing the crash reports of software applications
survey on analysing the crash reports of software applicationssurvey on analysing the crash reports of software applications
survey on analysing the crash reports of software applications
 
Intelligence Intelligence (Uber)
Intelligence Intelligence (Uber)Intelligence Intelligence (Uber)
Intelligence Intelligence (Uber)
 
Venice boats classification
Venice boats classificationVenice boats classification
Venice boats classification
 
Automated server-side model for recognition of security vulnerabilities in sc...
Automated server-side model for recognition of security vulnerabilities in sc...Automated server-side model for recognition of security vulnerabilities in sc...
Automated server-side model for recognition of security vulnerabilities in sc...
 
Application of genetic algorithm in intrusion detection system
Application of genetic algorithm in intrusion detection systemApplication of genetic algorithm in intrusion detection system
Application of genetic algorithm in intrusion detection system
 

Similar to Susi

Final_Presentation_FlowDroid
Final_Presentation_FlowDroidFinal_Presentation_FlowDroid
Final_Presentation_FlowDroidKruti Sharma
 
Software engg. pressman_ch-8
Software engg. pressman_ch-8Software engg. pressman_ch-8
Software engg. pressman_ch-8Dhairya Joshi
 
Security Application for Malicious Code Detection using Data Mining
Security Application for Malicious Code Detection using Data MiningSecurity Application for Malicious Code Detection using Data Mining
Security Application for Malicious Code Detection using Data MiningPravinYalameli
 
Stuxnet redux. malware attribution & lessons learned
Stuxnet redux. malware attribution & lessons learnedStuxnet redux. malware attribution & lessons learned
Stuxnet redux. malware attribution & lessons learnedYury Chemerkin
 
DMIA: A MALWARE DETECTION SYSTEM ON IOS PLATFORM
DMIA: A MALWARE DETECTION SYSTEM ON IOS PLATFORMDMIA: A MALWARE DETECTION SYSTEM ON IOS PLATFORM
DMIA: A MALWARE DETECTION SYSTEM ON IOS PLATFORMcsandit
 
CORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSIS
CORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSISCORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSIS
CORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSISijseajournal
 
An efficient HIDS using System Call Traces.pptx
An efficient HIDS using System Call Traces.pptxAn efficient HIDS using System Call Traces.pptx
An efficient HIDS using System Call Traces.pptxSandeep Maurya
 
IRJET- Android Malware Detection using Machine Learning
IRJET-  	  Android Malware Detection using Machine LearningIRJET-  	  Android Malware Detection using Machine Learning
IRJET- Android Malware Detection using Machine LearningIRJET Journal
 
A hybrid technique for sql injection attacks detection and prevention
A hybrid technique for sql injection attacks detection and preventionA hybrid technique for sql injection attacks detection and prevention
A hybrid technique for sql injection attacks detection and preventionijdms
 
Layered approach using conditional random fields for intrusion detection (syn...
Layered approach using conditional random fields for intrusion detection (syn...Layered approach using conditional random fields for intrusion detection (syn...
Layered approach using conditional random fields for intrusion detection (syn...Mumbai Academisc
 
Zero day malware detection
Zero day malware detectionZero day malware detection
Zero day malware detectionsujeeshkumarj
 
DROIDSWAN: Detecting Malicious Android Applications Based on Static Feature A...
DROIDSWAN: Detecting Malicious Android Applications Based on Static Feature A...DROIDSWAN: Detecting Malicious Android Applications Based on Static Feature A...
DROIDSWAN: Detecting Malicious Android Applications Based on Static Feature A...csandit
 
Scalable constrained spectral clustering
Scalable constrained spectral clusteringScalable constrained spectral clustering
Scalable constrained spectral clusteringNishanth Harapanahalli
 
TriggerScope: Towards Detecting Logic Bombs in Android Applications
TriggerScope: Towards Detecting Logic Bombs in Android ApplicationsTriggerScope: Towards Detecting Logic Bombs in Android Applications
TriggerScope: Towards Detecting Logic Bombs in Android ApplicationsPietro De Nicolao
 
Online eaxmination
Online eaxminationOnline eaxmination
Online eaxminationAditi_17
 
Automated Android Malware Detection Using Optimal Ensemble Learning Approach ...
Automated Android Malware Detection Using Optimal Ensemble Learning Approach ...Automated Android Malware Detection Using Optimal Ensemble Learning Approach ...
Automated Android Malware Detection Using Optimal Ensemble Learning Approach ...Shakas Technologies
 
portenumaration-1.pptx_20231116_115028_0000.pdf
portenumaration-1.pptx_20231116_115028_0000.pdfportenumaration-1.pptx_20231116_115028_0000.pdf
portenumaration-1.pptx_20231116_115028_0000.pdfvp544770
 
IEEE- Intrusion Detection Model using Self Organizing Map
IEEE- Intrusion Detection Model using Self Organizing MapIEEE- Intrusion Detection Model using Self Organizing Map
IEEE- Intrusion Detection Model using Self Organizing MapTushar Shinde
 
Connection String Parameter Pollution Attacks
Connection String Parameter Pollution AttacksConnection String Parameter Pollution Attacks
Connection String Parameter Pollution AttacksChema Alonso
 

Similar to Susi (20)

Final_Presentation_FlowDroid
Final_Presentation_FlowDroidFinal_Presentation_FlowDroid
Final_Presentation_FlowDroid
 
Software engg. pressman_ch-8
Software engg. pressman_ch-8Software engg. pressman_ch-8
Software engg. pressman_ch-8
 
Security Application for Malicious Code Detection using Data Mining
Security Application for Malicious Code Detection using Data MiningSecurity Application for Malicious Code Detection using Data Mining
Security Application for Malicious Code Detection using Data Mining
 
Stuxnet redux. malware attribution & lessons learned
Stuxnet redux. malware attribution & lessons learnedStuxnet redux. malware attribution & lessons learned
Stuxnet redux. malware attribution & lessons learned
 
DMIA: A MALWARE DETECTION SYSTEM ON IOS PLATFORM
DMIA: A MALWARE DETECTION SYSTEM ON IOS PLATFORMDMIA: A MALWARE DETECTION SYSTEM ON IOS PLATFORM
DMIA: A MALWARE DETECTION SYSTEM ON IOS PLATFORM
 
CORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSIS
CORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSISCORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSIS
CORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSIS
 
An efficient HIDS using System Call Traces.pptx
An efficient HIDS using System Call Traces.pptxAn efficient HIDS using System Call Traces.pptx
An efficient HIDS using System Call Traces.pptx
 
IRJET- Android Malware Detection using Machine Learning
IRJET-  	  Android Malware Detection using Machine LearningIRJET-  	  Android Malware Detection using Machine Learning
IRJET- Android Malware Detection using Machine Learning
 
A hybrid technique for sql injection attacks detection and prevention
A hybrid technique for sql injection attacks detection and preventionA hybrid technique for sql injection attacks detection and prevention
A hybrid technique for sql injection attacks detection and prevention
 
spamzombieppt
spamzombiepptspamzombieppt
spamzombieppt
 
Layered approach using conditional random fields for intrusion detection (syn...
Layered approach using conditional random fields for intrusion detection (syn...Layered approach using conditional random fields for intrusion detection (syn...
Layered approach using conditional random fields for intrusion detection (syn...
 
Zero day malware detection
Zero day malware detectionZero day malware detection
Zero day malware detection
 
DROIDSWAN: Detecting Malicious Android Applications Based on Static Feature A...
DROIDSWAN: Detecting Malicious Android Applications Based on Static Feature A...DROIDSWAN: Detecting Malicious Android Applications Based on Static Feature A...
DROIDSWAN: Detecting Malicious Android Applications Based on Static Feature A...
 
Scalable constrained spectral clustering
Scalable constrained spectral clusteringScalable constrained spectral clustering
Scalable constrained spectral clustering
 
TriggerScope: Towards Detecting Logic Bombs in Android Applications
TriggerScope: Towards Detecting Logic Bombs in Android ApplicationsTriggerScope: Towards Detecting Logic Bombs in Android Applications
TriggerScope: Towards Detecting Logic Bombs in Android Applications
 
Online eaxmination
Online eaxminationOnline eaxmination
Online eaxmination
 
Automated Android Malware Detection Using Optimal Ensemble Learning Approach ...
Automated Android Malware Detection Using Optimal Ensemble Learning Approach ...Automated Android Malware Detection Using Optimal Ensemble Learning Approach ...
Automated Android Malware Detection Using Optimal Ensemble Learning Approach ...
 
portenumaration-1.pptx_20231116_115028_0000.pdf
portenumaration-1.pptx_20231116_115028_0000.pdfportenumaration-1.pptx_20231116_115028_0000.pdf
portenumaration-1.pptx_20231116_115028_0000.pdf
 
IEEE- Intrusion Detection Model using Self Organizing Map
IEEE- Intrusion Detection Model using Self Organizing MapIEEE- Intrusion Detection Model using Self Organizing Map
IEEE- Intrusion Detection Model using Self Organizing Map
 
Connection String Parameter Pollution Attacks
Connection String Parameter Pollution AttacksConnection String Parameter Pollution Attacks
Connection String Parameter Pollution Attacks
 

Recently uploaded

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 

Recently uploaded (20)

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 

Susi

  • 1. HACETTEPE UNIVERSITY Department : Computer Engineering Course : BIL 656(Advanced Computer And Network Security) Supervisor : Asst.Prof.Sevil Sen Student : Ulvi Ismayilov (ID:N14124839) Date : 04/11/2014
  • 2. A Machine-Learning Approach for Classifying and Categorizing Android Sources and Sinks Technical University of Darmstadt Authors : Steven Arzt (EC SPRIDE) Siegfried Rasthofer (EC SPRIDE) Eric Bodden (EC SPRIDE)
  • 3. CONTENTS : 1. Introduction 2. Definition of Sources and Sinks 3. Example to source-sink connection(leak) 4. Classification Approach 5. Evaluation ( Tradeoffs ) 6. Related Work 7. Conclusion
  • 4. 1.1 Motivation Why do we need machine-learning approach for identifying sources and sinks?  Information-flow tools require specifications of sources and sinks  Analysis approaches often use a small hand-selected set of sources and sinks known from literature  Lists of sources and sinks known from literature are incomplete, causing many data leaks in systems.  Manual identification of lists(sources and sinks) is impractical(over 110000 public methods in Android 4.2).
  • 5. 1.2 What is SuSi ? An automated machine-learning guided approach for identifying sources and sinks directly from the code of Android API. Features :  SuSi analyzes not only the framework API methods but also pre-installed application codes.  Cross-validation over 92%.  SuSi doesn’t use permission lists for detecting sources and sinks  SuSi is an open-source project and available at : htttps://github.com/secure-software-engineering/SuSi  Susi has an ability to detect sources and sinks in case of new, previously unseen Android versions. Main Goal : Fully automated generation of a categorized list of sources and sinks for android applications. (the list can be directly used by existing static and dynamic analysis approaches)
  • 6. 2.1 Definition of Sources and Sinks  There are 2 main concepts must be spoken about before defining sources and sinks: 1)Data : is a value or a reference to a value 2)Resource method: reads data from or writes data to a shared resource.  There is only one restriction that if method values(return value , parameter value ) are constant then we decide that this resource method is neither a source nor a sink . Example 1 Example 2 Resource the phone’s hardware GSM network Data IMEI (as numerical value) Message (as string value) Resource method getDeviceId() In TelephonyManager Class sendTextMessage() In SmsManager Class Source getDeviceId() Sink sendTextMessage()
  • 7. 2.2 Definition of Sources and Sinks  Android Source: Sources are calls into resource methods returning non-constant values into the application code. Ex: getLac() returns Location Area Code which is not a constant value  Android Sink: Sinks are calls into resource methods accepting at least one non-constant data value from the application code as parameter, if and only if resource method parameter gets a new value or is overwritten Ex: sendTextMessage(a , b)  receives 2 non-constant parameters: a)The message text b)the phone number
  • 8. 3.1 Example to source-sink connection  The example creates a publicly accessible file on the phone’s internal storage, which can be accessed by arbitrary other applications without requiring any permissions.  The code uses such resource method that SuSi identifies as a ”FILE” sink but which is normally hidden from the SDK
  • 9. 3.2 Example to source-sink connection Line 12: the code checks for the specific well-known cell-tower ID in Berlin(it returns true-false) Line 14: Converts needed data to string type (assign as taint) Line 15: Create a direction from where attacker will easily reach to private data(shared) Line 17: the code uses a little known Android system function instead of Java’s normal writing functions
  • 10. 4.1 Classification Approach  There are two steps for classification of Android resource methods :  Identification Susi decides whether it is a source , a sink or neither  Categorization Susi separates sources and sinks which were identified in the first step to the specific categories Note: All methods previously identified as neither sink nor source are ignored for the second step
  • 11. 4.2 Simple machine-learning explanation As shown in Table I ,there are three features(input) : 1) Driving Experience : negatively correlated with accident rate 2) Blood alcohol level: positively correlated with accident rate 3) Driver’s phone number : completely unrelated Note: The impact of a single feature on the overall estimate is deduced from its value distribution over the annotated training set.
  • 12. 4.3 Support Vector Machines Tested approaches:  A simple rule-based classifier Problem: In some cases, the classifier would actually pick randomly , since both accident : yes and accident : no are equally likely  A probabilistic classifier(Naive Bayes) Problem: Gives very imprecise results because our classification is almost rule-based and has a fixed semantics  Pruned C4.5 decision tree. Problem: Lack of flexibility of rule set  Support Vector Machine (SMO in Weka) Chosen for implementation: Usually gives the best results , but not always, can be expressed more appropriately by shifting the hyper-plane for separation
  • 13. 4.3 Support Vector Machines  SVM is a supervised learning model to train classifier .  The main principle is to represent datasets of two classes(in our scenario “sink” and “source”) using vectors in a vector space.  If the data is not linearly separable problem can be transformed into higher-dimensional spaces(you may also assume as multidimensional matrix )  SMO is only capable of separating two classes . However , in SuSi , we have three classes in the first step.(Sink , Source , Neither) Solution: one-against-all technique applied.
  • 14. 4.4 SuSi’s overall architecture Training dataset << Test dataset Identification 0.7% training and 99.3% test data Categorization0.4% training and 99.6% test data • No-category concept and adding a new category
  • 15. 4.5 Categories  12 different Source categories : 1.Account 2.Bluetooth 3.Browser 4.Calendar 5.Contact 6.Database 7.File 8.Network 9.NFC 10.Settings 11.Sync 12.Unique-identifier  15 different Sink categories : 1.Account 2.Audio 3.Browser 4.Calendar 5.Contact 6.File 7.Log 8.Network 9.NFC 10.Phone-connection 11.Phone-state 12.SMS/MMS 13.Sync 14.System 15.Voip
  • 16. Output of classifier  Feature classes Source Sink Neither Source nor Sink Method Name Starts with ”get” Method has Parameters Less parameters more parameters Method Return Value Type Returned cursor Void value type Method Parameter Type Specific types Ex: java.io.* Specific types Ex: java.io.* Method Parameter is an Interface Don’t perform any actual operation on data itself Method Modifiers Public Methods Public Methods Static Methods Class Modifiers Methods declared in Protected Classes Dataflow to Sink Method Parameter calls other specific method update() 4.6 Feature Database
  • 17. 4.7 Dataflow features  It becomes apparent that semantic features are much more suitable for identifying sources and sinks than categorizing them.  On the source-code level , Android’s sources and sinks share common patterns which can be exploited by dataflow feature. Based on initialization, we then run a fixed-point iteration with the following rules: When the first source-to-sink connection is found the iteration is aborted and returns “True” . If the dataflow analysis completes without any source-to-sink connections ,the feature returns “False”
  • 18. 5.1 Evaluation (Cross validation)  Precision  is the fraction of correctly classified elements in class within all elements that were assigned to the same class.  Recall  is the fraction of correctly classified elements in class within all elements should have been assigned to the same class. Interestingly, the average precision and recall are almost the same with the permission featured and without Implicit annotations for Virtual Dispatch  generic machine-learning tool has no knowledge about the language semantics of Java. Evaluated SuSi on the extended test set using the implicit annotation and again got more than 92% precision
  • 19. 5.2 Sources and Sinks in Malware Apps  Tested 11000 malware Apps from Virus Share and founded that current malware is leaking more private information.  Second example is LeakMiner. It creates its own source and sink list from a permission map .But SuSi determined that there are more other not well-known methods which don’t need a permission Ex: getSimOperatorName() , getCountry() , getSimCounrtyIso()  SuSi found that there are plenty of wrapper methods in internal Android classes or per-installed apps that return privacy-sensitive information, such as the IMEI .
  • 20. 5.3 Changes during different Android versions From the figure we can clearly deduce that new sources are introduced with every version. The results show that SuSi detects the changes in different API versions very well . Susi reliably finds new sources and sinks that were added to the Android platform But new detected sources and sinks which couldn’t be categorized by SuSi should be done by hand (create a new category)
  • 21. 5.4 Source and Sink lists used by other analysis tools Analysis Tools Source Lists Identifying Method Leak Miner Permission Map CHEX Semi-automatic approach(not public) ScanDal Do not provide AndroidLeaks Do not provide Aurasium Intercept calls at system level libraries(Linux and Android) TaintDroid Like Aurasium but in lower-level internal system Scandroid Not public  but was extracted source and sink specifications from the source code and appeared list is fully covered by SuSi’s output.
  • 22. 5.5 Source and Sink lists used by other analysis tools
  • 23. 5.6 Disadvantages of SuSi  If number of test set is less in specific category then the precision of categorization will decrease (Ex: BLUETOOTH category just a few methods among 110000 Android API methods)  Many developers of Android framework do in fact follow a certain regular coding style or duplicate parts of one’s method’s implementation . These aspects lead to a regularity and redundancy in the code base. That’s why machine-learning approach can take an advantage of it. But if developer uses not regular coding style ?  There are call back methods(receive data from operating system)  SuSi can not detect these methods as sources or sinks ( onNmeaReceived() instead of onLocationChanged() )
  • 24. 6.1 Related Work  MERLIN  Probabilistic approach  Uses incomplete specifications of (sources and sinks)  Based on string-related vulnerabilities(scripting or sql-injections)  Need information about client or application  Fit a web application scenario but SuSi focuses on privacy related aspects of Android where data is usually not of type string
  • 25. 6.2 Related Work  Machine Learning used for security : 1)Automatic Spam detection 2)Anomaly detection(network traffic) 3)MCA(Multiple Correspondence analysis) Identifies malwares from different markets Difference between MCA and Susi : SuSi works on independent and discrete classes but MCA requires a logical ordering of records
  • 26. Conclusion Future aim for improvement of project : 1) Implement it to other platforms(J2EE,PHP,C++ and etc) 2)Automated detecting sensitive calbacks