SlideShare a Scribd company logo
1 of 14
Download to read offline
Finding the Hidden Scenes Behind Android Applications
Joey Allen
Mentor: Xiangyu Niu
CURENT REU Program: Final Presentation
7/16/2014
Previous Work
• Crawled Google Play Store
• Scraped Descriptions, Author, and Categories of
Applications
• Applied LDA Model
• Descriptions
• Permissions
• Applied Author Topic Model
• Descriptions
APPIC Framework
Figure 1. Flow Chart of APPIC Framework.
1.  User Requests to Download
App A.
2.  Description, Category, and
Permissions are filtered.
3.  Category is assigned to Ca.
4.  Embedded Topic models
auto-tag the description, Sa,
and permissions, pa.
5.  Ca , Sa , and pa are compared.
6.  If they all match, the app is
considered safe.
LDA MODEL
•  Latent Dirichlet Allocation (LDA) is a generative probabilistic
model for collections of discrete data such as a text corpora
[1].
•  The LDA Model creates topics that are distributions over words.
•  The words in a document can then be compared to a set of
topics, and a category can be chosen for a document.
Figure 2. Graphical Representations of LDA Model [1].
Author Topic Model
•  Author-topic model is a generative model for documents that
extends LDA to include authorship information [2].
•  Authors are distributed over topics and topics are distributed
over words.
Figure 3. Graphical Model of Author-Topic Model [2].
Calculating Results
User Reads
Application
Description
Compare
APPIC tags
with Author’s
Tags
CI = Correct
Inference
II = Incorrect
Inference
APPIC finds App
in wrong
category.
(CI + 1)
APPIC incorrectly
categorizes
application
(II + 1)
APPIC and
author incorrectly
categorize app.
(II + 1)
APPIC and
author incorrectly
categorize app.
(II + 1)
Accuracy =
CI
II +CI
(5) Calculating Accuracy
LDA Results (Descriptions)
0	
  
0.1	
  
0.2	
  
0.3	
  
0.4	
  
0.5	
  
0.6	
  
0.7	
  
0.8	
  
0.9	
  
1	
  Accuracy	
  
Categories	
  
Accuracy	
  vs.	
  Catagory	
  (LDA	
  Model)	
  
3	
  Tags	
   2	
  Tags	
  
LDA Results (Permissions)
0
0.2
0.4
0.6
0.8
1
1.2
Accuracy
Categories
Categories vs. Accuracy
AT Results (Descriptions)
0	
  
0.1	
  
0.2	
  
0.3	
  
0.4	
  
0.5	
  
0.6	
  
0.7	
  
0.8	
  
0.9	
  
1	
  
Accuracy	
  
	
  
Categories	
  
	
  
Accuracy	
  vs.	
  Categories	
  (AT	
  Model)	
  
Comparison of Results
Topic Model Results
LDA (3 Tags) 83%
LDA (2 Tags) 64%
Author-topic 58%
PLDA [3] 88% [3]
Topic Model Results
LDA (4 Tags) 34%
PDLA [3] 77% [3]
Conclusion
•  LDA performed better than AT at categorizing descriptions.
•  More tags increase accuracy but decrease efficiency.
•  AT model was not as accurate in categorizing applications.
•  Useful for finding authors that create similar apps
Future Work
•  Find a better method to calculate accuracy.
•  Learn a different method to categorize permissions
•  Dependencies between permissions and descriptions.
•  Modify AT Model
D
Document
Author-Topic Model (Modified)
β ϕ
T
Topic distribution over
words
w
Word
z
Topic
α θ
A
Distribution of permissions over topics
x
Nd
Permissions
pd
Uniform distribution of
documents over
permissions
References
{slide #}
[1] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” the Journal of
machine Learning research, vol. 3, pp. 993–1022, 2003.
[2] M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth, “The author- topic model for
authors and documents,” in Proceedings of the 20th conference on Uncertainty
in artificial intelligence, 2004, pp. 487–494.
[3] Y. Yang, J. S. Sun, and M. W. Berry, “APPIC: Finding The Hidden Scene Behind
Description Files for Android Apps.”

More Related Content

Viewers also liked

ใบงาน แบบสำรวจและประวัติ
ใบงาน แบบสำรวจและประวัติใบงาน แบบสำรวจและประวัติ
ใบงาน แบบสำรวจและประวัติWariyaphon Wongjirawat
 
Kişisel verilerin korunması hakkında bilgi bülteni
Kişisel verilerin korunması hakkında bilgi bülteniKişisel verilerin korunması hakkında bilgi bülteni
Kişisel verilerin korunması hakkında bilgi bülteniBaşak Arslan
 
สาขาอาชีพแห่งอนาคต
สาขาอาชีพแห่งอนาคตสาขาอาชีพแห่งอนาคต
สาขาอาชีพแห่งอนาคตWariyaphon Wongjirawat
 
ICT led Business Transformation
ICT led Business TransformationICT led Business Transformation
ICT led Business TransformationTulin Cengiz
 
Importance of Benefits Management in Strategic Change Initiatives
Importance of Benefits Management in Strategic Change InitiativesImportance of Benefits Management in Strategic Change Initiatives
Importance of Benefits Management in Strategic Change InitiativesTulin Cengiz
 
Activating the Subconscious through Meaningful Design
Activating the Subconscious through Meaningful DesignActivating the Subconscious through Meaningful Design
Activating the Subconscious through Meaningful Design(add)ventures
 
Strategic change analysis of royal bank of scotland
Strategic change analysis of royal bank of scotlandStrategic change analysis of royal bank of scotland
Strategic change analysis of royal bank of scotlandTulin Cengiz
 
Classification of Computer Networks
Classification of Computer Networks Classification of Computer Networks
Classification of Computer Networks Nazren Wak
 
Parrosam Netflix Report
Parrosam Netflix ReportParrosam Netflix Report
Parrosam Netflix Reportparrosam
 

Viewers also liked (11)

ใบงาน แบบสำรวจและประวัติ
ใบงาน แบบสำรวจและประวัติใบงาน แบบสำรวจและประวัติ
ใบงาน แบบสำรวจและประวัติ
 
Kişisel verilerin korunması hakkında bilgi bülteni
Kişisel verilerin korunması hakkında bilgi bülteniKişisel verilerin korunması hakkında bilgi bülteni
Kişisel verilerin korunması hakkında bilgi bülteni
 
สาขาอาชีพแห่งอนาคต
สาขาอาชีพแห่งอนาคตสาขาอาชีพแห่งอนาคต
สาขาอาชีพแห่งอนาคต
 
ICT led Business Transformation
ICT led Business TransformationICT led Business Transformation
ICT led Business Transformation
 
Importance of Benefits Management in Strategic Change Initiatives
Importance of Benefits Management in Strategic Change InitiativesImportance of Benefits Management in Strategic Change Initiatives
Importance of Benefits Management in Strategic Change Initiatives
 
Activating the Subconscious through Meaningful Design
Activating the Subconscious through Meaningful DesignActivating the Subconscious through Meaningful Design
Activating the Subconscious through Meaningful Design
 
thesisSlides
thesisSlidesthesisSlides
thesisSlides
 
Strategic change analysis of royal bank of scotland
Strategic change analysis of royal bank of scotlandStrategic change analysis of royal bank of scotland
Strategic change analysis of royal bank of scotland
 
Classification of Computer Networks
Classification of Computer Networks Classification of Computer Networks
Classification of Computer Networks
 
Parrosam Netflix Report
Parrosam Netflix ReportParrosam Netflix Report
Parrosam Netflix Report
 
Insite book
Insite bookInsite book
Insite book
 

Similar to AndroidDescriptionsAndPermissions

TECHNIQUES FOR COMPONENT REUSABLE APPROACH
TECHNIQUES FOR COMPONENT REUSABLE APPROACHTECHNIQUES FOR COMPONENT REUSABLE APPROACH
TECHNIQUES FOR COMPONENT REUSABLE APPROACHcscpconf
 
Semantic Annotation of Documents
Semantic Annotation of DocumentsSemantic Annotation of Documents
Semantic Annotation of Documentssubash chandra
 
Evaluation criteria for nosql databases
Evaluation criteria for nosql databasesEvaluation criteria for nosql databases
Evaluation criteria for nosql databasesEbenezer Daniel
 
Tagging search solution design
Tagging search solution designTagging search solution design
Tagging search solution designAlexander Tokarev
 
Searching Repositories of Web Application Models
Searching Repositories of Web Application ModelsSearching Repositories of Web Application Models
Searching Repositories of Web Application ModelsMarco Brambilla
 
Overview of Object-Oriented Concepts Characteristics by vikas jagtap
Overview of Object-Oriented Concepts Characteristics by vikas jagtapOverview of Object-Oriented Concepts Characteristics by vikas jagtap
Overview of Object-Oriented Concepts Characteristics by vikas jagtapVikas Jagtap
 
Ooad (object oriented analysis design)
Ooad (object oriented analysis design)Ooad (object oriented analysis design)
Ooad (object oriented analysis design)Gagandeep Nanda
 
ArchitectureOfAOMsWICSA3
ArchitectureOfAOMsWICSA3ArchitectureOfAOMsWICSA3
ArchitectureOfAOMsWICSA3Erdem Sahin
 
An introduction to the MDA
An introduction to the MDAAn introduction to the MDA
An introduction to the MDALai Ha
 
Thesis Defense: Building a Semantic Web of Comic Book Metadata
Thesis Defense: Building a Semantic Web of Comic Book MetadataThesis Defense: Building a Semantic Web of Comic Book Metadata
Thesis Defense: Building a Semantic Web of Comic Book MetadataSean Petiya
 
Unit No 6 Design Patterns.pptx
Unit No 6 Design Patterns.pptxUnit No 6 Design Patterns.pptx
Unit No 6 Design Patterns.pptxDrYogeshDeshmukh1
 
DCMI IEEE LTSC Joint taskforce at DC2007
DCMI IEEE LTSC Joint taskforce at DC2007DCMI IEEE LTSC Joint taskforce at DC2007
DCMI IEEE LTSC Joint taskforce at DC2007Mikael Nilsson
 
Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29Julie Allinson
 
LOM DCAM at LOM Meeting 2008-04-23
LOM DCAM at LOM Meeting 2008-04-23LOM DCAM at LOM Meeting 2008-04-23
LOM DCAM at LOM Meeting 2008-04-23Mikael Nilsson
 
IRJET- Data Mining - Secure Keyword Manager
IRJET- Data Mining - Secure Keyword ManagerIRJET- Data Mining - Secure Keyword Manager
IRJET- Data Mining - Secure Keyword ManagerIRJET Journal
 
Predicting Amazon Rating Using Spark ML and Azure ML
Predicting Amazon Rating Using Spark ML and Azure MLPredicting Amazon Rating Using Spark ML and Azure ML
Predicting Amazon Rating Using Spark ML and Azure MLMonika Mishra
 
Chapter 4_Introduction to Patterns.ppt
Chapter 4_Introduction to Patterns.pptChapter 4_Introduction to Patterns.ppt
Chapter 4_Introduction to Patterns.pptRushikeshChikane1
 

Similar to AndroidDescriptionsAndPermissions (20)

Application Profiles
Application ProfilesApplication Profiles
Application Profiles
 
TECHNIQUES FOR COMPONENT REUSABLE APPROACH
TECHNIQUES FOR COMPONENT REUSABLE APPROACHTECHNIQUES FOR COMPONENT REUSABLE APPROACH
TECHNIQUES FOR COMPONENT REUSABLE APPROACH
 
Semantic Annotation of Documents
Semantic Annotation of DocumentsSemantic Annotation of Documents
Semantic Annotation of Documents
 
Evaluation criteria for nosql databases
Evaluation criteria for nosql databasesEvaluation criteria for nosql databases
Evaluation criteria for nosql databases
 
Tagging search solution design
Tagging search solution designTagging search solution design
Tagging search solution design
 
Searching Repositories of Web Application Models
Searching Repositories of Web Application ModelsSearching Repositories of Web Application Models
Searching Repositories of Web Application Models
 
Overview of Object-Oriented Concepts Characteristics by vikas jagtap
Overview of Object-Oriented Concepts Characteristics by vikas jagtapOverview of Object-Oriented Concepts Characteristics by vikas jagtap
Overview of Object-Oriented Concepts Characteristics by vikas jagtap
 
Ooad (object oriented analysis design)
Ooad (object oriented analysis design)Ooad (object oriented analysis design)
Ooad (object oriented analysis design)
 
Bt0066 dbms
Bt0066 dbmsBt0066 dbms
Bt0066 dbms
 
ArchitectureOfAOMsWICSA3
ArchitectureOfAOMsWICSA3ArchitectureOfAOMsWICSA3
ArchitectureOfAOMsWICSA3
 
An introduction to the MDA
An introduction to the MDAAn introduction to the MDA
An introduction to the MDA
 
Thesis Defense: Building a Semantic Web of Comic Book Metadata
Thesis Defense: Building a Semantic Web of Comic Book MetadataThesis Defense: Building a Semantic Web of Comic Book Metadata
Thesis Defense: Building a Semantic Web of Comic Book Metadata
 
Apex code (Salesforce)
Apex code (Salesforce)Apex code (Salesforce)
Apex code (Salesforce)
 
Unit No 6 Design Patterns.pptx
Unit No 6 Design Patterns.pptxUnit No 6 Design Patterns.pptx
Unit No 6 Design Patterns.pptx
 
DCMI IEEE LTSC Joint taskforce at DC2007
DCMI IEEE LTSC Joint taskforce at DC2007DCMI IEEE LTSC Joint taskforce at DC2007
DCMI IEEE LTSC Joint taskforce at DC2007
 
Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29
 
LOM DCAM at LOM Meeting 2008-04-23
LOM DCAM at LOM Meeting 2008-04-23LOM DCAM at LOM Meeting 2008-04-23
LOM DCAM at LOM Meeting 2008-04-23
 
IRJET- Data Mining - Secure Keyword Manager
IRJET- Data Mining - Secure Keyword ManagerIRJET- Data Mining - Secure Keyword Manager
IRJET- Data Mining - Secure Keyword Manager
 
Predicting Amazon Rating Using Spark ML and Azure ML
Predicting Amazon Rating Using Spark ML and Azure MLPredicting Amazon Rating Using Spark ML and Azure ML
Predicting Amazon Rating Using Spark ML and Azure ML
 
Chapter 4_Introduction to Patterns.ppt
Chapter 4_Introduction to Patterns.pptChapter 4_Introduction to Patterns.ppt
Chapter 4_Introduction to Patterns.ppt
 

AndroidDescriptionsAndPermissions

  • 1. Finding the Hidden Scenes Behind Android Applications Joey Allen Mentor: Xiangyu Niu CURENT REU Program: Final Presentation 7/16/2014
  • 2. Previous Work • Crawled Google Play Store • Scraped Descriptions, Author, and Categories of Applications • Applied LDA Model • Descriptions • Permissions • Applied Author Topic Model • Descriptions
  • 3. APPIC Framework Figure 1. Flow Chart of APPIC Framework. 1.  User Requests to Download App A. 2.  Description, Category, and Permissions are filtered. 3.  Category is assigned to Ca. 4.  Embedded Topic models auto-tag the description, Sa, and permissions, pa. 5.  Ca , Sa , and pa are compared. 6.  If they all match, the app is considered safe.
  • 4. LDA MODEL •  Latent Dirichlet Allocation (LDA) is a generative probabilistic model for collections of discrete data such as a text corpora [1]. •  The LDA Model creates topics that are distributions over words. •  The words in a document can then be compared to a set of topics, and a category can be chosen for a document. Figure 2. Graphical Representations of LDA Model [1].
  • 5. Author Topic Model •  Author-topic model is a generative model for documents that extends LDA to include authorship information [2]. •  Authors are distributed over topics and topics are distributed over words. Figure 3. Graphical Model of Author-Topic Model [2].
  • 6. Calculating Results User Reads Application Description Compare APPIC tags with Author’s Tags CI = Correct Inference II = Incorrect Inference APPIC finds App in wrong category. (CI + 1) APPIC incorrectly categorizes application (II + 1) APPIC and author incorrectly categorize app. (II + 1) APPIC and author incorrectly categorize app. (II + 1) Accuracy = CI II +CI (5) Calculating Accuracy
  • 7. LDA Results (Descriptions) 0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9   1  Accuracy   Categories   Accuracy  vs.  Catagory  (LDA  Model)   3  Tags   2  Tags  
  • 9. AT Results (Descriptions) 0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9   1   Accuracy     Categories     Accuracy  vs.  Categories  (AT  Model)  
  • 10. Comparison of Results Topic Model Results LDA (3 Tags) 83% LDA (2 Tags) 64% Author-topic 58% PLDA [3] 88% [3] Topic Model Results LDA (4 Tags) 34% PDLA [3] 77% [3]
  • 11. Conclusion •  LDA performed better than AT at categorizing descriptions. •  More tags increase accuracy but decrease efficiency. •  AT model was not as accurate in categorizing applications. •  Useful for finding authors that create similar apps
  • 12. Future Work •  Find a better method to calculate accuracy. •  Learn a different method to categorize permissions •  Dependencies between permissions and descriptions. •  Modify AT Model
  • 13. D Document Author-Topic Model (Modified) β ϕ T Topic distribution over words w Word z Topic α θ A Distribution of permissions over topics x Nd Permissions pd Uniform distribution of documents over permissions
  • 14. References {slide #} [1] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” the Journal of machine Learning research, vol. 3, pp. 993–1022, 2003. [2] M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth, “The author- topic model for authors and documents,” in Proceedings of the 20th conference on Uncertainty in artificial intelligence, 2004, pp. 487–494. [3] Y. Yang, J. S. Sun, and M. W. Berry, “APPIC: Finding The Hidden Scene Behind Description Files for Android Apps.”