AndroidDescriptionsAndPermissions

Finding the Hidden Scenes Behind Android Applications
Joey Allen
Mentor: Xiangyu Niu
CURENT REU Program: Final Presentation
7/16/2014

Previous Work
• Crawled Google Play Store
• Scraped Descriptions, Author, and Categories of
Applications
• Applied LDA Model
• Descriptions
• Permissions
• Applied Author Topic Model
• Descriptions

APPIC Framework
Figure 1. Flow Chart of APPIC Framework.
1.  User Requests to Download
App A.
2.  Description, Category, and
Permissions are filtered.
3.  Category is assigned to Ca.
4.  Embedded Topic models
auto-tag the description, Sa,
and permissions, pa.
5.  Ca , Sa , and pa are compared.
6.  If they all match, the app is
considered safe.

LDA MODEL
•  Latent Dirichlet Allocation (LDA) is a generative probabilistic
model for collections of discrete data such as a text corpora
[1].
•  The LDA Model creates topics that are distributions over words.
•  The words in a document can then be compared to a set of
topics, and a category can be chosen for a document.
Figure 2. Graphical Representations of LDA Model [1].

Author Topic Model
•  Author-topic model is a generative model for documents that
extends LDA to include authorship information [2].
•  Authors are distributed over topics and topics are distributed
over words.
Figure 3. Graphical Model of Author-Topic Model [2].

Calculating Results
User Reads
Application
Description
Compare
APPIC tags
with Author’s
Tags
CI = Correct
Inference
II = Incorrect
Inference
APPIC finds App
in wrong
category.
(CI + 1)
APPIC incorrectly
categorizes
application
(II + 1)
APPIC and
author incorrectly
categorize app.
(II + 1)
APPIC and
author incorrectly
categorize app.
(II + 1)
Accuracy =
CI
II +CI
(5) Calculating Accuracy

LDA Results (Descriptions)
0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1
Accuracy

Categories

Accuracy
vs.
Catagory
(LDA
Model)

3
Tags
2
Tags

LDA Results (Permissions)
0
0.2
0.4
0.6
0.8
1
1.2
Accuracy
Categories
Categories vs. Accuracy

AT Results (Descriptions)
0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Accuracy

Categories

Accuracy
vs.
Categories
(AT
Model)

Comparison of Results
Topic Model Results
LDA (3 Tags) 83%
LDA (2 Tags) 64%
Author-topic 58%
PLDA [3] 88% [3]
Topic Model Results
LDA (4 Tags) 34%
PDLA [3] 77% [3]

Conclusion
•  LDA performed better than AT at categorizing descriptions.
•  More tags increase accuracy but decrease efficiency.
•  AT model was not as accurate in categorizing applications.
•  Useful for finding authors that create similar apps

Future Work
•  Find a better method to calculate accuracy.
•  Learn a different method to categorize permissions
•  Dependencies between permissions and descriptions.
•  Modify AT Model

D
Document
Author-Topic Model (Modified)
β ϕ
T
Topic distribution over
words
w
Word
z
Topic
α θ
A
Distribution of permissions over topics
x
Nd
Permissions
pd
Uniform distribution of
documents over
permissions

References
{slide #}
[1] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” the Journal of
machine Learning research, vol. 3, pp. 993–1022, 2003.
[2] M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth, “The author- topic model for
authors and documents,” in Proceedings of the 20th conference on Uncertainty
in artificial intelligence, 2004, pp. 487–494.
[3] Y. Yang, J. S. Sun, and M. W. Berry, “APPIC: Finding The Hidden Scene Behind
Description Files for Android Apps.”

AndroidDescriptionsAndPermissions

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (11)

Similar to AndroidDescriptionsAndPermissions

Similar to AndroidDescriptionsAndPermissions (20)

AndroidDescriptionsAndPermissions