PRADA: Prioritizing Android
Devices for Apps by Mining
Large-Scale Usage Data
Presented by: Akshay Mittal
Course: CS 5393
Professor: Dr. Guowei Yang
Authors
• Lu, Xuan;
• Liu, Xuanzhe;
• Li, Huoran;
• Xie, Tao; Mei,
• Qiaozhu;
• Hao, Dan;
• Huang, Gang;
10/20/2016 CS5393 Software Quality 2
Overview
• Motivation
• Approach
• Evaluation
• Strength
• Weakness
• Conclusion
• Q&A
10/20/2016 CS5393 Software Quality 3
Overview of PRADA Approach
10/20/2016 CS5393 Software Quality 4
Motivation
• Android Fragmentation that is,
concern over the alarming
number of different
available Android operating
system (OS) versions in the
market.
• to explore whether we can
make accurate estimates for
a new app.
10/20/2016 CS5393 Software Quality 5
10/20/2016 CS5393 Software Quality 6
Approach
Wandoujia
•Native Management
Tool
In Nutshell
•Usage Data Collection
•Similar App Selection
•Device Model Clustering
•Collaborative Filtering
• Operational profiling-how to
increase productivity and
reliability.
10/20/2016 CS5393 Software Quality 7
Wandoujia
Features
Network Activity
statistics
Permission
Monitoring
Content
recommendation
10/20/2016 CS5393 Software Quality 8
Approach (cont..)
Effectiveness Metrics
Device
Model Hit
Average
Precision
Usage Data
Coverage
10/20/2016 CS5393 Software Quality 9
Time-Share Driven Prioritization
• Browsing time on app • Collaborative Filtering by Time
Share
It Uses, Leave-One-Out Cross-
Validation (LOOCV)
10/20/2016 CS5393 Software Quality 10
Algorithm-I
Device model hit (DH), time
share coverage (T C) and
average precision (AP) against
top N device models with K
apps in the same category
10/20/2016 CS5393 Software Quality 11
Evaluation
1. Device Model Distribution
• Explains, RQ1: How many
device models account for
the majority of the browsing
time?
• Uses, Pareto Principle
2. Predicting Top Device Model
• Explains, RQ2: How effectively
can PRADA identify major
device models for a new app
given that developers have
no knowledge about this
app’s actual usage?
10/20/2016 CS5393 Software Quality 12
Device Model Distribution Statistics
10/20/2016 CS5393 Software Quality 13
Number of device models and users that use top 100 apps
from each of the two categories.
Results from Predicting Top Device Models
Using collaborative filtering algorithm on Game and Media Apps
10/20/2016 CS5393 Software Quality 14
Top 10 device models with the most time share for two apps (Temple Run 2 and Xunlei Movie),
and the selected device models by AppBrain, Wandoujia, and PRADA.
Comparison of Device Model Hit, Time Share Coverage, and AP by using market share
and PRADA to recommend top 10 device models for Game apps
Results of Device Model Hit, Time Share Coverage, and AP of top 10 device
models that are predicted by PRADA for 100 apps in each category, i.e., N =10
and K=100.
10/20/2016 CS5393 Software Quality 15
Strength of Prada
10/20/2016 CS5393 Software Quality 16
 Mining from large scale of data
 Leverage usage data
 Satisfactory accuracy
 Operational Profiling
Weakness
• Restricted on only two network categories
• Need access to existing data usage
• Not accurate for offline apps
• Relies on accuracy of Wandoujia
10/20/2016 CS5393 Software Quality 17
Related work
• A framework for detecting similar mobile applications by online
kernel learning.
• Rescaling reliability bounds for a new operational profile
• Mining large-scale smartphone data for personality studies.
• Prioritizing the devices to test your app on: A case study of
Android game apps.
• Understanding Android fragmentation with topic analysis of
vendor-specific bugs.
10/20/2016 CS5393 Software Quality 18
Conclusion
PRADA includes a collaborative filtering
technique to accurately predict major
device models for a new app, given the
usage data from existing apps with similar
functionalities.
Future work a) impact of localization on
device model prioritization.
b) how to cluster device models at different
granularities.
10/20/2016 CS5393 Software Quality 19
Questions for Deeper Analysis
• How the system can be efficient without
the time-share-based technique not
included in Wandoujia dataset?
• Why only browsing time is main parameter
in analysis?
10/20/2016 CS5393 Software Quality 20
Any Questions?
10/20/2016 CS5393 Software Quality 21

PRADA

  • 1.
    PRADA: Prioritizing Android Devicesfor Apps by Mining Large-Scale Usage Data Presented by: Akshay Mittal Course: CS 5393 Professor: Dr. Guowei Yang
  • 2.
    Authors • Lu, Xuan; •Liu, Xuanzhe; • Li, Huoran; • Xie, Tao; Mei, • Qiaozhu; • Hao, Dan; • Huang, Gang; 10/20/2016 CS5393 Software Quality 2
  • 3.
    Overview • Motivation • Approach •Evaluation • Strength • Weakness • Conclusion • Q&A 10/20/2016 CS5393 Software Quality 3
  • 4.
    Overview of PRADAApproach 10/20/2016 CS5393 Software Quality 4
  • 5.
    Motivation • Android Fragmentationthat is, concern over the alarming number of different available Android operating system (OS) versions in the market. • to explore whether we can make accurate estimates for a new app. 10/20/2016 CS5393 Software Quality 5
  • 6.
  • 7.
    Approach Wandoujia •Native Management Tool In Nutshell •UsageData Collection •Similar App Selection •Device Model Clustering •Collaborative Filtering • Operational profiling-how to increase productivity and reliability. 10/20/2016 CS5393 Software Quality 7
  • 8.
  • 9.
    Approach (cont..) Effectiveness Metrics Device ModelHit Average Precision Usage Data Coverage 10/20/2016 CS5393 Software Quality 9
  • 10.
    Time-Share Driven Prioritization •Browsing time on app • Collaborative Filtering by Time Share It Uses, Leave-One-Out Cross- Validation (LOOCV) 10/20/2016 CS5393 Software Quality 10
  • 11.
    Algorithm-I Device model hit(DH), time share coverage (T C) and average precision (AP) against top N device models with K apps in the same category 10/20/2016 CS5393 Software Quality 11
  • 12.
    Evaluation 1. Device ModelDistribution • Explains, RQ1: How many device models account for the majority of the browsing time? • Uses, Pareto Principle 2. Predicting Top Device Model • Explains, RQ2: How effectively can PRADA identify major device models for a new app given that developers have no knowledge about this app’s actual usage? 10/20/2016 CS5393 Software Quality 12
  • 13.
    Device Model DistributionStatistics 10/20/2016 CS5393 Software Quality 13 Number of device models and users that use top 100 apps from each of the two categories.
  • 14.
    Results from PredictingTop Device Models Using collaborative filtering algorithm on Game and Media Apps 10/20/2016 CS5393 Software Quality 14 Top 10 device models with the most time share for two apps (Temple Run 2 and Xunlei Movie), and the selected device models by AppBrain, Wandoujia, and PRADA.
  • 15.
    Comparison of DeviceModel Hit, Time Share Coverage, and AP by using market share and PRADA to recommend top 10 device models for Game apps Results of Device Model Hit, Time Share Coverage, and AP of top 10 device models that are predicted by PRADA for 100 apps in each category, i.e., N =10 and K=100. 10/20/2016 CS5393 Software Quality 15
  • 16.
    Strength of Prada 10/20/2016CS5393 Software Quality 16  Mining from large scale of data  Leverage usage data  Satisfactory accuracy  Operational Profiling
  • 17.
    Weakness • Restricted ononly two network categories • Need access to existing data usage • Not accurate for offline apps • Relies on accuracy of Wandoujia 10/20/2016 CS5393 Software Quality 17
  • 18.
    Related work • Aframework for detecting similar mobile applications by online kernel learning. • Rescaling reliability bounds for a new operational profile • Mining large-scale smartphone data for personality studies. • Prioritizing the devices to test your app on: A case study of Android game apps. • Understanding Android fragmentation with topic analysis of vendor-specific bugs. 10/20/2016 CS5393 Software Quality 18
  • 19.
    Conclusion PRADA includes acollaborative filtering technique to accurately predict major device models for a new app, given the usage data from existing apps with similar functionalities. Future work a) impact of localization on device model prioritization. b) how to cluster device models at different granularities. 10/20/2016 CS5393 Software Quality 19
  • 20.
    Questions for DeeperAnalysis • How the system can be efficient without the time-share-based technique not included in Wandoujia dataset? • Why only browsing time is main parameter in analysis? 10/20/2016 CS5393 Software Quality 20
  • 21.