This is the presentation slides for our paper in WCRE 2012: Understanding Android Fragmentation with Topic Analysis of Vendor-Specific Bugs (http://webdocs.cs.ualberta.ca/~chenlei1/publication/Zhang-wcre-2012.pdf).
Understanding Android Fragmentation with Topic Analysis of Vendor-Specific Bugs
1. Understanding Android Fragmentation with Topic
Analysis of Vendor-Specific Bugs
Dan Han, Chenlei Zhang, Xiaochao Fan, Abram Hindle, Kenny Wong and Eleni Stroulia
Department of Computing Science
University of Alberta
1
6. Why do we care
More than 20
Android device
manufacturers
Multiple Android versions
6
Hundreds of different
Android devices
Developers
Users
Stakeholders
7. What do we do in this study
7
Goal: search for evidence of Android
fragmentation within Android ecosystem
based on the Android bug reports
Approach: apply topic model and topic
analysis
9. Topic Model and Topic Analysis
9
Topic Model: a statistical model for discovering
abstract topics that occur in a collection of
documents.
Latent Dirichlet Allocation (LDA)
Topic Analysis: extract and evaluate the
topics from a corpus of text documents through
topic models
Traceability recovery: Asuncion et al., Lukins et al., Hindle et al.
Feature location: Marcus et al., Poshyvanyk et al., Grant et al.
Software evolution and trend analysis: Thomas et al., Martie et
al.
10. Differences between previous work and our work
10
Previous work applied unsupervised topic
models, e.g. LDA
We performed Labeled-LDA, a supervised
topic model to analyze topic evolution
We compared the performance between
LDA and Labeled-LDA on our dataset
11. LDA and Labeled-LDA
11
Labeled-LDA
A novel method applied in
software engineering so
far
Manual labeling
Supervised topic
modeling algorithm
Labeled-LDA only
predicts the relevance
between each document
and its labels
LDA
Well studied in software
engineering
Unsupervised topic
modeling algorithm
Need documents and
number of topics N as
input
LDA predicts the
relevance between each
document and all the N
topics
12. Difference between a topic and a label
12
Topic:
A word distribution extracted from bug
reports by topic models
Label:
The annotation of a document
15. Case Study
15
Android bug reports, 2008-2011, 20,000+
Vendor-specific bug reports
HTC -- 1503
Motorola --1058
http://www.puremobile.co.uk/insiderblog/wp-
content/uploads/2011/08/Motorola-
Mobile_logo.jpg
http://www.finestdaily.com/news/htc-
jetstream-to-be-launched-on-september-
6t.html/attachment/htc_cmyk_white_strapl
ine
VS
16. Create labels for Android bug reports
16
Feature-oriented labels for Android bug
reports
Android labels
Features in Android versions
e.g. Language, Bluetooth
Popular applications
e.g. Google Maps, Gmail
Hardware of Android devices
e.g. Keyboard, GPS
17. Label Android bug reports
17
60 person-hours of manual labeling
effort
Labeled bug reports are public now
HTC – 72 labels in total Motorola – 58 labels in total
19. Apply LDA
19
Try a range of N to find the most distinct topics
Label each topic using our manual labels for the
bug reports of HTC and Motorola
2 hours of labeling effort
20. Comparing LDA and Labeled-LDA
20
Each topic model generates the document-topic
matrix
Determine if LDA generates similar results to
Labeled-LDA
Compute and compare the Jaccard similarity of
documents related to each topic generated by
LDA and Labeled-LDA
24. Comparing Topic Models in Motorola
24
Pairwise Jaccard Similarity between each topic in LDA and Labeled-LDA
LDA
Labeled-LDA
25. Conclusion of comparing LDA and Labeled-LDA
25
Mean Jaccard similarities of the diagonal
entries are 0.2 for HTC and 0.08 for Motorola
The number of bug reports related to same
labels in LDA and Labeled-LDA are different
( tests: p<0.01) for both HTC and Motorola
Labeled-LDA produced more feature relevant
topics than LDA
2
33. Fragmentation Discussion
33
Software-Based Fragmentation
New features and changes contribute the bug reports
Difficult to test across all of the vendor and product-lines
Relevance of common topic “bluetooth” in HTC and Motorola
Android 2.1 Android 2.2
34. Fragmentation Discussion
34
Hardware-Based Fragmentation
Different product lines were associated with different
topics
Evident by differing bug topics and product specific issues
Label HTC Motorola
display screen, version, desire,
behavior, app, home,
number, code, final, press,
sure, user, black, new,
power
droid, screen, button,
correct, home, display,
behavior, landscape, 2.1,
menu, bar, xoom, device,
user, status
36. Conclusion
36
Found how fragmentation is manifested within Android
between HTC and Motorola
Incompatibility issues
Portability issues
Compared the performance of Labeled-LDA and LDA
Labeled-LDA produced more feature relevant
topics than LDA
Labeled-LDA need more manual effort
http://softwareprocess.es/static/Fragmentation.html
(http://goo.gl/SwGDT)
Could be useful to make project dashboards,
process mining and software process recovery
Editor's Notes
According to the time-series relevance evolution of each topic, we categorized these topics into common topics and unique topics. The common topics represent the same labels that are shared between both vendors, and theyshare similar evolution of the average relevance over time.The unique topics arewith significantly different topic relevance over time which is more specific to one vendor than the other.
In our study, there are 14 common topics shared between the vendors. Let’s take a look at one of the most frequent topics, bluetooth. This table shows the “BlueTooth” topic and associated word list with related top 15 terms generated by Labeled-LDA for each vendor. We can see both vendors share many identical topic words. The bottom figure is the evolution of average relevance of “bluetooth”in HTC and Motorola over time. From the figure, we can observe the bluetooth topic has a cross vendor peak with the release of both Android 2.1 and 2.2.
Another feature of common topics is the topics of each vendor tend to have vendor specific terms. By reading the bug reports, we found In HTC, there are 9 topics share the term “desire” which refers to the HTC Desire phone. In Motorola topics, they share the term “Droid” and “Xoom” which refer to Motorola Droid phone and Xoom tablet. It implies the evidence that the different product lines face different issues.For example, there is the display issues between Motorola Droid phone and Xoom tablet because of the screen size.
Keyboard is one of the unique topics in HTC. We can see there are two peaks in the figure. By reading the bug reports, we found most HTC devices have no physical keyboards, this virtual keyboard feature is frequently used by HTC users. In contrast, Motorola’s Android devices tend to have physical keyboards, which might explain the lack of bug activities in the Motorola bug reports. This figure shows that HTC keyboard relevance peaks and drops out, while keyboard in Motorola is steady. This behavior suggests that hardware and software configuration dictate the importance of the keyboard topic.
GPS is one of unique topics in Motorola. By reading the bug reports and the history brief of Android releases, we found Motorola and HTC use the different GPS software before Android 2.2. Motorola Droid smart phone use Google Map as GPS navigation service from Android 2.2. As a result, this new feature contributes three peaks for Motorola in the figure.
In our study, we found how fragmentation is manifested within Android by comparing and contrasting the bug reports between two Android smart phone vendors: HTC and Motorola.Based on Labeled-LDA topic analysis we found different topics tended to be associated with their different products, providing even more evidence of vendor specific fragmentation. As a result, hardware-based fragmentation in Android is evident by differing bug topics and product specific issues. moreover, from word list of the different topics, Software-based and hardware-based fragmentation within Android appears thought incompatibility issues and portability issues. On the other hand, we compared the performance of Labeled-LDA and LDA. We found Labeled-LDA produced more feature relevant topics than LDA. However, applied Labeled-LDA need more efforts than LDA. You can download our labeled dataset from the following link.Finally, Our findings can be used to make project dashboards, process mining and software process recovery.