1. Mobipedia
Building a Mobile Applications Knowledge Base for
the Linked Data Cloud
Primal Pappachan, Roberto Yus, Prajit Kumar Das,
Sharad Mehrotra, Tim Finin, and Anupam Joshi
bit.ly/MPSlides
7. Our Goal
Build a Knowledge Base for apps
1. Create an ontology for apps
2. Identify interesting sources
3. Add semantics to the information
4. Interlink the KB with others
5. Establish access mechanisms
12. Current Sources
1. Scalable Google Play Store Crawler:
includes 1.1 M apps (Viennot et. al
2014)
2. Website which assigns privacy grades
to apps
3. Android permissions model: includes
152 official permissions used by
Android apps
1
2
3
16. Extraction Process
• Download the data
• JSON files (PlayDrone)
• HTML files (PrivacyGrade, Android
Permissions)
• Crawler - crawler4j
• Parse the data
• JSON – GSON
• HTML - jsoup
17. Semantic Labeling
• Match data with entities
o Mobipedia’s ontology
o Custom code
• Create RDF data
o We used the OWL API
20. Linking with DBpedia
• Why not Xpedia?
o We couldn’t find other KBs talking about Apps
• Found two categories related to apps
o Android_(operating_system)_software: 409
o Mobile_software: 221
• Filtered for duplicates and identified 600 entities
• Linked using owl:sameAs property
o Retrieve list of links based on name of the entity
o Manually select the appropriate ones
21. Benefits
• Centralized repository with different sources
• Common format for representation of Apps
• Derive inferences from different sources of related
information
24. SPARQL Query Support
• SELECT App
• WHERE
o App description contains “flashlight” and
o App has version Version
o Version has Permission
• Group by App and order by Permissions
27. Semantic Search
• Support Searches like
o Superhero games with parental control
o Todo list with location reminder
• Translate search terms to SPARQL queries (Han et.
al.)
• Execute converted queries on Mobipedia for results
• Convert RDF results into App store combatable
format
Lushan Han, Tim Finin, and Anupam Joshi. 2011. GoRelations: an intuitive query system for DBpedia.
In Proceedings of the 2011 joint international conference on The Semantic Web(JIST'11), 27
28. Recommendation
Apps
• Requires user history of
app usage
• Augment with user
context
28
App Permissions
• Verify permissions and
third party libraries used
by similar apps
• Evolution of a privacy
guideline
29. Policy Representation
• Rule languages used to represent policies about
apps
• Capture user preferences about apps and what
data it can access
• Leverage Mobipedia for representation of
concepts and obtaining data for policies
30. Others
• Linking application user experiences
o Capture user experiences during app usage and link to the app
entity
• Mining app reviews
o Link concepts from app reviews to apps itself capturing user
sentiment in the knowledge base
• Saving the world one App at a time ;)
31. Where do we go now
• Include more data sources
o Android Malware Genome Project
o BlueSeal project
o Amazon.com, GetJar, Apple App store, Baidu store, Tencent App
Gem
• Community participation
o User submissions
o Moderation
o Tools for easy contribution
32. Take aways
• Mobipedia is the Knowledge Base of mobile apps
• Current version has information of ~1 million
• Three Access Mechanisms
o Linked Data Interface
o SPARQL endpoint
o RDF Dumps
Thank you NSF for the travel grant
Go to http://mobipedia.link today
If there's something strange with the semantics, Who you gonna call
Dublin Core Metadata Initiative (DCMI) and Description of a Project (DOAP) which are used to describe web resources and software projects. either of them aren't focused on mobile development, the concepts and properties in those vocabularies didn't match the requirements for modeling of mobile apps. We subclassed some of the terms in DCMI ontology using \texttt{owl:subClassOf} to define the terms in Mobipedia ontology.
Google Play store has no public API which we could use for crawling it. So we looked around to see if someone had already released a public dataset with metadata about the apps.
152 official permissions. Other custom permissions can be included too.
Why OWL API? Reuse of code developed for other projects
setup SPARQL endpoint using OpenVirtuoso Project
We generated the Linked Data interface for the SPARQL endpoint by using the Pubby project17. Pubby is a Java web application which translates URIs which are not dereferenceable to dereferenceable URIs by connecting to the SPARQL endpoint.
Can you see the Mobipedia circle? Are you looking closely?
Execute the query on the website and show the results <- LIVE
Android Malware Genome Project (one of the first) – Genetic makeup of Android malware, malware families, collected 1200 apps and classified them into malware families and identified what these apps do at various points of lifetime
BlueSeal – proposed different permission model for apps and analyzed flow of permissions or information and classifies apps into flow categories