%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
Presentation ClassRank WikidataCon 2017
1. ClassRank
Applied to Wikidata
Daniel Fernández Álvarez
Department of Computer Science
University of Oviedo
danifdezalvarez@gmail.com
Slidesahre: https://es.slideshare.net/DanielFernndezlvarez1
2. Introduction
• What is ClassRank?
• An algorithm to measure the class
relevance in RDF graphs.
• It assigns an score to each class and it
produces a ranking.
• How does it calculate that score?
• In light of the centrality of the
instances of each class.
• How does it measure centrality?
• Using PageRank algorithm.
Class
Class
Class
Class
3. Motivation (real case)
Thesis
Applications:
Improvement of social search engines
“Boost” for Linked Data
Dataset to explore knowledge dimensions (social, spatial, time…)
Model
knowledge
Natural language
in social media
+ Pattern recognition
New structured
knowledge
5. Motivation (real case)
• Discovering relevant
topics in Wikidata:
PageRank
• Summary of the top:
• Human / social
products.
• Geopolitical
subdivisions /
countries.
• Biological taxonomies
1º human 13º Mexico
2º Taxon 14º Germany
3º Species 15º Russia
4º male 16º village
5º People's Republic of China 17º street
6º village-level division in China 18º association football
7º United States of America 19º Italy
8º album 20º France
9º human settlement 21º Sweden
10º United Kingdom 22º Poland
11º Netherlands 23º film
12º female 24º genus
6. PageRank
• Google’s base
Powered by S. Brin and L. Page to be used in their
web search engine.
• Centrality measure:
Qualify each element with a score that represents its relevance regarding its
links with other elements.
• Directed graphs:
Originally design for ranking web pages, it can be applied in any kind of directed
graph.
• Quantity and quality of links:
• Incoming links increase the score.
• Links from entities with high scores have a greater influence.
8. Motivation (real case)
1º human 13º Mexico
2º Taxon 14º Germany
3º Species 15º Russia
4º male 16º village
5º People's Republic of China 17º street
6º village-level division in China 18º association football
7º United States of America 19º Italy
8º album 20º France
9º human settlement 21º Sweden
10º United Kingdom 22º Poland
11º Netherlands 23º film
12º female 24º genus
classes instances of country
9. Motivation (real case)
Classes instead of topics:
• Groupings of similar individuals: classes are hubs for
entities that share many characteristics (instances).
• Common interfaces: the instances can be queried with
SPARQL using shared properties (similar shape).
• Summarization: Class relevance helps to summarize the
content of a graph better than the relevance of specific
entities.
10. ClassRank
Hungary Finland Italy
Parlimentary
republic
PageRank: 0,1 PageRank: 0,3 PageRank: 0,2
PageRank: …
ClassRank: 0,4
PageRank: …
ClassRank: 0,6
Country
• Pagerank-based. The ClassRank score
is…
• The accumulated centrality
(PageRank score) of its instances.
• The chance of reaching one of its
instances while surfing the graph
randomly.
• Classpointers:
• We consider properties beyond
instance of and subclass of as
linkers between classes and
instances/pseudo-instances.
12. ClassRank
• Inputs:
• Graph.
• Set of classpointers.
• PageRank related params:
• Damping factor: handy to configure the probability of a random surfer to get bored of
following links and jumping to a random node.
• Iterations (fixed when we computed Wikidata’s dump).
• Thresholds θI and θC :
• They are used to filter noisy triples in some stages of the algorithm.
• Outputs:
• PageRank scores.
• ClassRank scores.
• A matrix containing which classes are pointed by which instances with
which classpointers.
13. ClassRank Applied to Wikidata*
*Computed dump: 2016/10/16. Excluding Wikimedia special items from the results
1º country 13º male
2º member state of UN 14º member of the CE
3º sovereign state 15º constitutional monarchy
4º taxon 16º male given name
5º person 17º village
6º common name 18º profession
7º class 19º species
8º taxonomic rank 20º state
9º genus 21º republic
10º human 22º admin. territ. of China
11º member state of UE 23º admin. territ. entity
12º federal republic 24º island nation
14. ClassRank vs PageRank of classes
• Different notions:
• PageRank: relevance
of the idea of the class
itself.
• ClassRank: aggregated
relevance of a group of
individuals with shared
characteristics,
represented by their
class.
1º human 13º Mexico
2º Taxon 14º Germany
3º Species 15º Russia
4º male 16º village
5º People's Republic of China 17º street
6º village-level division in China 18º association football
7º United States of America 19º Italy
8º album 20º France
9º human settlement 21º Sweden
10º United Kingdom 22º Poland
11º Netherlands 23º film
12º female 24º genus
…
1798º country
15. ClassRank vs instance counting
• Instance counting:
• Wikidata is using this measure:
https://www.wikidata.org/wiki/Wikidata:Statistics/en
• It gives you a list of classes really populated:
• You can make queries involving many elements.
• ClassRank can achieve this by setting a high value of θI.
• It does not catch the relevance of classes which cannot have many
instances:
• Country
• Ball game
• …
16. ClassRank “without classpointers”
• By using a set of classpointers formed by P31/ instance of and
P279/subclass of:
• We speed up the entire process: less discussion, fewer computations.
• We obtain relations of pure instantiation.
• We miss useful classes:
• Federal republic P122/ basic form of government.
• Female or male P21/ sex or gender
• Politician P39/ occupation
17. Differences between approaches
Rank
ClassRank
|Pc| = 62
ClassRank
|Pc| = 2
Instance counting
|Pc| = 62
Instance counting
|Pc| = 2
1º country country human human
2º member state of UN member state of UN male taxon
3º sovereign state sovereign state taxon village of China
4º taxon taxon species street
5º person person village of China human settlement
6º common name common name female village
7º class class politician album
8º taxonomic rank taxonomic rank street film
9º genus human human settlement gene
10º human member state of EU village painting
|Pc| = 62 Complete set of classpointers
|Pc| = 2 Classpointers = {P31, P279}
18. Differences between approaches
Unshared elements between top lists of ClassRank with |Pc| = 62 and other approaches
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
TOP-100 TOP-500 TOP-1000
ClassRank |Pc| = 2 Instance counting |Pc| = 62 Instance counting |Pc| = 2
(decimal)
19. 0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
TOP-100 TOP-500 TOP-1000
ClassRank |Pc| = 2 Instance counting |Pc| = 62 Instance counting |Pc| = 2
Differences between approaches
Relative rank variations between the elements shared in the top lists of
ClassRank with |Pc| = 62 and the top list of some other approaches
(decimal)
20. ClassRank Online Demo
• Features:
• ClassRank computation for small graphs online.
• ClassRank overview.
• Access to the results of Wikidata computation.
• Access to ClassRank source code in Python
(prototype).
• URL of the online demo:
• http://boa.weso.es/
• URL of ClassRank repository:
• https://github.com/DaniFdezAlvarez/classrank
21. Conclusions
• ClassRank:
• An algoritmh to measure class relevance in RDF graphs.
• PageRank-based.
• Online demo and source code available.
• Wikidata overview:
• Analysis with different approaches oriented to measure class relevance.
• Main classes: geopolitical divisions, human/human products, biological
taxonomies
• The ClassRank results over Wikidata are available online.
22. ClassRank
Applied to Wikidata
Daniel Fernández Álvarez
Department of Computer Science
University of Oviedo
danifdezalvarez@gmail.com
Slidesahre: https://es.slideshare.net/DanielFernndezlvarez1