SlideShare a Scribd company logo
1 of 22
ClassRank
Applied to Wikidata
Daniel Fernández Álvarez
Department of Computer Science
University of Oviedo
danifdezalvarez@gmail.com
Slidesahre: https://es.slideshare.net/DanielFernndezlvarez1
Introduction
• What is ClassRank?
• An algorithm to measure the class
relevance in RDF graphs.
• It assigns an score to each class and it
produces a ranking.
• How does it calculate that score?
• In light of the centrality of the
instances of each class.
• How does it measure centrality?
• Using PageRank algorithm.
Class
Class
Class
Class
Motivation (real case)
Thesis
Applications:
Improvement of social search engines
“Boost” for Linked Data
Dataset to explore knowledge dimensions (social, spatial, time…)
Model
knowledge
Natural language
in social media
+ Pattern recognition
New structured
knowledge
Motivation (real case)
Thesis
Question:
Which are the most swapped topics between Wikidata and Twitter/Reddit?
+ Pattern recognition
New structured
knowledge
Motivation (real case)
• Discovering relevant
topics in Wikidata:
PageRank
• Summary of the top:
• Human / social
products.
• Geopolitical
subdivisions /
countries.
• Biological taxonomies
1º human 13º Mexico
2º Taxon 14º Germany
3º Species 15º Russia
4º male 16º village
5º People's Republic of China 17º street
6º village-level division in China 18º association football
7º United States of America 19º Italy
8º album 20º France
9º human settlement 21º Sweden
10º United Kingdom 22º Poland
11º Netherlands 23º film
12º female 24º genus
PageRank
• Google’s base
Powered by S. Brin and L. Page to be used in their
web search engine.
• Centrality measure:
Qualify each element with a score that represents its relevance regarding its
links with other elements.
• Directed graphs:
Originally design for ranking web pages, it can be applied in any kind of directed
graph.
• Quantity and quality of links:
• Incoming links increase the score.
• Links from entities with high scores have a greater influence.
Motivation (real case)
… what now?
People's Republic of China
Places
Sports
Music
People Arts
Motivation (real case)
1º human 13º Mexico
2º Taxon 14º Germany
3º Species 15º Russia
4º male 16º village
5º People's Republic of China 17º street
6º village-level division in China 18º association football
7º United States of America 19º Italy
8º album 20º France
9º human settlement 21º Sweden
10º United Kingdom 22º Poland
11º Netherlands 23º film
12º female 24º genus
classes instances of country
Motivation (real case)
Classes instead of topics:
• Groupings of similar individuals: classes are hubs for
entities that share many characteristics (instances).
• Common interfaces: the instances can be queried with
SPARQL using shared properties (similar shape).
• Summarization: Class relevance helps to summarize the
content of a graph better than the relevance of specific
entities.
ClassRank
Hungary Finland Italy
Parlimentary
republic
PageRank: 0,1 PageRank: 0,3 PageRank: 0,2
PageRank: …
ClassRank: 0,4
PageRank: …
ClassRank: 0,6
Country
• Pagerank-based. The ClassRank score
is…
• The accumulated centrality
(PageRank score) of its instances.
• The chance of reaching one of its
instances while surfing the graph
randomly.
• Classpointers:
• We consider properties beyond
instance of and subclass of as
linkers between classes and
instances/pseudo-instances.
Classpointers
Core properties
P31/ instance of
P279/ subclass of
Some other examples
P106/ occupation
P122/ basic form of government
P412/ voice type
P136/ genre
…
ClassRank
• Inputs:
• Graph.
• Set of classpointers.
• PageRank related params:
• Damping factor: handy to configure the probability of a random surfer to get bored of
following links and jumping to a random node.
• Iterations (fixed when we computed Wikidata’s dump).
• Thresholds θI and θC :
• They are used to filter noisy triples in some stages of the algorithm.
• Outputs:
• PageRank scores.
• ClassRank scores.
• A matrix containing which classes are pointed by which instances with
which classpointers.
ClassRank Applied to Wikidata*
*Computed dump: 2016/10/16. Excluding Wikimedia special items from the results
1º country 13º male
2º member state of UN 14º member of the CE
3º sovereign state 15º constitutional monarchy
4º taxon 16º male given name
5º person 17º village
6º common name 18º profession
7º class 19º species
8º taxonomic rank 20º state
9º genus 21º republic
10º human 22º admin. territ. of China
11º member state of UE 23º admin. territ. entity
12º federal republic 24º island nation
ClassRank vs PageRank of classes
• Different notions:
• PageRank: relevance
of the idea of the class
itself.
• ClassRank: aggregated
relevance of a group of
individuals with shared
characteristics,
represented by their
class.
1º human 13º Mexico
2º Taxon 14º Germany
3º Species 15º Russia
4º male 16º village
5º People's Republic of China 17º street
6º village-level division in China 18º association football
7º United States of America 19º Italy
8º album 20º France
9º human settlement 21º Sweden
10º United Kingdom 22º Poland
11º Netherlands 23º film
12º female 24º genus
…
1798º country
ClassRank vs instance counting
• Instance counting:
• Wikidata is using this measure:
https://www.wikidata.org/wiki/Wikidata:Statistics/en
• It gives you a list of classes really populated:
• You can make queries involving many elements.
• ClassRank can achieve this by setting a high value of θI.
• It does not catch the relevance of classes which cannot have many
instances:
• Country
• Ball game
• …
ClassRank “without classpointers”
• By using a set of classpointers formed by P31/ instance of and
P279/subclass of:
• We speed up the entire process: less discussion, fewer computations.
• We obtain relations of pure instantiation.
• We miss useful classes:
• Federal republic  P122/ basic form of government.
• Female or male  P21/ sex or gender
• Politician  P39/ occupation
Differences between approaches
Rank
ClassRank
|Pc| = 62
ClassRank
|Pc| = 2
Instance counting
|Pc| = 62
Instance counting
|Pc| = 2
1º country country human human
2º member state of UN member state of UN male taxon
3º sovereign state sovereign state taxon village of China
4º taxon taxon species street
5º person person village of China human settlement
6º common name common name female village
7º class class politician album
8º taxonomic rank taxonomic rank street film
9º genus human human settlement gene
10º human member state of EU village painting
|Pc| = 62  Complete set of classpointers
|Pc| = 2  Classpointers = {P31, P279}
Differences between approaches
Unshared elements between top lists of ClassRank with |Pc| = 62 and other approaches
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
TOP-100 TOP-500 TOP-1000
ClassRank |Pc| = 2 Instance counting |Pc| = 62 Instance counting |Pc| = 2
(decimal)
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
TOP-100 TOP-500 TOP-1000
ClassRank |Pc| = 2 Instance counting |Pc| = 62 Instance counting |Pc| = 2
Differences between approaches
Relative rank variations between the elements shared in the top lists of
ClassRank with |Pc| = 62 and the top list of some other approaches
(decimal)
ClassRank Online Demo
• Features:
• ClassRank computation for small graphs online.
• ClassRank overview.
• Access to the results of Wikidata computation.
• Access to ClassRank source code in Python
(prototype).
• URL of the online demo:
• http://boa.weso.es/
• URL of ClassRank repository:
• https://github.com/DaniFdezAlvarez/classrank
Conclusions
• ClassRank:
• An algoritmh to measure class relevance in RDF graphs.
• PageRank-based.
• Online demo and source code available.
• Wikidata overview:
• Analysis with different approaches oriented to measure class relevance.
• Main classes: geopolitical divisions, human/human products, biological
taxonomies
• The ClassRank results over Wikidata are available online.
ClassRank
Applied to Wikidata
Daniel Fernández Álvarez
Department of Computer Science
University of Oviedo
danifdezalvarez@gmail.com
Slidesahre: https://es.slideshare.net/DanielFernndezlvarez1

More Related Content

More from Daniel Fernández Álvarez (6)

Mini tutorial rdflib
Mini tutorial rdflibMini tutorial rdflib
Mini tutorial rdflib
 
Wikidata: qué es y cómo subirse al carro
Wikidata: qué es y cómo subirse al carroWikidata: qué es y cómo subirse al carro
Wikidata: qué es y cómo subirse al carro
 
Presentation shexer
Presentation shexerPresentation shexer
Presentation shexer
 
Wikidata intro
Wikidata introWikidata intro
Wikidata intro
 
Presentation to KILT
Presentation to KILTPresentation to KILT
Presentation to KILT
 
Slides SEMAPRO 2016 University of Oviedo
Slides SEMAPRO 2016 University of OviedoSlides SEMAPRO 2016 University of Oviedo
Slides SEMAPRO 2016 University of Oviedo
 

Recently uploaded

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 

Presentation ClassRank WikidataCon 2017

  • 1. ClassRank Applied to Wikidata Daniel Fernández Álvarez Department of Computer Science University of Oviedo danifdezalvarez@gmail.com Slidesahre: https://es.slideshare.net/DanielFernndezlvarez1
  • 2. Introduction • What is ClassRank? • An algorithm to measure the class relevance in RDF graphs. • It assigns an score to each class and it produces a ranking. • How does it calculate that score? • In light of the centrality of the instances of each class. • How does it measure centrality? • Using PageRank algorithm. Class Class Class Class
  • 3. Motivation (real case) Thesis Applications: Improvement of social search engines “Boost” for Linked Data Dataset to explore knowledge dimensions (social, spatial, time…) Model knowledge Natural language in social media + Pattern recognition New structured knowledge
  • 4. Motivation (real case) Thesis Question: Which are the most swapped topics between Wikidata and Twitter/Reddit? + Pattern recognition New structured knowledge
  • 5. Motivation (real case) • Discovering relevant topics in Wikidata: PageRank • Summary of the top: • Human / social products. • Geopolitical subdivisions / countries. • Biological taxonomies 1º human 13º Mexico 2º Taxon 14º Germany 3º Species 15º Russia 4º male 16º village 5º People's Republic of China 17º street 6º village-level division in China 18º association football 7º United States of America 19º Italy 8º album 20º France 9º human settlement 21º Sweden 10º United Kingdom 22º Poland 11º Netherlands 23º film 12º female 24º genus
  • 6. PageRank • Google’s base Powered by S. Brin and L. Page to be used in their web search engine. • Centrality measure: Qualify each element with a score that represents its relevance regarding its links with other elements. • Directed graphs: Originally design for ranking web pages, it can be applied in any kind of directed graph. • Quantity and quality of links: • Incoming links increase the score. • Links from entities with high scores have a greater influence.
  • 7. Motivation (real case) … what now? People's Republic of China Places Sports Music People Arts
  • 8. Motivation (real case) 1º human 13º Mexico 2º Taxon 14º Germany 3º Species 15º Russia 4º male 16º village 5º People's Republic of China 17º street 6º village-level division in China 18º association football 7º United States of America 19º Italy 8º album 20º France 9º human settlement 21º Sweden 10º United Kingdom 22º Poland 11º Netherlands 23º film 12º female 24º genus classes instances of country
  • 9. Motivation (real case) Classes instead of topics: • Groupings of similar individuals: classes are hubs for entities that share many characteristics (instances). • Common interfaces: the instances can be queried with SPARQL using shared properties (similar shape). • Summarization: Class relevance helps to summarize the content of a graph better than the relevance of specific entities.
  • 10. ClassRank Hungary Finland Italy Parlimentary republic PageRank: 0,1 PageRank: 0,3 PageRank: 0,2 PageRank: … ClassRank: 0,4 PageRank: … ClassRank: 0,6 Country • Pagerank-based. The ClassRank score is… • The accumulated centrality (PageRank score) of its instances. • The chance of reaching one of its instances while surfing the graph randomly. • Classpointers: • We consider properties beyond instance of and subclass of as linkers between classes and instances/pseudo-instances.
  • 11. Classpointers Core properties P31/ instance of P279/ subclass of Some other examples P106/ occupation P122/ basic form of government P412/ voice type P136/ genre …
  • 12. ClassRank • Inputs: • Graph. • Set of classpointers. • PageRank related params: • Damping factor: handy to configure the probability of a random surfer to get bored of following links and jumping to a random node. • Iterations (fixed when we computed Wikidata’s dump). • Thresholds θI and θC : • They are used to filter noisy triples in some stages of the algorithm. • Outputs: • PageRank scores. • ClassRank scores. • A matrix containing which classes are pointed by which instances with which classpointers.
  • 13. ClassRank Applied to Wikidata* *Computed dump: 2016/10/16. Excluding Wikimedia special items from the results 1º country 13º male 2º member state of UN 14º member of the CE 3º sovereign state 15º constitutional monarchy 4º taxon 16º male given name 5º person 17º village 6º common name 18º profession 7º class 19º species 8º taxonomic rank 20º state 9º genus 21º republic 10º human 22º admin. territ. of China 11º member state of UE 23º admin. territ. entity 12º federal republic 24º island nation
  • 14. ClassRank vs PageRank of classes • Different notions: • PageRank: relevance of the idea of the class itself. • ClassRank: aggregated relevance of a group of individuals with shared characteristics, represented by their class. 1º human 13º Mexico 2º Taxon 14º Germany 3º Species 15º Russia 4º male 16º village 5º People's Republic of China 17º street 6º village-level division in China 18º association football 7º United States of America 19º Italy 8º album 20º France 9º human settlement 21º Sweden 10º United Kingdom 22º Poland 11º Netherlands 23º film 12º female 24º genus … 1798º country
  • 15. ClassRank vs instance counting • Instance counting: • Wikidata is using this measure: https://www.wikidata.org/wiki/Wikidata:Statistics/en • It gives you a list of classes really populated: • You can make queries involving many elements. • ClassRank can achieve this by setting a high value of θI. • It does not catch the relevance of classes which cannot have many instances: • Country • Ball game • …
  • 16. ClassRank “without classpointers” • By using a set of classpointers formed by P31/ instance of and P279/subclass of: • We speed up the entire process: less discussion, fewer computations. • We obtain relations of pure instantiation. • We miss useful classes: • Federal republic  P122/ basic form of government. • Female or male  P21/ sex or gender • Politician  P39/ occupation
  • 17. Differences between approaches Rank ClassRank |Pc| = 62 ClassRank |Pc| = 2 Instance counting |Pc| = 62 Instance counting |Pc| = 2 1º country country human human 2º member state of UN member state of UN male taxon 3º sovereign state sovereign state taxon village of China 4º taxon taxon species street 5º person person village of China human settlement 6º common name common name female village 7º class class politician album 8º taxonomic rank taxonomic rank street film 9º genus human human settlement gene 10º human member state of EU village painting |Pc| = 62  Complete set of classpointers |Pc| = 2  Classpointers = {P31, P279}
  • 18. Differences between approaches Unshared elements between top lists of ClassRank with |Pc| = 62 and other approaches 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 TOP-100 TOP-500 TOP-1000 ClassRank |Pc| = 2 Instance counting |Pc| = 62 Instance counting |Pc| = 2 (decimal)
  • 19. 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 TOP-100 TOP-500 TOP-1000 ClassRank |Pc| = 2 Instance counting |Pc| = 62 Instance counting |Pc| = 2 Differences between approaches Relative rank variations between the elements shared in the top lists of ClassRank with |Pc| = 62 and the top list of some other approaches (decimal)
  • 20. ClassRank Online Demo • Features: • ClassRank computation for small graphs online. • ClassRank overview. • Access to the results of Wikidata computation. • Access to ClassRank source code in Python (prototype). • URL of the online demo: • http://boa.weso.es/ • URL of ClassRank repository: • https://github.com/DaniFdezAlvarez/classrank
  • 21. Conclusions • ClassRank: • An algoritmh to measure class relevance in RDF graphs. • PageRank-based. • Online demo and source code available. • Wikidata overview: • Analysis with different approaches oriented to measure class relevance. • Main classes: geopolitical divisions, human/human products, biological taxonomies • The ClassRank results over Wikidata are available online.
  • 22. ClassRank Applied to Wikidata Daniel Fernández Álvarez Department of Computer Science University of Oviedo danifdezalvarez@gmail.com Slidesahre: https://es.slideshare.net/DanielFernndezlvarez1