Aristotle University of Thessaloniki
School of Computer Science - Master Studies - Spring Semester
Web Data Mining course - Instructor: Vakali Athena
Kouroupetroglou
Praxitelis Nikolaos
Master Student
Linked data and Graph properties
● A Graph Analysis of the Linked Data Cloud (2009)
Overview
Open Linked Data Graph Analysis (1)
● Connected Components
○ Without SCC, 31 WCC
○ Top WCC Sizes: DBPedia, DBLP
● Degree Correlations
○ In-degree - Out-degree, statistical
significance
○ Degree assortativity: data sets tend to
connect to data sets with differing degrees.
● Pagerank Data Centrality
○ Top central datasets: DBLP Berlin, DBLP
Hannover, DBpedia, KEGG, UniProt, GeneID
● Communities
○ Communities based on datasets content
○ Datasets with similar content exist in the
same structural area of the graph.
● Open Linked Data Cloud 2009 analysis
● Graph Construction
○ G = (V, E) model, directed graph
● General Statistics
○ #edges: 274, #vertices: 86
○ diameter: 10, avg path length: 3.916
● Degrees - Datasets references
○ In-degree - references
■ Top Nodes: DBpedia (14), DBLP (13),
ACM (10), GeneID (10), Geonames (10)
○ Out-degree - references
■ Top Nodes: DBpedia (17), DBLP (14),
ACM (10), SiteSeer (9), EPrints (9)
● Degree Distribution
○ with log-log plot, a power law distribution fits
with a = 1.496
Open Linked Data Graph Analysis (2)
Figure from Source [1]
● Visualization:
○ Vertex: Data Sets
○ Edge: Dataset Links
○ Vertex color: denote the structural
communities
● Conclusions:
○ Open Linked Data with RDF
technologies provides data useful
for data reuse and distribution
leading to Web of Data
○ Graphs are becoming a flexible
representational data structure in
contrast to RDBMS tables
1. Rodriguez, M. A. (2009). A graph analysis of the linked data cloud. arXiv
preprint arXiv:0903.0194.
References
Linked data and Graph properties
Thank you for listening
Kouroupetroglou
Praxitelis Nikolaos

Linked data and Graph properties

  • 1.
    Aristotle University ofThessaloniki School of Computer Science - Master Studies - Spring Semester Web Data Mining course - Instructor: Vakali Athena Kouroupetroglou Praxitelis Nikolaos Master Student Linked data and Graph properties
  • 2.
    ● A GraphAnalysis of the Linked Data Cloud (2009) Overview
  • 3.
    Open Linked DataGraph Analysis (1) ● Connected Components ○ Without SCC, 31 WCC ○ Top WCC Sizes: DBPedia, DBLP ● Degree Correlations ○ In-degree - Out-degree, statistical significance ○ Degree assortativity: data sets tend to connect to data sets with differing degrees. ● Pagerank Data Centrality ○ Top central datasets: DBLP Berlin, DBLP Hannover, DBpedia, KEGG, UniProt, GeneID ● Communities ○ Communities based on datasets content ○ Datasets with similar content exist in the same structural area of the graph. ● Open Linked Data Cloud 2009 analysis ● Graph Construction ○ G = (V, E) model, directed graph ● General Statistics ○ #edges: 274, #vertices: 86 ○ diameter: 10, avg path length: 3.916 ● Degrees - Datasets references ○ In-degree - references ■ Top Nodes: DBpedia (14), DBLP (13), ACM (10), GeneID (10), Geonames (10) ○ Out-degree - references ■ Top Nodes: DBpedia (17), DBLP (14), ACM (10), SiteSeer (9), EPrints (9) ● Degree Distribution ○ with log-log plot, a power law distribution fits with a = 1.496
  • 4.
    Open Linked DataGraph Analysis (2) Figure from Source [1] ● Visualization: ○ Vertex: Data Sets ○ Edge: Dataset Links ○ Vertex color: denote the structural communities ● Conclusions: ○ Open Linked Data with RDF technologies provides data useful for data reuse and distribution leading to Web of Data ○ Graphs are becoming a flexible representational data structure in contrast to RDBMS tables
  • 5.
    1. Rodriguez, M.A. (2009). A graph analysis of the linked data cloud. arXiv preprint arXiv:0903.0194. References Linked data and Graph properties Thank you for listening Kouroupetroglou Praxitelis Nikolaos