SlideShare a Scribd company logo
1 of 17
Download to read offline
Discovering	
  Related	
  Data	
  Sources	
  	
  
in	
  Data	
  Portals	
  
	
  
Andreas	
  Wagner,	
  Peter	
  Haase,	
  	
  
Achim	
  Re4nger,	
  Holger	
  Lamm	
  

1st	
  Interna:onal	
  Workshop	
  on	
  Seman:c	
  Sta:s:cs	
  
Sydney,	
  Oct	
  22,	
  2013	
  

	
  
Poten&al	
  of	
  Open	
  (Sta&s&cs)	
  Data	
  

WORLD BANK
fluidOps	
  Open	
  Data	
  Portal	
  
•  Data	
  collec&on	
  

•  Integra&on	
  of	
  major	
  open	
  data	
  catalogs	
  
•  Automated	
  provisioning	
  of	
  10.000s	
  data	
  sets	
  

•  Portal	
  for	
  search	
  and	
  explora&on	
  of	
  data	
  sets	
  
•  Rich	
  metadata	
  based	
  on	
  open	
  standards	
  
•  Both	
  descrip&ve	
  and	
  structural	
  metadata	
  

•  Integrated	
  querying	
  across	
  interlinked	
  data	
  sets	
  
•  Easy	
  to	
  use	
  queries	
  against	
  mul&ple	
  data	
  sets	
  
•  Using	
  federa&on	
  technologies	
  

•  Self-­‐service	
  UI	
  

•  Custom	
  queries	
  and	
  visualiza&ons	
  
•  Widgets,	
  dashboarding,	
  etc.	
  

WORLD BANK
Finding	
  Related	
  Data	
  Sets	
  
•  Many	
  informa&on	
  needs	
  require	
  analysis	
  of	
  mul&ple	
  data	
  sets	
  
•  Example:	
  Compare	
  and	
  correlate	
  GDP,	
  popula&on	
  and	
  public	
  debt	
  
of	
  countries	
  over	
  &me	
  
•  Task	
  of	
  finding	
  related	
  data	
  sets	
  

•  Iden&fy	
  data	
  sets	
  that	
  are	
  similar,	
  but	
  complementary	
  
•  To	
  support	
  queries	
  across	
  mul&ple	
  data	
  sets,	
  e.g.	
  in	
  the	
  form	
  of	
  joins	
  
and	
  unions	
  

•  Inspira&on:	
  Finding	
  related	
  tables	
  

•  En&ty	
  complement:	
  same	
  aVributes,	
  complemen&ng	
  en&&es	
  
•  Schema	
  complement:	
  same	
  en&&es,	
  complemen&ng	
  aVributes	
  
Finding	
  Related	
  Data	
  Sources	
  
via	
  Related	
  En&&es	
  
•  Data	
  Model:	
  Data	
  source	
  is	
  a	
  set	
  of	
  mul&ple	
  
RDF	
  graphs	
  
•  Intui&on:	
  if	
  data	
  sources	
  contain	
  similar	
  
en&&es,	
  they	
  are	
  somehow	
  related	
  
Cluster	
  2	
  
Cluster	
  1	
  
•  Approach:	
  
En&&es	
  

1.  En&ty	
  Extrac&on	
  
2.  En&ty	
  Similarity	
  
3.  En&ty	
  Clustering	
  

Related?!	
  

Source	
  1	
  

Source	
  3	
  
Source	
  2	
  
Related	
  En&&es	
  (2)	
  
1.  En&ty	
  Extrac&on	
  

–  Sample	
  over	
  en&&es	
  in	
  data	
  graphs	
  in	
  D	
  
–  For	
  each	
  en&ty	
  crawl	
  its	
  surrounding	
  sub-­‐graph	
  [1]	
  

2.  En&ty	
  Similarity	
  

–  Define	
  dissimilarity	
  measure	
  between	
  two	
  en&&es	
  
based	
  on	
  kernel	
  func&ons	
  
–  Compare	
  en&ty	
  structure	
  and	
  literals	
  via	
  different	
  
kernels	
  [2,3]	
  

3.  En&ty	
  Clustering	
  

–  Apply	
  k-­‐means	
  clustering	
  to	
  discover	
  similar	
  	
  
	
  en&&es	
  [4]	
  
Contextualisa&on	
  Score	
  
•  Contextualiza&on	
  score	
  for	
  data	
  source	
  D’’	
  
given	
  D’:	
  ec(D’’|D’)	
  and	
  sc(D’’|D’)	
  
•  En*ty	
  complement	
  score	
  

•  Schema	
  complement	
  score	
  
Search	
  for	
  Gross	
  Domes&c	
  Product	
  
Querying	
  the	
  Data	
  Set	
  
Visualizing	
  the	
  Results	
  
Queries	
  Across	
  Related	
  Data	
  Sets	
  
•  Query	
  for	
  GDP	
  of	
  Germany	
  
•  Union	
  of	
  results	
  from	
  	
  
•  Worldbank:	
  GDP	
  (current	
  US$	
  )	
  (up	
  to	
  2010)	
  
•  Eurostat:	
  GDP	
  at	
  Market	
  Prices	
  (including	
  projected	
  values	
  un&l	
  2014)	
  
Queries	
  Across	
  Related	
  Data	
  Sets	
  

Data	
  from	
  Worldbank	
  

Data	
  from	
  Eurostat	
  
Summary	
  and	
  Outlook	
  
•  Techniques	
  for	
  finding	
  related	
  data	
  sets	
  
–  Based	
  on	
  finding	
  related	
  en&&es	
  

•  Implementa&on	
  available	
  in	
  open	
  data	
  portal	
  
•  Outlook	
  

–  Finding	
  relevant	
  related	
  data	
  sources	
  for	
  a	
  given	
  
informa&on	
  need	
  
–  End	
  user	
  interfaces	
  for	
  formula&ng	
  queries	
  	
  
across	
  data	
  sets	
  (see	
  Op&que	
  project)	
  
–  Operators	
  for	
  combining	
  data	
  cubes	
  
–  Interac&ve	
  visualiza&on	
  and	
  explora&on	
  of	
  	
  
combined	
  data	
  cubes	
  (see	
  OpenCube	
  project)	
  
References	
  
[1]	
   	
  G.	
  A.	
  Grimnes,	
  P.	
  Edwards,	
  and	
  A.	
  Preece.	
  
	
  Instance	
  based	
  clustering	
  of	
  seman:c	
  web	
  
	
  resources.	
  In	
  ESWC,	
  2008.	
  
[2] 	
  U.	
  Lösch,	
  S.	
  Bloehdorn,	
  and	
  A.	
  Reenger.	
  
	
  Graph	
  kernels	
  for	
  RDF	
  data.	
  In	
  ESWC,	
  2012.	
  
[3] 	
  J.	
  Shawe-­‐Taylor	
  and	
  N.	
  Cris&anini.	
  Kernel	
  
	
  Methods	
  for	
  PaPern	
  Analysis.	
  2004.	
  
[4]	
   	
  R.	
  Zhang	
  and	
  A.	
  Rudnicky.	
  A	
  large	
  scale	
  
	
  clustering	
  scheme	
  for	
  kernel	
  k-­‐means.	
  In	
  
	
  PaVern	
  Recogni&on,	
  2002.	
  
	
  
	
  

More Related Content

What's hot

Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Michele Pasin
 
Sören Auer | Enterprise Knowledge Graphs
Sören Auer | Enterprise Knowledge GraphsSören Auer | Enterprise Knowledge Graphs
Sören Auer | Enterprise Knowledge Graphssemanticsconference
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphIoan Toma
 
A distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamA distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamEnno Meijers
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the webChiara Del Vescovo
 
DSpace standard Data model and DSpace-CRIS
DSpace standard Data model and DSpace-CRISDSpace standard Data model and DSpace-CRIS
DSpace standard Data model and DSpace-CRISAndrea Bollini
 
Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...
Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...
Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...OpenAIRE
 
6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation Slides6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation SlidesDuraSpace
 
TIB AV-Portal: Semantic Content Mining with Semi-Automatic Metadata Editing. ...
TIB AV-Portal: Semantic Content Mining with Semi-Automatic Metadata Editing. ...TIB AV-Portal: Semantic Content Mining with Semi-Automatic Metadata Editing. ...
TIB AV-Portal: Semantic Content Mining with Semi-Automatic Metadata Editing. ...LIBER Europe
 
DSpace-CRIS: a CRIS enhanced repository platform
DSpace-CRIS: a CRIS enhanced repository platformDSpace-CRIS: a CRIS enhanced repository platform
DSpace-CRIS: a CRIS enhanced repository platformAndrea Bollini
 
Session 1.6 slovak public metadata governance and management based on linke...
Session 1.6   slovak public metadata governance and management based on linke...Session 1.6   slovak public metadata governance and management based on linke...
Session 1.6 slovak public metadata governance and management based on linke...semanticsconference
 
Linked Data efforts for data standards in biopharma and healthcare
Linked Data efforts for data standards in biopharma and healthcareLinked Data efforts for data standards in biopharma and healthcare
Linked Data efforts for data standards in biopharma and healthcareKerstin Forsberg
 
Enhancing Interoperability: The Implementation of OpenAIRE Guidelines and COA...
Enhancing Interoperability: The Implementation of OpenAIRE Guidelines and COA...Enhancing Interoperability: The Implementation of OpenAIRE Guidelines and COA...
Enhancing Interoperability: The Implementation of OpenAIRE Guidelines and COA...4Science
 
Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...LIBER Europe
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Fabrizio Orlandi
 
DSpace-CRIS: new features and contribution to the DSpace mainstream
DSpace-CRIS: new features and contribution to the DSpace mainstreamDSpace-CRIS: new features and contribution to the DSpace mainstream
DSpace-CRIS: new features and contribution to the DSpace mainstreamAndrea Bollini
 
Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021Fabrizio Orlandi
 

What's hot (20)

Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...
 
Sören Auer | Enterprise Knowledge Graphs
Sören Auer | Enterprise Knowledge GraphsSören Auer | Enterprise Knowledge Graphs
Sören Auer | Enterprise Knowledge Graphs
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
 
A distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamA distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics Amsterdam
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the web
 
DSpace standard Data model and DSpace-CRIS
DSpace standard Data model and DSpace-CRISDSpace standard Data model and DSpace-CRIS
DSpace standard Data model and DSpace-CRIS
 
Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...
Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...
Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...
 
6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation Slides6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation Slides
 
TIB AV-Portal: Semantic Content Mining with Semi-Automatic Metadata Editing. ...
TIB AV-Portal: Semantic Content Mining with Semi-Automatic Metadata Editing. ...TIB AV-Portal: Semantic Content Mining with Semi-Automatic Metadata Editing. ...
TIB AV-Portal: Semantic Content Mining with Semi-Automatic Metadata Editing. ...
 
DSpace-CRIS: a CRIS enhanced repository platform
DSpace-CRIS: a CRIS enhanced repository platformDSpace-CRIS: a CRIS enhanced repository platform
DSpace-CRIS: a CRIS enhanced repository platform
 
Linked Data
Linked DataLinked Data
Linked Data
 
Session 1.6 slovak public metadata governance and management based on linke...
Session 1.6   slovak public metadata governance and management based on linke...Session 1.6   slovak public metadata governance and management based on linke...
Session 1.6 slovak public metadata governance and management based on linke...
 
Linked Data efforts for data standards in biopharma and healthcare
Linked Data efforts for data standards in biopharma and healthcareLinked Data efforts for data standards in biopharma and healthcare
Linked Data efforts for data standards in biopharma and healthcare
 
Enhancing Interoperability: The Implementation of OpenAIRE Guidelines and COA...
Enhancing Interoperability: The Implementation of OpenAIRE Guidelines and COA...Enhancing Interoperability: The Implementation of OpenAIRE Guidelines and COA...
Enhancing Interoperability: The Implementation of OpenAIRE Guidelines and COA...
 
Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...
 
Wikidata
WikidataWikidata
Wikidata
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
 
DSpace-CRIS: new features and contribution to the DSpace mainstream
DSpace-CRIS: new features and contribution to the DSpace mainstreamDSpace-CRIS: new features and contribution to the DSpace mainstream
DSpace-CRIS: new features and contribution to the DSpace mainstream
 
The CIARD RINGValeri
The CIARD RINGValeriThe CIARD RINGValeri
The CIARD RINGValeri
 
Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021
 

Similar to Discovering Related Data Sources in Data Portals

Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...AKSHAY BHAGAT
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...giuseppe_futia
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Riccardo Albertoni
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataGiorgos Santipantakis
 
A Framework for Ontology Usage Analysis
A Framework for Ontology Usage AnalysisA Framework for Ontology Usage Analysis
A Framework for Ontology Usage AnalysisJamshaid Ashraf
 
UNIT - 5: Data Warehousing and Data Mining
UNIT - 5: Data Warehousing and Data MiningUNIT - 5: Data Warehousing and Data Mining
UNIT - 5: Data Warehousing and Data MiningNandakumar P
 
Hide the Stack: Toward Usable Linked Data
Hide the Stack:Toward Usable Linked DataHide the Stack:Toward Usable Linked Data
Hide the Stack: Toward Usable Linked Dataaba-sah
 
A scalable architecture for extracting, aligning, linking, and visualizing mu...
A scalable architecture for extracting, aligning, linking, and visualizing mu...A scalable architecture for extracting, aligning, linking, and visualizing mu...
A scalable architecture for extracting, aligning, linking, and visualizing mu...Craig Knoblock
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsdgarijo
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW
 
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routingIEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routingIEEEFINALYEARSTUDENTPROJECTS
 
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
2014 IEEE JAVA DATA MINING PROJECT Keyword query routingIEEEMEMTECHSTUDENTSPROJECTS
 
Relational Database explanation with detail.pdf
Relational Database explanation with detail.pdfRelational Database explanation with detail.pdf
Relational Database explanation with detail.pdf9wldv5h8n
 
WP4: overzicht van de voortgang van WP4 op de CLARIAH-dag 22 januari 2016
WP4: overzicht van de voortgang van WP4 op de CLARIAH-dag 22 januari 2016WP4: overzicht van de voortgang van WP4 op de CLARIAH-dag 22 januari 2016
WP4: overzicht van de voortgang van WP4 op de CLARIAH-dag 22 januari 2016CLARIAH
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionSymeon Papadopoulos
 

Similar to Discovering Related Data Sources in Data Portals (20)

Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
 
Linked (Open) Data
Linked (Open) DataLinked (Open) Data
Linked (Open) Data
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
 
Unit 3 part i Data mining
Unit 3 part i Data miningUnit 3 part i Data mining
Unit 3 part i Data mining
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
A Framework for Ontology Usage Analysis
A Framework for Ontology Usage AnalysisA Framework for Ontology Usage Analysis
A Framework for Ontology Usage Analysis
 
UNIT - 5: Data Warehousing and Data Mining
UNIT - 5: Data Warehousing and Data MiningUNIT - 5: Data Warehousing and Data Mining
UNIT - 5: Data Warehousing and Data Mining
 
Hide the Stack: Toward Usable Linked Data
Hide the Stack:Toward Usable Linked DataHide the Stack:Toward Usable Linked Data
Hide the Stack: Toward Usable Linked Data
 
A scalable architecture for extracting, aligning, linking, and visualizing mu...
A scalable architecture for extracting, aligning, linking, and visualizing mu...A scalable architecture for extracting, aligning, linking, and visualizing mu...
A scalable architecture for extracting, aligning, linking, and visualizing mu...
 
At33264269
At33264269At33264269
At33264269
 
At33264269
At33264269At33264269
At33264269
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologists
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow Tutorial
 
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routingIEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
 
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
 
Relational Database explanation with detail.pdf
Relational Database explanation with detail.pdfRelational Database explanation with detail.pdf
Relational Database explanation with detail.pdf
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
 
WP4: overzicht van de voortgang van WP4 op de CLARIAH-dag 22 januari 2016
WP4: overzicht van de voortgang van WP4 op de CLARIAH-dag 22 januari 2016WP4: overzicht van de voortgang van WP4 op de CLARIAH-dag 22 januari 2016
WP4: overzicht van de voortgang van WP4 op de CLARIAH-dag 22 januari 2016
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
 

More from Peter Haase

Visual Ontology Modeling for Domain Experts and Business Users with metaphactory
Visual Ontology Modeling for Domain Experts and Business Users with metaphactoryVisual Ontology Modeling for Domain Experts and Business Users with metaphactory
Visual Ontology Modeling for Domain Experts and Business Users with metaphactoryPeter Haase
 
Hybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge GraphsHybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge GraphsPeter Haase
 
Building Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the CloudBuilding Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the CloudPeter Haase
 
Mapping, Interlinking and Exposing MusicBrainz as Linked Data
Mapping, Interlinking and Exposing MusicBrainz as Linked DataMapping, Interlinking and Exposing MusicBrainz as Linked Data
Mapping, Interlinking and Exposing MusicBrainz as Linked DataPeter Haase
 
On demand access to Big Data through Semantic Technologies
 On demand access to Big Data through Semantic Technologies On demand access to Big Data through Semantic Technologies
On demand access to Big Data through Semantic TechnologiesPeter Haase
 
Linked Data as a Service
Linked Data as a ServiceLinked Data as a Service
Linked Data as a ServicePeter Haase
 
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingFedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingPeter Haase
 
Everything Self-Service:Linked Data Applications with the Information Workbench
Everything Self-Service:Linked Data Applications with the Information WorkbenchEverything Self-Service:Linked Data Applications with the Information Workbench
Everything Self-Service:Linked Data Applications with the Information WorkbenchPeter Haase
 
The Information Workbench as a Self-Service Platform for Linked Data Applicat...
The Information Workbench as a Self-Service Platform for Linked Data Applicat...The Information Workbench as a Self-Service Platform for Linked Data Applicat...
The Information Workbench as a Self-Service Platform for Linked Data Applicat...Peter Haase
 
Cloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentCloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentPeter Haase
 
Semantic Technologies for Enterprise Cloud Management
Semantic Technologies for Enterprise Cloud ManagementSemantic Technologies for Enterprise Cloud Management
Semantic Technologies for Enterprise Cloud ManagementPeter Haase
 

More from Peter Haase (11)

Visual Ontology Modeling for Domain Experts and Business Users with metaphactory
Visual Ontology Modeling for Domain Experts and Business Users with metaphactoryVisual Ontology Modeling for Domain Experts and Business Users with metaphactory
Visual Ontology Modeling for Domain Experts and Business Users with metaphactory
 
Hybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge GraphsHybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge Graphs
 
Building Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the CloudBuilding Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the Cloud
 
Mapping, Interlinking and Exposing MusicBrainz as Linked Data
Mapping, Interlinking and Exposing MusicBrainz as Linked DataMapping, Interlinking and Exposing MusicBrainz as Linked Data
Mapping, Interlinking and Exposing MusicBrainz as Linked Data
 
On demand access to Big Data through Semantic Technologies
 On demand access to Big Data through Semantic Technologies On demand access to Big Data through Semantic Technologies
On demand access to Big Data through Semantic Technologies
 
Linked Data as a Service
Linked Data as a ServiceLinked Data as a Service
Linked Data as a Service
 
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingFedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
 
Everything Self-Service:Linked Data Applications with the Information Workbench
Everything Self-Service:Linked Data Applications with the Information WorkbenchEverything Self-Service:Linked Data Applications with the Information Workbench
Everything Self-Service:Linked Data Applications with the Information Workbench
 
The Information Workbench as a Self-Service Platform for Linked Data Applicat...
The Information Workbench as a Self-Service Platform for Linked Data Applicat...The Information Workbench as a Self-Service Platform for Linked Data Applicat...
The Information Workbench as a Self-Service Platform for Linked Data Applicat...
 
Cloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentCloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application Development
 
Semantic Technologies for Enterprise Cloud Management
Semantic Technologies for Enterprise Cloud ManagementSemantic Technologies for Enterprise Cloud Management
Semantic Technologies for Enterprise Cloud Management
 

Recently uploaded

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Recently uploaded (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Discovering Related Data Sources in Data Portals

  • 1. Discovering  Related  Data  Sources     in  Data  Portals     Andreas  Wagner,  Peter  Haase,     Achim  Re4nger,  Holger  Lamm   1st  Interna:onal  Workshop  on  Seman:c  Sta:s:cs   Sydney,  Oct  22,  2013    
  • 2. Poten&al  of  Open  (Sta&s&cs)  Data   WORLD BANK
  • 3. fluidOps  Open  Data  Portal   •  Data  collec&on   •  Integra&on  of  major  open  data  catalogs   •  Automated  provisioning  of  10.000s  data  sets   •  Portal  for  search  and  explora&on  of  data  sets   •  Rich  metadata  based  on  open  standards   •  Both  descrip&ve  and  structural  metadata   •  Integrated  querying  across  interlinked  data  sets   •  Easy  to  use  queries  against  mul&ple  data  sets   •  Using  federa&on  technologies   •  Self-­‐service  UI   •  Custom  queries  and  visualiza&ons   •  Widgets,  dashboarding,  etc.   WORLD BANK
  • 4.
  • 5. Finding  Related  Data  Sets   •  Many  informa&on  needs  require  analysis  of  mul&ple  data  sets   •  Example:  Compare  and  correlate  GDP,  popula&on  and  public  debt   of  countries  over  &me   •  Task  of  finding  related  data  sets   •  Iden&fy  data  sets  that  are  similar,  but  complementary   •  To  support  queries  across  mul&ple  data  sets,  e.g.  in  the  form  of  joins   and  unions   •  Inspira&on:  Finding  related  tables   •  En&ty  complement:  same  aVributes,  complemen&ng  en&&es   •  Schema  complement:  same  en&&es,  complemen&ng  aVributes  
  • 6. Finding  Related  Data  Sources   via  Related  En&&es   •  Data  Model:  Data  source  is  a  set  of  mul&ple   RDF  graphs   •  Intui&on:  if  data  sources  contain  similar   en&&es,  they  are  somehow  related   Cluster  2   Cluster  1   •  Approach:   En&&es   1.  En&ty  Extrac&on   2.  En&ty  Similarity   3.  En&ty  Clustering   Related?!   Source  1   Source  3   Source  2  
  • 7. Related  En&&es  (2)   1.  En&ty  Extrac&on   –  Sample  over  en&&es  in  data  graphs  in  D   –  For  each  en&ty  crawl  its  surrounding  sub-­‐graph  [1]   2.  En&ty  Similarity   –  Define  dissimilarity  measure  between  two  en&&es   based  on  kernel  func&ons   –  Compare  en&ty  structure  and  literals  via  different   kernels  [2,3]   3.  En&ty  Clustering   –  Apply  k-­‐means  clustering  to  discover  similar      en&&es  [4]  
  • 8. Contextualisa&on  Score   •  Contextualiza&on  score  for  data  source  D’’   given  D’:  ec(D’’|D’)  and  sc(D’’|D’)   •  En*ty  complement  score   •  Schema  complement  score  
  • 9.
  • 10. Search  for  Gross  Domes&c  Product  
  • 11.
  • 14. Queries  Across  Related  Data  Sets   •  Query  for  GDP  of  Germany   •  Union  of  results  from     •  Worldbank:  GDP  (current  US$  )  (up  to  2010)   •  Eurostat:  GDP  at  Market  Prices  (including  projected  values  un&l  2014)  
  • 15. Queries  Across  Related  Data  Sets   Data  from  Worldbank   Data  from  Eurostat  
  • 16. Summary  and  Outlook   •  Techniques  for  finding  related  data  sets   –  Based  on  finding  related  en&&es   •  Implementa&on  available  in  open  data  portal   •  Outlook   –  Finding  relevant  related  data  sources  for  a  given   informa&on  need   –  End  user  interfaces  for  formula&ng  queries     across  data  sets  (see  Op&que  project)   –  Operators  for  combining  data  cubes   –  Interac&ve  visualiza&on  and  explora&on  of     combined  data  cubes  (see  OpenCube  project)  
  • 17. References   [1]    G.  A.  Grimnes,  P.  Edwards,  and  A.  Preece.    Instance  based  clustering  of  seman:c  web    resources.  In  ESWC,  2008.   [2]  U.  Lösch,  S.  Bloehdorn,  and  A.  Reenger.    Graph  kernels  for  RDF  data.  In  ESWC,  2012.   [3]  J.  Shawe-­‐Taylor  and  N.  Cris&anini.  Kernel    Methods  for  PaPern  Analysis.  2004.   [4]    R.  Zhang  and  A.  Rudnicky.  A  large  scale    clustering  scheme  for  kernel  k-­‐means.  In    PaVern  Recogni&on,  2002.