SlideShare a Scribd company logo
Web Data Management
Advanced Database Presentation
By:
Navid Sedighpour
Professor :
Dr. Alireza Bagheri
Nevember 2015
1
Interest
Lack of schema
Data is unstructured or at best “semi-structured”
Missing data, additional attributes, similar data but not identical
Volatility
May confirm to one schema now, but not later
Scale
How to capture everything?
Querying Difficulty
What is the user language?
What are the primitives?
Aren’t Search Engines sufficient?
2
Fusion Tables
Users contribute data in spreadsheet
Possible joins between multiple data sets
Extensive visualization
3
More Recent Approaches to Web Querying
More Recent Approaches to Web Querying
XML
Data exchange language
Tree based structure
4
More Recent Approaches to Web Querying
RDF
W3C Recommendation
Simple, self-descriptive model
5
RDF Data Volumes
90% of world's data generated over last two years
Data are growing fast
Size almost doubling every year
6
RDF Data Volumes
March 2009 – 89 Datasets
7
RDF Data Volumes
September 2010 – 203 datasets
8
RDF Data Volumes
September 2011 – 295 Datasets
9
RDF Data Volumes
10
April 2014 – 1091 Datasets
RDF Introduction
Everything is an uniquely named resource
Prefixes can be used to shorten names
Properties of resources can be defined
Relationships with other resources can be defined
Resource description can be contributed by different people/groups and can be located anywhere
in the web
Integrated web “database”
11
RDF Data Model
Triple : Subject, Predicate (Property) , Object
Subject : The entity that is described (URI or Blank Node)
Predicate : a feature of the entity
Object : value of the feature
Set of RDF Triples is called “RDF Graph”
12
RDF Example Instance
13
RDF Graph
14
SPARQL Queries
15
Naïve Triple Store Design
16
17
Naïve Triple Store Design
Easy to Implement
But
Too Many self-joins
Property Tables
Grouping by Entities
Types :
Clustered Property Tables
Property Class Tables
18
Clustered Property Tables
Group together the properties that tend to occur in the same (or similar) subjects
19
Property Class Tables
Cluster the subjects with the same type of property into one property table
20
Property Tables
Advantages :
Fewer Joins
Disadvantages :
Lots of NULLs
Clustering is not trivial
Multi-valued properties are complicated
21
Binary Tables
Grouping by Properties: for each property build a two column table containing both subject and
object, ordered by subjects
Also called “Vertically Partitioned Approach”
N two column tables (n is the number of unique properties in the data)
22
Binary Tables
Advantages :
Support multi-valued Properties
No NULLs
No Clustering
Good performance for subject-subject joins
Disadvantages:
Not useful for subject-subject joins
Expensive inserts
23
Graph-Based Approach
Answering SPARQL query = Subgraph Matching
gStore
24
Two steps need to be done :
1. For each node of Q* get the lists of nodes in G* that include that node
2. Do a multi-way join to get the candidate list
Alternatives :
Sequential scan of G*
 Both steps are inefficient
S-Tree
 Height Balanced Tree over signatures
 Run an inclusion query for each node of Q* and get lists of nodes in G* that include that node (q & s = q)
VS-Tree
 Support both steps efficiently
 Grouping by vertices
25
Graph-Based Approach
S-Tree
26
Pruning
S-Tree
27
S-Tree
28
S-Tree
29
S-Tree
30
VS-Tree
31
VS-Tree
32
Conclusion
RDF Data seem to have considerable promise for web data management
We talked about four approaches to web data management including Naïve triple store design,
Property Tables, Binary Tables and Graph-Based approach
VS-Tree has the best performance in Graph-Base approaches
gStore is more efficient than other approaches
33
References
34
[1] D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach, "Scalable semantic web data
management using vertical partitioning," in Proceedings of the 33rd international conference on Very large
data bases, 2007, pp. 411-422.
[2] L. Zou, J. Mo, L. Chen, M. T. Özsu, and D. Zhao, "gStore: answering SPARQL queries via
subgraph matching," Proceedings of the VLDB Endowment, vol. 4, pp. 482-493, 2011.
[3] L. Zou, M. T. Özsu, L. Chen, X. Shen, R. Huang, and D. Zhao, "gStore: a graph-based SPARQL
query engine," The VLDB Journal—The International Journal on Very Large Data Bases, vol. 23, pp. 565-
590, 2014.
[4] X. Shen, L. Zou, M. T. Ozsu, L. Chen, Y. Li, S. Han, et al., "A Graph-based RDF Triple Store."
35

More Related Content

What's hot

Semantic Web related top conference review
Semantic Web related top conference reviewSemantic Web related top conference review
Semantic Web related top conference review
Gong Cheng
 
Ephedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federationEphedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federation
Peter Haase
 
Analytics and Access to the UK web archive
Analytics and Access to the UK web archiveAnalytics and Access to the UK web archive
Analytics and Access to the UK web archive
Lewis Crawford
 
QB'er demonstration
QB'er demonstrationQB'er demonstration
QB'er demonstration
CLARIAH
 
Wikidata
WikidataWikidata
Wikidata
Anja Jentzsch
 
Talis Platform: A Linked Data Engine
Talis Platform: A Linked Data EngineTalis Platform: A Linked Data Engine
Talis Platform: A Linked Data Engine
Leigh Dodds
 
Mining a Large Web Corpus
Mining a Large Web CorpusMining a Large Web Corpus
Mining a Large Web Corpus
Robert Meusel
 
COOL-WD: A Completeness Tool for Wikidata
COOL-WD: A Completeness Tool for WikidataCOOL-WD: A Completeness Tool for Wikidata
COOL-WD: A Completeness Tool for Wikidata
Fariz Darari
 
Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data
Asuncion Gomez-Perez
 
Pandas
PandasPandas
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Fabrizio Orlandi
 
Rdf and open linked data a first approach
Rdf and open linked data a first approach Rdf and open linked data a first approach
Rdf and open linked data a first approach
@CULT Srl
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologists
dgarijo
 
DataTables view CKAN monthly live
DataTables view   CKAN monthly liveDataTables view   CKAN monthly live
DataTables view CKAN monthly live
Joel Natividad
 
Linked Data
Linked DataLinked Data
Linked Data
Anja Jentzsch
 
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioDo it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Open Knowledge Belgium
 
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...
Robert Meusel
 
Let your data shine... with OpenRefine
Let your data shine... with OpenRefineLet your data shine... with OpenRefine
Let your data shine... with OpenRefine
Open Knowledge Belgium
 
ODI Summit 2016 - Linked Open Data at Springer Nature
ODI Summit 2016 - Linked Open Data at Springer NatureODI Summit 2016 - Linked Open Data at Springer Nature
ODI Summit 2016 - Linked Open Data at Springer Nature
Michele Pasin
 
Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021
Fabrizio Orlandi
 

What's hot (20)

Semantic Web related top conference review
Semantic Web related top conference reviewSemantic Web related top conference review
Semantic Web related top conference review
 
Ephedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federationEphedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federation
 
Analytics and Access to the UK web archive
Analytics and Access to the UK web archiveAnalytics and Access to the UK web archive
Analytics and Access to the UK web archive
 
QB'er demonstration
QB'er demonstrationQB'er demonstration
QB'er demonstration
 
Wikidata
WikidataWikidata
Wikidata
 
Talis Platform: A Linked Data Engine
Talis Platform: A Linked Data EngineTalis Platform: A Linked Data Engine
Talis Platform: A Linked Data Engine
 
Mining a Large Web Corpus
Mining a Large Web CorpusMining a Large Web Corpus
Mining a Large Web Corpus
 
COOL-WD: A Completeness Tool for Wikidata
COOL-WD: A Completeness Tool for WikidataCOOL-WD: A Completeness Tool for Wikidata
COOL-WD: A Completeness Tool for Wikidata
 
Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data
 
Pandas
PandasPandas
Pandas
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
 
Rdf and open linked data a first approach
Rdf and open linked data a first approach Rdf and open linked data a first approach
Rdf and open linked data a first approach
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologists
 
DataTables view CKAN monthly live
DataTables view   CKAN monthly liveDataTables view   CKAN monthly live
DataTables view CKAN monthly live
 
Linked Data
Linked DataLinked Data
Linked Data
 
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioDo it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
 
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...
 
Let your data shine... with OpenRefine
Let your data shine... with OpenRefineLet your data shine... with OpenRefine
Let your data shine... with OpenRefine
 
ODI Summit 2016 - Linked Open Data at Springer Nature
ODI Summit 2016 - Linked Open Data at Springer NatureODI Summit 2016 - Linked Open Data at Springer Nature
ODI Summit 2016 - Linked Open Data at Springer Nature
 
Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021
 

Similar to Scalable Web Data Management using RDF

NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
Carlos Alberto Benitez
 
No sql – rise of the clusters
No sql – rise of the clustersNo sql – rise of the clusters
No sql – rise of the clusters
responseteam
 
Selecting the right database type for your knowledge management needs.
Selecting the right database type for your knowledge management needs.Selecting the right database type for your knowledge management needs.
Selecting the right database type for your knowledge management needs.
Synaptica, LLC
 
NO SQL Databases, Big Data and the cloud
NO SQL Databases, Big Data and the cloudNO SQL Databases, Big Data and the cloud
NO SQL Databases, Big Data and the cloud
Manu Cohen-Yashar
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
Sören Auer
 
No sql databases
No sql databasesNo sql databases
No sql databases
Walaa Hamdy Assy
 
NoSQL_Databases
NoSQL_DatabasesNoSQL_Databases
NoSQL_Databases
Rick Perry
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
Suvradeep Rudra
 
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4JOUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
ijcsity
 
Introducción a NoSQL
Introducción a NoSQLIntroducción a NoSQL
Introducción a NoSQL
MongoDB
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
Dimitar Danailov
 
Oslo baksia2014
Oslo baksia2014Oslo baksia2014
Oslo baksia2014
Max Neunhöffer
 
Graph database in sv meetup
Graph database in sv meetupGraph database in sv meetup
Graph database in sv meetup
Joshua Bae
 
Spring Data Neo4j Intro SpringOne 2011
Spring Data Neo4j Intro SpringOne 2011Spring Data Neo4j Intro SpringOne 2011
Spring Data Neo4j Intro SpringOne 2011
jexp
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
Ram kumar
 
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGEVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
ijiert bestjournal
 
Nosql
NosqlNosql
Graph Databases
Graph DatabasesGraph Databases
Graph Databases
Girish Khanzode
 
Nosql
NosqlNosql
NOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfNOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdf
ajajkhan16
 

Similar to Scalable Web Data Management using RDF (20)

NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
 
No sql – rise of the clusters
No sql – rise of the clustersNo sql – rise of the clusters
No sql – rise of the clusters
 
Selecting the right database type for your knowledge management needs.
Selecting the right database type for your knowledge management needs.Selecting the right database type for your knowledge management needs.
Selecting the right database type for your knowledge management needs.
 
NO SQL Databases, Big Data and the cloud
NO SQL Databases, Big Data and the cloudNO SQL Databases, Big Data and the cloud
NO SQL Databases, Big Data and the cloud
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
NoSQL_Databases
NoSQL_DatabasesNoSQL_Databases
NoSQL_Databases
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4JOUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
 
Introducción a NoSQL
Introducción a NoSQLIntroducción a NoSQL
Introducción a NoSQL
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Oslo baksia2014
Oslo baksia2014Oslo baksia2014
Oslo baksia2014
 
Graph database in sv meetup
Graph database in sv meetupGraph database in sv meetup
Graph database in sv meetup
 
Spring Data Neo4j Intro SpringOne 2011
Spring Data Neo4j Intro SpringOne 2011Spring Data Neo4j Intro SpringOne 2011
Spring Data Neo4j Intro SpringOne 2011
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
 
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGEVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
 
Nosql
NosqlNosql
Nosql
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databases
 
Nosql
NosqlNosql
Nosql
 
NOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfNOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdf
 

Recently uploaded

一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
aguty
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
GeorgiiSteshenko
 
Salesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - CanariasSalesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - Canarias
davidpietrzykowski1
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
eudsoh
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
actyx
 
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
PsychoTech Services
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
Vineet
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 
Digital Marketing Performance Marketing Sample .pdf
Digital Marketing Performance Marketing  Sample .pdfDigital Marketing Performance Marketing  Sample .pdf
Digital Marketing Performance Marketing Sample .pdf
Vineet
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
nhutnguyen355078
 
SAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content DocumentSAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content Document
newdirectionconsulta
 
一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
ugydym
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
keesa2
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
Vietnam Cotton & Spinning Association
 
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
Call Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call GirlCall Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call Girl
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
sapna sharmap11
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
Bisnar Chase Personal Injury Attorneys
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdfNamma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
22ad0301
 
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
zoykygu
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
oaxefes
 

Recently uploaded (20)

一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
 
Salesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - CanariasSalesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - Canarias
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
 
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 
Digital Marketing Performance Marketing Sample .pdf
Digital Marketing Performance Marketing  Sample .pdfDigital Marketing Performance Marketing  Sample .pdf
Digital Marketing Performance Marketing Sample .pdf
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
 
SAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content DocumentSAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content Document
 
一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
 
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
Call Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call GirlCall Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call Girl
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdfNamma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
 
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
一比一原版(heriotwatt学位证书)英国赫瑞瓦特大学毕业证如何办理
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
 

Scalable Web Data Management using RDF

  • 1. Web Data Management Advanced Database Presentation By: Navid Sedighpour Professor : Dr. Alireza Bagheri Nevember 2015 1
  • 2. Interest Lack of schema Data is unstructured or at best “semi-structured” Missing data, additional attributes, similar data but not identical Volatility May confirm to one schema now, but not later Scale How to capture everything? Querying Difficulty What is the user language? What are the primitives? Aren’t Search Engines sufficient? 2
  • 3. Fusion Tables Users contribute data in spreadsheet Possible joins between multiple data sets Extensive visualization 3 More Recent Approaches to Web Querying
  • 4. More Recent Approaches to Web Querying XML Data exchange language Tree based structure 4
  • 5. More Recent Approaches to Web Querying RDF W3C Recommendation Simple, self-descriptive model 5
  • 6. RDF Data Volumes 90% of world's data generated over last two years Data are growing fast Size almost doubling every year 6
  • 7. RDF Data Volumes March 2009 – 89 Datasets 7
  • 8. RDF Data Volumes September 2010 – 203 datasets 8
  • 9. RDF Data Volumes September 2011 – 295 Datasets 9
  • 10. RDF Data Volumes 10 April 2014 – 1091 Datasets
  • 11. RDF Introduction Everything is an uniquely named resource Prefixes can be used to shorten names Properties of resources can be defined Relationships with other resources can be defined Resource description can be contributed by different people/groups and can be located anywhere in the web Integrated web “database” 11
  • 12. RDF Data Model Triple : Subject, Predicate (Property) , Object Subject : The entity that is described (URI or Blank Node) Predicate : a feature of the entity Object : value of the feature Set of RDF Triples is called “RDF Graph” 12
  • 16. Naïve Triple Store Design 16
  • 17. 17 Naïve Triple Store Design Easy to Implement But Too Many self-joins
  • 18. Property Tables Grouping by Entities Types : Clustered Property Tables Property Class Tables 18
  • 19. Clustered Property Tables Group together the properties that tend to occur in the same (or similar) subjects 19
  • 20. Property Class Tables Cluster the subjects with the same type of property into one property table 20
  • 21. Property Tables Advantages : Fewer Joins Disadvantages : Lots of NULLs Clustering is not trivial Multi-valued properties are complicated 21
  • 22. Binary Tables Grouping by Properties: for each property build a two column table containing both subject and object, ordered by subjects Also called “Vertically Partitioned Approach” N two column tables (n is the number of unique properties in the data) 22
  • 23. Binary Tables Advantages : Support multi-valued Properties No NULLs No Clustering Good performance for subject-subject joins Disadvantages: Not useful for subject-subject joins Expensive inserts 23
  • 24. Graph-Based Approach Answering SPARQL query = Subgraph Matching gStore 24
  • 25. Two steps need to be done : 1. For each node of Q* get the lists of nodes in G* that include that node 2. Do a multi-way join to get the candidate list Alternatives : Sequential scan of G*  Both steps are inefficient S-Tree  Height Balanced Tree over signatures  Run an inclusion query for each node of Q* and get lists of nodes in G* that include that node (q & s = q) VS-Tree  Support both steps efficiently  Grouping by vertices 25 Graph-Based Approach
  • 33. Conclusion RDF Data seem to have considerable promise for web data management We talked about four approaches to web data management including Naïve triple store design, Property Tables, Binary Tables and Graph-Based approach VS-Tree has the best performance in Graph-Base approaches gStore is more efficient than other approaches 33
  • 34. References 34 [1] D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach, "Scalable semantic web data management using vertical partitioning," in Proceedings of the 33rd international conference on Very large data bases, 2007, pp. 411-422. [2] L. Zou, J. Mo, L. Chen, M. T. Özsu, and D. Zhao, "gStore: answering SPARQL queries via subgraph matching," Proceedings of the VLDB Endowment, vol. 4, pp. 482-493, 2011. [3] L. Zou, M. T. Özsu, L. Chen, X. Shen, R. Huang, and D. Zhao, "gStore: a graph-based SPARQL query engine," The VLDB Journal—The International Journal on Very Large Data Bases, vol. 23, pp. 565- 590, 2014. [4] X. Shen, L. Zou, M. T. Ozsu, L. Chen, Y. Li, S. Han, et al., "A Graph-based RDF Triple Store."
  • 35. 35