SlideShare a Scribd company logo
How redundant is it? – An empirical analysis on linked datasets 
Honghan Wu1, Boris Villazon-Terrazas2, Jeff Z. Pan1 and José Manuel Gómez Pérez2 
University of Aberdeen1, UK 
iSOCO2 , Spain 
20/10/2014 1
2 
Content 
• 
What is data redundancy with linked data? 
• 
Why is it of special interest to linked data consumption? 
• 
Linked Data redundancy categorisation 
• 
How to analysis? 
• 
Dataset selection & The Result 
• 
Conclusion
3 
What is the data redundancy in LD? 
• 
Data Redundancy 
– 
[Database systems] Same piece of data in multiple places 
– 
[Information theory] Wasted "space" used to transmit certain data 
• 
(In this work)Linked Data Redundancy 
– 
Wasted “space” to represent certain meaning (represented in certain semantics) 
– 
Duplication-free
4 
Why is it of special interest to LD consumption? 
• 
Bad Redundancy & Good Redundancy 
– 
Bad for exchange: storage, transmission 
– 
Good for inference computation 
• 
Relevant consumption tasks 
– 
Hosting/Sharing 
– 
Query Answering (SPARQL) 
– 
Ontology Based Data Access 
– 
Reasoning
Redundancy in Linked Data 
• 
Redundancy Categorisation for RDF Data 
• 
Redundancies caused by the “Linked” nature
6 
RDF Redundancies vs. Succinct Representations 
[Rule based] A. K. Joshi, P. Hitzler, and G. Dong. Logical linked data compression. In The Semantic Web: Semantics and Big Data, pages 170–184. Springer, 2013. 
[HDT]J. D. FernáNdez, M. A. MartíNez-Prieto, C. GutiéRrez, A. Polleres, and M. Arias. Binary rdf representation for publication and exchange (hdt). Web Semant., 19:22–41, Mar. 2013. 
[WaterFowl] O. Curé, G. Blin, D. Revuz, and D. C. Faye. Waterfowl: A compact, self-indexed and inference-enabled immutable rdf store. In The Semantic Web: Trends and Challenges, pages 302– 316. Springer, 2014. 
Pan, Jeff Z., Jose Manuel Gomez-Perez, Yuan Ren, Honghan Wu, Haofen Wang and Man Zhu. “Graph Pattern based RDF Data Compression”. In Proc. of 4th Joint International Semantic Technology Conference (JIST). 2014. (To appear)
7 
Semantic redundancy 
Rule Representation 
- 
DL Axioms (T-Box) 
- 
Other semantics (graph pattern substitution)
8 
Syntactic Redundancy 
Concise syntax 
- 
RDF abbreviation & striping syntax 
- 
Intra-structure & Inter- structure
9 
Symbolic Redundancy 
• 
http://xmlns.com/foaf/0.1/name 
– 
31 bytes in ASCII 
URI 
ID (4 bytes) 
… 
… 
http://xmlns.com/foaf/0.1/name 
128 
… 
… 
Less bytes for basic data units 
- 
(Fix-length)Dictionary Based 
- 
(Variable-length) Huffman coding 
- 
Predictive encoding
10 
Semantic Redundancy Caused by “Linked” Nature 
• 
Vocabulary Linkage 
– 
Reuse of other vocabularies: more rules 
– 
Less redundancy ratio: more triples derivable 
– 
More redundancy: co-occurrence triples removable 
• 
Instance Linkage 
– 
sameAs linkages 
– 
Bring in new assertions (e.g., type assertions) 
– 
Bring in new axioms
How to analysis? 
• 
Two dimension analysis 
• 
Methodology 
• 
Metrics
12 
Two dimension analysis 
Semantic 
Syntactic 
Symbolic 
A-Box 
✔ 
✔ 
A-Box & T-Box 
No Linkage 
✔ 
- 
- 
T-Box Reuse 
✔ 
- 
- 
A-Box Linkage 
- 
- 
RDF Redundancy Dimension 
Linked Semantic 
Dimension
13 
Methodology: EDP Summarisation
14 
Virtually Materialised A-Box: expanded EDP 
A1, B1 (1) 
A2, B2 (1) 
A-Box: A1(o1) B1(o1) A2(o2) B2(o2) R(o1, o2) 
T-Box: A1⊆A, A2⊆A, B1⊆B, B2⊆B 
R (1:1) 
A, B, 
A, B,
Linked Dataset Analysis Results 
• 
Dataset Selection & Summary 
• 
Analysis Results
16 
Dataset Selection and Summary 
LOD 2011
17 
A-Box Only: Semantic Redundancies 
– Redundant Triples 
– Semantic redundancy ratio, i.e. 
– # Graph Patterns used to substitute redundant triples
18 
A-Box Only: Syntactic Redundancies 
– the redundant resource occurrences of inter-structural 
redundancies 
– the syntactic redundancy ratio, i.e.
19 
A-Box & T-Box: No Linkage 
DBLP2013: SWRC ontology 
Ordnance Survey: official published OS ontology 
1.7% 
184% 
108% 
4.7%
20 
A-Box & T-Box: No Linkage 
First 3 datasts are reusing FOAF Ontology 
– the number of directly used terms from reused T-Box 
– the number of applicable axioms from (materialised) reused T-Box 
26.9% 
4% 
45.4% 
1.3%
21 
Conclusion 
• 
LOD redundancy are heterogeneous & huge 
• 
Vocabulary linkage might lead to huge number of derivable triples 
• 
Redundancy aware techniques are demanded
22 
Redundancy-aware Consumption 
• 
Compression: different redundancies might need different techniques 
• 
For Data Access: (high inter-structure redundancy) skewed entity distributions over EDPs -> efficient access? 
• 
OBDA/Reasoning: A-Box redundancy = less T-Box axioms 
• 
Data Publisher: should be aware of the consequences of reusing
Thanks! Q & A

More Related Content

What's hot

Zhishi.me - Weaving Chinese Linking Open Data
Zhishi.me - Weaving Chinese Linking Open DataZhishi.me - Weaving Chinese Linking Open Data
Zhishi.me - Weaving Chinese Linking Open Data
Xing Niu
 
The Maze of Deletion in Ontology Stream Reasoning
The Maze of Deletion in Ontology Stream Reasoning The Maze of Deletion in Ontology Stream Reasoning
The Maze of Deletion in Ontology Stream Reasoning
Jeff Z. Pan
 
Towards the implementation of a refined data model for a Zulu machine-readabl...
Towards the implementation of a refined data model for a Zulu machine-readabl...Towards the implementation of a refined data model for a Zulu machine-readabl...
Towards the implementation of a refined data model for a Zulu machine-readabl...
Guy De Pauw
 
3. Stack - Data Structures using C++ by Varsha Patil
3. Stack - Data Structures using C++ by Varsha Patil3. Stack - Data Structures using C++ by Varsha Patil
3. Stack - Data Structures using C++ by Varsha Patil
widespreadpromotion
 
6. Linked list - Data Structures using C++ by Varsha Patil
6. Linked list - Data Structures using C++ by Varsha Patil6. Linked list - Data Structures using C++ by Varsha Patil
6. Linked list - Data Structures using C++ by Varsha Patil
widespreadpromotion
 
13. Indexing MTrees - Data Structures using C++ by Varsha Patil
13. Indexing MTrees - Data Structures using C++ by Varsha Patil13. Indexing MTrees - Data Structures using C++ by Varsha Patil
13. Indexing MTrees - Data Structures using C++ by Varsha Patil
widespreadpromotion
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
WU (Vienna University of Economics and Business)
 
Effective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch AlgorithmEffective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch Algorithm
IRJET Journal
 
Efficient RDF Interchange (ERI) Format for RDF Data Streams
Efficient RDF Interchange (ERI) Format for RDF Data StreamsEfficient RDF Interchange (ERI) Format for RDF Data Streams
Efficient RDF Interchange (ERI) Format for RDF Data Streams
WU (Vienna University of Economics and Business)
 
A hierarchical approach for semi structured document indexing and
A hierarchical approach for semi structured document indexing andA hierarchical approach for semi structured document indexing and
A hierarchical approach for semi structured document indexing and
Ibrahim Bounhas
 
5. Queue - Data Structures using C++ by Varsha Patil
5. Queue - Data Structures using C++ by Varsha Patil5. Queue - Data Structures using C++ by Varsha Patil
5. Queue - Data Structures using C++ by Varsha Patil
widespreadpromotion
 
On the Way to a Holding Ontology
On the Way to a Holding OntologyOn the Way to a Holding Ontology
On the Way to a Holding Ontology
Jakob .
 
SQL For PHP Programmers
SQL For PHP ProgrammersSQL For PHP Programmers
SQL For PHP Programmers
Dave Stokes
 
Contexts and Importing in RDF
Contexts and Importing in RDFContexts and Importing in RDF
Contexts and Importing in RDF
Jie Bao
 
1. Fundamental Concept - Data Structures using C++ by Varsha Patil
1. Fundamental Concept - Data Structures using C++ by Varsha Patil1. Fundamental Concept - Data Structures using C++ by Varsha Patil
1. Fundamental Concept - Data Structures using C++ by Varsha Patil
widespreadpromotion
 
normalization
normalizationnormalization
10. Search Tree - Data Structures using C++ by Varsha Patil
10. Search Tree - Data Structures using C++ by Varsha Patil10. Search Tree - Data Structures using C++ by Varsha Patil
10. Search Tree - Data Structures using C++ by Varsha Patil
widespreadpromotion
 
7. Tree - Data Structures using C++ by Varsha Patil
7. Tree - Data Structures using C++ by Varsha Patil7. Tree - Data Structures using C++ by Varsha Patil
7. Tree - Data Structures using C++ by Varsha Patil
widespreadpromotion
 
Introduction to LDL 2012
Introduction to LDL 2012Introduction to LDL 2012
Introduction to LDL 2012
Sebastian Hellmann
 
14. Files - Data Structures using C++ by Varsha Patil
14. Files - Data Structures using C++ by Varsha Patil14. Files - Data Structures using C++ by Varsha Patil
14. Files - Data Structures using C++ by Varsha Patil
widespreadpromotion
 

What's hot (20)

Zhishi.me - Weaving Chinese Linking Open Data
Zhishi.me - Weaving Chinese Linking Open DataZhishi.me - Weaving Chinese Linking Open Data
Zhishi.me - Weaving Chinese Linking Open Data
 
The Maze of Deletion in Ontology Stream Reasoning
The Maze of Deletion in Ontology Stream Reasoning The Maze of Deletion in Ontology Stream Reasoning
The Maze of Deletion in Ontology Stream Reasoning
 
Towards the implementation of a refined data model for a Zulu machine-readabl...
Towards the implementation of a refined data model for a Zulu machine-readabl...Towards the implementation of a refined data model for a Zulu machine-readabl...
Towards the implementation of a refined data model for a Zulu machine-readabl...
 
3. Stack - Data Structures using C++ by Varsha Patil
3. Stack - Data Structures using C++ by Varsha Patil3. Stack - Data Structures using C++ by Varsha Patil
3. Stack - Data Structures using C++ by Varsha Patil
 
6. Linked list - Data Structures using C++ by Varsha Patil
6. Linked list - Data Structures using C++ by Varsha Patil6. Linked list - Data Structures using C++ by Varsha Patil
6. Linked list - Data Structures using C++ by Varsha Patil
 
13. Indexing MTrees - Data Structures using C++ by Varsha Patil
13. Indexing MTrees - Data Structures using C++ by Varsha Patil13. Indexing MTrees - Data Structures using C++ by Varsha Patil
13. Indexing MTrees - Data Structures using C++ by Varsha Patil
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
Effective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch AlgorithmEffective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch Algorithm
 
Efficient RDF Interchange (ERI) Format for RDF Data Streams
Efficient RDF Interchange (ERI) Format for RDF Data StreamsEfficient RDF Interchange (ERI) Format for RDF Data Streams
Efficient RDF Interchange (ERI) Format for RDF Data Streams
 
A hierarchical approach for semi structured document indexing and
A hierarchical approach for semi structured document indexing andA hierarchical approach for semi structured document indexing and
A hierarchical approach for semi structured document indexing and
 
5. Queue - Data Structures using C++ by Varsha Patil
5. Queue - Data Structures using C++ by Varsha Patil5. Queue - Data Structures using C++ by Varsha Patil
5. Queue - Data Structures using C++ by Varsha Patil
 
On the Way to a Holding Ontology
On the Way to a Holding OntologyOn the Way to a Holding Ontology
On the Way to a Holding Ontology
 
SQL For PHP Programmers
SQL For PHP ProgrammersSQL For PHP Programmers
SQL For PHP Programmers
 
Contexts and Importing in RDF
Contexts and Importing in RDFContexts and Importing in RDF
Contexts and Importing in RDF
 
1. Fundamental Concept - Data Structures using C++ by Varsha Patil
1. Fundamental Concept - Data Structures using C++ by Varsha Patil1. Fundamental Concept - Data Structures using C++ by Varsha Patil
1. Fundamental Concept - Data Structures using C++ by Varsha Patil
 
normalization
normalizationnormalization
normalization
 
10. Search Tree - Data Structures using C++ by Varsha Patil
10. Search Tree - Data Structures using C++ by Varsha Patil10. Search Tree - Data Structures using C++ by Varsha Patil
10. Search Tree - Data Structures using C++ by Varsha Patil
 
7. Tree - Data Structures using C++ by Varsha Patil
7. Tree - Data Structures using C++ by Varsha Patil7. Tree - Data Structures using C++ by Varsha Patil
7. Tree - Data Structures using C++ by Varsha Patil
 
Introduction to LDL 2012
Introduction to LDL 2012Introduction to LDL 2012
Introduction to LDL 2012
 
14. Files - Data Structures using C++ by Varsha Patil
14. Files - Data Structures using C++ by Varsha Patil14. Files - Data Structures using C++ by Varsha Patil
14. Files - Data Structures using C++ by Varsha Patil
 

Viewers also liked

Introduction to Data Mining for Newbies
Introduction to Data Mining for NewbiesIntroduction to Data Mining for Newbies
Introduction to Data Mining for Newbies
Eunjeong (Lucy) Park
 
Jonanthan Leopard's Visual Resume
Jonanthan Leopard's Visual ResumeJonanthan Leopard's Visual Resume
Jonanthan Leopard's Visual Resume
jonleopard
 
Experimental investigation of effectiveness of heat wheel as a rotory heat ex...
Experimental investigation of effectiveness of heat wheel as a rotory heat ex...Experimental investigation of effectiveness of heat wheel as a rotory heat ex...
Experimental investigation of effectiveness of heat wheel as a rotory heat ex...
eSAT Publishing House
 
Design of a usb based data acquisition system
Design of a usb based data acquisition systemDesign of a usb based data acquisition system
Design of a usb based data acquisition system
eSAT Publishing House
 
Performance bounds for unequally punctured
Performance bounds for unequally puncturedPerformance bounds for unequally punctured
Performance bounds for unequally punctured
eSAT Publishing House
 
A comprehensive survey on security issues in cloud computing and data privacy...
A comprehensive survey on security issues in cloud computing and data privacy...A comprehensive survey on security issues in cloud computing and data privacy...
A comprehensive survey on security issues in cloud computing and data privacy...
eSAT Publishing House
 
IVT Företagspresentation
IVT FöretagspresentationIVT Företagspresentation
IVT Företagspresentation
IVT Center Hjelms Rör
 
Implementation of delay measurement technique using signature register for sm...
Implementation of delay measurement technique using signature register for sm...Implementation of delay measurement technique using signature register for sm...
Implementation of delay measurement technique using signature register for sm...
eSAT Publishing House
 
Загадки о животных
Загадки о животныхЗагадки о животных
Загадки о животных
drugsem
 
Solentive / InRule AADI Gartner Summit 2014
Solentive / InRule AADI Gartner Summit 2014Solentive / InRule AADI Gartner Summit 2014
Solentive / InRule AADI Gartner Summit 2014
Solentive
 
Космическое фотопутешествие с телескопом хаббл
Космическое фотопутешествие с телескопом хабблКосмическое фотопутешествие с телескопом хаббл
Космическое фотопутешествие с телескопом хаббл
drugsem
 
NBPC 1613 San Diego, CA Proposed 2014 bylaws draft_july_24_unanimous_consensu...
NBPC 1613 San Diego, CA Proposed 2014 bylaws draft_july_24_unanimous_consensu...NBPC 1613 San Diego, CA Proposed 2014 bylaws draft_july_24_unanimous_consensu...
NBPC 1613 San Diego, CA Proposed 2014 bylaws draft_july_24_unanimous_consensu...
NBPCSanDiego
 
A language independent web data extraction using vision based page segmentati...
A language independent web data extraction using vision based page segmentati...A language independent web data extraction using vision based page segmentati...
A language independent web data extraction using vision based page segmentati...
eSAT Publishing House
 
Road map of development for pull system in thailand small and medium automoti...
Road map of development for pull system in thailand small and medium automoti...Road map of development for pull system in thailand small and medium automoti...
Road map of development for pull system in thailand small and medium automoti...
eSAT Publishing House
 
Study of protein content and effect of p h variation on solubility of seed pr...
Study of protein content and effect of p h variation on solubility of seed pr...Study of protein content and effect of p h variation on solubility of seed pr...
Study of protein content and effect of p h variation on solubility of seed pr...
eSAT Publishing House
 
Ga based dynamic routing in wdm optical networks
Ga based dynamic routing in wdm optical networksGa based dynamic routing in wdm optical networks
Ga based dynamic routing in wdm optical networks
eSAT Publishing House
 
Andrea paola duran 11- 03 trabajo
Andrea paola duran 11- 03 trabajoAndrea paola duran 11- 03 trabajo
Andrea paola duran 11- 03 trabajo
Andrea Duran ʚïɞ
 
Hardware cristian villavicencio 1
Hardware cristian villavicencio 1Hardware cristian villavicencio 1
Hardware cristian villavicencio 1
cristianlukas
 
Kaoru.K_portfolio_ADV124
Kaoru.K_portfolio_ADV124Kaoru.K_portfolio_ADV124
Kaoru.K_portfolio_ADV124
Kaoru Kishigami
 

Viewers also liked (20)

Introduction to Data Mining for Newbies
Introduction to Data Mining for NewbiesIntroduction to Data Mining for Newbies
Introduction to Data Mining for Newbies
 
Jonanthan Leopard's Visual Resume
Jonanthan Leopard's Visual ResumeJonanthan Leopard's Visual Resume
Jonanthan Leopard's Visual Resume
 
Experimental investigation of effectiveness of heat wheel as a rotory heat ex...
Experimental investigation of effectiveness of heat wheel as a rotory heat ex...Experimental investigation of effectiveness of heat wheel as a rotory heat ex...
Experimental investigation of effectiveness of heat wheel as a rotory heat ex...
 
Design of a usb based data acquisition system
Design of a usb based data acquisition systemDesign of a usb based data acquisition system
Design of a usb based data acquisition system
 
Performance bounds for unequally punctured
Performance bounds for unequally puncturedPerformance bounds for unequally punctured
Performance bounds for unequally punctured
 
A comprehensive survey on security issues in cloud computing and data privacy...
A comprehensive survey on security issues in cloud computing and data privacy...A comprehensive survey on security issues in cloud computing and data privacy...
A comprehensive survey on security issues in cloud computing and data privacy...
 
IVT Företagspresentation
IVT FöretagspresentationIVT Företagspresentation
IVT Företagspresentation
 
Implementation of delay measurement technique using signature register for sm...
Implementation of delay measurement technique using signature register for sm...Implementation of delay measurement technique using signature register for sm...
Implementation of delay measurement technique using signature register for sm...
 
Загадки о животных
Загадки о животныхЗагадки о животных
Загадки о животных
 
Solentive / InRule AADI Gartner Summit 2014
Solentive / InRule AADI Gartner Summit 2014Solentive / InRule AADI Gartner Summit 2014
Solentive / InRule AADI Gartner Summit 2014
 
Космическое фотопутешествие с телескопом хаббл
Космическое фотопутешествие с телескопом хабблКосмическое фотопутешествие с телескопом хаббл
Космическое фотопутешествие с телескопом хаббл
 
NBPC 1613 San Diego, CA Proposed 2014 bylaws draft_july_24_unanimous_consensu...
NBPC 1613 San Diego, CA Proposed 2014 bylaws draft_july_24_unanimous_consensu...NBPC 1613 San Diego, CA Proposed 2014 bylaws draft_july_24_unanimous_consensu...
NBPC 1613 San Diego, CA Proposed 2014 bylaws draft_july_24_unanimous_consensu...
 
A language independent web data extraction using vision based page segmentati...
A language independent web data extraction using vision based page segmentati...A language independent web data extraction using vision based page segmentati...
A language independent web data extraction using vision based page segmentati...
 
Ar
ArAr
Ar
 
Road map of development for pull system in thailand small and medium automoti...
Road map of development for pull system in thailand small and medium automoti...Road map of development for pull system in thailand small and medium automoti...
Road map of development for pull system in thailand small and medium automoti...
 
Study of protein content and effect of p h variation on solubility of seed pr...
Study of protein content and effect of p h variation on solubility of seed pr...Study of protein content and effect of p h variation on solubility of seed pr...
Study of protein content and effect of p h variation on solubility of seed pr...
 
Ga based dynamic routing in wdm optical networks
Ga based dynamic routing in wdm optical networksGa based dynamic routing in wdm optical networks
Ga based dynamic routing in wdm optical networks
 
Andrea paola duran 11- 03 trabajo
Andrea paola duran 11- 03 trabajoAndrea paola duran 11- 03 trabajo
Andrea paola duran 11- 03 trabajo
 
Hardware cristian villavicencio 1
Hardware cristian villavicencio 1Hardware cristian villavicencio 1
Hardware cristian villavicencio 1
 
Kaoru.K_portfolio_ADV124
Kaoru.K_portfolio_ADV124Kaoru.K_portfolio_ADV124
Kaoru.K_portfolio_ADV124
 

Similar to Redundancy analysis on linked data #cold2014 #ISWC2014

Data curation and data archiving at different stages of the research process
Data curation and data archiving at different stages of the research processData curation and data archiving at different stages of the research process
Data curation and data archiving at different stages of the research process
Andrea Scharnhorst
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
Sören Auer
 
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
Matthäus Zloch
 
Efficient Query Answering against Dynamic RDF Databases
Efficient Query Answering against Dynamic RDF DatabasesEfficient Query Answering against Dynamic RDF Databases
Efficient Query Answering against Dynamic RDF Databases
Alexandra Roatiș
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
WU (Vienna University of Economics and Business)
 
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaEfficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Gezim Sejdiu
 
Explanations in Dialogue Systems through Uncertain RDF Knowledge Bases
Explanations in Dialogue Systems through Uncertain RDF Knowledge BasesExplanations in Dialogue Systems through Uncertain RDF Knowledge Bases
Explanations in Dialogue Systems through Uncertain RDF Knowledge Bases
Daniel Sonntag
 
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
CONUL Conference
 
Sem facet paper
Sem facet paperSem facet paper
Sem facet paper
DBOnto
 
SemFacet paper
SemFacet paperSemFacet paper
SemFacet paper
DBOnto
 
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingFedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Peter Haase
 
Data Integration at the Ontology Engineering Group
Data Integration at the Ontology Engineering GroupData Integration at the Ontology Engineering Group
Data Integration at the Ontology Engineering Group
Oscar Corcho
 
Quantifying the bias in data links
Quantifying the bias in data linksQuantifying the bias in data links
Quantifying the bias in data links
Vrije Universiteit Amsterdam
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
National Institute of Informatics
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
Rathachai Chawuthai
 
Timbuctoo 2 EASY
Timbuctoo 2 EASYTimbuctoo 2 EASY
Timbuctoo 2 EASY
henkvandenberg16
 
RDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsRDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of Semantics
Jean-Paul Calbimonte
 
Automatically converting tabular data to
Automatically converting tabular data toAutomatically converting tabular data to
Automatically converting tabular data to
IJwest
 
semantic web & natural language
semantic web & natural languagesemantic web & natural language
semantic web & natural language
Nurfadhlina Mohd Sharef
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Ontotext
 

Similar to Redundancy analysis on linked data #cold2014 #ISWC2014 (20)

Data curation and data archiving at different stages of the research process
Data curation and data archiving at different stages of the research processData curation and data archiving at different stages of the research process
Data curation and data archiving at different stages of the research process
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
 
Efficient Query Answering against Dynamic RDF Databases
Efficient Query Answering against Dynamic RDF DatabasesEfficient Query Answering against Dynamic RDF Databases
Efficient Query Answering against Dynamic RDF Databases
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
 
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaEfficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
 
Explanations in Dialogue Systems through Uncertain RDF Knowledge Bases
Explanations in Dialogue Systems through Uncertain RDF Knowledge BasesExplanations in Dialogue Systems through Uncertain RDF Knowledge Bases
Explanations in Dialogue Systems through Uncertain RDF Knowledge Bases
 
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
 
Sem facet paper
Sem facet paperSem facet paper
Sem facet paper
 
SemFacet paper
SemFacet paperSemFacet paper
SemFacet paper
 
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingFedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
 
Data Integration at the Ontology Engineering Group
Data Integration at the Ontology Engineering GroupData Integration at the Ontology Engineering Group
Data Integration at the Ontology Engineering Group
 
Quantifying the bias in data links
Quantifying the bias in data linksQuantifying the bias in data links
Quantifying the bias in data links
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
 
Timbuctoo 2 EASY
Timbuctoo 2 EASYTimbuctoo 2 EASY
Timbuctoo 2 EASY
 
RDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsRDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of Semantics
 
Automatically converting tabular data to
Automatically converting tabular data toAutomatically converting tabular data to
Automatically converting tabular data to
 
semantic web & natural language
semantic web & natural languagesemantic web & natural language
semantic web & natural language
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
 

Recently uploaded

Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 

Recently uploaded (20)

Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 

Redundancy analysis on linked data #cold2014 #ISWC2014

  • 1. How redundant is it? – An empirical analysis on linked datasets Honghan Wu1, Boris Villazon-Terrazas2, Jeff Z. Pan1 and José Manuel Gómez Pérez2 University of Aberdeen1, UK iSOCO2 , Spain 20/10/2014 1
  • 2. 2 Content • What is data redundancy with linked data? • Why is it of special interest to linked data consumption? • Linked Data redundancy categorisation • How to analysis? • Dataset selection & The Result • Conclusion
  • 3. 3 What is the data redundancy in LD? • Data Redundancy – [Database systems] Same piece of data in multiple places – [Information theory] Wasted "space" used to transmit certain data • (In this work)Linked Data Redundancy – Wasted “space” to represent certain meaning (represented in certain semantics) – Duplication-free
  • 4. 4 Why is it of special interest to LD consumption? • Bad Redundancy & Good Redundancy – Bad for exchange: storage, transmission – Good for inference computation • Relevant consumption tasks – Hosting/Sharing – Query Answering (SPARQL) – Ontology Based Data Access – Reasoning
  • 5. Redundancy in Linked Data • Redundancy Categorisation for RDF Data • Redundancies caused by the “Linked” nature
  • 6. 6 RDF Redundancies vs. Succinct Representations [Rule based] A. K. Joshi, P. Hitzler, and G. Dong. Logical linked data compression. In The Semantic Web: Semantics and Big Data, pages 170–184. Springer, 2013. [HDT]J. D. FernáNdez, M. A. MartíNez-Prieto, C. GutiéRrez, A. Polleres, and M. Arias. Binary rdf representation for publication and exchange (hdt). Web Semant., 19:22–41, Mar. 2013. [WaterFowl] O. Curé, G. Blin, D. Revuz, and D. C. Faye. Waterfowl: A compact, self-indexed and inference-enabled immutable rdf store. In The Semantic Web: Trends and Challenges, pages 302– 316. Springer, 2014. Pan, Jeff Z., Jose Manuel Gomez-Perez, Yuan Ren, Honghan Wu, Haofen Wang and Man Zhu. “Graph Pattern based RDF Data Compression”. In Proc. of 4th Joint International Semantic Technology Conference (JIST). 2014. (To appear)
  • 7. 7 Semantic redundancy Rule Representation - DL Axioms (T-Box) - Other semantics (graph pattern substitution)
  • 8. 8 Syntactic Redundancy Concise syntax - RDF abbreviation & striping syntax - Intra-structure & Inter- structure
  • 9. 9 Symbolic Redundancy • http://xmlns.com/foaf/0.1/name – 31 bytes in ASCII URI ID (4 bytes) … … http://xmlns.com/foaf/0.1/name 128 … … Less bytes for basic data units - (Fix-length)Dictionary Based - (Variable-length) Huffman coding - Predictive encoding
  • 10. 10 Semantic Redundancy Caused by “Linked” Nature • Vocabulary Linkage – Reuse of other vocabularies: more rules – Less redundancy ratio: more triples derivable – More redundancy: co-occurrence triples removable • Instance Linkage – sameAs linkages – Bring in new assertions (e.g., type assertions) – Bring in new axioms
  • 11. How to analysis? • Two dimension analysis • Methodology • Metrics
  • 12. 12 Two dimension analysis Semantic Syntactic Symbolic A-Box ✔ ✔ A-Box & T-Box No Linkage ✔ - - T-Box Reuse ✔ - - A-Box Linkage - - RDF Redundancy Dimension Linked Semantic Dimension
  • 13. 13 Methodology: EDP Summarisation
  • 14. 14 Virtually Materialised A-Box: expanded EDP A1, B1 (1) A2, B2 (1) A-Box: A1(o1) B1(o1) A2(o2) B2(o2) R(o1, o2) T-Box: A1⊆A, A2⊆A, B1⊆B, B2⊆B R (1:1) A, B, A, B,
  • 15. Linked Dataset Analysis Results • Dataset Selection & Summary • Analysis Results
  • 16. 16 Dataset Selection and Summary LOD 2011
  • 17. 17 A-Box Only: Semantic Redundancies – Redundant Triples – Semantic redundancy ratio, i.e. – # Graph Patterns used to substitute redundant triples
  • 18. 18 A-Box Only: Syntactic Redundancies – the redundant resource occurrences of inter-structural redundancies – the syntactic redundancy ratio, i.e.
  • 19. 19 A-Box & T-Box: No Linkage DBLP2013: SWRC ontology Ordnance Survey: official published OS ontology 1.7% 184% 108% 4.7%
  • 20. 20 A-Box & T-Box: No Linkage First 3 datasts are reusing FOAF Ontology – the number of directly used terms from reused T-Box – the number of applicable axioms from (materialised) reused T-Box 26.9% 4% 45.4% 1.3%
  • 21. 21 Conclusion • LOD redundancy are heterogeneous & huge • Vocabulary linkage might lead to huge number of derivable triples • Redundancy aware techniques are demanded
  • 22. 22 Redundancy-aware Consumption • Compression: different redundancies might need different techniques • For Data Access: (high inter-structure redundancy) skewed entity distributions over EDPs -> efficient access? • OBDA/Reasoning: A-Box redundancy = less T-Box axioms • Data Publisher: should be aware of the consequences of reusing