SlideShare a Scribd company logo
1 of 23
Download to read offline
How redundant is it? – An empirical analysis on linked datasets 
Honghan Wu1, Boris Villazon-Terrazas2, Jeff Z. Pan1 and José Manuel Gómez Pérez2 
University of Aberdeen1, UK 
iSOCO2 , Spain 
20/10/2014 1
2 
Content 
• 
What is data redundancy with linked data? 
• 
Why is it of special interest to linked data consumption? 
• 
Linked Data redundancy categorisation 
• 
How to analysis? 
• 
Dataset selection & The Result 
• 
Conclusion
3 
What is the data redundancy in LD? 
• 
Data Redundancy 
– 
[Database systems] Same piece of data in multiple places 
– 
[Information theory] Wasted "space" used to transmit certain data 
• 
(In this work)Linked Data Redundancy 
– 
Wasted “space” to represent certain meaning (represented in certain semantics) 
– 
Duplication-free
4 
Why is it of special interest to LD consumption? 
• 
Bad Redundancy & Good Redundancy 
– 
Bad for exchange: storage, transmission 
– 
Good for inference computation 
• 
Relevant consumption tasks 
– 
Hosting/Sharing 
– 
Query Answering (SPARQL) 
– 
Ontology Based Data Access 
– 
Reasoning
Redundancy in Linked Data 
• 
Redundancy Categorisation for RDF Data 
• 
Redundancies caused by the “Linked” nature
6 
RDF Redundancies vs. Succinct Representations 
[Rule based] A. K. Joshi, P. Hitzler, and G. Dong. Logical linked data compression. In The Semantic Web: Semantics and Big Data, pages 170–184. Springer, 2013. 
[HDT]J. D. FernáNdez, M. A. MartíNez-Prieto, C. GutiéRrez, A. Polleres, and M. Arias. Binary rdf representation for publication and exchange (hdt). Web Semant., 19:22–41, Mar. 2013. 
[WaterFowl] O. Curé, G. Blin, D. Revuz, and D. C. Faye. Waterfowl: A compact, self-indexed and inference-enabled immutable rdf store. In The Semantic Web: Trends and Challenges, pages 302– 316. Springer, 2014. 
Pan, Jeff Z., Jose Manuel Gomez-Perez, Yuan Ren, Honghan Wu, Haofen Wang and Man Zhu. “Graph Pattern based RDF Data Compression”. In Proc. of 4th Joint International Semantic Technology Conference (JIST). 2014. (To appear)
7 
Semantic redundancy 
Rule Representation 
- 
DL Axioms (T-Box) 
- 
Other semantics (graph pattern substitution)
8 
Syntactic Redundancy 
Concise syntax 
- 
RDF abbreviation & striping syntax 
- 
Intra-structure & Inter- structure
9 
Symbolic Redundancy 
• 
http://xmlns.com/foaf/0.1/name 
– 
31 bytes in ASCII 
URI 
ID (4 bytes) 
… 
… 
http://xmlns.com/foaf/0.1/name 
128 
… 
… 
Less bytes for basic data units 
- 
(Fix-length)Dictionary Based 
- 
(Variable-length) Huffman coding 
- 
Predictive encoding
10 
Semantic Redundancy Caused by “Linked” Nature 
• 
Vocabulary Linkage 
– 
Reuse of other vocabularies: more rules 
– 
Less redundancy ratio: more triples derivable 
– 
More redundancy: co-occurrence triples removable 
• 
Instance Linkage 
– 
sameAs linkages 
– 
Bring in new assertions (e.g., type assertions) 
– 
Bring in new axioms
How to analysis? 
• 
Two dimension analysis 
• 
Methodology 
• 
Metrics
12 
Two dimension analysis 
Semantic 
Syntactic 
Symbolic 
A-Box 
✔ 
✔ 
A-Box & T-Box 
No Linkage 
✔ 
- 
- 
T-Box Reuse 
✔ 
- 
- 
A-Box Linkage 
- 
- 
RDF Redundancy Dimension 
Linked Semantic 
Dimension
13 
Methodology: EDP Summarisation
14 
Virtually Materialised A-Box: expanded EDP 
A1, B1 (1) 
A2, B2 (1) 
A-Box: A1(o1) B1(o1) A2(o2) B2(o2) R(o1, o2) 
T-Box: A1⊆A, A2⊆A, B1⊆B, B2⊆B 
R (1:1) 
A, B, 
A, B,
Linked Dataset Analysis Results 
• 
Dataset Selection & Summary 
• 
Analysis Results
16 
Dataset Selection and Summary 
LOD 2011
17 
A-Box Only: Semantic Redundancies 
– Redundant Triples 
– Semantic redundancy ratio, i.e. 
– # Graph Patterns used to substitute redundant triples
18 
A-Box Only: Syntactic Redundancies 
– the redundant resource occurrences of inter-structural 
redundancies 
– the syntactic redundancy ratio, i.e.
19 
A-Box & T-Box: No Linkage 
DBLP2013: SWRC ontology 
Ordnance Survey: official published OS ontology 
1.7% 
184% 
108% 
4.7%
20 
A-Box & T-Box: No Linkage 
First 3 datasts are reusing FOAF Ontology 
– the number of directly used terms from reused T-Box 
– the number of applicable axioms from (materialised) reused T-Box 
26.9% 
4% 
45.4% 
1.3%
21 
Conclusion 
• 
LOD redundancy are heterogeneous & huge 
• 
Vocabulary linkage might lead to huge number of derivable triples 
• 
Redundancy aware techniques are demanded
22 
Redundancy-aware Consumption 
• 
Compression: different redundancies might need different techniques 
• 
For Data Access: (high inter-structure redundancy) skewed entity distributions over EDPs -> efficient access? 
• 
OBDA/Reasoning: A-Box redundancy = less T-Box axioms 
• 
Data Publisher: should be aware of the consequences of reusing
Thanks! Q & A

More Related Content

What's hot

Zhishi.me - Weaving Chinese Linking Open Data
Zhishi.me - Weaving Chinese Linking Open DataZhishi.me - Weaving Chinese Linking Open Data
Zhishi.me - Weaving Chinese Linking Open DataXing Niu
 
The Maze of Deletion in Ontology Stream Reasoning
The Maze of Deletion in Ontology Stream Reasoning The Maze of Deletion in Ontology Stream Reasoning
The Maze of Deletion in Ontology Stream Reasoning Jeff Z. Pan
 
Towards the implementation of a refined data model for a Zulu machine-readabl...
Towards the implementation of a refined data model for a Zulu machine-readabl...Towards the implementation of a refined data model for a Zulu machine-readabl...
Towards the implementation of a refined data model for a Zulu machine-readabl...Guy De Pauw
 
3. Stack - Data Structures using C++ by Varsha Patil
3. Stack - Data Structures using C++ by Varsha Patil3. Stack - Data Structures using C++ by Varsha Patil
3. Stack - Data Structures using C++ by Varsha Patilwidespreadpromotion
 
6. Linked list - Data Structures using C++ by Varsha Patil
6. Linked list - Data Structures using C++ by Varsha Patil6. Linked list - Data Structures using C++ by Varsha Patil
6. Linked list - Data Structures using C++ by Varsha Patilwidespreadpromotion
 
13. Indexing MTrees - Data Structures using C++ by Varsha Patil
13. Indexing MTrees - Data Structures using C++ by Varsha Patil13. Indexing MTrees - Data Structures using C++ by Varsha Patil
13. Indexing MTrees - Data Structures using C++ by Varsha Patilwidespreadpromotion
 
Effective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch AlgorithmEffective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch AlgorithmIRJET Journal
 
A hierarchical approach for semi structured document indexing and
A hierarchical approach for semi structured document indexing andA hierarchical approach for semi structured document indexing and
A hierarchical approach for semi structured document indexing andIbrahim Bounhas
 
5. Queue - Data Structures using C++ by Varsha Patil
5. Queue - Data Structures using C++ by Varsha Patil5. Queue - Data Structures using C++ by Varsha Patil
5. Queue - Data Structures using C++ by Varsha Patilwidespreadpromotion
 
On the Way to a Holding Ontology
On the Way to a Holding OntologyOn the Way to a Holding Ontology
On the Way to a Holding OntologyJakob .
 
SQL For PHP Programmers
SQL For PHP ProgrammersSQL For PHP Programmers
SQL For PHP ProgrammersDave Stokes
 
Contexts and Importing in RDF
Contexts and Importing in RDFContexts and Importing in RDF
Contexts and Importing in RDFJie Bao
 
1. Fundamental Concept - Data Structures using C++ by Varsha Patil
1. Fundamental Concept - Data Structures using C++ by Varsha Patil1. Fundamental Concept - Data Structures using C++ by Varsha Patil
1. Fundamental Concept - Data Structures using C++ by Varsha Patilwidespreadpromotion
 
10. Search Tree - Data Structures using C++ by Varsha Patil
10. Search Tree - Data Structures using C++ by Varsha Patil10. Search Tree - Data Structures using C++ by Varsha Patil
10. Search Tree - Data Structures using C++ by Varsha Patilwidespreadpromotion
 
7. Tree - Data Structures using C++ by Varsha Patil
7. Tree - Data Structures using C++ by Varsha Patil7. Tree - Data Structures using C++ by Varsha Patil
7. Tree - Data Structures using C++ by Varsha Patilwidespreadpromotion
 
14. Files - Data Structures using C++ by Varsha Patil
14. Files - Data Structures using C++ by Varsha Patil14. Files - Data Structures using C++ by Varsha Patil
14. Files - Data Structures using C++ by Varsha Patilwidespreadpromotion
 

What's hot (20)

Zhishi.me - Weaving Chinese Linking Open Data
Zhishi.me - Weaving Chinese Linking Open DataZhishi.me - Weaving Chinese Linking Open Data
Zhishi.me - Weaving Chinese Linking Open Data
 
The Maze of Deletion in Ontology Stream Reasoning
The Maze of Deletion in Ontology Stream Reasoning The Maze of Deletion in Ontology Stream Reasoning
The Maze of Deletion in Ontology Stream Reasoning
 
Towards the implementation of a refined data model for a Zulu machine-readabl...
Towards the implementation of a refined data model for a Zulu machine-readabl...Towards the implementation of a refined data model for a Zulu machine-readabl...
Towards the implementation of a refined data model for a Zulu machine-readabl...
 
3. Stack - Data Structures using C++ by Varsha Patil
3. Stack - Data Structures using C++ by Varsha Patil3. Stack - Data Structures using C++ by Varsha Patil
3. Stack - Data Structures using C++ by Varsha Patil
 
6. Linked list - Data Structures using C++ by Varsha Patil
6. Linked list - Data Structures using C++ by Varsha Patil6. Linked list - Data Structures using C++ by Varsha Patil
6. Linked list - Data Structures using C++ by Varsha Patil
 
13. Indexing MTrees - Data Structures using C++ by Varsha Patil
13. Indexing MTrees - Data Structures using C++ by Varsha Patil13. Indexing MTrees - Data Structures using C++ by Varsha Patil
13. Indexing MTrees - Data Structures using C++ by Varsha Patil
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
Effective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch AlgorithmEffective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch Algorithm
 
Efficient RDF Interchange (ERI) Format for RDF Data Streams
Efficient RDF Interchange (ERI) Format for RDF Data StreamsEfficient RDF Interchange (ERI) Format for RDF Data Streams
Efficient RDF Interchange (ERI) Format for RDF Data Streams
 
A hierarchical approach for semi structured document indexing and
A hierarchical approach for semi structured document indexing andA hierarchical approach for semi structured document indexing and
A hierarchical approach for semi structured document indexing and
 
5. Queue - Data Structures using C++ by Varsha Patil
5. Queue - Data Structures using C++ by Varsha Patil5. Queue - Data Structures using C++ by Varsha Patil
5. Queue - Data Structures using C++ by Varsha Patil
 
On the Way to a Holding Ontology
On the Way to a Holding OntologyOn the Way to a Holding Ontology
On the Way to a Holding Ontology
 
SQL For PHP Programmers
SQL For PHP ProgrammersSQL For PHP Programmers
SQL For PHP Programmers
 
Contexts and Importing in RDF
Contexts and Importing in RDFContexts and Importing in RDF
Contexts and Importing in RDF
 
1. Fundamental Concept - Data Structures using C++ by Varsha Patil
1. Fundamental Concept - Data Structures using C++ by Varsha Patil1. Fundamental Concept - Data Structures using C++ by Varsha Patil
1. Fundamental Concept - Data Structures using C++ by Varsha Patil
 
normalization
normalizationnormalization
normalization
 
10. Search Tree - Data Structures using C++ by Varsha Patil
10. Search Tree - Data Structures using C++ by Varsha Patil10. Search Tree - Data Structures using C++ by Varsha Patil
10. Search Tree - Data Structures using C++ by Varsha Patil
 
7. Tree - Data Structures using C++ by Varsha Patil
7. Tree - Data Structures using C++ by Varsha Patil7. Tree - Data Structures using C++ by Varsha Patil
7. Tree - Data Structures using C++ by Varsha Patil
 
Introduction to LDL 2012
Introduction to LDL 2012Introduction to LDL 2012
Introduction to LDL 2012
 
14. Files - Data Structures using C++ by Varsha Patil
14. Files - Data Structures using C++ by Varsha Patil14. Files - Data Structures using C++ by Varsha Patil
14. Files - Data Structures using C++ by Varsha Patil
 

Viewers also liked

Introduction to Data Mining for Newbies
Introduction to Data Mining for NewbiesIntroduction to Data Mining for Newbies
Introduction to Data Mining for NewbiesEunjeong (Lucy) Park
 
Jonanthan Leopard's Visual Resume
Jonanthan Leopard's Visual ResumeJonanthan Leopard's Visual Resume
Jonanthan Leopard's Visual Resumejonleopard
 
Experimental investigation of effectiveness of heat wheel as a rotory heat ex...
Experimental investigation of effectiveness of heat wheel as a rotory heat ex...Experimental investigation of effectiveness of heat wheel as a rotory heat ex...
Experimental investigation of effectiveness of heat wheel as a rotory heat ex...eSAT Publishing House
 
Design of a usb based data acquisition system
Design of a usb based data acquisition systemDesign of a usb based data acquisition system
Design of a usb based data acquisition systemeSAT Publishing House
 
Performance bounds for unequally punctured
Performance bounds for unequally puncturedPerformance bounds for unequally punctured
Performance bounds for unequally puncturedeSAT Publishing House
 
A comprehensive survey on security issues in cloud computing and data privacy...
A comprehensive survey on security issues in cloud computing and data privacy...A comprehensive survey on security issues in cloud computing and data privacy...
A comprehensive survey on security issues in cloud computing and data privacy...eSAT Publishing House
 
Implementation of delay measurement technique using signature register for sm...
Implementation of delay measurement technique using signature register for sm...Implementation of delay measurement technique using signature register for sm...
Implementation of delay measurement technique using signature register for sm...eSAT Publishing House
 
Загадки о животных
Загадки о животныхЗагадки о животных
Загадки о животныхdrugsem
 
Solentive / InRule AADI Gartner Summit 2014
Solentive / InRule AADI Gartner Summit 2014Solentive / InRule AADI Gartner Summit 2014
Solentive / InRule AADI Gartner Summit 2014Solentive
 
Космическое фотопутешествие с телескопом хаббл
Космическое фотопутешествие с телескопом хабблКосмическое фотопутешествие с телескопом хаббл
Космическое фотопутешествие с телескопом хабблdrugsem
 
NBPC 1613 San Diego, CA Proposed 2014 bylaws draft_july_24_unanimous_consensu...
NBPC 1613 San Diego, CA Proposed 2014 bylaws draft_july_24_unanimous_consensu...NBPC 1613 San Diego, CA Proposed 2014 bylaws draft_july_24_unanimous_consensu...
NBPC 1613 San Diego, CA Proposed 2014 bylaws draft_july_24_unanimous_consensu...NBPCSanDiego
 
A language independent web data extraction using vision based page segmentati...
A language independent web data extraction using vision based page segmentati...A language independent web data extraction using vision based page segmentati...
A language independent web data extraction using vision based page segmentati...eSAT Publishing House
 
Road map of development for pull system in thailand small and medium automoti...
Road map of development for pull system in thailand small and medium automoti...Road map of development for pull system in thailand small and medium automoti...
Road map of development for pull system in thailand small and medium automoti...eSAT Publishing House
 
Study of protein content and effect of p h variation on solubility of seed pr...
Study of protein content and effect of p h variation on solubility of seed pr...Study of protein content and effect of p h variation on solubility of seed pr...
Study of protein content and effect of p h variation on solubility of seed pr...eSAT Publishing House
 
Ga based dynamic routing in wdm optical networks
Ga based dynamic routing in wdm optical networksGa based dynamic routing in wdm optical networks
Ga based dynamic routing in wdm optical networkseSAT Publishing House
 
Andrea paola duran 11- 03 trabajo
Andrea paola duran 11- 03 trabajoAndrea paola duran 11- 03 trabajo
Andrea paola duran 11- 03 trabajoAndrea Duran ʚïɞ
 
Hardware cristian villavicencio 1
Hardware cristian villavicencio 1Hardware cristian villavicencio 1
Hardware cristian villavicencio 1cristianlukas
 
Kaoru.K_portfolio_ADV124
Kaoru.K_portfolio_ADV124Kaoru.K_portfolio_ADV124
Kaoru.K_portfolio_ADV124Kaoru Kishigami
 

Viewers also liked (20)

Introduction to Data Mining for Newbies
Introduction to Data Mining for NewbiesIntroduction to Data Mining for Newbies
Introduction to Data Mining for Newbies
 
Jonanthan Leopard's Visual Resume
Jonanthan Leopard's Visual ResumeJonanthan Leopard's Visual Resume
Jonanthan Leopard's Visual Resume
 
Experimental investigation of effectiveness of heat wheel as a rotory heat ex...
Experimental investigation of effectiveness of heat wheel as a rotory heat ex...Experimental investigation of effectiveness of heat wheel as a rotory heat ex...
Experimental investigation of effectiveness of heat wheel as a rotory heat ex...
 
Design of a usb based data acquisition system
Design of a usb based data acquisition systemDesign of a usb based data acquisition system
Design of a usb based data acquisition system
 
Performance bounds for unequally punctured
Performance bounds for unequally puncturedPerformance bounds for unequally punctured
Performance bounds for unequally punctured
 
A comprehensive survey on security issues in cloud computing and data privacy...
A comprehensive survey on security issues in cloud computing and data privacy...A comprehensive survey on security issues in cloud computing and data privacy...
A comprehensive survey on security issues in cloud computing and data privacy...
 
IVT Företagspresentation
IVT FöretagspresentationIVT Företagspresentation
IVT Företagspresentation
 
Implementation of delay measurement technique using signature register for sm...
Implementation of delay measurement technique using signature register for sm...Implementation of delay measurement technique using signature register for sm...
Implementation of delay measurement technique using signature register for sm...
 
Загадки о животных
Загадки о животныхЗагадки о животных
Загадки о животных
 
Solentive / InRule AADI Gartner Summit 2014
Solentive / InRule AADI Gartner Summit 2014Solentive / InRule AADI Gartner Summit 2014
Solentive / InRule AADI Gartner Summit 2014
 
Космическое фотопутешествие с телескопом хаббл
Космическое фотопутешествие с телескопом хабблКосмическое фотопутешествие с телескопом хаббл
Космическое фотопутешествие с телескопом хаббл
 
NBPC 1613 San Diego, CA Proposed 2014 bylaws draft_july_24_unanimous_consensu...
NBPC 1613 San Diego, CA Proposed 2014 bylaws draft_july_24_unanimous_consensu...NBPC 1613 San Diego, CA Proposed 2014 bylaws draft_july_24_unanimous_consensu...
NBPC 1613 San Diego, CA Proposed 2014 bylaws draft_july_24_unanimous_consensu...
 
A language independent web data extraction using vision based page segmentati...
A language independent web data extraction using vision based page segmentati...A language independent web data extraction using vision based page segmentati...
A language independent web data extraction using vision based page segmentati...
 
Ar
ArAr
Ar
 
Road map of development for pull system in thailand small and medium automoti...
Road map of development for pull system in thailand small and medium automoti...Road map of development for pull system in thailand small and medium automoti...
Road map of development for pull system in thailand small and medium automoti...
 
Study of protein content and effect of p h variation on solubility of seed pr...
Study of protein content and effect of p h variation on solubility of seed pr...Study of protein content and effect of p h variation on solubility of seed pr...
Study of protein content and effect of p h variation on solubility of seed pr...
 
Ga based dynamic routing in wdm optical networks
Ga based dynamic routing in wdm optical networksGa based dynamic routing in wdm optical networks
Ga based dynamic routing in wdm optical networks
 
Andrea paola duran 11- 03 trabajo
Andrea paola duran 11- 03 trabajoAndrea paola duran 11- 03 trabajo
Andrea paola duran 11- 03 trabajo
 
Hardware cristian villavicencio 1
Hardware cristian villavicencio 1Hardware cristian villavicencio 1
Hardware cristian villavicencio 1
 
Kaoru.K_portfolio_ADV124
Kaoru.K_portfolio_ADV124Kaoru.K_portfolio_ADV124
Kaoru.K_portfolio_ADV124
 

Similar to Redundancy analysis on linked data #cold2014 #ISWC2014

Data curation and data archiving at different stages of the research process
Data curation and data archiving at different stages of the research processData curation and data archiving at different stages of the research process
Data curation and data archiving at different stages of the research processAndrea Scharnhorst
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedSören Auer
 
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...Matthäus Zloch
 
Efficient Query Answering against Dynamic RDF Databases
Efficient Query Answering against Dynamic RDF DatabasesEfficient Query Answering against Dynamic RDF Databases
Efficient Query Answering against Dynamic RDF DatabasesAlexandra Roatiș
 
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaEfficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaGezim Sejdiu
 
Explanations in Dialogue Systems through Uncertain RDF Knowledge Bases
Explanations in Dialogue Systems through Uncertain RDF Knowledge BasesExplanations in Dialogue Systems through Uncertain RDF Knowledge Bases
Explanations in Dialogue Systems through Uncertain RDF Knowledge BasesDaniel Sonntag
 
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...CONUL Conference
 
Sem facet paper
Sem facet paperSem facet paper
Sem facet paperDBOnto
 
SemFacet paper
SemFacet paperSemFacet paper
SemFacet paperDBOnto
 
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingFedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingPeter Haase
 
Data Integration at the Ontology Engineering Group
Data Integration at the Ontology Engineering GroupData Integration at the Ontology Engineering Group
Data Integration at the Ontology Engineering GroupOscar Corcho
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeNational Institute of Informatics
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRathachai Chawuthai
 
RDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsRDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsJean-Paul Calbimonte
 
Automatically converting tabular data to
Automatically converting tabular data toAutomatically converting tabular data to
Automatically converting tabular data toIJwest
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesOntotext
 

Similar to Redundancy analysis on linked data #cold2014 #ISWC2014 (20)

Data curation and data archiving at different stages of the research process
Data curation and data archiving at different stages of the research processData curation and data archiving at different stages of the research process
Data curation and data archiving at different stages of the research process
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
 
Efficient Query Answering against Dynamic RDF Databases
Efficient Query Answering against Dynamic RDF DatabasesEfficient Query Answering against Dynamic RDF Databases
Efficient Query Answering against Dynamic RDF Databases
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
 
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaEfficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
 
Explanations in Dialogue Systems through Uncertain RDF Knowledge Bases
Explanations in Dialogue Systems through Uncertain RDF Knowledge BasesExplanations in Dialogue Systems through Uncertain RDF Knowledge Bases
Explanations in Dialogue Systems through Uncertain RDF Knowledge Bases
 
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
 
Sem facet paper
Sem facet paperSem facet paper
Sem facet paper
 
SemFacet paper
SemFacet paperSemFacet paper
SemFacet paper
 
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingFedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
 
Data Integration at the Ontology Engineering Group
Data Integration at the Ontology Engineering GroupData Integration at the Ontology Engineering Group
Data Integration at the Ontology Engineering Group
 
Quantifying the bias in data links
Quantifying the bias in data linksQuantifying the bias in data links
Quantifying the bias in data links
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
 
Timbuctoo 2 EASY
Timbuctoo 2 EASYTimbuctoo 2 EASY
Timbuctoo 2 EASY
 
RDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsRDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of Semantics
 
Automatically converting tabular data to
Automatically converting tabular data toAutomatically converting tabular data to
Automatically converting tabular data to
 
semantic web & natural language
semantic web & natural languagesemantic web & natural language
semantic web & natural language
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
 

Recently uploaded

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Recently uploaded (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Redundancy analysis on linked data #cold2014 #ISWC2014

  • 1. How redundant is it? – An empirical analysis on linked datasets Honghan Wu1, Boris Villazon-Terrazas2, Jeff Z. Pan1 and José Manuel Gómez Pérez2 University of Aberdeen1, UK iSOCO2 , Spain 20/10/2014 1
  • 2. 2 Content • What is data redundancy with linked data? • Why is it of special interest to linked data consumption? • Linked Data redundancy categorisation • How to analysis? • Dataset selection & The Result • Conclusion
  • 3. 3 What is the data redundancy in LD? • Data Redundancy – [Database systems] Same piece of data in multiple places – [Information theory] Wasted "space" used to transmit certain data • (In this work)Linked Data Redundancy – Wasted “space” to represent certain meaning (represented in certain semantics) – Duplication-free
  • 4. 4 Why is it of special interest to LD consumption? • Bad Redundancy & Good Redundancy – Bad for exchange: storage, transmission – Good for inference computation • Relevant consumption tasks – Hosting/Sharing – Query Answering (SPARQL) – Ontology Based Data Access – Reasoning
  • 5. Redundancy in Linked Data • Redundancy Categorisation for RDF Data • Redundancies caused by the “Linked” nature
  • 6. 6 RDF Redundancies vs. Succinct Representations [Rule based] A. K. Joshi, P. Hitzler, and G. Dong. Logical linked data compression. In The Semantic Web: Semantics and Big Data, pages 170–184. Springer, 2013. [HDT]J. D. FernáNdez, M. A. MartíNez-Prieto, C. GutiéRrez, A. Polleres, and M. Arias. Binary rdf representation for publication and exchange (hdt). Web Semant., 19:22–41, Mar. 2013. [WaterFowl] O. Curé, G. Blin, D. Revuz, and D. C. Faye. Waterfowl: A compact, self-indexed and inference-enabled immutable rdf store. In The Semantic Web: Trends and Challenges, pages 302– 316. Springer, 2014. Pan, Jeff Z., Jose Manuel Gomez-Perez, Yuan Ren, Honghan Wu, Haofen Wang and Man Zhu. “Graph Pattern based RDF Data Compression”. In Proc. of 4th Joint International Semantic Technology Conference (JIST). 2014. (To appear)
  • 7. 7 Semantic redundancy Rule Representation - DL Axioms (T-Box) - Other semantics (graph pattern substitution)
  • 8. 8 Syntactic Redundancy Concise syntax - RDF abbreviation & striping syntax - Intra-structure & Inter- structure
  • 9. 9 Symbolic Redundancy • http://xmlns.com/foaf/0.1/name – 31 bytes in ASCII URI ID (4 bytes) … … http://xmlns.com/foaf/0.1/name 128 … … Less bytes for basic data units - (Fix-length)Dictionary Based - (Variable-length) Huffman coding - Predictive encoding
  • 10. 10 Semantic Redundancy Caused by “Linked” Nature • Vocabulary Linkage – Reuse of other vocabularies: more rules – Less redundancy ratio: more triples derivable – More redundancy: co-occurrence triples removable • Instance Linkage – sameAs linkages – Bring in new assertions (e.g., type assertions) – Bring in new axioms
  • 11. How to analysis? • Two dimension analysis • Methodology • Metrics
  • 12. 12 Two dimension analysis Semantic Syntactic Symbolic A-Box ✔ ✔ A-Box & T-Box No Linkage ✔ - - T-Box Reuse ✔ - - A-Box Linkage - - RDF Redundancy Dimension Linked Semantic Dimension
  • 13. 13 Methodology: EDP Summarisation
  • 14. 14 Virtually Materialised A-Box: expanded EDP A1, B1 (1) A2, B2 (1) A-Box: A1(o1) B1(o1) A2(o2) B2(o2) R(o1, o2) T-Box: A1⊆A, A2⊆A, B1⊆B, B2⊆B R (1:1) A, B, A, B,
  • 15. Linked Dataset Analysis Results • Dataset Selection & Summary • Analysis Results
  • 16. 16 Dataset Selection and Summary LOD 2011
  • 17. 17 A-Box Only: Semantic Redundancies – Redundant Triples – Semantic redundancy ratio, i.e. – # Graph Patterns used to substitute redundant triples
  • 18. 18 A-Box Only: Syntactic Redundancies – the redundant resource occurrences of inter-structural redundancies – the syntactic redundancy ratio, i.e.
  • 19. 19 A-Box & T-Box: No Linkage DBLP2013: SWRC ontology Ordnance Survey: official published OS ontology 1.7% 184% 108% 4.7%
  • 20. 20 A-Box & T-Box: No Linkage First 3 datasts are reusing FOAF Ontology – the number of directly used terms from reused T-Box – the number of applicable axioms from (materialised) reused T-Box 26.9% 4% 45.4% 1.3%
  • 21. 21 Conclusion • LOD redundancy are heterogeneous & huge • Vocabulary linkage might lead to huge number of derivable triples • Redundancy aware techniques are demanded
  • 22. 22 Redundancy-aware Consumption • Compression: different redundancies might need different techniques • For Data Access: (high inter-structure redundancy) skewed entity distributions over EDPs -> efficient access? • OBDA/Reasoning: A-Box redundancy = less T-Box axioms • Data Publisher: should be aware of the consequences of reusing