SlideShare a Scribd company logo
1 of 21
Download to read offline
Using Caching for Local Link Discovery on Large
Data Sets
Mofeed Hassan, Ren´e Speck and Axel-Cyrille Ngonga Ngomo
Agile Knowledge Engineering and Semantic Web
Department of Computer Science
University of Leipzig
Augustusplatz 10, 04109 Leipzig
{mounir,speck,ngonga}@informatik.uni-leipzig.de
June 25, 2015
t
ICWE-2015
Data Web and Link Discovery
1 Web of Data
2 Fourth Linked Data principle
3 Links are central for
Cross-ontology QA
Data Integration
Reasoning
Federated Queries
...
4 Linked Data on the Web
10+ thousand datasets
89+ billion triples
≈ 500+ million links
M. Hassan, R. Speck and A. Ngonga June 25, 2015 Caching in Link Discovery
2 / 15
t
ICWE-2015
Link Discovery- Rotterdam
M. Hassan, R. Speck and A. Ngonga June 25, 2015 Caching in Link Discovery
3 / 15
t
ICWE-2015
Link Discovery- Rotterdam
M. Hassan, R. Speck and A. Ngonga June 25, 2015 Caching in Link Discovery
3 / 15
t
ICWE-2015
Link Discovery- Rotterdam
M. Hassan, R. Speck and A. Ngonga June 25, 2015 Caching in Link Discovery
3 / 15
t
ICWE-2015
Link Discovery- Rotterdam
M. Hassan, R. Speck and A. Ngonga June 25, 2015 Caching in Link Discovery
4 / 15
t
ICWE-2015
Link Discovery- Rotterdam
M. Hassan, R. Speck and A. Ngonga June 25, 2015 Caching in Link Discovery
4 / 15
t
ICWE-2015
Link Discovery- Rotterdam
M. Hassan, R. Speck and A. Ngonga June 25, 2015 Caching in Link Discovery
4 / 15
t
ICWE-2015
Link Discovery- Rotterdam
Problem
Large datasets do not fit in memory during linking process.
M. Hassan, R. Speck and A. Ngonga June 25, 2015 Caching in Link Discovery
5 / 15
t
ICWE-2015
Link Discovery- Rotterdam
Problem
Large datasets do not fit in memory during linking process.
M. Hassan, R. Speck and A. Ngonga June 25, 2015 Caching in Link Discovery
5 / 15
t
ICWE-2015
ORCHID
M. Hassan, R. Speck and A. Ngonga June 25, 2015 Caching in Link Discovery
6 / 15
t
ICWE-2015
ORCHID
Idea
How to cache the closest segments to compare?
M. Hassan, R. Speck and A. Ngonga June 25, 2015 Caching in Link Discovery
7 / 15
t
ICWE-2015
ORCHID
Idea
How to cache the closest segments to compare?
M. Hassan, R. Speck and A. Ngonga June 25, 2015 Caching in Link Discovery
7 / 15
t
ICWE-2015
Caching approaches
M. Hassan, R. Speck and A. Ngonga June 25, 2015 Caching in Link Discovery
8 / 15
t
ICWE-2015
Experiment Set-up
Datasets: LinkedGeoData
Experiment is two phases
Phase I : same cache size, different distance thresholds
Phase II: different cache sizes, same distance threshold
Two phases set-up
Data size Cache size Dist. threshold
Phase I 104 103 0,0.1,0.3,0.5
Phase II 105 101, 102, 103, 104, 105 0.5
M. Hassan, R. Speck and A. Ngonga June 25, 2015 Caching in Link Discovery
9 / 15
t
ICWE-2015
Results-Phase I
Fifo Fifo2ndChance LRU Slru Lfu LfuDA
Caching Approaches
10
3
10
4
10
5
10
6
10
7
CacheHits
Distance Threshold= 0
Distance Threshold= 0.1
Distance Threshold= 0.3
Distance Threshold= 0.5
Figure : Cache hits for different distance thresholds (104
resources)
M. Hassan, R. Speck and A. Ngonga June 25, 2015 Caching in Link Discovery
10 / 15
t
ICWE-2015
Results-Phase I
Fifo Fifo2ndChance LRU Slru Lfu LfuDA
Caching Approaches
10
3
10
4
10
5
10
6
10
7
10
8
RunTime(milliseconds)
Distance Threshold= 0
Distance Threshold= 0.1
Distance Threshold= 0.3
Distance Threshold= 0.5
Figure : Run times for different distance thresholds (104
resources)
M. Hassan, R. Speck and A. Ngonga June 25, 2015 Caching in Link Discovery
11 / 15
t
ICWE-2015
Results-Phase II
10
1
10
2
10
3
10
4
10
5
Cache size
10
3
10
4
10
5
10
6
10
7
10
8
CacheHits
Fifo
Lru
Slru
Figure : Cache hits for different cache sizes (105
resources)
M. Hassan, R. Speck and A. Ngonga June 25, 2015 Caching in Link Discovery
12 / 15
t
ICWE-2015
Results-Phase II
10
1
10
2
10
3
10
4
10
5
Cache sizes
10
3
10
4
10
5
10
6
Runtime(milliseconds)
Fifo
Lru
Slru
Figure : Run times for different cache sizes (105
resources)
M. Hassan, R. Speck and A. Ngonga June 25, 2015 Caching in Link Discovery
13 / 15
t
ICWE-2015
Conclusion and Future Work
Experiment’s findings:
Preliminary results of Caching with Link Discovery
Most of the Caching approaches performed closely
Caching approaches performed relatively with low cache hit
rates
A need for dedicated caching approach for Link Discovery
arises.
M. Hassan, R. Speck and A. Ngonga June 25, 2015 Caching in Link Discovery
14 / 15
t
ICWE-2015
Thank you!
Questions?
Mofeed Hassan
University of Leipzig
AKSW Research Group
Augustusplatz 10, Room P616
04109 Leipzig, Germany
mounir@informatik.uni-leipzig.de
@akswgroup
M. Hassan, R. Speck and A. Ngonga June 25, 2015 Caching in Link Discovery
15 / 15

More Related Content

Similar to ICWE

(Linked Data Interfaces and Querying track) "SUMMA: A Common API for Linked D...
(Linked Data Interfaces and Querying track) "SUMMA: A Common API for Linked D...(Linked Data Interfaces and Querying track) "SUMMA: A Common API for Linked D...
(Linked Data Interfaces and Querying track) "SUMMA: A Common API for Linked D...
icwe2015
 
Austin Koenig Resume
Austin Koenig ResumeAustin Koenig Resume
Austin Koenig Resume
Austin Koenig
 

Similar to ICWE (20)

Mastering the variety dimension - Ontop Demonstration
Mastering the variety dimension - Ontop DemonstrationMastering the variety dimension - Ontop Demonstration
Mastering the variety dimension - Ontop Demonstration
 
Repository Power: How Repositories can support Open Access Mandates (OR2015 O...
Repository Power: How Repositories can support Open Access Mandates (OR2015 O...Repository Power: How Repositories can support Open Access Mandates (OR2015 O...
Repository Power: How Repositories can support Open Access Mandates (OR2015 O...
 
Stream Reasoning: a summary of ten years of research and a vision for the nex...
Stream Reasoning: a summary of ten years of research and a vision for the nex...Stream Reasoning: a summary of ten years of research and a vision for the nex...
Stream Reasoning: a summary of ten years of research and a vision for the nex...
 
FP7 OpenCube project presentation at NTTS 2015 conference
FP7 OpenCube project presentation at NTTS 2015 conferenceFP7 OpenCube project presentation at NTTS 2015 conference
FP7 OpenCube project presentation at NTTS 2015 conference
 
Vehicular Content Centric Network (VCCN): A Survey and Research Challenges
Vehicular Content Centric Network (VCCN): A Survey and Research ChallengesVehicular Content Centric Network (VCCN): A Survey and Research Challenges
Vehicular Content Centric Network (VCCN): A Survey and Research Challenges
 
SUMMA: A Common API for Linked Data Entity Summaries
SUMMA: A Common API for Linked Data Entity SummariesSUMMA: A Common API for Linked Data Entity Summaries
SUMMA: A Common API for Linked Data Entity Summaries
 
(Linked Data Interfaces and Querying track) "SUMMA: A Common API for Linked D...
(Linked Data Interfaces and Querying track) "SUMMA: A Common API for Linked D...(Linked Data Interfaces and Querying track) "SUMMA: A Common API for Linked D...
(Linked Data Interfaces and Querying track) "SUMMA: A Common API for Linked D...
 
Assisted Interaction Data Analysis of Web-Based User Studies
Assisted Interaction Data Analysis of Web-Based User StudiesAssisted Interaction Data Analysis of Web-Based User Studies
Assisted Interaction Data Analysis of Web-Based User Studies
 
UCT eResearch - Presentation for IT reps
UCT eResearch  - Presentation for IT repsUCT eResearch  - Presentation for IT reps
UCT eResearch - Presentation for IT reps
 
NATE_Oct17_2015_Schwieterman
NATE_Oct17_2015_SchwietermanNATE_Oct17_2015_Schwieterman
NATE_Oct17_2015_Schwieterman
 
A Link Generator for Increasing the Utility of OpenAPI-to-GraphQL Translations
A Link Generator for Increasing the Utility of OpenAPI-to-GraphQL TranslationsA Link Generator for Increasing the Utility of OpenAPI-to-GraphQL Translations
A Link Generator for Increasing the Utility of OpenAPI-to-GraphQL Translations
 
Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1
 
Mobile Community Cloud Computing: Emerges and Evolves
Mobile Community Cloud Computing: Emerges and EvolvesMobile Community Cloud Computing: Emerges and Evolves
Mobile Community Cloud Computing: Emerges and Evolves
 
Personal Research Overview presented at the KU-NAIST Research Meeting
Personal Research Overview presented at the KU-NAIST Research MeetingPersonal Research Overview presented at the KU-NAIST Research Meeting
Personal Research Overview presented at the KU-NAIST Research Meeting
 
Austin Koenig Resume
Austin Koenig ResumeAustin Koenig Resume
Austin Koenig Resume
 
How to get Ready for FirstNet | Strategy, Architecture, and Security | June 10
How to get Ready for FirstNet | Strategy, Architecture, and Security | June 10How to get Ready for FirstNet | Strategy, Architecture, and Security | June 10
How to get Ready for FirstNet | Strategy, Architecture, and Security | June 10
 
Mapping Research Infrastructures with the ENVRI Reference Model
Mapping Research Infrastructures with the ENVRI Reference ModelMapping Research Infrastructures with the ENVRI Reference Model
Mapping Research Infrastructures with the ENVRI Reference Model
 
Benchmarking Linked Data Introductory Remarks
Benchmarking Linked Data Introductory RemarksBenchmarking Linked Data Introductory Remarks
Benchmarking Linked Data Introductory Remarks
 
Leveraging Data Driven Research Through Microsoft Azure
Leveraging Data Driven Research Through Microsoft AzureLeveraging Data Driven Research Through Microsoft Azure
Leveraging Data Driven Research Through Microsoft Azure
 
Camp 4-data workshop presentation
Camp 4-data workshop presentationCamp 4-data workshop presentation
Camp 4-data workshop presentation
 

ICWE