SlideShare a Scribd company logo
1 of 15
Download to read offline
Towards Smart Cache Management for Ontology
Based, History-Aware Stream Reasoning
Rui Yan, Deborah L. McGuinness
Tetherless World Constellation
Department of Computer Science
Rensselaer Polytechnic Institute
Presented on Oct 12, 2015 at Linked Science Workshop, International Semantic Web Conference (ISWC) 2015
Brenda Praggastis, William P. Smith
Pacific Northwest National Laboratory,
Richland, WA, USA
Contents
1. Introduction
a. stream reasoning
b. examples of the existing stream reasoning systems
2. Approach
a. motivated use case
b. why cache
c. cache v.s. window
d. historical data management
3. Discussion
4. Future work
2
Presented on Oct 12, 2015 at Linked Science Workshop, International Semantic Web Conference (ISWC) 2015
Introduction / stream reasoning
- RDF streams [1]
- streaming data modeled in RDF
- linked data principles
- Data stream processing systems
- Semantic reasoning
- Stream reasoning [2]
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[1] Barbieri, Davide F., and E. D. Valle. "A proposal for publishing data streams as linked data." Linked Data on the Web Workshop. 2010.
[2] Della Valle, Emanuele, et al. A first step towards stream reasoning. Springer Berlin Heidelberg, 2009.
Presented on Oct 12, 2015 at Linked Science Workshop, International Semantic Web Conference (ISWC) 2015
3
Introduction / examples of the existing systems
- Existing stream reasoning systems
- C-SPARQL [3]
- continuous SPARQL, an extension of the standard SPARQL
- window-based system
- RDF data are stamped with timepoints
- process RDF streams
- EP-SPARQL [4]
- event processing SPARQL, an extension of the standard SPARQL
- window-based system
- RDF data are stamped with time intervals
- detect complex event patterns
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[3] Barbieri, Davide Francesco, et al. "C-SPARQL: SPARQL for continuous querying." Proceedings of the 18th international conference on World wide web. ACM, 2009.
[4] Anicic, Darko, et al. "EP-SPARQL: a unified language for event processing and stream reasoning." Proceedings of the 20th international conference on World wide web. ACM, 201
Presented on Oct 12, 2015 at Linked Science Workshop, International Semantic Web Conference (ISWC) 2015
4
Approach / motivated use case
Motivated Use Case:
- Nuclear Magnetic Resonance (NMR)
5
Presented on Oct 12, 2015 at Linked Science Workshop, International Semantic Web Conference (ISWC) 2015
6
Approach / background ontology
Background ontology
- 30 different compounds are encoded with their unique frequency ranges
- these compounds are sourced from Human Metabolome Database1
- all metabolites (small molecules) that are found in human urine and/or
blood plasma
7
Presented on Oct 12, 2015 at Linked Science Workshop, International Semantic Web Conference (ISWC) 2015
1. http://www.hmdb.ca/
Approach / introducing the cache
What & Why cache ?
- memory-based or disk-based
- identify & store interesting portion of the streaming data
- cache management policy
- historical data management
a cache v.s. a window:
8
Presented on Oct 12, 2015 at Linked Science Workshop, International Semantic Web Conference (ISWC) 2015
Approach / cache-enabling stream reasoning
system architecture
- cache size is limited
- background ontology is preloaded
- size can be in terms of triples/graph
numbers
- reasoning and querying is constantly
executed
- historical data: original data and entailments
- cache manages historical data with cache
eviction policy
9
Presented on Oct 12, 2015 at Linked Science Workshop, International Semantic Web Conference (ISWC) 2015
Approach / historical data management step 1
- historical data management
- one of the nine requirements[5]
10
Presented on Oct 12, 2015 at Linked Science Workshop, International Semantic Web Conference (ISWC) 2015
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[5] Margara, Alessandro, et al. "Streaming the web: Reasoning over dynamic data." Web Semantics: Science, Services and Agents on the World Wide Web 25 (2014): 24-44.
Approach / historical data management step 2
11
Presented on Oct 12, 2015 at Linked Science Workshop, International Semantic Web Conference (ISWC) 2015
Discussion
- scenarios where historical data are needed
- anomaly detection
- trend identification
- historical data provides extra background
- multithreading can be leveraged
- split different tasks to different threads make the system respond fast
- but need to collaborate well: no eviction before query
- easy to realize continuous querying with a thread
- reduced the overhead of learning and applying other continuous
sparql (like C-SPARQL, which has a different execution model and
extra syntax)
- benefits of the semantics
- background ontology
- historical data management
12
Presented on Oct 12, 2015 at Linked Science Workshop, International Semantic Web Conference (ISWC) 2015
Future Work & Next Steps
- explore different cache eviction policies’ performances and effects on
the system, such as least frequently used, least recently used, first in
first out etc.
- the effects that expressiveness of the background ontology has on the
system in terms of reasoning, querying and evicting.
- evaluation methods to benchmark the system
13
Presented on Oct 12, 2015 at Linked Science Workshop, International Semantic Web Conference (ISWC) 2015
Acknowledgements
- The research described in the paper is part of the AIM Initiative at PNNL. It
was conducted under the Laboratory Directed Research and Development
(LDRD) program at PNNL, a multiprogram national laboratory operated by
Battelle for the U.S. Department of Energy under contract DE-AC06-76RLO
1830.
- Project page: http://aim.pnnl.gov/projects/hypothesis_reasoning.stm
14
Presented on Oct 12, 2015 at Linked Science Workshop, International Semantic Web Conference (ISWC) 2015
Q & A
Thank you!
15
Presented on Oct 12, 2015 at Linked Science Workshop, International Semantic Web Conference (ISWC) 2015

More Related Content

Similar to Rui Yan LISC 2015 slides

Hadoop/Spark Non-Technical Basics
Hadoop/Spark Non-Technical BasicsHadoop/Spark Non-Technical Basics
Hadoop/Spark Non-Technical BasicsZitao Liu
 
Ontologies and Linked Open Data in the LifeWatch Greece Research Infrastructure
Ontologies and Linked Open Data in theLifeWatch Greece Research InfrastructureOntologies and Linked Open Data in theLifeWatch Greece Research Infrastructure
Ontologies and Linked Open Data in the LifeWatch Greece Research Infrastructureymark_em
 
Christian Kreuzfeld – Static vs Dynamic Stream Processing
Christian Kreuzfeld – Static vs Dynamic Stream ProcessingChristian Kreuzfeld – Static vs Dynamic Stream Processing
Christian Kreuzfeld – Static vs Dynamic Stream ProcessingFlink Forward
 
A preliminary implementation of a content–aware network node
A preliminary implementation of a content–aware network nodeA preliminary implementation of a content–aware network node
A preliminary implementation of a content–aware network nodeAlpen-Adria-Universität
 
Vision for an academic research library as partner in campus-wide data manage...
Vision for an academic research library as partner in campus-wide data manage...Vision for an academic research library as partner in campus-wide data manage...
Vision for an academic research library as partner in campus-wide data manage...Plato L. Smith II
 
Introduction to Big data
Introduction to Big dataIntroduction to Big data
Introduction to Big datacthanopoulos
 
Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming Hypothesis Reasoning - William Smith, Jan 2016Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming Hypothesis Reasoning - William Smith, Jan 2016Seattle DAML meetup
 
Solera Networks @ Sharkfest 2008
Solera Networks @ Sharkfest 2008Solera Networks @ Sharkfest 2008
Solera Networks @ Sharkfest 2008Denny K
 
STORAGE GROWING FORECAST WITH BACULA BACKUP SOFTWARE CATALOG DATA MINING
STORAGE GROWING FORECAST WITH BACULA BACKUP SOFTWARE CATALOG DATA MININGSTORAGE GROWING FORECAST WITH BACULA BACKUP SOFTWARE CATALOG DATA MINING
STORAGE GROWING FORECAST WITH BACULA BACKUP SOFTWARE CATALOG DATA MININGcsandit
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Geoffrey Fox
 
Open Source Lambda Architecture for deep learning
Open Source Lambda Architecture for deep learningOpen Source Lambda Architecture for deep learning
Open Source Lambda Architecture for deep learningPatrick Nicolas
 
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...Databricks
 
What infrastructure is necessary for successful research data management (RDM...
What infrastructure is necessary for successful research data management (RDM...What infrastructure is necessary for successful research data management (RDM...
What infrastructure is necessary for successful research data management (RDM...heila1
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsMaribel Acosta Deibe
 
Alexander Vodyaho & Nataly Zhukova — Implementation of Agile Concepts in Reco...
Alexander Vodyaho & Nataly Zhukova — Implementation of Agile Concepts in Reco...Alexander Vodyaho & Nataly Zhukova — Implementation of Agile Concepts in Reco...
Alexander Vodyaho & Nataly Zhukova — Implementation of Agile Concepts in Reco...AIST
 
Cloud Testbeds for Standards Development and Innovation
Cloud Testbeds for Standards Development and InnovationCloud Testbeds for Standards Development and Innovation
Cloud Testbeds for Standards Development and InnovationAlan Sill
 
Ph.D Annual Report III
Ph.D Annual Report IIIPh.D Annual Report III
Ph.D Annual Report IIIMatteo Avalle
 
Smarter Data for Smarter Libraries
Smarter Data for Smarter LibrariesSmarter Data for Smarter Libraries
Smarter Data for Smarter LibrariesOCLC
 
Open Science Days 2014 - Becker - Repositories and Linked Data
Open Science Days 2014 - Becker - Repositories and Linked DataOpen Science Days 2014 - Becker - Repositories and Linked Data
Open Science Days 2014 - Becker - Repositories and Linked DataPascal-Nicolas Becker
 

Similar to Rui Yan LISC 2015 slides (20)

Hadoop/Spark Non-Technical Basics
Hadoop/Spark Non-Technical BasicsHadoop/Spark Non-Technical Basics
Hadoop/Spark Non-Technical Basics
 
Ontologies and Linked Open Data in the LifeWatch Greece Research Infrastructure
Ontologies and Linked Open Data in theLifeWatch Greece Research InfrastructureOntologies and Linked Open Data in theLifeWatch Greece Research Infrastructure
Ontologies and Linked Open Data in the LifeWatch Greece Research Infrastructure
 
Towards processing and reasoning streams of events in knowledge driven manufa...
Towards processing and reasoning streams of events in knowledge driven manufa...Towards processing and reasoning streams of events in knowledge driven manufa...
Towards processing and reasoning streams of events in knowledge driven manufa...
 
Christian Kreuzfeld – Static vs Dynamic Stream Processing
Christian Kreuzfeld – Static vs Dynamic Stream ProcessingChristian Kreuzfeld – Static vs Dynamic Stream Processing
Christian Kreuzfeld – Static vs Dynamic Stream Processing
 
A preliminary implementation of a content–aware network node
A preliminary implementation of a content–aware network nodeA preliminary implementation of a content–aware network node
A preliminary implementation of a content–aware network node
 
Vision for an academic research library as partner in campus-wide data manage...
Vision for an academic research library as partner in campus-wide data manage...Vision for an academic research library as partner in campus-wide data manage...
Vision for an academic research library as partner in campus-wide data manage...
 
Introduction to Big data
Introduction to Big dataIntroduction to Big data
Introduction to Big data
 
Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming Hypothesis Reasoning - William Smith, Jan 2016Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming Hypothesis Reasoning - William Smith, Jan 2016
 
Solera Networks @ Sharkfest 2008
Solera Networks @ Sharkfest 2008Solera Networks @ Sharkfest 2008
Solera Networks @ Sharkfest 2008
 
STORAGE GROWING FORECAST WITH BACULA BACKUP SOFTWARE CATALOG DATA MINING
STORAGE GROWING FORECAST WITH BACULA BACKUP SOFTWARE CATALOG DATA MININGSTORAGE GROWING FORECAST WITH BACULA BACKUP SOFTWARE CATALOG DATA MINING
STORAGE GROWING FORECAST WITH BACULA BACKUP SOFTWARE CATALOG DATA MINING
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 
Open Source Lambda Architecture for deep learning
Open Source Lambda Architecture for deep learningOpen Source Lambda Architecture for deep learning
Open Source Lambda Architecture for deep learning
 
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
 
What infrastructure is necessary for successful research data management (RDM...
What infrastructure is necessary for successful research data management (RDM...What infrastructure is necessary for successful research data management (RDM...
What infrastructure is necessary for successful research data management (RDM...
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of Endpoints
 
Alexander Vodyaho & Nataly Zhukova — Implementation of Agile Concepts in Reco...
Alexander Vodyaho & Nataly Zhukova — Implementation of Agile Concepts in Reco...Alexander Vodyaho & Nataly Zhukova — Implementation of Agile Concepts in Reco...
Alexander Vodyaho & Nataly Zhukova — Implementation of Agile Concepts in Reco...
 
Cloud Testbeds for Standards Development and Innovation
Cloud Testbeds for Standards Development and InnovationCloud Testbeds for Standards Development and Innovation
Cloud Testbeds for Standards Development and Innovation
 
Ph.D Annual Report III
Ph.D Annual Report IIIPh.D Annual Report III
Ph.D Annual Report III
 
Smarter Data for Smarter Libraries
Smarter Data for Smarter LibrariesSmarter Data for Smarter Libraries
Smarter Data for Smarter Libraries
 
Open Science Days 2014 - Becker - Repositories and Linked Data
Open Science Days 2014 - Becker - Repositories and Linked DataOpen Science Days 2014 - Becker - Repositories and Linked Data
Open Science Days 2014 - Becker - Repositories and Linked Data
 

Rui Yan LISC 2015 slides

  • 1. Towards Smart Cache Management for Ontology Based, History-Aware Stream Reasoning Rui Yan, Deborah L. McGuinness Tetherless World Constellation Department of Computer Science Rensselaer Polytechnic Institute Presented on Oct 12, 2015 at Linked Science Workshop, International Semantic Web Conference (ISWC) 2015 Brenda Praggastis, William P. Smith Pacific Northwest National Laboratory, Richland, WA, USA
  • 2. Contents 1. Introduction a. stream reasoning b. examples of the existing stream reasoning systems 2. Approach a. motivated use case b. why cache c. cache v.s. window d. historical data management 3. Discussion 4. Future work 2 Presented on Oct 12, 2015 at Linked Science Workshop, International Semantic Web Conference (ISWC) 2015
  • 3. Introduction / stream reasoning - RDF streams [1] - streaming data modeled in RDF - linked data principles - Data stream processing systems - Semantic reasoning - Stream reasoning [2] ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [1] Barbieri, Davide F., and E. D. Valle. "A proposal for publishing data streams as linked data." Linked Data on the Web Workshop. 2010. [2] Della Valle, Emanuele, et al. A first step towards stream reasoning. Springer Berlin Heidelberg, 2009. Presented on Oct 12, 2015 at Linked Science Workshop, International Semantic Web Conference (ISWC) 2015 3
  • 4. Introduction / examples of the existing systems - Existing stream reasoning systems - C-SPARQL [3] - continuous SPARQL, an extension of the standard SPARQL - window-based system - RDF data are stamped with timepoints - process RDF streams - EP-SPARQL [4] - event processing SPARQL, an extension of the standard SPARQL - window-based system - RDF data are stamped with time intervals - detect complex event patterns ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [3] Barbieri, Davide Francesco, et al. "C-SPARQL: SPARQL for continuous querying." Proceedings of the 18th international conference on World wide web. ACM, 2009. [4] Anicic, Darko, et al. "EP-SPARQL: a unified language for event processing and stream reasoning." Proceedings of the 20th international conference on World wide web. ACM, 201 Presented on Oct 12, 2015 at Linked Science Workshop, International Semantic Web Conference (ISWC) 2015 4
  • 5. Approach / motivated use case Motivated Use Case: - Nuclear Magnetic Resonance (NMR) 5 Presented on Oct 12, 2015 at Linked Science Workshop, International Semantic Web Conference (ISWC) 2015
  • 6. 6
  • 7. Approach / background ontology Background ontology - 30 different compounds are encoded with their unique frequency ranges - these compounds are sourced from Human Metabolome Database1 - all metabolites (small molecules) that are found in human urine and/or blood plasma 7 Presented on Oct 12, 2015 at Linked Science Workshop, International Semantic Web Conference (ISWC) 2015 1. http://www.hmdb.ca/
  • 8. Approach / introducing the cache What & Why cache ? - memory-based or disk-based - identify & store interesting portion of the streaming data - cache management policy - historical data management a cache v.s. a window: 8 Presented on Oct 12, 2015 at Linked Science Workshop, International Semantic Web Conference (ISWC) 2015
  • 9. Approach / cache-enabling stream reasoning system architecture - cache size is limited - background ontology is preloaded - size can be in terms of triples/graph numbers - reasoning and querying is constantly executed - historical data: original data and entailments - cache manages historical data with cache eviction policy 9 Presented on Oct 12, 2015 at Linked Science Workshop, International Semantic Web Conference (ISWC) 2015
  • 10. Approach / historical data management step 1 - historical data management - one of the nine requirements[5] 10 Presented on Oct 12, 2015 at Linked Science Workshop, International Semantic Web Conference (ISWC) 2015 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [5] Margara, Alessandro, et al. "Streaming the web: Reasoning over dynamic data." Web Semantics: Science, Services and Agents on the World Wide Web 25 (2014): 24-44.
  • 11. Approach / historical data management step 2 11 Presented on Oct 12, 2015 at Linked Science Workshop, International Semantic Web Conference (ISWC) 2015
  • 12. Discussion - scenarios where historical data are needed - anomaly detection - trend identification - historical data provides extra background - multithreading can be leveraged - split different tasks to different threads make the system respond fast - but need to collaborate well: no eviction before query - easy to realize continuous querying with a thread - reduced the overhead of learning and applying other continuous sparql (like C-SPARQL, which has a different execution model and extra syntax) - benefits of the semantics - background ontology - historical data management 12 Presented on Oct 12, 2015 at Linked Science Workshop, International Semantic Web Conference (ISWC) 2015
  • 13. Future Work & Next Steps - explore different cache eviction policies’ performances and effects on the system, such as least frequently used, least recently used, first in first out etc. - the effects that expressiveness of the background ontology has on the system in terms of reasoning, querying and evicting. - evaluation methods to benchmark the system 13 Presented on Oct 12, 2015 at Linked Science Workshop, International Semantic Web Conference (ISWC) 2015
  • 14. Acknowledgements - The research described in the paper is part of the AIM Initiative at PNNL. It was conducted under the Laboratory Directed Research and Development (LDRD) program at PNNL, a multiprogram national laboratory operated by Battelle for the U.S. Department of Energy under contract DE-AC06-76RLO 1830. - Project page: http://aim.pnnl.gov/projects/hypothesis_reasoning.stm 14 Presented on Oct 12, 2015 at Linked Science Workshop, International Semantic Web Conference (ISWC) 2015
  • 15. Q & A Thank you! 15 Presented on Oct 12, 2015 at Linked Science Workshop, International Semantic Web Conference (ISWC) 2015