SlideShare a Scribd company logo
1 of 27
Download to read offline
ComputableFacts
Cyrille SAVELIEF – csavelief@mncc.fr
Accumulo Summit 2018
MNCC 1
The Constraints
Security
Data Engineering
Knowledge
Engineering
2
The Problem
How do you explore large and heterogeneous
document collections?
• For example, in the analysis of an NGO dataset one may
be interested in the following two questions:
• In which regions of the world do the reports mention the word
“refugee” more often?
• What kind of events do the reports refer to when they talk about
“refugee”?
3
Shortcoming of Keywords Search
4
Keywords
selection
is hard!
False
positives
False
negatives
Our Approach :
Ontology Based Data Access
5
Index Facts
Raw DataMetadata
Accumulo
OntologyQUERY
Not a simple search engine!
6
Tag names, concepts or key phrases and
produce NER training data in record time!
7
Complex queries made easy!
8
The Collection &
Integration Processes
9
Data Collection
10
Data Integration
11
Data Model
12
Enriched
Document
• Structured fields
• Unstructured text fields
• Structured text fields
13
Enriched
Document
• Structured fields
• Unstructured text fields
• Structured text fields
14
Enriched
Document
• Structured fields
• Unstructured text fields
• Structured text fields
15
Convert JSON to RDF triples
16
Predicate ObjectSubject
Data Model = Rya + GraphBLAS
17
Table
Stored Triple
D4M Equivalent
RowId CF CQ
PSO person|32490c02-614d-… Ø Frédéric Colin Ø
SOP 32490c02-614d-… Ø Frédéric Colin|person Edge
OPS Frédéric Colin|person Ø 32490c02-614d-… EdgeTranspose
OPD Frédéric Colin|person Ø Ø EdgeTransposeDegree
Text 32490c02-614d-… Ø Ø EdgeText
https://rya.apache.org http://graphblas.org
Patterns to Table Scans
Pattern Table to Scan
(S, P, O) Any Table
(S, P, *) PSO
(S, *, O) SOP
(*, P, O) OPS
(S, *, *) SOP
(*, P, *) PSO
(*, *, O) OPS
(*, *, *) Full Table Scan
• Any pattern can be translated
into a scan of one of these 3
tables:
• PSO = (Predicate, Subject, Object)
• SOP = (Subject, Object, Predicate)
• OPS = (Object, Subject, Predicate)
18
Anatomy of a
conjunctive query
19
AND
person:
Daniel
rendez-
vous
SeekingFilter
RemoteSourceIterator
• We keep track of the lowest
and highest Subjects for each
entry of the OPD table.
• We compute the intersecting
range of Subjects between the
left and right nodes and use it
as our boundaries.
• We add a SeekingFilter in order
to ensure we won’t exceed the
boundaries.
RemoteWriteIterator
Mutations
TwoTableIterator
SeekingFilter
Hutchison & al., Graphulo Implementation of Server-Side Sparse Matrix Multiply in the Accumulo Database, 2015
Back to ontologies…
• Why adding an ontology layer on top of your data?
• It enriches the vocabulary.
• It allows to infer new facts not explicitely stored.
• It provides a unified view of multiples sources.
20
An example of Datalog query
21
• Given the fact base below, what is the answer to the
following query ? :- hadMeeting("cyrille", Y)
An example of Datalog query
22
• Given the fact base below, what is the answer to the
following query ? :- hadMeeting("pierre", Y)
Here comes Existential Rules!
• Graal gives you the ability to assert the existence of
unknown entities : http://graphik-team.github.io/graal
• Graal allows you to rewrite any query as an union of
conjunctive queries:
Matrix multiply!
23M.-L. Mugnier, Reasoning on Data: The Ontology-Mediated Query Answering Problem, 2018
An example of Existential query
24
• Given the fact base below, what is the answer to the
following query ? :- hadMeeting("cyrille", Y)
An example of Existential query
25
• Given the fact base below, what is the answer to the
following query ? :- hadMeeting("pierre", Y)
Any questions?
26
Thank you for
listening!
Adresse:
ComputableFacts
178, bd Haussmann
75008 Paris

More Related Content

What's hot

Exposing the data from NARCIS with VIVO
Exposing the data from NARCIS with VIVOExposing the data from NARCIS with VIVO
Exposing the data from NARCIS with VIVOChristophe Guéret
 
Let's downscale the semantic web !
Let's downscale the semantic web !Let's downscale the semantic web !
Let's downscale the semantic web !Christophe Guéret
 
Big data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantBig data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantStuart Miniman
 
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...Big Data Value Association
 
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldBig Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldDez Blanchfield
 
Big Data Projects Research Ideas
Big Data Projects Research IdeasBig Data Projects Research Ideas
Big Data Projects Research IdeasMatlab Simulation
 
Research Topics in Data Mining
Research Topics in Data MiningResearch Topics in Data Mining
Research Topics in Data MiningPhdtopiccom
 
Big data in Food sector
Big data in Food sectorBig data in Food sector
Big data in Food sectorShamim Hossain
 
Hadoop Training Tutorial for Freshers
Hadoop Training Tutorial for FreshersHadoop Training Tutorial for Freshers
Hadoop Training Tutorial for Freshersrajkamaltibacademy
 
Applied Data Science Course Part 1: Concepts & your first ML model
Applied Data Science Course Part 1: Concepts & your first ML modelApplied Data Science Course Part 1: Concepts & your first ML model
Applied Data Science Course Part 1: Concepts & your first ML modelDataiku
 
Mastering in Data Warehousing and Business Intelligence
Mastering in Data Warehousing and Business IntelligenceMastering in Data Warehousing and Business Intelligence
Mastering in Data Warehousing and Business IntelligenceEdureka!
 
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...KamleshKumar394
 
Data Mining: A Short Survey
Data Mining: A Short SurveyData Mining: A Short Survey
Data Mining: A Short SurveyArvin Jenabi
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsPetr Novotný
 
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph TechnologyOracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph TechnologyInfiniteGraph
 

What's hot (20)

Exposing the data from NARCIS with VIVO
Exposing the data from NARCIS with VIVOExposing the data from NARCIS with VIVO
Exposing the data from NARCIS with VIVO
 
Let's downscale the semantic web !
Let's downscale the semantic web !Let's downscale the semantic web !
Let's downscale the semantic web !
 
Big data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantBig data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You Want
 
Data Skills for Digital Era
Data Skills for Digital EraData Skills for Digital Era
Data Skills for Digital Era
 
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
 
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldBig Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
 
Big Data Projects Research Ideas
Big Data Projects Research IdeasBig Data Projects Research Ideas
Big Data Projects Research Ideas
 
Pre processing big data
Pre processing big dataPre processing big data
Pre processing big data
 
Research Topics in Data Mining
Research Topics in Data MiningResearch Topics in Data Mining
Research Topics in Data Mining
 
Big data in Food sector
Big data in Food sectorBig data in Food sector
Big data in Food sector
 
Bigdata
BigdataBigdata
Bigdata
 
Hadoop Training Tutorial for Freshers
Hadoop Training Tutorial for FreshersHadoop Training Tutorial for Freshers
Hadoop Training Tutorial for Freshers
 
Applied Data Science Course Part 1: Concepts & your first ML model
Applied Data Science Course Part 1: Concepts & your first ML modelApplied Data Science Course Part 1: Concepts & your first ML model
Applied Data Science Course Part 1: Concepts & your first ML model
 
Mastering in Data Warehousing and Business Intelligence
Mastering in Data Warehousing and Business IntelligenceMastering in Data Warehousing and Business Intelligence
Mastering in Data Warehousing and Business Intelligence
 
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
 
Data Mining: A Short Survey
Data Mining: A Short SurveyData Mining: A Short Survey
Data Mining: A Short Survey
 
Data Mining: Key definitions
Data Mining: Key definitionsData Mining: Key definitions
Data Mining: Key definitions
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big Graphs
 
Data science
Data scienceData science
Data science
 
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph TechnologyOracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
 

Similar to ComputableFacts Ontology Based Data Access

Solving the Disconnected Data Problem in Healthcare Using MongoDB
Solving the Disconnected Data Problem in Healthcare Using MongoDBSolving the Disconnected Data Problem in Healthcare Using MongoDB
Solving the Disconnected Data Problem in Healthcare Using MongoDBMongoDB
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
State of Florida Neo4J Graph Briefing - Keynote
State of Florida Neo4J Graph Briefing - KeynoteState of Florida Neo4J Graph Briefing - Keynote
State of Florida Neo4J Graph Briefing - KeynoteNeo4j
 
Keynote: Graphs in Government_Lance Walter, CMO
Keynote:  Graphs in Government_Lance Walter, CMOKeynote:  Graphs in Government_Lance Walter, CMO
Keynote: Graphs in Government_Lance Walter, CMONeo4j
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataAndre Freitas
 
Promote the Good of the People of the United Kingdom by Maintaining Monetary ...
Promote the Good of the People of the United Kingdom by Maintaining Monetary ...Promote the Good of the People of the United Kingdom by Maintaining Monetary ...
Promote the Good of the People of the United Kingdom by Maintaining Monetary ...DataWorks Summit
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementTony Bain
 
A gentle introduction to relational learning
A gentle introduction to relational learning A gentle introduction to relational learning
A gentle introduction to relational learning Nikolaos Vasiloglou
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationDenodo
 
01-Introduction.pdf
01-Introduction.pdf01-Introduction.pdf
01-Introduction.pdfngVnThng12
 
Introduction to the FP7 CODE project @ BDBC
Introduction to the FP7 CODE project @ BDBCIntroduction to the FP7 CODE project @ BDBC
Introduction to the FP7 CODE project @ BDBCFlorian Stegmaier
 
Tutorial Data Management and workflows
Tutorial Data Management and workflowsTutorial Data Management and workflows
Tutorial Data Management and workflowsSSSW
 
II-SDV 2012 Text Mining, Term Mining and Visualization - Improving the Impac...
II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impac...II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impac...
II-SDV 2012 Text Mining, Term Mining and Visualization - Improving the Impac...Dr. Haxel Consult
 
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...Access Innovations, Inc.
 
Emcien overview v6 01282013
Emcien overview v6 01282013Emcien overview v6 01282013
Emcien overview v6 01282013WCJones6348
 
BigData-Challenges.pptx
BigData-Challenges.pptxBigData-Challenges.pptx
BigData-Challenges.pptxamanyosama12
 

Similar to ComputableFacts Ontology Based Data Access (20)

Solving the Disconnected Data Problem in Healthcare Using MongoDB
Solving the Disconnected Data Problem in Healthcare Using MongoDBSolving the Disconnected Data Problem in Healthcare Using MongoDB
Solving the Disconnected Data Problem in Healthcare Using MongoDB
 
Big Data Overview
Big Data OverviewBig Data Overview
Big Data Overview
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
State of Florida Neo4J Graph Briefing - Keynote
State of Florida Neo4J Graph Briefing - KeynoteState of Florida Neo4J Graph Briefing - Keynote
State of Florida Neo4J Graph Briefing - Keynote
 
Keynote: Graphs in Government_Lance Walter, CMO
Keynote:  Graphs in Government_Lance Walter, CMOKeynote:  Graphs in Government_Lance Walter, CMO
Keynote: Graphs in Government_Lance Walter, CMO
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
 
Promote the Good of the People of the United Kingdom by Maintaining Monetary ...
Promote the Good of the People of the United Kingdom by Maintaining Monetary ...Promote the Good of the People of the United Kingdom by Maintaining Monetary ...
Promote the Good of the People of the United Kingdom by Maintaining Monetary ...
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow Tutorial
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
 
A gentle introduction to relational learning
A gentle introduction to relational learning A gentle introduction to relational learning
A gentle introduction to relational learning
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
01-Introduction.pdf
01-Introduction.pdf01-Introduction.pdf
01-Introduction.pdf
 
Misceb intro2014
Misceb intro2014Misceb intro2014
Misceb intro2014
 
Data literacy
Data literacyData literacy
Data literacy
 
Introduction to the FP7 CODE project @ BDBC
Introduction to the FP7 CODE project @ BDBCIntroduction to the FP7 CODE project @ BDBC
Introduction to the FP7 CODE project @ BDBC
 
Tutorial Data Management and workflows
Tutorial Data Management and workflowsTutorial Data Management and workflows
Tutorial Data Management and workflows
 
II-SDV 2012 Text Mining, Term Mining and Visualization - Improving the Impac...
II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impac...II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impac...
II-SDV 2012 Text Mining, Term Mining and Visualization - Improving the Impac...
 
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
 
Emcien overview v6 01282013
Emcien overview v6 01282013Emcien overview v6 01282013
Emcien overview v6 01282013
 
BigData-Challenges.pptx
BigData-Challenges.pptxBigData-Challenges.pptx
BigData-Challenges.pptx
 

Recently uploaded

software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxnada99848
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 

Recently uploaded (20)

software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptx
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 

ComputableFacts Ontology Based Data Access

  • 1. ComputableFacts Cyrille SAVELIEF – csavelief@mncc.fr Accumulo Summit 2018 MNCC 1
  • 3. The Problem How do you explore large and heterogeneous document collections? • For example, in the analysis of an NGO dataset one may be interested in the following two questions: • In which regions of the world do the reports mention the word “refugee” more often? • What kind of events do the reports refer to when they talk about “refugee”? 3
  • 4. Shortcoming of Keywords Search 4 Keywords selection is hard! False positives False negatives
  • 5. Our Approach : Ontology Based Data Access 5 Index Facts Raw DataMetadata Accumulo OntologyQUERY
  • 6. Not a simple search engine! 6
  • 7. Tag names, concepts or key phrases and produce NER training data in record time! 7
  • 13. Enriched Document • Structured fields • Unstructured text fields • Structured text fields 13
  • 14. Enriched Document • Structured fields • Unstructured text fields • Structured text fields 14
  • 15. Enriched Document • Structured fields • Unstructured text fields • Structured text fields 15
  • 16. Convert JSON to RDF triples 16 Predicate ObjectSubject
  • 17. Data Model = Rya + GraphBLAS 17 Table Stored Triple D4M Equivalent RowId CF CQ PSO person|32490c02-614d-… Ø Frédéric Colin Ø SOP 32490c02-614d-… Ø Frédéric Colin|person Edge OPS Frédéric Colin|person Ø 32490c02-614d-… EdgeTranspose OPD Frédéric Colin|person Ø Ø EdgeTransposeDegree Text 32490c02-614d-… Ø Ø EdgeText https://rya.apache.org http://graphblas.org
  • 18. Patterns to Table Scans Pattern Table to Scan (S, P, O) Any Table (S, P, *) PSO (S, *, O) SOP (*, P, O) OPS (S, *, *) SOP (*, P, *) PSO (*, *, O) OPS (*, *, *) Full Table Scan • Any pattern can be translated into a scan of one of these 3 tables: • PSO = (Predicate, Subject, Object) • SOP = (Subject, Object, Predicate) • OPS = (Object, Subject, Predicate) 18
  • 19. Anatomy of a conjunctive query 19 AND person: Daniel rendez- vous SeekingFilter RemoteSourceIterator • We keep track of the lowest and highest Subjects for each entry of the OPD table. • We compute the intersecting range of Subjects between the left and right nodes and use it as our boundaries. • We add a SeekingFilter in order to ensure we won’t exceed the boundaries. RemoteWriteIterator Mutations TwoTableIterator SeekingFilter Hutchison & al., Graphulo Implementation of Server-Side Sparse Matrix Multiply in the Accumulo Database, 2015
  • 20. Back to ontologies… • Why adding an ontology layer on top of your data? • It enriches the vocabulary. • It allows to infer new facts not explicitely stored. • It provides a unified view of multiples sources. 20
  • 21. An example of Datalog query 21 • Given the fact base below, what is the answer to the following query ? :- hadMeeting("cyrille", Y)
  • 22. An example of Datalog query 22 • Given the fact base below, what is the answer to the following query ? :- hadMeeting("pierre", Y)
  • 23. Here comes Existential Rules! • Graal gives you the ability to assert the existence of unknown entities : http://graphik-team.github.io/graal • Graal allows you to rewrite any query as an union of conjunctive queries: Matrix multiply! 23M.-L. Mugnier, Reasoning on Data: The Ontology-Mediated Query Answering Problem, 2018
  • 24. An example of Existential query 24 • Given the fact base below, what is the answer to the following query ? :- hadMeeting("cyrille", Y)
  • 25. An example of Existential query 25 • Given the fact base below, what is the answer to the following query ? :- hadMeeting("pierre", Y)
  • 26. Any questions? 26 Thank you for listening!