This document discusses an approach to exploring large heterogeneous document collections using ontology-based data access. It describes indexing facts from raw data and metadata into Accumulo and using an ontology to allow complex queries. The approach converts documents into enriched documents with structured and unstructured fields, converts them to RDF triples, and stores them in a graph database using Rya and GraphBLAS. It explains how conjunctive queries can be executed over the graph database and how existential rules can infer new facts.
3. The Problem
How do you explore large and heterogeneous
document collections?
• For example, in the analysis of an NGO dataset one may
be interested in the following two questions:
• In which regions of the world do the reports mention the word
“refugee” more often?
• What kind of events do the reports refer to when they talk about
“refugee”?
3
18. Patterns to Table Scans
Pattern Table to Scan
(S, P, O) Any Table
(S, P, *) PSO
(S, *, O) SOP
(*, P, O) OPS
(S, *, *) SOP
(*, P, *) PSO
(*, *, O) OPS
(*, *, *) Full Table Scan
• Any pattern can be translated
into a scan of one of these 3
tables:
• PSO = (Predicate, Subject, Object)
• SOP = (Subject, Object, Predicate)
• OPS = (Object, Subject, Predicate)
18
19. Anatomy of a
conjunctive query
19
AND
person:
Daniel
rendez-
vous
SeekingFilter
RemoteSourceIterator
• We keep track of the lowest
and highest Subjects for each
entry of the OPD table.
• We compute the intersecting
range of Subjects between the
left and right nodes and use it
as our boundaries.
• We add a SeekingFilter in order
to ensure we won’t exceed the
boundaries.
RemoteWriteIterator
Mutations
TwoTableIterator
SeekingFilter
Hutchison & al., Graphulo Implementation of Server-Side Sparse Matrix Multiply in the Accumulo Database, 2015
20. Back to ontologies…
• Why adding an ontology layer on top of your data?
• It enriches the vocabulary.
• It allows to infer new facts not explicitely stored.
• It provides a unified view of multiples sources.
20
21. An example of Datalog query
21
• Given the fact base below, what is the answer to the
following query ? :- hadMeeting("cyrille", Y)
22. An example of Datalog query
22
• Given the fact base below, what is the answer to the
following query ? :- hadMeeting("pierre", Y)
23. Here comes Existential Rules!
• Graal gives you the ability to assert the existence of
unknown entities : http://graphik-team.github.io/graal
• Graal allows you to rewrite any query as an union of
conjunctive queries:
Matrix multiply!
23M.-L. Mugnier, Reasoning on Data: The Ontology-Mediated Query Answering Problem, 2018
24. An example of Existential query
24
• Given the fact base below, what is the answer to the
following query ? :- hadMeeting("cyrille", Y)
25. An example of Existential query
25
• Given the fact base below, what is the answer to the
following query ? :- hadMeeting("pierre", Y)