Flexible querying of graph data          Graph processing room          FOSDEM, 2 Feb 2013                Petra Selmer    ...
Introduction       I shall be presenting my PhD topic which involves       a declarative query language allowing for the ...
Agenda     Who (am I)?     Why (the motivation)?     Some background info     What (is the query language and what    ...
Who?     Petra Selmer     Part-time PhD student:       Birkbeck College, University of London       Prof. Alexandra Po...
Why?     Amount of graph-structured data is      growing fast     The structure of this data is      becoming more compl...
Why?     Users of such systems may not be familiar with the underlying data      structure: available paths etc     The ...
Background: Ontologies Currently part of the Semantic Web stack (Tim Berners-  Lee, RDF, triple stores) Models a domain ...
What? Data model: G = (V, E)   Very general model   V : vertices (or nodes); each labelled with some    constant   E :...
What? Conjunctive regular path queries:   This is where the graphs paths to be traversed are expressed with a    regular...
What? Approximation allows for the approximate matching  of labels in the path An edit operation is applied to each edge...
What?      Relaxation is applied by using inference      rules from an ontology (if one exists).       Achieved by apply...
What?      Answers are ranked according to how       closely they match the original query;       higher-cost answers hav...
Example – ‘Lifelong learner metadata’     sc History13
sc History14
 Query: “What work positions can I reach, having a degree in English”?        Y = the episode; Z = the job     (?Y, ?Z) ...
 Query: “What work positions can I reach, having a degree in English”?        Y = the episode; Z = the job     (?Y, ?Z) ...
 Allowing query approximation can yield some answers:      Replacing the edge label prereq by next, at an edit cost of 1...
 Allowing query approximation can yield some answers:    Replacing the edge label prereq by next, at an edit cost of 1, ...
sc     History19
   Query: “What jobs are open to me if I study English, or something similar, at University”?     (?Y, ?Z)          (?X,...
   Query: “What jobs are open to me if I study English, or something similar, at         University”?     (?Y, ?Z)      ...
How?      Theory       Construction of a weighted non-deterministic finite        automaton (NFA) to represent the regul...
How?      Implementation of prototype        Graph database: DEX (http://www.sparsity-         technologies.com/dex)    ...
Any questions?     Thank you for your attention!                      petra.selmer.uk@gmail.com24
Upcoming SlideShare
Loading in …5
×

Fosdem 2013 petra selmer flexible querying of graph data

1,136 views

Published on

These are the slides from a talk I presented at the Graph Processing room at FOSDEM 2013, in which I discussed my PhD topic: a query language allowing for the flexible querying of complex paths within graph structured data

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,136
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
21
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Fosdem 2013 petra selmer flexible querying of graph data

  1. 1. Flexible querying of graph data Graph processing room FOSDEM, 2 Feb 2013 Petra Selmer petra.selmer.uk@gmail.com http://www.dcs.bbk.ac.uk/~lselm01/
  2. 2. Introduction  I shall be presenting my PhD topic which involves a declarative query language allowing for the flexible querying of graph-structured data with complex paths.2
  3. 3. Agenda  Who (am I)?  Why (the motivation)?  Some background info  What (is the query language and what can it do)?  Illustrative examples  How (is it done)?3
  4. 4. Who?  Petra Selmer  Part-time PhD student:  Birkbeck College, University of London  Prof. Alexandra Poulovassilis  Dr. Peter T. Wood  Software Architect:  University College London’s Institute of Neurology (Wellcome Trust Centre for Neuroimaging)4
  5. 5. Why?  Amount of graph-structured data is growing fast  The structure of this data is becoming more complex, especially when multiple, heterogeneous data sources are integrated together  The structure of the data is also always subject to change...5
  6. 6. Why?  Users of such systems may not be familiar with the underlying data structure: available paths etc  The user may not be able to obtain meaningful answers (or indeed, any answers) from the data IF the querying system is limited to exact matching of users’ queries  Also, the user may wish to explore the data by starting from a set of initial answers and proceeding from there  The user may additionally wish to derive some intelligence from the connections.... The data The query The user6
  7. 7. Background: Ontologies Currently part of the Semantic Web stack (Tim Berners- Lee, RDF, triple stores) Models a domain of interest: inferences, reasoning... It can be thought of as a “schema” for graph data The following inference rules are included (among others):  Subclass: ‘History’, ‘Languages’ are subclasses of ‘Humanities’  Subproperty, Domain, Range...7
  8. 8. What? Data model: G = (V, E)  Very general model  V : vertices (or nodes); each labelled with some constant  E : directed, labelled edges; labels drawn from an alphabet {Ʃ U ‘type’} The query language is called Flex-It (it is declarative) The basis is that of conjunctive regular path queries There are two operators which may be applied to the original query8
  9. 9. What? Conjunctive regular path queries:  This is where the graphs paths to be traversed are expressed with a regular expression A single regular path query conjunct: (X, R, Y)  X, Y: either constants or variables  R: the regular expression “Conjunctive”: joining multiple conjuncts; e.g. (X, R1, Y), (Y, R2, Z), (Z, R3, A)  The Y’s are matched, the Z’s are matched etc 1) (N1, n+, ?Y): n n p • Y = N2, N3 N1 N2 N3 N4 2) (N1, n*p, ?Y): • Y = N49
  10. 10. What? Approximation allows for the approximate matching of labels in the path An edit operation is applied to each edge label in the path denoted by the regular expression:  Edit operations: insertions, deletions, inversions, substitutions and transpositions of labels  Each operation has a ‘cost’: usually 1 Example:  Query conjunct: (X, a*.b, Y)  R = a*.b [answers returned at cost 0]  R’ = p.a*.b (insertion of ‘p’) [answers returned at cost 1]  R’’ = p.a*.b- (inversion of ‘b’) [answers returned at cost 2]10
  11. 11. What?  Relaxation is applied by using inference rules from an ontology (if one exists).  Achieved by applying logical relaxation of the query conditions using the data’s ontology definition  Relaxation operations: subclass, subproperty, domain and range  Each operation has a ‘cost’ – usually 1  Example:  We have an ontology:  Humanities (superclass)  Languages and History (subclasses of Humanities)  Assume our query states Languages may be relaxed  Languages is relaxed to Humanities:  Instances of Languages will be returned at cost 0  Instances of History will be returned at cost 111
  12. 12. What?  Answers are ranked according to how closely they match the original query; higher-cost answers have a lower ranking  All answers at a certain distance d are ranked the same and returned before answers at a higher distance  We allow for incremental execution: exact answers returned first; then answers at distance 1; ...12
  13. 13. Example – ‘Lifelong learner metadata’ sc History13
  14. 14. sc History14
  15. 15.  Query: “What work positions can I reach, having a degree in English”?  Y = the episode; Z = the job (?Y, ?Z)  (?X, type, University), (?X, qualif.type, EnglishStudies), (?X, prereq+, ?Y), (?Y, type, Work), (?Y, job.type, ?Z)15
  16. 16.  Query: “What work positions can I reach, having a degree in English”?  Y = the episode; Z = the job (?Y, ?Z)  (?X, type, University), (?X, qualif.type, EnglishStudies), (?X, prereq+, ?Y), (?Y, type, Work), (?Y, job.type, ?Z)  No results from User 2 will be returned...even though it is relevant!16
  17. 17.  Allowing query approximation can yield some answers:  Replacing the edge label prereq by next, at an edit cost of 1, we get this variant of the query: (?Y, ?Z)  (?X, type, University), (?X, qualif.type, EnglishStudies), APPROX(?X, prereq+, ?Y), (?Y, type, Work), (?Y, job.type, ?Z)  prereq+ can be approximated by next.prereq* at edit distance 1:  Result: Y = ep22, Z = AirTravelAssistant17
  18. 18.  Allowing query approximation can yield some answers:  Replacing the edge label prereq by next, at an edit cost of 1, we get this variant of the query: (?Y, ?Z)  (?X, type, University), (?X, qualif.type, EnglishStudies), APPROX(?X, prereq+, ?Y), (?Y, type, Work), (?Y, job.type, ?Z)  next.prereq* can be approximated by next.next.prereq*, now at edit distance 2:  Results:  Y = ep23, Z = Journalist  Y = ep24, Z = AssistantEditor18
  19. 19. sc History19
  20. 20.  Query: “What jobs are open to me if I study English, or something similar, at University”? (?Y, ?Z)  (?X, type, University), (?X, qualif, ?D), RELAX (?D, type, EnglishStudies), APPROX (?X, prereq+, ?Y), (?Y, type, Work), (?Y, job.type, ?Z)  In addition to the answers (from User 2) obtained by the previous query, we now also have answers from the timeline of User 3  prereq+ can be approximated by next.prereq* (distance 1) and EnglishStudies can be relaxed – via Languages - to Humanities (distance 2), encompassing History  Result: Y = ep32, Z = PersonalAssistant (distance of 3 from original query)20
  21. 21.  Query: “What jobs are open to me if I study English, or something similar, at University”? (?Y, ?Z)  (?X, type, University), (?X, qualif, ?D), RELAX (?D, type, EnglishStudies), APPROX (?X, prereq+, ?Y), (?Y, type, Work), (?Y, job.type, ?Z)  next.prereq* can be approximated by next.next.prereq* (distance 2), with EnglishStudies again relaxed to Humanities (distance 2)  Results: (both at distance 4 from the original query)  Y = ep33, Z = Author  Y = e34, Z = AssociateEditor21
  22. 22. How?  Theory  Construction of a weighted non-deterministic finite automaton (NFA) to represent the regular expression  We apply new states and transitions to the NFA to represent the approximation and relaxation operations  Formation of a product automaton: NFA with data graph G  We perform a lowest cost path traversal of the product automaton; construct query tree, do joins etc  Polynomial time complexity  Correctness of algorithms proven22
  23. 23. How?  Implementation of prototype  Graph database: DEX (http://www.sparsity- technologies.com/dex)  Programming language: C#  Further work  New flexible operation combining APPROX and RELAX  FLEX  Optimisation!23
  24. 24. Any questions? Thank you for your attention! petra.selmer.uk@gmail.com24

×