Mulgara

1,251 views
1,189 views

Published on

1 Comment
2 Likes
Statistics
Notes
No Downloads
Views
Total views
1,251
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
13
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide

Mulgara

  1. 1. Mulgara Open Source Semantic Web Paul Gearon gearon@computer.org
  2. 2. Mulgara RDF Database Open Source Written in Java Over 320,000 lines of code
  3. 3. Math in Mulgara Graphs and Trees (Chapter 5) Recursive Algorithms (Chapter 2) Functions (Chapter 4) Formal Languages and Algebras (Chapter 8) Graph Algorithms (Chapter 6) Prolog, Rules, Logic (Chapter 1) Sets (Chapter 3)
  4. 4. Math in Mulgara Graphs and Trees (Chapter 5) Recursive Algorithms (Chapter 2) Functions (Chapter 4) Formal Languages and Algebras (Chapter 8) Graph Algorithms (Chapter 6) Prolog, Rules, Logic (Chapter 1) Sets (Chapter 3) All programming uses Boolean Logic (Chapter 7)
  5. 5. RDF A simple Description Logic (Chapter 1) Provides structure for data in the Semantic Web Simple data format of binary predicates, or Triples Triples combine to form a directed graph
  6. 6. RDF Simple Describes schemas, ontologies, and instance data Foundation for complex logic systems like OWL Describes relationships between arbitrary things Forms a graph (Chapter 5) Can be used to describe anything
  7. 7. RDF Triples :David :knows :Paul :knows(:David, :Paul) :knows :David :Paul
  8. 8. RDF Graph :Person rdf:type rdf:type :knows :David :Paul :email :fullname :title mailto:david@example.com Dr David Smith
  9. 9. RDF Graph :knows rdfs:domain rdfs:range :email rdfs:domain :Person rdfs:domain :fullname rdfs:domain rdf:type rdf:type :knows :title :David :Paul :email :fullname :title mailto:david@example.com Dr David Smith
  10. 10. Storage Speed Fast storage Quickly find what we want
  11. 11. Storage Speed Fast storage Quickly find what we want Index the data
  12. 12. Persistent Storage Must be efficient in space Smaller means more data Smaller means faster read/write Must support re-writable data Indexed, rewritable data usually means regular sized data blocks
  13. 13. One Approach Map URIs and strings to numbers Map numbers back to URIs and strings Store a triple as 3 numbers see Adjacency Matrix (page 418) and Adjacency List (page 420)
  14. 14. Representation :David rdf:type :Person :Paul rdf:type :Person :David :knows :Paul :Person rdf:type rdf:type :knows :David :Paul
  15. 15. Representation rdf:type 1 :knows 2 :David 3 :David rdf:type :Person :Paul 4 :Paul rdf:type :Person :Person 5 :David :knows :Paul 3 1 5 4 1 5 3 2 4 5 1 1 2 3 4
  16. 16. Finding Triples Sort by columns S P O 3 1 5 S 3 2 4 then P 4 1 5 then O 3 1 5 P 4 1 5 then O 3 2 4 then S 3 2 4 O 3 1 5 then S 4 1 5 then P
  17. 17. Finding Triples Sort by columns S P O 3 1 5 S 3 2 4 then P 4 1 5 then O 3 1 5 P 4 1 5 then O 3 2 4 then S 3 2 4 O 3 1 5 then S 4 1 5 then P
  18. 18. Finding Triples Sort by columns S P O 3 1 5 S 3 2 4 then P 4 1 5 then O 3 1 5 P 4 1 5 then O 3 2 4 then S 3 2 4 O 3 1 5 then S 4 1 5 then P
  19. 19. Disk Structure Linear layouts do not scale
  20. 20. Disk Structure Linear layouts do not scale
  21. 21. Disk Structure Linear layouts do not scale
  22. 22. Disk Structure Linear layouts do not scale Trees scale well
  23. 23. Disk Structure Linear layouts do not scale Trees scale well
  24. 24. Trees Scale well Basis of every major database Fast writing Fast reading Can be split over a network
  25. 25. Index Searches Use a binary tree search on the trees (page 456) Logarithmic complexity Blocks of data stored in tree nodes as stored data Use a binary search on sorted data blocks (page 138) Logarithmic complexity
  26. 26. Why Binary? Wider trees have identical complexity (logarithmic) Wider trees have fewer disk seeks Linear effect on complexity, which has no effect Wider trees have complex rebalancing
  27. 27. Better than Trees? Hash Tables (pages 362-366)
  28. 28. Better than Trees? Hash Tables (pages 362-366) Have constant complexity, BUT: Use too much space (scaling issues) Need to be expanded when they get too full Great for smaller data sets in RAM
  29. 29. Better than Trees? Hash Tables (pages 362-366) Have constant complexity, BUT: Use too much space (scaling issues) Need to be expanded when they get too full Great for smaller data sets in RAM Poor for disk usage - Good for clusters
  30. 30. Mapping Bijective Function (page 339) Store key/value pair, indexed by key Trees order by key Datatype ordering (lexical, numerical, dates, etc) Can find ranges of data Find all students enrolled between 1-1-2010 and 31-12-2010 Hashmaps have no ordering
  31. 31. Real Data Searches Combination searches “The list of people who know :Paul” The list of people AND Things that know :Paul
  32. 32. Constraints Bind a variable with a constraint ?x rdf:type :Person ?x :knows :Paul Describe requirements with a formal language Tucana Query Language (TQL) SPARQL Protocol and RDF Query Language (SPARQL)
  33. 33. Query Languages Formal language Context Free Grammar Chapter 8, section 8.4 SPARQL example: SELECT ?person WHERE { ?person a :Person . ?person :knows :Paul }
  34. 34. Algebra Formal language converted to an Algebra (Section 8.1) Constraints are combined and manipulated algebraically Optimization through algebraic manipulation Example: before optimization: ~600 seconds after optimization: 0.8 seconds
  35. 35. Algebraic Operations AND operations (Conjunctions) Mergesort (page 179) OR operations (Disjunctions) Union then sort (Chapter 2) Others Filter, Minus, LeftJoin, Datatype, etc...
  36. 36. Graph Operations List operations Graph traversal Transitivity (page 289) Distance between nodes Algorithm similar to Euler Path (page 490) on constrained graph
  37. 37. Ontologies Formal representation of knowledge Set of concepts in a domain Relationship between concepts Vocabularies for building ontologies expressed in RDF RDF Schema (RDFS) Simple Knowledge Organization System (SKOS) Web Ontology Language (OWL)
  38. 38. Rules RDF has few semantics Support for higher languages through Rules (page 72) Uses Prolog style language (page 64-71) to express Horn clauses (page 66), and therefore modus ponens (page 23) RDFS, SKOS and most of OWL all supported through Rules
  39. 39. Rule Examples if A is the same as B (owl:sameAs), and A relates to X, then B relates to X the same way. if P is a symmetric property (owl:SymmetricProperty), and P relates A to B, then P also relates B to A owl:sameAs is a symmetric property P(B,X) :- owl:sameAs(A,B), P(A,X). P(B,A) :- owl:SymmetricProperty(P), P(A,B). owl:SymmetricProperty(owl:sameAs).
  40. 40. OWL Properties Transitive Properties (page 289) Symmetric/Asymmetric Properties (page 289) Reflexive/Irreflexive Properties (page 289) Functional/Inverse-Functional Properties (page 341) Property inverses (page 342) Disjoint properties (page 195)
  41. 41. OWL Classes Defined with Set semantics (Chapter 3, section 3.1) Handles both instance data (set membership, pg 187) and set descriptions Types described with a Unary Predicate (pg 36, 188) RDF represents this with predicate of rdf:type Existential (pg 36), Universal (pg 35), Complementary (pg 195), Cardinality (pg 320), and Datatype operations
  42. 42. OWL in Use Represent schemas, similar to database schemas Automated research for candidate drug treatments NASA inventories

×