Upcoming SlideShare
×

# Mulgara

1,251 views
1,189 views

Published on

1 Comment
2 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• semantic

Are you sure you want to  Yes  No
Your message goes here
Views
Total views
1,251
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
13
1
Likes
2
Embeds 0
No embeds

No notes for slide

### Mulgara

1. 1. Mulgara Open Source Semantic Web Paul Gearon gearon@computer.org
2. 2. Mulgara RDF Database Open Source Written in Java Over 320,000 lines of code
3. 3. Math in Mulgara Graphs and Trees (Chapter 5) Recursive Algorithms (Chapter 2) Functions (Chapter 4) Formal Languages and Algebras (Chapter 8) Graph Algorithms (Chapter 6) Prolog, Rules, Logic (Chapter 1) Sets (Chapter 3)
4. 4. Math in Mulgara Graphs and Trees (Chapter 5) Recursive Algorithms (Chapter 2) Functions (Chapter 4) Formal Languages and Algebras (Chapter 8) Graph Algorithms (Chapter 6) Prolog, Rules, Logic (Chapter 1) Sets (Chapter 3) All programming uses Boolean Logic (Chapter 7)
5. 5. RDF A simple Description Logic (Chapter 1) Provides structure for data in the Semantic Web Simple data format of binary predicates, or Triples Triples combine to form a directed graph
6. 6. RDF Simple Describes schemas, ontologies, and instance data Foundation for complex logic systems like OWL Describes relationships between arbitrary things Forms a graph (Chapter 5) Can be used to describe anything
7. 7. RDF Triples :David :knows :Paul :knows(:David, :Paul) :knows :David :Paul
8. 8. RDF Graph :Person rdf:type rdf:type :knows :David :Paul :email :fullname :title mailto:david@example.com Dr David Smith
9. 9. RDF Graph :knows rdfs:domain rdfs:range :email rdfs:domain :Person rdfs:domain :fullname rdfs:domain rdf:type rdf:type :knows :title :David :Paul :email :fullname :title mailto:david@example.com Dr David Smith
10. 10. Storage Speed Fast storage Quickly ﬁnd what we want
11. 11. Storage Speed Fast storage Quickly ﬁnd what we want Index the data
12. 12. Persistent Storage Must be efﬁcient in space Smaller means more data Smaller means faster read/write Must support re-writable data Indexed, rewritable data usually means regular sized data blocks
13. 13. One Approach Map URIs and strings to numbers Map numbers back to URIs and strings Store a triple as 3 numbers see Adjacency Matrix (page 418) and Adjacency List (page 420)
14. 14. Representation :David rdf:type :Person :Paul rdf:type :Person :David :knows :Paul :Person rdf:type rdf:type :knows :David :Paul
15. 15. Representation rdf:type 1 :knows 2 :David 3 :David rdf:type :Person :Paul 4 :Paul rdf:type :Person :Person 5 :David :knows :Paul 3 1 5 4 1 5 3 2 4 5 1 1 2 3 4
16. 16. Finding Triples Sort by columns S P O 3 1 5 S 3 2 4 then P 4 1 5 then O 3 1 5 P 4 1 5 then O 3 2 4 then S 3 2 4 O 3 1 5 then S 4 1 5 then P
17. 17. Finding Triples Sort by columns S P O 3 1 5 S 3 2 4 then P 4 1 5 then O 3 1 5 P 4 1 5 then O 3 2 4 then S 3 2 4 O 3 1 5 then S 4 1 5 then P
18. 18. Finding Triples Sort by columns S P O 3 1 5 S 3 2 4 then P 4 1 5 then O 3 1 5 P 4 1 5 then O 3 2 4 then S 3 2 4 O 3 1 5 then S 4 1 5 then P
19. 19. Disk Structure Linear layouts do not scale
20. 20. Disk Structure Linear layouts do not scale
21. 21. Disk Structure Linear layouts do not scale
22. 22. Disk Structure Linear layouts do not scale Trees scale well
23. 23. Disk Structure Linear layouts do not scale Trees scale well
24. 24. Trees Scale well Basis of every major database Fast writing Fast reading Can be split over a network
25. 25. Index Searches Use a binary tree search on the trees (page 456) Logarithmic complexity Blocks of data stored in tree nodes as stored data Use a binary search on sorted data blocks (page 138) Logarithmic complexity
26. 26. Why Binary? Wider trees have identical complexity (logarithmic) Wider trees have fewer disk seeks Linear effect on complexity, which has no effect Wider trees have complex rebalancing
27. 27. Better than Trees? Hash Tables (pages 362-366)
28. 28. Better than Trees? Hash Tables (pages 362-366) Have constant complexity, BUT: Use too much space (scaling issues) Need to be expanded when they get too full Great for smaller data sets in RAM
29. 29. Better than Trees? Hash Tables (pages 362-366) Have constant complexity, BUT: Use too much space (scaling issues) Need to be expanded when they get too full Great for smaller data sets in RAM Poor for disk usage - Good for clusters
30. 30. Mapping Bijective Function (page 339) Store key/value pair, indexed by key Trees order by key Datatype ordering (lexical, numerical, dates, etc) Can ﬁnd ranges of data Find all students enrolled between 1-1-2010 and 31-12-2010 Hashmaps have no ordering
31. 31. Real Data Searches Combination searches “The list of people who know :Paul” The list of people AND Things that know :Paul
32. 32. Constraints Bind a variable with a constraint ?x rdf:type :Person ?x :knows :Paul Describe requirements with a formal language Tucana Query Language (TQL) SPARQL Protocol and RDF Query Language (SPARQL)
33. 33. Query Languages Formal language Context Free Grammar Chapter 8, section 8.4 SPARQL example: SELECT ?person WHERE { ?person a :Person . ?person :knows :Paul }
34. 34. Algebra Formal language converted to an Algebra (Section 8.1) Constraints are combined and manipulated algebraically Optimization through algebraic manipulation Example: before optimization: ~600 seconds after optimization: 0.8 seconds
35. 35. Algebraic Operations AND operations (Conjunctions) Mergesort (page 179) OR operations (Disjunctions) Union then sort (Chapter 2) Others Filter, Minus, LeftJoin, Datatype, etc...
36. 36. Graph Operations List operations Graph traversal Transitivity (page 289) Distance between nodes Algorithm similar to Euler Path (page 490) on constrained graph
37. 37. Ontologies Formal representation of knowledge Set of concepts in a domain Relationship between concepts Vocabularies for building ontologies expressed in RDF RDF Schema (RDFS) Simple Knowledge Organization System (SKOS) Web Ontology Language (OWL)
38. 38. Rules RDF has few semantics Support for higher languages through Rules (page 72) Uses Prolog style language (page 64-71) to express Horn clauses (page 66), and therefore modus ponens (page 23) RDFS, SKOS and most of OWL all supported through Rules
39. 39. Rule Examples if A is the same as B (owl:sameAs), and A relates to X, then B relates to X the same way. if P is a symmetric property (owl:SymmetricProperty), and P relates A to B, then P also relates B to A owl:sameAs is a symmetric property P(B,X) :- owl:sameAs(A,B), P(A,X). P(B,A) :- owl:SymmetricProperty(P), P(A,B). owl:SymmetricProperty(owl:sameAs).
40. 40. OWL Properties Transitive Properties (page 289) Symmetric/Asymmetric Properties (page 289) Reﬂexive/Irreﬂexive Properties (page 289) Functional/Inverse-Functional Properties (page 341) Property inverses (page 342) Disjoint properties (page 195)
41. 41. OWL Classes Deﬁned with Set semantics (Chapter 3, section 3.1) Handles both instance data (set membership, pg 187) and set descriptions Types described with a Unary Predicate (pg 36, 188) RDF represents this with predicate of rdf:type Existential (pg 36), Universal (pg 35), Complementary (pg 195), Cardinality (pg 320), and Datatype operations
42. 42. OWL in Use Represent schemas, similar to database schemas Automated research for candidate drug treatments NASA inventories