1.
Mulgara
Open Source Semantic Web
Paul Gearon
gearon@computer.org
2.
Mulgara
RDF Database
Open Source
Written in Java
Over 320,000 lines of code
3.
Math in Mulgara
Graphs and Trees (Chapter 5)
Recursive Algorithms (Chapter 2)
Functions (Chapter 4)
Formal Languages and Algebras (Chapter 8)
Graph Algorithms (Chapter 6)
Prolog, Rules, Logic (Chapter 1)
Sets (Chapter 3)
4.
Math in Mulgara
Graphs and Trees (Chapter 5)
Recursive Algorithms (Chapter 2)
Functions (Chapter 4)
Formal Languages and Algebras (Chapter 8)
Graph Algorithms (Chapter 6)
Prolog, Rules, Logic (Chapter 1)
Sets (Chapter 3)
All programming uses Boolean Logic (Chapter 7)
5.
RDF
A simple Description Logic (Chapter 1)
Provides structure for data in the Semantic Web
Simple data format of binary predicates, or Triples
Triples combine to form a directed graph
6.
RDF
Simple
Describes schemas, ontologies, and instance data
Foundation for complex logic systems like OWL
Describes relationships between arbitrary things
Forms a graph (Chapter 5)
Can be used to describe anything
8.
RDF Graph
:Person
rdf:type rdf:type
:knows
:David :Paul
:email
:fullname
:title
mailto:david@example.com
Dr David Smith
9.
RDF Graph :knows
rdfs:domain
rdfs:range
:email
rdfs:domain
:Person rdfs:domain
:fullname
rdfs:domain
rdf:type rdf:type
:knows
:title
:David :Paul
:email
:fullname
:title
mailto:david@example.com
Dr David Smith
10.
Storage
Speed
Fast storage
Quickly ﬁnd what we want
11.
Storage
Speed
Fast storage
Quickly ﬁnd what we want
Index the data
12.
Persistent Storage
Must be efﬁcient in space
Smaller means more data
Smaller means faster read/write
Must support re-writable data
Indexed, rewritable data usually means regular sized
data blocks
13.
One Approach
Map URIs and strings to numbers
Map numbers back to URIs and strings
Store a triple as 3 numbers
see Adjacency Matrix (page 418)
and Adjacency List (page 420)
22.
Disk Structure
Linear layouts do not scale
Trees scale well
23.
Disk Structure
Linear layouts do not scale
Trees scale well
24.
Trees
Scale well
Basis of every major database
Fast writing
Fast reading
Can be split over a network
25.
Index Searches
Use a binary tree search on the trees (page 456)
Logarithmic complexity
Blocks of data stored in tree nodes as stored data
Use a binary search on sorted data blocks (page 138)
Logarithmic complexity
26.
Why Binary?
Wider trees have identical complexity (logarithmic)
Wider trees have fewer disk seeks
Linear effect on complexity, which has no effect
Wider trees have complex rebalancing
27.
Better than Trees?
Hash Tables (pages 362-366)
28.
Better than Trees?
Hash Tables (pages 362-366)
Have constant complexity, BUT:
Use too much space (scaling issues)
Need to be expanded when they get too full
Great for smaller data sets in RAM
29.
Better than Trees?
Hash Tables (pages 362-366)
Have constant complexity, BUT:
Use too much space (scaling issues)
Need to be expanded when they get too full
Great for smaller data sets in RAM
Poor for disk usage - Good for clusters
30.
Mapping
Bijective Function (page 339)
Store key/value pair, indexed by key
Trees order by key
Datatype ordering (lexical, numerical, dates, etc)
Can ﬁnd ranges of data
Find all students enrolled between 1-1-2010 and 31-12-2010
Hashmaps have no ordering
31.
Real Data Searches
Combination searches
“The list of people who know :Paul”
The list of people
AND
Things that know :Paul
32.
Constraints
Bind a variable with a constraint
?x rdf:type :Person
?x :knows :Paul
Describe requirements with a formal language
Tucana Query Language (TQL)
SPARQL Protocol and RDF Query Language
(SPARQL)
33.
Query Languages
Formal language
Context Free Grammar
Chapter 8, section 8.4
SPARQL example:
SELECT ?person
WHERE {
?person a :Person .
?person :knows :Paul
}
34.
Algebra
Formal language converted to an Algebra (Section 8.1)
Constraints are combined and manipulated
algebraically
Optimization through algebraic manipulation
Example:
before optimization: ~600 seconds
after optimization: 0.8 seconds
35.
Algebraic Operations
AND operations (Conjunctions)
Mergesort (page 179)
OR operations (Disjunctions)
Union then sort (Chapter 2)
Others
Filter, Minus, LeftJoin, Datatype, etc...
36.
Graph Operations
List operations
Graph traversal
Transitivity (page 289)
Distance between nodes
Algorithm similar to Euler Path (page 490) on
constrained graph
37.
Ontologies
Formal representation of knowledge
Set of concepts in a domain
Relationship between concepts
Vocabularies for building ontologies expressed in RDF
RDF Schema (RDFS)
Simple Knowledge Organization System (SKOS)
Web Ontology Language (OWL)
38.
Rules
RDF has few semantics
Support for higher languages through Rules (page 72)
Uses Prolog style language (page 64-71) to express
Horn clauses (page 66), and therefore modus ponens
(page 23)
RDFS, SKOS and most of OWL all supported through
Rules
39.
Rule Examples
if A is the same as B (owl:sameAs), and A relates to X,
then B relates to X the same way.
if P is a symmetric property (owl:SymmetricProperty),
and P relates A to B, then P also relates B to A
owl:sameAs is a symmetric property
P(B,X) :- owl:sameAs(A,B), P(A,X).
P(B,A) :- owl:SymmetricProperty(P), P(A,B).
owl:SymmetricProperty(owl:sameAs).
41.
OWL Classes
Deﬁned with Set semantics (Chapter 3, section 3.1)
Handles both instance data (set membership, pg 187)
and set descriptions
Types described with a Unary Predicate (pg 36, 188)
RDF represents this with predicate of rdf:type
Existential (pg 36), Universal (pg 35), Complementary
(pg 195), Cardinality (pg 320), and Datatype operations
42.
OWL in Use
Represent schemas, similar to database schemas
Automated research for candidate drug treatments
NASA inventories
Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.