The World Wide Web is the defacto medium for publicly exposing a corpus of interrelated documents. In its current form, the World Wide Web is the Web of Documents. The next generation of the World Wide Web will support the Web of Data. The Web of Data utilizes the same Uniform Resource Identifier (URI) address space as the Web of Documents, but instead of a exposing a graph of documents, the Web of Data exposes a graph of data. Given that the URI address space of the Web is distributed and infinite, the Web of Data provides a single unified space by which the worlds data can be publicly exposed and interrelated. The Web of Data is supported by both graph databases (which structure the data) and distributed computing mechanism (which process the data). This presentation will discuss the Web of Data, graph databases, and models of computing in this emerging space.
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Distributed Graph Databases and the Emerging Web of Data
1. Distributed Graph Databases and the
Emerging Web of Data
Marko A. Rodriguez
T-5, Center for Nonlinear Studies
Los Alamos National Laboratory
http://markorodriguez.com
April 16, 2009
2. Abstract
The World Wide Web is the defacto medium for publicly exposing a corpus
of interrelated documents. In its current form, the World Wide Web is the
Web of Documents. The next generation of the World Wide Web will
support the Web of Data. The Web of Data utilizes the same Uniform
Resource Identifier (URI) address space as the Web of Documents, but
instead of a exposing a graph of documents, the Web of Data exposes a
graph of data. Given that the URI address space of the Web is distributed
and infinite, the Web of Data provides a single unified space by which the
worlds data can be publicly exposed and interrelated. The Web of Data is
supported by both graph databases (which structure the data) and
distributed computing mechanism (which process the data). This
presentation will discuss the Web of Data, graph databases, and models of
computing in this emerging space.
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
3. Outline
• The Relational Database vs. the Graph Database
• The Web of Documents vs. the Web of Data
• Local Computing vs. Distributed Computing
• Multi-Relational Network Analysis with Grammar Walkers
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
4. Outline
• The Relational Database vs. the Graph Database
• The Web of Documents vs. the Web of Data
• Local Computing vs. Distributed Computing
• Multi-Relational Network Analysis with Grammar Walkers
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
5. The Relational Database vs. the Graph Database
• A relational database’s (e.g. MySQL, PostgreSQL, Oracle) data model
is a collection interlinked tables.
• A graph database’s (e.g. OpenSesame, AllegroGraph, Neo4j) data model
is a multi-relational graph.
Relational Database Graph Database
d
c a
a
b
127.0.0.1 127.0.0.2
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
6. Types of Graphs
• Undirected single-relational graph: homogenous set of symmetric links.
• Directed single-relational graph: homogenous set of links.
• Directed multi-relational graph: heterogenous set of links.
undirected single-relational graph
x z
directed single-relational graph
x z
directed multi-relational graph
x y z
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
7. Our Make Believe World - Phase 1
• Marko is a human and Fluffy is a dog.
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
8. Our World Modeled in a Relational Database - Phase 1
ID Name Type Legs Fur
0001 Marko Human 2 false
0002 Fluffy Dog 4 true
Object_Table
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
9. Our World Modeled in a Graph Database - Phase 1
Human Dog
type type
0001 0002
name name
legs fur legs fur
2 Marko false 4 Fluffy true
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
10. Our Make Believe World - Phase 2
• Marko is a human and Fluffy is a dog.
• Marko and Fluffy are good friends.
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
11. Our World Modeled in a Relational Database - Phase 2
ID Name Type Legs Fur ID2 ID2
0001 Marko Human 2 false 0001 0002
0002 Fluffy Dog 4 true 0002 0001
Object_Table Friendship_Table
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
12. Our World Modeled in a Graph Database - Phase 2
Human Dog
type type
friend
0001 friend 0002
name name
legs fur legs fur
2 Marko false 4 Fluffy true
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
13. Our Make Believe World - Phase 3
• Marko is a human and Fluffy is a dog.
• Marko and Fluffy are good friends.
• Human and dog are a subclass of mammal.
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
14. Our World Modeled in a Relational Database - Phase 3
ID Name Type Legs Fur ID2 ID2 Type1 Type2
0001 Marko Human 2 false 0001 0002 Human Mammal
0002 Fluffy Dog 4 true 0002 0001 Dog Mammal
Object_Table Friendship_Table Subclass_Table
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
15. Our World Modeled in a Graph Database - Phase 3
Mammal
subclassof subclassof
Human Dog
type type
friend
0001 friend 0002
name name
legs fur legs fur
2 Marko false 4 Fluffy true
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
16. Our Make Believe World - Phase 4
• Marko is a human and Fluffy is a dog.
• Marko and Fluffy are good friends.
• Human and dog are a subclass of mammal.
• Fluffy peed on the carpet.
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
17. Our World Modeled in a Relational Database - Phase 4
ID Name Type Legs Fur ID2 ID2 Type1 Type2
0001 Marko Human 2 false 0001 0002 Human Mammal
0002 Fluffy Dog 4 true 0002 0001 Dog Mammal
0003 My_Rug Carpet N/A N/A
Friendship_Table Subclass_Table
Object_Table ID1 ID2
0002 0003
Pee_Table
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
18. Our World Modeled in a Graph Database - Phase 4
Mammal
subclassof subclassof
Human Dog Carpet
type type type
friend
0001 friend 0002 peedOn 0003
name name name
legs fur legs fur
2 Marko false 4 Fluffy true My_Rug
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
19. Our Make Believe World - Phase 5
• Marko is a human and Fluffy is a dog.
• Marko and Fluffy are good friends.
• Human and dog are a subclass of mammal.
• Fluffy peed on the carpet.
• Marko and Fluffy are both mammals.
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
20. Our World Modeled in a Relational Database - Phase 5
ID Name Type Legs Fur ID2 ID2 Type1 Type2
0001 Marko Human 2 false 0001 0002 Human Mammal
0002 Fluffy Dog 4 true 0002 0001 Dog Mammal
0003 My_Rug Carpet N/A N/A
Friendship_Table Subclass_Table
Object_Table ID1 ID2 ID Type
0002 0003 0001 Human
Pee_Table 0002 Dog
0003 Carpet
0001 Mammal
0002 Mammal
Type_Table
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
21. Our World Modeled in a Graph Database - Phase 5
Mammal
subclassof subclassof
Human Dog Carpet
type type
type type type
friend
0001 friend 0002 peedOn 0003
name name name
legs fur legs fur
2 Marko false 4 Fluffy true My_Rug
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
22. The Graph as the Natural World Model
• The world is inherently (or perceived as) object-oriented.
• The world is filled with objects and relations among them.
• The multi-relational graph is a very natural representation of the world.
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
23. The Graph as the Natural Programming Model
• High-level computer languages are object-oriented.
• Nearly no impedance mismatch between the multi-relational graph and
the programming object.
• It is easy to go from graph database to in-memory object.
Human marko = new Human();
marko.name = "Marko";
marko.addFriend(fluffy);
marko.setHasFur(false);
marko.setLegs(2);
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
24. SQL vs. SPARQL
SELECT OTY.Name FROM Object_Table AS OTX,
Object_Table AS OTY, Friendship_Table WHERE
OTX.Name = "Marko" AND
Friendship_Table.ID1 = OTY.ID AND
Friendship_Table.ID2 = OTX.ID;
SELECT ?z WHERE {
?x name "Marko" .
?y friend ?x .
?y name ?z }
E. Prud’hommeaux and A. Seaborne. SPARQL Query Language for RDF, WWW Consortium,
http://www.w3.org/TR/2004/WD-rdf-sparql-query-20041012/, 2004.
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
25. Outline
• The Relational Database vs. the Graph Database
• The Web of Documents vs. the Web of Data
• Local Computing vs. Distributed Computing
• Multi-Relational Network Analysis with Grammar Walkers
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
26. Internet Address Spaces
• The Uniform Resource Identifier (URI) is the superclass of the Uniform
Resource Locator (URL) and Uniform Resource Name (URN).
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
27. The Uniform Resource Locator
• The set of all URLs is the address space of all resources that can be
located and retrieved on the Web. URLs denote where a resource is.
http://markorodriguez.com/index.html
∗ Domain name server (DNS): markorodriguez.com → 216.251.43.6
∗ http:// means GET at port 80,
∗ /index.html means the resource to get at that Internet location.
Web Server
index.html
markorodriguez.com
216.251.43.6
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
28. The Uniform Resource Name
• The set of all URNs is the address space of all resources within the urn:
namespace.
urn:uuid:bd93def0-8026-11dd-842be54955baa12
urn:issn:0892-3310
urn:doi:10.1016/j.knosys.2008.03.030
• Named resources need not be retrievable through the Web.
• URNs denote what a resource is.
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
29. The Uniform Resource Identifier
• The URI address space is an infinite space for all Internet resources.
urn:issn:0892-3310
ftp://markorodriguez.com/private/markos_secrets.txt
http://www.lanl.gov#fluffy
• Important: URIs can denote concepts, instances, and datum.
lanl:fluffy lanl:fluffy_legs
lanl is a namespace prefix which extends to http://www.lanl.gov#.
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
30. The Web of Documents
• The World of Documents is primarily concerned with the Hyper-Text
Transfer Protocol (HTTP) and with retrievable resources in the URL
address space.
• These retrievable resources are files: HTML documents, images, audio,
etc. The “web” is created when HTML documents contain URLs.
http://markorodriguez.com/
index.html
href
Resume.html href Home.html href Research.html
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
31. The Web of Data
• The Web of Data is primarily concerned with URIs.
• The Resource Description Framework (RDF) is the standard for
representing the relationship between URIs and literals (e.g. float, string,
date time, etc.).
subject predicate object
lanl:marko foaf:knows lanl:fluffy
foaf:name foaf:name
"Marko A. Rodriguez"^^xsd:string "Fluffy P. Everywhere"^^xsd:string
C. Bizer, T. Heath, K. Idehen, and T. Berners-Lee. Linked Data on the Web, International World Wide Web Conference, 2008.
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
32. Our Make Believe World in RDF
lanl:Mammal
rdfs:subClassOf rdfs:subClassOf
lanl:Human lanl:Dog
rdf:type rdf:type
rdf:type rdf:type
lanl:marko lanl:friend lanl:fluffy
lanl:friend
lanl:fur lanl:legs lanl:fur lanl:legs
foaf:name foaf:name
"false"^^xsd:boolean "2"^^xsd:integer "true"^^xsd:boolean "4"^^xsd:integer
"Marko A. Rodriguez"^^xsd:string "Fluffy P. Everywhere"^^xsd:string
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
33. The Web of Data is a Distributed Database
• The URI address space is distributed.
• URIs can denote datum.
• RDF denotes the relationships URIs.
• The Web of Data’s foundational standard is RDF.
• Therefore, the Web of Data is a distributed database.
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
34. The Web of Documents vs. the Web of Data
Web Server Web Server
HTML href HTML
127.0.0.1 127.0.0.2
Graph Database Graph Database
lanl:friend
127.0.0.1 127.0.0.2
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
35. The Current Web of Data - March 2009
homologenekegg projectgutenberg
symbol
homologenekegg
libris projectgutenberg
cas symbol
bbcjohnpeel
libris
unists diseasome dailymed w3cwordnet
chebi
hgnc pubchem eurostat
mgi
geneid
omim wikicompany geospecies
cas bbcjohnpeel
diseasome dailymed
drugbank worldfactbook
reactome
pubmed unists
magnatune
opencyc w3cwordnet
uniparc linkedct chebi
freebase
taxonomy
uniref
uniprot
geneontology
interpro hgnc pubchem eurostat
pdb yago umbel
pfam mgi
dbpedia omim
bbclatertotpgovtrack wikicompany geospecies
prosite
prodom flickrwrappr
geneid
opencalais
reactome
uscensusdata
drugbank worldfactbook
lingvoj linkedmdb
surgeradio
magnatune
pubmed
virtuososponger opencyc
rdfbookmashup
uniparc freebase
swconferencecorpus geonames musicbrainz myspacewrapper linkedct
dblpberlin uniprot pubguide
taxonomy revyu interpro
uniref geneontologyjamendo bbcplaycountdata
rdfohloh
pdb umbel
yago
semanticweborg siocsites riese
pfam dbpedia bbclatertotp govtrack
foafprofiles
dblphannover openguides audioscrobbler prosite bbcprogrammes
prodom
crunchbase flickrwrappropencalais
doapspace uscensusdata
flickrexporter
surgeradio
budapestbme qdos
lingvoj linkedmdb
semwebcentral virtuososponger
eurecom ecssouthampton
pisa
dblprkbexplorer
newcastle rdfbookmashup
geonames musicbrainz
rae2001
eprints
irittoulouse
laascnrs acm citeseer
swconferencecorpus myspacewrapper
ieee dblpberlin pubguide
resex
ibm
revyu jamendo
rdfohloh
bbcplaycountdata
M.A. Rodriguez. A Graph Analysis of the Linked Data Cloud, in review, http://arxiv.org/abs/0903.0194, 2009.
semanticweborg riese siocsites
foafprofiles
openguides audioscrobbler bbcprogrammes
dblphannover
crunchbase
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
doapspace
flickrexporter
qdos
36. The Current Web of Data - March 2009
data set domain data set domain data set domain
audioscrobbler music govtrack government pubguide books
bbclatertotp music homologene biology qdos social
bbcplaycountdata music ibm computer rae2001 computer
bbcprogrammes media ieee computer rdfbookmashup books
budapestbme computer interpro biology rdfohloh social
chebi biology jamendo music resex computer
crunchbase business laascnrs computer riese government
dailymed medical libris books semanticweborg computer
dblpberlin computer lingvoj reference semwebcentral social
dblphannover computer linkedct medical siocsites social
dblprkbexplorer computer linkedmdb movie surgeradio music
dbpedia general magnatune music swconferencecorpus computer
doapspace social musicbrainz music taxonomy reference
drugbank medical myspacewrapper social umbel general
eurecom computer opencalais reference uniref biology
eurostat government opencyc general unists biology
flickrexporter images openguides reference uscensusdata government
flickrwrappr images pdb biology virtuososponger reference
foafprofiles social pfam biology w3cwordnet reference
freebase general pisa computer wikicompany business
geneid biology prodom biology worldfactbook government
geneontology biology projectgutenberg books yago general
geonames geographic prosite biology ...
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
37. Cultural Differences that are Leading to Web-Based
Data Management - Part 1
• Relational databases tend to not maintain public access points.
• Relational database users tend to not publish their schemas.
• Web of Data graph databases maintain public access points called
SPARQL end-points or Linked Data URLs.
• Web of Data graph database users tend to reuse and extend public
schemas called ontologies.
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
38. Cultural Differences that are Leading to Web-Based
Data Management - Part 2
Conventional Model Web of Data Model
127.0.0.1 127.0.0.2 127.0.0.3 127.0.0.1 127.0.0.2 127.0.0.3
Application 1 Application 2 Application 3 Application 1 Application 2 Application 3
processes processes processes
processes processes processes
Web of Data
structures structures structures
structures structures structures
127.0.0.1 127.0.0.2 127.0.0.3 127.0.0.4 127.0.0.5 127.0.0.6
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
39. Outline
• The Relational Database vs. the Graph Database
• The Web of Documents vs. the Web of Data
• Local Computing vs. Distributed Computing
• Multi-Relational Network Analysis with Grammar Walkers
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
40. SPARQLing a Data Provider - Local Computing
SELECT ?x WHERE { 127.0.0.2
lanl:marko lanl:friend ?x
END-POINT
127.0.0.1
SPARQL
}
Graph Database
{ lanl:fluffy }
• The 127.0.0.1 client is querying the 127.0.0.2 server.
• The query is any read-based SPARQL query.
• The results are those resources that bound to the query arguments.
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
41. GETing Linked Data as RDF - Local Computing
http://www.lanl.gov#marko
lanl:fluffy
lanl:friend
lanl:fluffy
lanl:marko
HTTP GET
lanl:wrote lanl:friend
vub:1010 Web of Data
lanl:marko ieee:2020
http://www.vub.edu#1010 lanl:wrote lanl:cites
ieee:2020
vub:1010
lanl:cites
vub:1010 HTTP GET
127.0.0.1
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
42. Problem with the Current Web of Data Infrastructure
• The only interfaces are SPARQL end-points and HTTP GETs of RDF
subgraphs.
• For human-based document retrieval, this is fine. For machine-based
data processing, this does not scale.
M.A. Rodriguez. A Distributed Process Infrastructure for a Distributed Data Structure. Semantic Web and Information Systems
Bulletin, AIS Special Interest Group on Semantic Web and Information Systems, http://arxiv.org/abs/0807.3908, 2008.
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
43. Problem with the Current Web of Data Infrastructure
• We can not rely on the “download and index” philosophy of the World
Wide Web.
As of March 2009, the Web of Data maintains 4.5 billion triples.
• The Web of Data can not rely on a single service provider.
too much data.
too many types algorithms that can utilize this data.
too many clock cycles to locally process this data.
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
44. The Open Virtual Machine Farm
Graph Database Graph Database
lanl:friend
127.0.0.1 127.0.0.2
Virtual Machine code/ Virtual Machine
Farm machine Farm
• Distributed computing through code/machine migration between farms.
• move the process to the data, not the data to the process.
M.A. Rodriguez. General Purpose Computing on a Semantic Network Substrate. in Emergent Web Intelligence, eds. R. Chbeir,
A. Hassanien, A. Abraham and Y. Badr, Springer-Verlag, http://arxiv.org/abs/0704.3395, 2009.
M.A. Rodriguez. The RDF Virtual Machine, in review, LA-UR-08-03925, 2009.
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
47. A Collection of Interlinked Graph Databases - Currently
127.0.0.2 127.0.0.3
127.0.0.6
127.0.0.4 127.0.0.5
127.0.0.10
127.0.0.9
127.0.0.8
127.0.0.7 127.0.0.11
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
48. A Collection of Interlinked Graph Databases and
Processors - Future
127.0.0.2 127.0.0.3
127.0.0.6
127.0.0.4 127.0.0.5
127.0.0.10
127.0.0.9
127.0.0.8
127.0.0.7 127.0.0.11
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
49. The Future of Web-Based Distributed Computing
• The HTTP GET approach to Web of Data does not scale.
• The Neno/Fhat (or any general-purpose computing) environment is
unsafe.
• The Web of Data needs an open, safe, flexible, and easy to adopt
computing infrastructure.
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
50. What Type of Processing?
• Object-oriented programming: Web of Data as an object repository.
• Logic: Web of Data as a knowledge-base.
• Graph/network analysis: Web of Data as a multi-relational graph.
• The future computing environment should support at least these popular
processing models.
• We will focus on graph/network analysis for the remainder of this
presentation.
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
51. Outline
• The Relational Database vs. the Graph Database
• The Web of Documents vs. the Web of Data
• Local Computing vs. Distributed Computing
• Multi-Relational Network Analysis with Grammar Walkers
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
52. Introduction to Random Walkers
• Random walkers can be used in single-relational networks to calculate:
stationary probability distribution: primary eigenvector calculation
spreading activation: search by means of diffusion
• There is a continuous and a discrete form of the general random walk
method.
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
53. Random Walks in a Single-Relational Network
• Suppose a single-relational network G, where
G = (V, E ⊆ (V × V )).
• Let’s represent that network as a row stochastic adjacency matrix A ∈
[0, 1]|V |×|V |, where
1
Γ(i) if (i, j) ∈ E
Ai,j =
0 otherwise.
• Finally, assume an “energy vector” π ∈ R|V |.
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
54. Random Walks in a Single-Relational Network
a b c d
a 0 0.5 0 0.5
b c
b 0 0 1 0
1 0 0 0
c 0.5 0 0 0.5
a d
d 0 1 0 0
G A π
• πA can be interpreted as the continuous form of propagating random
walkers over the G.
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
55. Stationary Probability Distribution in a
Single-Relational Network
π1 1 0 0 0
a b c d π2 0 0.5 0 0.5
0 0.5 0 0.5
π3 0 0.5 0.5 0
1
π4
0 0 0
0.25 0 0.5 0.25 time
0.5 0 0 0.5
5
0 0 0
π 0.25 0.38 0 0.36
1
π6 0 0.5 0.38 0.13
A ...
π∞ 0.15 0.31 0.31 0.23
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
56. Stationary Probability Distribution in a
Single-Relational Network
• If G is strongly connected and aperiodic then there exits a π such that
π = πA.
• This stationary π ∞ is the primary eigenvector of A.
• PageRank computes the stationary π by forcing G (the Web citation
graph) to be strongly connected and aperiodic.
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
57. Spreading Activation in a Single-Relational Network
• Spreading activation can be thought of as a “local rank” algorithm, while
calculating the stationary probability provides you a “global rank”.
• With spreading activation, you iterate for only a certain number of
timesteps.
• Also, you record how much energy has flowed through each vertex.
• Let’s demonstrate using a single discrete walker...
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
58. Spreading Activation in a Single-Relational Network
• The walkers moves from vertex to vertex with choice dependent on the
probability distribution of A.
• At every step, if the walker is at vertex i then πi = π + 1.
2 3
π1 1 0 0 0
G b c
π2 1 1 0 0
time
1 π3 1 1 1 0
π4
a d
4 2 1 1 0
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
59. Random Walks in a Multi-Relational Network
• Suppose a multi-relational network M , where
M = (V, E = {E0, E1, . . . , Ek ⊆ (V × V )})
• Represent as a {0, 1}-adjacency tensor A ∈ {0, 1}|V |×|V |×|E|, where
1 if (i, j) ∈ Em : 1 ≤ m ≤ k
Am =
i,j
0 otherwise.
• Then assume a “energy vector” π ∈ R|V |.
M.A. Rodriguez and J. Shinavier. Exposing Multi-Relational Networks to Single-Relational Network Analysis Algorithms, in
review, http://arxiv.org/abs/0806.2274, 2009.
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
60. Random Walks in a Multi-Relational Network
b cites c
0 1 0 0
authored contains 0 0 0 0 1 0 0 0
a d 0 0 0 0
0 0 0 0
ns
ai
nt
co
s
te
ed
ci
or
th
au
M A π
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
61. The Operations of the Multi-Relational Path Algebra
• A · B: ordinary matrix multiplication determines the number of (A, B)-
paths between vertices.
• A : matrix transpose inverts path directionality.
• A ◦ B: Hadamard, entry-wise multiplication applies a filter to selectively
exclude paths.
• n(A): not generates the complement of a {0, 1}n×n matrix.
• c(A): clip generates a {0, 1}n×n matrix from a Rn×n matrix.
+
• v ±(A): vertex generates a {0, 1}n×n matrix from a Rn×n matrix, where
+
only certain rows or columns contain non-zero values.
• λA: scalar multiplication weights the entries of a matrix.
• A + B: matrix addition merges paths.
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
62. The Traverse Operation
• An interesting aspect of the single-relational adjacency matrix A ∈ {0, 1}n×n is that when it is raised
(k)
to the kth power, the entry Ai,j is equal to the number of paths of length k that connect vertex i to
vertex j .
(1)
• Given, by definition, that Ai,j (i.e. Ai,j ) represents the number of paths that go from i to j of length
1 (i.e. a single edge) and by the rules of ordinary matrix multiplication,
(k) (k−1)
Ai,j = Ai,l · Al,j : k ≥ 2.
l∈V
a b c
a b c a b c a b c
a 0 1 0 a 0 1 0 a 0 0 1
b 0 0 1 · b 0 0 1 = b 0 0 0
c 0 0 0 c 0 0 0 c 0 0 0
there is a path of length 2
from a to c
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
63. A1 : authored A2 : cites A3 : contains
h ih ih i
The Traverse Operation
Z = A1 · A2 · A1 ,
Zi,j defines the number of paths from vertex i to vertex j such that a path goes from author i to one the
articles he or she has authored, from that article to one of the articles it cites, and finally, from that cited
article to its author j . Semantically, Z is an author-citation single-relational path matrix.
A2
vub:1010 lanl:cites ieee:2020
A1 lanl:authored A1
lanl:authored
lanl:marko lanl:author-citation vub:fheyligh
Z
* NOTE: All diagrams are with respect to a “source” vertex (the blue vertex) in order to preserve clarity. In reality, the operations
operate on all vertices in parallel.
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
65. The Filter Operation
• A◦1=A
• A◦0=0
• A◦B=B◦A
• A ◦ (B + C) = (A ◦ B) + (A ◦ C)
• A ◦ B = (A ◦ B) .
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
66. The Not Filter
The not filter is useful for excluding a set of paths to or from a vertex.
n : {0, 1}n×n → {0, 1}n×n
with a function rule of
1 if Ai,j = 0
n(A)i,j =
0 otherwise.
0 0 1 1 1 1 1 0 0 0
1 0 1 0 1 0 1 0 1 0
n 0 1 1 1 1 = 1 0 0 0 0
1 1 0 1 1 0 0 1 0 0
1 1 1 1 0 0 0 0 0 1
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
67. The Not Filter
If A ∈ {0, 1}n×n, then
• n(n(A)) = A
• A ◦ n(A) = 0
• n(A) ◦ n(A) = n(A).
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
68. A1 : authored A2 : cites A3 : contains
h ih ih i
The Not Filter
A coauthorship path matrix is
Z = A1 · A1 ◦ n(I)
acm:0505
A1 lanl:authored
A1
lanl:authored
lanl:marko lanl:coauthor lanl:jbollen
Z
n(I)
lanl:coauthor
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
69. The Clip Filter
The general purpose of clip is to take a path matrix and “clip”, or
normalize, it to a {0, 1}n×n matrix.
c : Rn×n → {0, 1}n×n
+
1 if Zi,j > 0
c(Z)i,j =
0 otherwise.
24 1 0 0 0 1 1 0 0 0
0 72 0 4 0 0 1 0 1 0
c 23 0 0 0 0 = 1 0 0 0 0
0 0 15.3 0 0 0 0 1 0 0
0 0 0 0 12 0 0 0 0 1
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
70. The Clip Filter
If A, B ∈ {0, 1}n×n and Y, Z ∈ Rn×n, then
+
• c(A) = A
• c(n(A)) = n(c(A)) = n(A)
• c(Y ◦ Z) = c(Y) ◦ c(Z)
• n(A ◦ B) = c (n(A) + n(B))
• n(A + B) = n(A) ◦ n(B)
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
71. A1 : authored A2 : cites A3 : contains
h ih ih i
The Clip Filter
Suppose we want to create an author citation path matrix that does not allow self citation or coauthor
citations. „ « „ „ ««
1 2 1 1 1
Z= A ·A ·A ◦n c A · A ◦ n(I) ◦ n(I)
|{z}
| {z } | {z } no self
cites no coauthors
Z
lanl:author-citation odu:nelson
authored
2
A A1
lanl:3030 lanl:cites lanl:4040
A 1 A1
lanl:authored lanl:authored
lanl:authored
lanl:marko lanl:coauthor lanl:jbollen
n c A1 · A1 ◦ n(I)
self n(I)
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
72. A1 : authored A2 : cites A3 : contains
h ih ih i
The Clip Filter
However, using various theorems of the path algebra and abstract algebra
in general,
Z = A1 · A2 · A1 ◦ n c A1 · A1 ◦ n(I) ◦ n(I)
no self
cites no coauthors
becomes
Z = A1 · A2 · A1 ◦ n c A1 · A1 ◦ n(I).
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
73. Other Filters and Operations...
• Please refer to the article for more information on these filters and
operations.
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
74. Problems with the Path Algebra
• As a matrix algebra, it is impossible (computationally speaking) to
compute matrix operations over the entire Web of Data.
• However, it is possible to approximate these calculations using “random”
walkers.
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
75. Mapping Paths to Grammar-Based Random Walkers
• A grammar-based random walker is a walker that obeys a path
description.
• Able to compute “semantically rich” spreading activation and stationary
probability distributions in a multi-relational network.
• Able to approximate through the convergence properties of these
operations.
• Provides a convenient application to the Web of Data and linked graph
databases.
M.A. Rodriguez. Grammar-Based Random Walkers in Semantic Networks. Knowledge-Based Systems, 21(7), 727–739, 2008.
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
76. A Grammar Walker
Grammar Walker
A1 · A1 ◦ n(I)
t=1
t=2 t=3
Web of Data
structures structures structures
127.0.0.4 127.0.0.5 127.0.0.6
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
77. Grammar Walking the Web of Data
127.0.0.1
1 7
127.0.0.2 127.0.0.3
2
127.0.0.6
127.0.0.4 127.0.0.5
127.0.0.10
3
127.0.0.9
127.0.0.8 6
5
127.0.0.7 4 127.0.0.11
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
78. Conclusion
• Graph databases will increasingly support the Web of Data.
• The Web of Data is about open, global-scale data management.
• Distributed computing is required for global-scale data processing.
• Grammar walkers can be used for distributed network analysis on the
Web of Data.
Computer Science Department Colloquium – University of New Mexico – April 16, 2009
79. Thank You For Your Time
My homepage: http://markorodriguez.com
Neno/Fhat: http://neno.lanl.gov
Collective Decision Making Systems: http://cdms.lanl.gov
Faith in the Algorithm: http://faithinthealgorithm.net
MESUR: http://www.mesur.org
Computer Science Department Colloquium – University of New Mexico – April 16, 2009