Multi-Relational Graph Structures: From Algebra to Application
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Multi-Relational Graph Structures: From Algebra to Application

on

  • 2,618 views

In a single-relational graph, all edges share the same meaning. In contrast, a multi-relational graph represents a heterogeneous set of edges, where each edge is labeled to denote the type of ...

In a single-relational graph, all edges share the same meaning. In contrast, a multi-relational graph represents a heterogeneous set of edges, where each edge is labeled to denote the type of relationship that exists between the two vertices it connects. While less prevalent than the single-relational graph, the multi-relational graph structure is beginning to see widespread adoption in both academia and industry. An algebra for manipulating multi-relational graph structures and the realization of this algebra in various application scenarios is presented in this talk.

Statistics

Views

Total Views
2,618
Views on SlideShare
2,612
Embed Views
6

Actions

Likes
2
Downloads
100
Comments
0

1 Embed 6

http://www.slideshare.net 6

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Multi-Relational Graph Structures: From Algebra to Application Presentation Transcript

  • 1. Multi-Relational Graph Structures: From Algebra to Application Marko A. Rodriguez T-5, Center for Nonlinear Studies Los Alamos National Laboratory http://markorodriguez.com October 27, 2009
  • 2. Abstract In a single-relational graph, all edges share the same meaning. In contrast, a multi-relational graph represents a heterogeneous set of edges, where each edge is labeled to denote the type of relationship that exists between the two vertices it connects. While less prevalent than the single-relational graph, the multi-relational graph structure is beginning to see widespread adoption in both academia and industry. An algebra for manipulating multi-relational graph structures and the realization of this algebra in various application scenarios is presented in this talk. MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 3. My Computer Eco-System • Articles/Lectures: LTEX, OmniGraffle, LTEX iT A A • Software Development: Java, R Statistics • Large-Scale Data Management: MySQL, Neo4j, Linked Process • Graph/Network Analysis: iGraph, rPath, Confluence, JUNG • Web of Data/Semantic Web: Open Sesame (SAIL), Prot´g´ e e • 3D Modeling/Programming: Java Monkey Engine, Blender, Gimp • Audio Synthesis/Processing: Max/MSP, ProTools MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 4. Outline • Introduction to Graph Structures The Single-Relational Graph The Multi-Relational Graph • A Multi-Relational Path Algebra • Application to Recommender Systems MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 5. Outline • Introduction to Graph Structures The Single-Relational Graph The Multi-Relational Graph • A Multi-Relational Path Algebra • Application to Recommender Systems MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 6. A Single-Relational Graph Example Article C Article F Article B Article D Article A Article E An article citation graph. Each vertex is an article and each edge denotes that the tail article cites the head article. MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 7. Single-Relational Graph Notation • Homogenous set of vertex and edge types.1 • There are undirected and directed forms, where V is the set of vertices and E is an unordered or ordered set of edges, respectively. G = (V, E ⊆ {V × V }) G = (V, E ⊆ (V × V )) (we will focus on directed graphs in this talk.) • There is an adjacency matrix representation A ∈ {0, 1}n×n, where n = |V | and 1 if (i, j) ∈ E Ai,j = 0 otherwise. 1 Unless the graph is bipartite. MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 8. The Use of Single-Relational Graphs in Research • Most common graph structure used in 90’s and 00’s research. scholarly graphs: citations, coauthorship relationships, article/journal usage, acknowledgements, funding sources. technological graphs: software dependencies, Internet architecture, web citations. communication graphs: email correspondence, cell phone calls, micro-blog “following.” • Numerous algorithms have been developed for analyzing such structures. geodesics: radius, diameter, eccentricity, closeness, betweenness. spectral: eigenvector centrality, pagerank, spreading activation. community detection: walktrap, edge betweenness, leading eigenvector, spin-glass. MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 9. My Work with Single-Relational Graphs • Articles of mine that make use of the single-relational graph structure. Bollen, J., Van de Sompel, H., Hagberg, A., Bettencourt, L.M.A, Chute, R., Rodriguez, M.A., Balakireva, L.L., “Clickstream Data Yields High-Resolution Maps of Science,” PLoS One, 4(3), e4803, 2009. Bollen, J., Van de Sompel, H., Rodriguez, M.A., “Towards Usage-Based Impact Metrics: First Results from the MESUR Project,” Joint Conference on Digital Libraries (JCDL), 2008. Rodriguez, M.A., Pepe, A., “On the Relationship Between the Structural and Socioacademic Communities of a Coauthorship Network,” Journal of Informetrics, 2(3), pp. 195–201, 2008. Rodriguez, M.A., Bollen, J., “An Algorithm to Determine Peer-Reviewers,” Conference on Information and Knowledge Management (CIKM), pp. 319–328, 2008. Rodriguez, M.A., Bollen, J., Van de Sompel, H., “Mapping the Bid Behavior of Conference Referees,” Journal of Informetrics, 1(1), pp. 62–82, 2007. Bollen, J., Rodriguez, M.A., Van de Sompel, H., “Journal Status,” Scientometrics, 69(3), pp. 669-687, 2006. Rodriguez, M.A., Bollen, J., Van de Sompel, H., “The Convergence of Digital Libraries and the Peer-Review Process,” Journal of Information Science, 32(2), pp. 149–159, 2006. Rodriguez, M.A., Steinbock, D.J., “A Social Network for Societal-Scale Decision-Making Systems,” Proceedings of the North American Association for Computational Social and Organizational Science Conference, 2004. • They focus on supporting/analyzing/ranking/visualizing the scholarly community and large-scale decision support systems (i.e. governance systems). MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 10. Studying the Reading Behavior of Scholars Bollen, J., Van de Sompel, H., Hagberg, A., Bettencourt, L.M.A, Chute, R., Rodriguez, M.A., Balakireva, L.L., “Clickstream Data Yields High-Resolution Maps of Science,” PLoS One, 4(3), e4803, 2009. MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 11. ! ! ! ! ! ! ! Studying Characteristics that Lead to Coauthorship ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! !! ! ! ! ! ! !! ! ! ! !! ! ! ! ! ! ! ! ! ! ! !! !! ! !! ! ! ! !! ! ! ! ! ! ! ! !! ! !! !! ! ! ! ! ! ! ! ! ! ! !! !! ! ! ! ! !! ! ! ! ! !! ! !!! ! !!! ! ! !! ! ! !! ! !! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! !! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! !! ! !!!!! ! ! ! !! !! !!!! ! !! ! ! ! ! ! !! ! ! ! ! ! ! !!! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! !! ! ! ! ! ! !! ! !! !!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! Rodriguez, M.A., Pepe, A., “On the Relationship Between the Structural and Socioacademic Communities of a Coauthorship Network,” Journal of Informetrics, 2(3), pp. 195–201, 2008. ! ! ! ! MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 12. Predicting Referees Based on Coauthorship Patterns BORGMAN WITTEN TAYLOR RECKER MOORE BISHOFF MARSHALL CUNNINGHAM SUMNER CASTELLI RAY CASSEL FURUTA GOLOVCHINSKY FUHR GIERSCH THANOS SOMPEL FOX ALLEN NEUHOLD SOLVBERG FULKER ARMS NELSON CHEN FOO LEGGETT JANEE LAGOZE MARCHIONINI LYNCH RASMUSSEN BAKER LIM SANCHEZ WRIGHT JESUROGA TSE SUGIMOTO KHOO Rodriguez, M.A., Bollen, J., Van de Sompel, H., “Mapping the Bid Behavior of Conference Referees,” Journal of Informetrics, 1(1), pp. 62–82, 2007. MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 13. A Multi-Relational Graph Example Article C Article F cites cites acknowledges Article B Article D authored peer-reviewed authored authored Person A Person E A scholarly graph. Each vertex is a scholarly artifact and each edge denotes the type of directed relationship that exists between the two scholarly artifacts it connects. MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 14. Multi-Relational Graph Notation • Heterogeneous set of vertex types and a heterogeneous set of edge types. • This data structure is becoming more prevalent due to both the Semantic Web/Web of Data movement and the graph database movement. • G = (V, E = {E0, E1, . . . , Em ⊆ (V ×V )}), where E is a family of typed edge sets of length m. For example, E0 is the “authored” adjacency matrix, E1 is the “cites” adjacency matrix, etc. • There is a three-way tensor representation A ∈ {0, 1}n×n×m, where 1 if (i, j) ∈ Ek : k ≤ m Ak i,j = 0 otherwise. MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 15. A Three-Way Tensor Representation of a Multi-Relational Graph A ∈ {0, 1}n×n×m 0 1 1 0 0 |V | = n 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 ... s te |E ed ci |V | = n or |= th au m MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 16. My Work with Multi-Relational Graphs • Articles of mine that make use of the multi-relational graph structure. Rodriguez M.A., Shinavier, J., “Exposing Multi-Relational Networks to Single-Relational Network Analysis Algorithms,” Journal of Informetrics, in press, 2009. [Presented in the second part of this presentation.] Rodriguez, M.A., Geldart, J., “An Evidential Path Logic for Multi-Relational Networks,” Proceedings of the Association for the Advancement of Artificial Intelligence Spring Symposium: Technosocial Predictive Analytics Symposium, volume SS-09-09, pp. 114–119, 2009. Rodriguez M.A., Bollen, J., Van de Sompel, H., “Automatic Metadata Generation using Associative Networks,” ACM Transactions on Information Systems, 27(2), pp. 1–20, 2009. Rodriguez, M.A., “Grammar-Based Random Walkers in Semantic Networks,” Knowledge-Based Systems, 21(7), pp. 727–739, 2008. [Presented in the third part of this presentation.] Rodriguez, M.A., “Social Decision Making with Multi-Relational Networks and Grammar-Based Particle Swarms,” Hawaii International Conference on Systems Science (HICSS), pp. 39–49, 2007. Bollen, J., Rodriguez, M.A., Van de Sompel, H., Balakireva, L.L., Hagberg, A., “The Largest Scholarly Semantic Network...Ever.,” ACM World Wide Web Conference, 2007. Rodriguez, M.A., “A Multi-Relational Network to Support the Scholarly Communication Process,” International Journal of Public Information Systems, 2007(1), pp. 13–29, 2007. • They focus on multi-relational graph algorithms, logic, information retrieval, decision support systems, bibliometrics, recommender systems. MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 17. Resource Description Framework Graph lanl:article_c lanl:article_f lanl:cites lanl:cites lanl:acknowledges lanl:article_b lanl:article_d lanl:authored lanl:peer_reviewed lanl:authored lanl:authored lanl:person_a lanl:person_e lanl: → http://lanl.gov# A scholarly graph. Each vertex and edge type is identified by a Uniform Resource Identifier and thus, encoded in the address space of the World Wide Web. MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 18. Resource Description Framework Graph • Vertices and edge labels are identified by Uniform Resource Identifiers (URI). Thus, there is a single address space where the world’s data can be interrelated. • G = (U ∪ B) × U × (U ∪ B ∪ L), where U is the set of all URIs, B is the set of all blank nodes, and L is the set of all literals. • There exist various implementations of this standard model. Open Sesame (http://openrdf.org/). AllegroGraph (http://www.franz.com/agraph/allegrograph/). OWLim (http://www.ontotext.com/owlim/). Jena (http://jena.sourceforge.net/) MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 19. Linked Data and the Web of Data http://dbpedia.org/resource/Albert Einstein http://www4.wiwiss.fu-berlin.de/flickrwrappr/photos/Albert_Einstein http://farm1.static.flickr.com/60/170621225_661c705eb4_m.jpg http://farm4.static.flickr.com/3408/3547607847_65abfd03a5_m.jpg foaf:depiction foaf:depiction flickr:Albert_Einstein dbpprop:hasPhotoCollection dbpedia:Albert_Einstein dbpedia:doctoralAdvisor dbpedia:citizenship dbpedia:United_States dbpedia:Alfred_Kleiner http://dbpedia.org/resource/Albert Einstein MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 20. My Work with Resource Description Framework Graphs • Articles of mine that make use of RDF/Web of Data/Semantic Web. Rodriguez, M.A., “Interpretations of the Web of Data,” Data Management in the Semantic Web, eds. H. Jin and Z. Lv, Nova, in press, 2009. Rodriguez, M.A., “A Reflection on the Structure and Process of the Web of Data,” Bulletin of the American Society for Information Science and Technology, 35(6), pp. 38–43, 2009. Rodriguez, M.A., “A Graph Analysis of the Linked Data Cloud,” http://arxiv.org/abs/0903.0194, February 2009. Rodriguez, M.A., Allen, D.W., Shinavier, J., Ebersole, G., “A Recommender System to Support the Scholarly Communication Process,” KRS-2009-02, 2009. [Presented in the third part of this presentation.] Rodriguez, M.A., Watkins, J., “Faith in the Algorithm, Part 2: Computational Eudaemonics,” Lecture Notes in Artificial Intelligence, eds. Velsquez, J.D., Howlett, R.J., and Jain, L.C., volume 5712, pp 813–820, 2009. Rodriguez, M.A., “General-Purpose Computing on a Semantic Network Substrate,” Emergent Web Intelligence, Advanced Information and Knowledge Processing series, Eds. R. Chbeir, A. Hassanien, A. Abraham, and Y. Badr, in press, 2008. Rodriguez, M.A., Pepe, A., Shinavier, J., “The Dilated Triple,” Emergent Web Intelligence, Advanced Information and Knowledge Processing series, eds. R. Chbeir, A. Hassanien, A. Abraham, and Y. Badr, in press, 2008. • They focus on graph algorithms, distributed computing, graph-based computing, recommender systems. MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 21. The Web of Data as of March 2009 homologenekegg projectgutenberg symbol libris homologenekegg projectgutenberg symbol libris cas bbcjohnpeel unists unists cas diseasome dailymed bbcjohnpeel w3cwordnet diseasome dailymed w3cwordnet chebi hgnc pubchem eurostat chebi mgi geneid omim wikicompany hgnc geospecies worldfactbook pubchem eurostat reactome drugbank uniparc pubmed mgi magnatune linkedct opencyc omim freebase wikicompany geospecies uniprot taxonomy interpro geneid uniref geneontology pdb reactome yago umbel drugbank worldfactbook pfam dbpedia bbclatertotp govtrack magnatune prodom prosite pubmed flickrwrappropencalais opencyc uniparc uscensusdata freebase lingvoj linkedmdb surgeradio linkedct uniprot virtuososponger taxonomy rdfbookmashup swconferencecorpus interpro geonames musicbrainz myspacewrapper uniref dblpberlin geneontology pubguide pdb yago umbel revyu rdfohloh jamendo pfam bbcplaycountdata dbpedia bbclatertotp govtrack semanticweborg siocsites riese prosite openguides prodom foafprofiles audioscrobbler bbcprogrammes flickrwrappropencalais dblphannover crunchbase uscensusdata doapspace surgeradio flickrexporter lingvoj linkedmdb budapestbme qdos virtuososponger semwebcentral eurecom ecssouthampton dblprkbexplorer rdfbookmashup newcastle geonames musicbrainz pisa rae2001 eprints swconferencecorpus myspacewrapper irittoulouse laascnrs acm citeseer ieee dblpberlin pubguide resex ibm revyu jamendo rdfohloh bbcplaycountdata Rodriguez, M.A., “A Graph Analysis of the Linked Data Cloud,” http://arxiv.org/abs/0903.0194, February 2009. semanticweborg riese siocsites foafprofiles openguides audioscrobbler bbcprogrammes dblphannover crunchbase MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009 doapspace flickrexporter budapestbme qdos
  • 22. The Web of Data as of March 2009 data set domain data set domain data set domain audioscrobbler music govtrack government pubguide books bbclatertotp music homologene biology qdos social bbcplaycountdata music ibm computer rae2001 computer bbcprogrammes media ieee computer rdfbookmashup books budapestbme computer interpro biology rdfohloh social chebi biology jamendo music resex computer crunchbase business laascnrs computer riese government dailymed medical libris books semanticweborg computer dblpberlin computer lingvoj reference semwebcentral social dblphannover computer linkedct medical siocsites social dblprkbexplorer computer linkedmdb movie surgeradio music dbpedia general magnatune music swconferencecorpus computer doapspace social musicbrainz music taxonomy reference drugbank medical myspacewrapper social umbel general eurecom computer opencalais reference uniref biology eurostat government opencyc general unists biology flickrexporter images openguides reference uscensusdata government flickrwrappr images pdb biology virtuososponger reference foafprofiles social pfam biology w3cwordnet reference freebase general pisa computer wikicompany business geneid biology prodom biology worldfactbook government geneontology biology projectgutenberg books yago general geonames geographic prosite biology ... MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 23. Application Development on the Web of Data a. Application 1 Application 2 Application 3 b. Application 1 Application 2 Application 3 processes processes processes processes processes processes Web of Data structures structures structures structures structures structures 127.0.0.1 127.0.0.2 127.0.0.3 127.0.0.1 127.0.0.2 127.0.0.3 a.) standard model b.) Web of Data model — public data changes the development paradigm. MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 24. A Key/Value Graph Example type = article type = article name = "Network..." name = "A Distributed..." created = 2/1/08 created = 12/1/07 C F type = article type = cites type = cites type = acknowledges name = "Algori..." weight = 1.0 weight = 1.0 weight = 1.0 created = 1/1/09 B type = authored D weight = 1.0 type = article type = authored type = authored name = "Linked..." weight =1.0 weight = 0.5 created = 1/30/09 type = peer-reviewed A weight = -1.0 E type = person type = person name = Marko name = Johan age = 29 age = 37 A scholarly graph. Both vertices and edges maintain a key/value pair map that allows metadata to be attached to them. MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 25. Key/Value Graph • G = (V, E ⊆ (V × V ), λ : (V ∪ E) × Ω → Σ), where Ω is the set of keys and Σ is the set of values. • Has a convenient representation in object-oriented programming languages and used by various standards and graph packages. GraphML (http://graphml.graphdrawing.org/). Neo4j (http://neo4j.org). NetworkX (http://networkx.lanl.gov). Confluence (http://markorodriguez.com/docs/conf/api/). iGraph (http://igraph.sourceforge.net/). MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 26. Outline • Introduction to Graph Structures The Single-Relational Graph The Multi-Relational Graph • A Multi-Relational Path Algebra • Application to Recommender Systems MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 27. Problem Statement • There is a need to port all the known single-relational graph analysis algorithms over to the multi-relational domain. Why?: There is a large body of algorithms in the domain of single- relational graph analysis. Why?: Multi-relational graph structures are becoming more prevalent and can be used to model more complex structures. • The set of single-relational graph analysis algorithms should not be “blindly” applied to multi-relational graphs. Why?: For example, marko, knows, johan says more about social communicaiton than marko, livesInSameCityAs, bob . Why?: Multi-relational graph analysis algorithms must respect the meaning of the edges. MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 28. Solution Statement • Provide an algebra to map a multi-relational graph to a “semantically-rich” single-relational graph that can be subjected to all the known single-relational graph analysis algorithms. Rodriguez M.A., Shinavier, J., “Exposing Multi-Relational Networks to Single-Relational Network Analysis Algorithms,” Journal of Informetrics, ISSN:1751-1577, Elsevier, doi:10.1016/j.joi.2009.06.004, http://arxiv.org/abs/0806.2274, LA-UR-08-03931, in press, 2009. MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 29. A Three-Way Tensor Representation of a Multi-Relational Graph As stated previously, a three-way tensor can be used to represent a multi-relational graph. If G = (V, E = {E0, E1, . . . , Em ⊆ (V × V )}) is a multi-relational graph, then A ∈ {0, 1}n×n×m and 1 if (i, j) ∈ Ek : k ≤ m Ak i,j = 0 otherwise. A is the three-way tensor representation of the multi-relational graph. MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 30. The General Purpose of the Path Algebra • Map a multi-relational tensor A ∈ {0, 1}n×n×m to a single-relational path matrix Z ∈ Rn×n — this path matrix is a weighted single-relational graph. + 24 72 0 1 1 0 0 24 1 0 0 0 1 1 2 0 0 0 0 0 0 72 0 4 0 0 0 0 0 0 23 0 0 0 0 ≡ 23 5 4 0 0 1 0 0 0 0 15.3 0 0 12 0 0 0 0 0 0 0 0 0 12 3 15.3 4 A ∈ {0, 1}n×n×m Z ∈ Rn×n + • The created single-relational graph’s edges are loaded with meaning. For example, given the right tensor, it is possible to create a coauthorship graph for scholars from the same university who are not on the same project, but share a graduate student. • The theorems of the algebra can be used to manipulate your operation to a more efficient form. MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 31. The Elements of the Path Algebra • A ∈ {0, 1}n×n×m: a three-way tensor representation of a multi-relational graph. • Z ∈ Rn×n: a path matrix derived by means of operations applied to A. + —————————————————————————————— • Cj ∈ {0, 1}n×n: a “to” path filter. • Ri ∈ {0, 1}n×n: a “from” path filter. • Ei,j ∈ {0, 1}n×n: an entry path filter. • I ∈ {0, 1}n×n: the identity matrix as a self-loop filter. • 1 ∈ 1n×n: a matrix in which all entries are equal to 1. • 0 ∈ 0n×n: a matrix in which all entries are equal to 0. MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 32. The Operations of the Path Algebra • A · B: ordinary matrix multiplication determines the number of (A, B)- paths between vertices. • A : matrix transpose inverts path directionality. • A ◦ B: Hadamard, entry-wise multiplication applies a filter to selectively exclude paths. • n(A): not generates the complement of a {0, 1}n×n matrix. • c(A): clip generates a {0, 1}n×n matrix from a Rn×n matrix. + • v ±(A): vertex generates a {0, 1}n×n matrix from a Rn×n matrix, where + only certain rows or columns contain non-zero values. • λA: scalar multiplication weights the entries of a matrix. • A + B: matrix addition merges paths. MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 33. A1 : authored A2 : cites A3 : contains A4 : category A5 : developed h ih ih ih ih i Example Scholarly Tensor Used in the Remainder of the Presentation • A1 authored : human → article • A2 cites : article → article • A3 contains : journal → article • A4 category : journal → subject category • A5 developed : human → program/software. MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 34. The Traverse Operation • An interesting aspect of the single-relational adjacency matrix A ∈ {0, 1}n×n is that when it is raised (k) to the kth power, the entry Ai,j is equal to the number of paths of length k that connect vertex i to vertex j . (1) • Given, by definition, that Ai,j (i.e. Ai,j ) represents the number of paths that go from i to j of length 1 (i.e. a single edge) and by the rules of ordinary matrix multiplication, (k) (k−1) Ai,j = Ai,l · Al,j : k ≥ 2. l∈V a b c a b c a b c a b c a 0 1 0 a 0 1 0 a 0 0 1 b 0 0 1 · b 0 0 1 = b 0 0 0 c 0 0 0 c 0 0 0 c 0 0 0 there is a path of length 2 from a to c MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 35. A1 : authored A2 : cites A3 : contains A4 : category A5 : developed h ih ih ih ih i The Traverse Operation Z = A1 · A2 · A1 , Zi,j defines the number of paths from vertex i to vertex j such that a path goes from author i to one the articles he or she has authored, from that article to one of the articles it cites, and finally, from that cited article to its author j . Semantically, Z is an author-citation single-relational path matrix. A2 Article B cites Article C A1 authored A1 authored Human A author-citation Human D Z • NOTE: All diagrams are with respect to a “source” vertex (the blue vertex) in order to preserve clarity. In reality, the operations operate on all vertices in parallel. MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 36. The Filter Operation Various path filters can be defined and applied using the entry-wise Hadamard matrix product denoted ◦, where   A1,1 · B1,1 · · · A1,m · B1,m A◦B= . . ... . . . An,1 · Bn,1 · · · An,m · Bn,m 24 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 72 0 4 0 0 1 0 0 0 0 72 0 0 0 23 0 0 0 0 ◦ 1 0 0 0 0 = 23 0 0 0 0 0 0 15.3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 0 0 0 0 0 0 0 0 0 0 Path Matrix Path Filter Filtered Path Matrix MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 37. The Filter Operation • A◦1=A • A◦0=0 • A◦B=B◦A • A ◦ (B + C) = (A ◦ B) + (A ◦ C) • A ◦ B = (A ◦ B) . MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 38. The Not Filter The not filter is useful for excluding a set of paths to or from a vertex. n : {0, 1}n×n → {0, 1}n×n with a function rule of 1 if Ai,j = 0 n(A)i,j = 0 otherwise. 0 0 1 1 1 1 1 0 0 0 1 0 1 0 1 0 1 0 1 0 n 0 1 1 1 1 = 1 0 0 0 0 1 1 0 1 1 0 0 1 0 0 1 1 1 1 0 0 0 0 0 1 MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 39. The Not Filter If A ∈ {0, 1}n×n, then • n(n(A)) = A • A ◦ n(A) = 0 • n(A) ◦ n(A) = n(A). MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 40. A1 : authored A2 : cites A3 : contains A4 : category A5 : developed h ih ih ih ih i The Not Filter A coauthorship path matrix is Z = A1 · A1 ◦ n(I) Article B A1 authored A1 authored Human A coauthor Human C Z n(I) coauthor MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 41. The Clip Filter The general purpose of clip is to take a path matrix and “clip,” or normalize, it to a {0, 1}n×n matrix. c : Rn×n → {0, 1}n×n + 1 if Zi,j > 0 c(Z)i,j = 0 otherwise. 24 1 0 0 0 1 1 0 0 0 0 72 0 4 0 0 1 0 1 0 c 23 0 0 0 0 = 1 0 0 0 0 0 0 15.3 0 0 0 0 1 0 0 0 0 0 0 12 0 0 0 0 1 MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 42. The Clip Filter If A, B ∈ {0, 1}n×n and Y, Z ∈ Rn×n, then + • c(A) = A • c(n(A)) = n(c(A)) = n(A) • c(Y ◦ Z) = c(Y) ◦ c(Z) • n(A ◦ B) = c (n(A) + n(B)) • n(A + B) = n(A) ◦ n(B) MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 43. A1 : authored A2 : cites A3 : contains A4 : category A5 : developed h ih ih ih ih i The Clip Filter Suppose we want to create an author citation path matrix that does not allow self citation or coauthor citations. „ « „ „ «« 1 2 1 1 1 Z= A ·A ·A ◦n c A · A ◦ n(I) ◦ n(I) |{z} | {z } | {z } no self cites no coauthors Z author-citation Human D authored 2 A A1 Article B cites Article C A 1 A1 authored authored authored Human A coauthor Human E n c A1 · A1 ◦ n(I) self n(I) MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 44. A1 : authored A2 : cites A3 : contains A4 : category A5 : developed h ih ih ih ih i The Clip Filter However, using various theorems of the algebra, Z = A1 · A2 · A1 ◦ n c A1 · A1 ◦ n(I) ◦ n(I) no self cites no coauthors becomes Z = A1 · A2 · A1 ◦ n c A1 · A1 ◦ n(I). MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 45. The Vertex Filter In many cases, it is important to filter out particular paths to and from a vertex. v − : Rn×n × N → {0, 1}n×n, + − 1 if k∈V Zi,k > 0 v (Z)i,j = 0 otherwise turns a non-zero column into an all 1-column and v + : Rn×n × N → {0, 1}n×n, + + 1 if k∈V Zk,j > 0 v (Z)i,j = 0 otherwise turns a non-zero row into an all 1-row. MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 46. The Vertex Filter 0 1 0 1 0 0 1 0 1 0 0 0 0 0 0 0 1 0 1 0 v− 0 2 0 32 0 = 0 1 0 1 0 0 23 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 1 0 v + not diagrammed, but acts the same except for makes 1-rows. Two import filters are the column and row filters, C ∈ {0, 1}n×n and R ∈ {0, 1}n×n , respectively. 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 C2 = 0 1 0 0 0 R3 = 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 47. The Vertex Filter • v −(Ci) = Ci • v +(Rj ) = Rj • v −(Z) = v +(Z ) • v +(Z) = v −(Z ) . MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 48. A1 : authored A2 : cites A3 : contains A4 : category A5 : developed h ih ih ih ih i The Vertex Filter Assume that vertex 1 is the social science subject category vertex and we want to create a journal citation graph for social science journals only. » „ «– + 4 3 2 3 − 4 h “ ” i Z= v C1 ◦ A ◦ A ·A · A ◦v R1 ◦ A . | {z } | {z } soc.sci. journal articles articles in soc.sci. journals social-science journal citation Z 1 Social Science category category A2 Article C contains Journal E A3 cites A3 v − R1 ◦ A4 Journal A contains Article B cites + 4 v C1 ◦ A 2 Article D contains Journal F A A3 MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 49. A1 : authored A2 : cites A3 : contains A4 : category A5 : developed h ih ih ih ih i The Vertex Filter + 4 3 h “ ” i v C1 ◦ A ◦A | {z } soc.sci. journal articles S J-A J-E J-F A-B A-C A-D S J-A J-E J-F A-B A-C A-D S J-A J-E J-F A-B A-C A-D S 1 0 0 0 0 0 0 S 0 0 0 0 0 0 0 S 0 0 0 0 0 0 0 J-A 1 0 0 0 0 0 0 J-A 1 0 0 0 0 0 0 J-A 1 0 0 0 0 0 0 J-E 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ◦ 0 J-E 0 J-E 0 = 1 1 1 J-F 1 0 0 0 0 0 0 J-F 0 0 0 0 0 0 0 J-F 0 0 0 0 0 0 0 A-B 1 0 0 0 0 0 0 A-B 0 0 0 0 0 0 0 A-B 0 0 0 0 0 0 0 A-C 1 0 0 0 0 0 0 A-C 0 0 0 0 0 0 0 A-C 0 0 0 0 0 0 0 A-D 1 0 0 0 0 0 0 A-D 0 0 0 0 0 0 0 A-D 0 0 0 0 0 0 0 C1 A4 C1 ◦ A4 S J-A J-E J-F A-B A-C A-D S J-A J-E J-F A-B A-C A-D S J-A J-E J-F A-B A-C A-D S 0 0 0 0 0 0 0 S 0 0 0 0 0 0 0 S 0 0 0 0 0 0 0 J-A 1 1 1 1 1 1 1 J-A 0 0 0 0 1 0 0 J-A 0 0 0 0 1 0 0 J-E 1 1 1 1 1 1 1 J-E 0 0 0 0 0 1 0 J-E 0 0 0 0 0 1 0 J-F 0 0 0 0 0 0 0 ◦ J-F 0 0 0 0 0 0 1 = J-F 0 0 0 0 0 0 0 A-B 0 0 0 0 0 0 0 A-B 0 0 0 0 0 0 0 A-B 0 0 0 0 0 0 0 A-C 0 0 0 0 0 0 0 A-C 0 0 0 0 0 0 0 A-C 0 0 0 0 0 0 0 A-D 0 0 0 0 0 0 0 A-D 0 0 0 0 0 0 0 A-D 0 0 0 0 0 0 0 v + (C1 ◦ A4 ) A3 v + (C1 ◦ A4 ) ◦ A3 MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 50. A1 : authored A2 : cites A3 : contains A4 : category A5 : developed h ih ih ih ih i The Vertex Filter Z = v + C1 ◦ A4 ◦ A3 ·A2 · A3 ◦ v − R1 ◦ A4 . soc.sci. journal articles articles in soc.sci. journals However, v − R1 ◦ A4 = v− C1 ◦ A4 Cx = Rx = v + C1 ◦ A4 v +(Z) = v −(Z ) . Therefore, because A ◦ B = (A ◦ B) , Z = v + C1 ◦ A4 ◦ A3 ·A2 · v + C1 ◦ A4 ◦ A3 . reused reused MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 51. The Weight and Merge Operations • λZ: scalar multiplication weights paths. • Y + Z: matrix addition merges paths. 24 1 0 0 0 0 1 0 0 0 24 2 0 0 0 0 72 0 4 0 0 10 0 0 0 0 82 0 4 0 23 0 0 0 0 + 1 0 34 0 0 = 24 0 34 0 0 0 0 15.3 0 0 0 0 0 0 0 0 0 15.3 0 0 0 0 0 0 12 0 0 0 0 2 0 0 0 0 14 MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 52. A1 : authored A2 : cites A3 : contains A4 : category A5 : developed h ih ih ih ih i The Weight and Merge Operations Z = 0.6 A1 · A1 ◦ n(I) + 0.4 A5 · A5 ◦ n(I) coauthorship co-development merges the article and software program collaboration path matrices as specified by their respective weights of 0.6 and 0.4. The semantics of the resultant is a software program and article collaboration path matrix that favors article collaboration over software program collaboration. A simplification of the previous composition is Z = 0.6 A1 · A1 + 0.4 A5 · A5 ◦ n(I). MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 53. Outline • Introduction to Graph Structures The Single-Relational Graph The Multi-Relational Graph • A Multi-Relational Path Algebra • Application to Recommender Systems MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 54. kReef: A Scholarly Recommendation Engine 1. The scholarly community is modeled using a multi-relational graph. 2. A “walker”-version of the path algebra is applied to the graph to support scholars. Graphical User Interface Analytics Grammar Walker Translators Engine Engine 2 Multi-Relational Graph Database 1 ontology instances Rodriguez, M.A., Allen, D.W., Shinavier, J., Ebersole, G., “A Recommender System to Support the Scholarly Communication Process,” KRS-2009-02, http://arxiv.org/abs/0905.1594, 2009. MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 55. kReef: Ontology Classes core:Reefsource Ag It Ev core:Agent core:Item core:Event Gr Pe Do Co Cf core:Group core:Person core:Document core:Collection core:Conference Cs Or Pj Ar Bo core:Course core:Organization core:Project core:Article core:Book Me Vg Jo core:Meeting Fu core:Viewgraph core:Journal Pn Ac Lb core:FundingOpportunity Wp core:Panel core:Academic core:Webpage core:Library Da Ps Cm Mg core:Dataset Md core:Presentation core:Commerical core:Magazine Sw core:Media Gv core:Software Np Ss Kn core:Government core:Newspaper core:Session core:Keynote Ca Au core:Call Po Se core:Audio core:Proceedings core:SocialEvent Im Cc Tu core:Image core:CallForChapters core:Tutorial Vi Cp core:Video Wk core:CallForPapers core:Workshop Cl core:CallForProposals Ct core:CallForTutorials Cw core:CallForWorkshops • NOTE: All edges denote an rdf:subClassOf relationship (either directly or inferred). MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 56. kReef: Ontology Properties Table 3: core:Item rdf:Property relations rdf:Property rdfs:domain rdfs:range core:cites core:Item core:Item core:containedIn core:Item core:Collection Table 1: core:Reefsource rdf:Property relations core:creationTime core:Item xsd:dateTime rdf:Property rdfs:domain rdfs:range core:doi core:Item xsd:anyURI core:publisher core:Item core:Group core:title core:Reefsource xsd:string core:dueDate core:Call xsd:dateTime core:abstract core:Reefsource xsd:string core:callFor core:Call core:Reefsource core:guid core:Reefsource xsd:string core:contains core:Collection core:Item core:editor core:Collection core:Agent core:isbn core:Collection xsd:anyURI core:issn core:Collection xsd:anyURI Table 2: core:Agent rdf:Property relations core:oaipmh core:Library xsd:anyURI rdf:Property rdfs:domain rdfs:range core:startPage core:Article xsd:int core:attends core:Agent core:Event core:endPage core:Article xsd:int core:created core:Agent core:Item core:number core:Article xsd:int core:member core:Group core:Person core:volume core:Article xsd:int core:subGroup core:Group core:Group core:firstName core:Person xsd:string core:lastName core:Person xsd:string core:occupation core:Person xsd:string Table 4: core:Event rdf:Property relations core:sex core:Person core:Gender rdf:Property rdfs:domain rdfs:range core:startTime core:Event xsd:dateTime core:endTime core:Event xsd:dateTime core:presents core:Event core:Item core:organizedBy core:Event core:Agent core:subEvent core:Event core:Event MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 57. kReef: Instance Data Ingestion Connotea arXiv CiteULike Multi-Relational Graph Database CogPrints ontology CogPrints instances CiteSeer BibSonomy CrossRef ACM, IEEE, IOP, Springer, Blackwell, Elsevier, etc. MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 58. kReef: Grammar Walker Engine Overview • A walker-based implementation of the path algebra is applied to the scholarly model in order to support scholars in their professional lives. The path description is known as a “grammar” because it can be modeled as a finite state machine embedded in the walker. identify articles related to some interesting resource. identify collaborators for a funding opportunity. identify a publication venue for a newly created article. identify referees to review an article. identify resources of interest in one’s community. Rodriguez, M.A., “Grammar-Based Random Walkers in Semantic Networks,” Knowledge-Based Systems, 21(7), pp. 727–739, http://arxiv.org/abs/0803.4355, 2008. MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 59. kReef: Grammar Walker Engine Algorithm, Part 1 • First, when trying to solve a recommendation problem, determine which abstract path should be searched to find a solution — this is usually based on hunch and then validated using real-world data. For example, what makes a good peer-reviewer/referee for an article: someone that is cited by the article and their respective coauthors. Moreover, a referee should not include the authors of the article or their coauthors one step away in the coauthorship network (conflict of interest). • Let us denote the path description/grammar/contraint ψ. Rodriguez, M.A., Bollen, J., “An Algorithm to Determine Peer-Reviewers,” Conference on Information and Knowledge Management (CIKM), pp. 319–328, http://arxiv.org/abs/cs/0605112, 2008. MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 60. kReef: Grammar Walker Engine Algorithm, Part 2 • Program a collection of discrete walkers to traverse the abstract path defined by ψ. Each walker starts at some vertex i ∈ V and with an energy value ∈ R. As it walks the graph, its energy decays. Given the peer-review/referee example, the source vertex is the article that requires a set of referees. ψ t=3 t=1 t=2 i MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 61. kReef: Grammar Walker Engine Algorithm, Part 3 • The solution to the problem is where the highest energy flow in the network exists after k time steps. Given the peer-review example, the highest energy vertices are those people most competent to review the article in question. In short, Ψ × P(V ) → ω, where Ψ is the set of all grammars, P(V ) is the set of all sets of source vertices, and ω : V → R is the resultant energy flow for each vertex in the graph. Or, Grammar × Set<Vertex> → Map<Vertex, Double> . path description source vertices ranked results MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 62. Other Application Scenarios • Populating metadata poor resources with data propagated from metadata rich resources. Walkers take particular paths, pick up metadata from rich resources, and attach metadata to atrophied resources. Rodriguez M.A., Bollen, J., Van de Sompel, H., “Automatic Metadata Generation using Associative Networks,” ACM Transactions on Information Systems, 27(2), pp. 1–20, http://arxiv.org/abs/0807.0023, 2009. • Generate a context-senstive representative decision-making structure that reflects the voting behavior of the full population even as the actual voting population wanes in size. Rodriguez, M.A., “Social Decision Making with Multi-Relational Networks and Grammar-Based Particle Swarms,” Hawaii International Conference on Systems Science (HICSS), pp. 39–49, http://arxiv.org/abs/cs/0609034, 2007. MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 63. Future Work in this Area • Further develop the path algebra. Explore other matrix and tensor operations and determine if they are meaningful in the context of manipulating multi-relational graphs. • Develop a programming language (Turing Complete?) to easily represent path descriptions for walkers. Make it easier for developers to deploy swarms of walkers within a multi-relational network for various application scenarios. Recommender systems Vertex and edge ranking systems Information retrieval systems General graph analysis MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009
  • 64. Conclusion • Thank you for your time... My homepage: http://markorodriguez.com Linked Process: http://linkedprocess.org Neno/Fhat: http://neno.lanl.gov Collective Decision Making Systems: http://cdms.lanl.gov Faith in the Algorithm: http://faithinthealgorithm.net MESUR: http://www.mesur.org MIT Lincoln Laboratory Lecture – Lexington, Massachusetts – October 27, 2009