SlideShare a Scribd company logo
1 of 71
Download to read offline
The Network:
A Data Structure that Links Domains

             Marko A. Rodriguez

       Los Alamos National Laboratory
           Vrije Universiteit Brussel
     University of California at Santa Cruz




            marko@lanl.gov
    http://www.soe.ucsc.edu/~okram


                    Marko A. Rodriguez
      University of New Mexico, September 14, 2007
About me.

•   Marko Antonio Rodriguez.




•   Bachelors of Science in Cognitive Science from U.C. San Diego.
•   Minor in the Arts in Computer Music from U.C. San Diego.
•   Masters of Science in Computer Science from U.C. Santa Cruz.
•   Visiting Researcher at the Center for Evolution, Complexity, and
    Cognition at the Free University of Brussels.
•   Ph.d. in Computer Science from U.C. Santa Cruz [soon].
     o   I defend November 15, 2007!
•   Researcher at the Los Alamos National Laboratory since 2005.



                                      Marko A. Rodriguez
                        University of New Mexico, September 14, 2007
Research trends.

•   MESUR: Metrics from Scholarly Usage of Resources.
    (http://www.mesur.org)




•   Neno/Fhat: A Semantic Network Programming Language and Virtual
    Machine Architecture. (http://neno.lanl.gov)




•   CDMS: Collective Decision Making Systems. (http://cdms.lanl.gov)




                                    Marko A. Rodriguez
                      University of New Mexico, September 14, 2007
What is a network?

•   A network is a data structure that is used to connect vertices/nodes/dots by
    means of edges/links/lines.

•   Networks are everywhere.
     o   Social: friendship, trust, communication, collaboration.
     o   Technological: web-pages, communication, software dependencies, circuits.
     o   Scholarly: journals, authors, articles, institutions.
     o   Natural: protein interaction, neural, food web.




                                        Marko A. Rodriguez
                          University of New Mexico, September 14, 2007
The undirected network.

•   There is the undirected network of common knowledge.
     o   Sometimes called an undirected single-relational network.
     o   e.g. vertex i and vertex j are “related”.


•   The semantic of the edge denotes the network type.
     o   e.g. friendship network, collaboration network, etc.


                       i                                           j




                                         Marko A. Rodriguez
                           University of New Mexico, September 14, 2007
Example undirected network.



         Jen
                                       Alberto                        Whenzong




  Luda
                                        Aric
         Herbert                                                    Zhiwu


                                             Ed

Johan
                                                                  Stephan
               Marko



                                 Marko A. Rodriguez
                   University of New Mexico, September 14, 2007
The directed network.


•   Then there is the directed network of common knowledge.
     o   Sometimes called a directed single-relational network.
     o   For example, vertex i is related to vertex j, but j is not related to i.



                        i                                         j




                                          Marko A. Rodriguez
                            University of New Mexico, September 14, 2007
Example directed network.



         Human
                                         Lion                               Hyena




  Deer
                                         Fish
          Muskrat                                                   Meerkat


                                             Fox

Wolf
                                                                   Beetle
                    Bear



                                  Marko A. Rodriguez
                    University of New Mexico, September 14, 2007
The semantic network.


•   Finally, there is the semantic network
     o   Sometimes called a directed multi-relational network.
     o   For example, vertex i is related to vertex j by the semantic s, but j is not
         related to i by the semantic s.



                       i                                        j
                                         s




                                         Marko A. Rodriguez
                           University of New Mexico, September 14, 2007
Example semantic network.



               LANL           hasLab
                                           UnitedStates                              Arnold
researches
                      locatedIn                                       stateOf
                                                    stateOf                            governerOf
  Atoms
                              cityOf       NewMexico
                SantaFe                                                        California

    madeOf                hasResident                         originallyFrom
                                                   Ryan                                northOf
                          livesIn                                         southOf
Cells        madeOf                    worksWith
                                                                             Oregon
                         Marko



                                         Marko A. Rodriguez
                           University of New Mexico, September 14, 2007
What are the techniques for analysis?

•   Degree statistics
     o   How many in- and out-edges does vertex i have?
     o   What is the maximum and minimum in- and out-degree of the network?



•   Shortest-path metrics
     o   What is the smallest number of steps to get from vertex i to vertex j?
     o   How many of the shortest-paths go through vertex i?



•   Power metrics
     o   What vertices are the most “influential”?



•   Metadata distributions
     o   What is the probability that a vertex of type x is connected to a vertex of type y?




                                          Marko A. Rodriguez
                            University of New Mexico, September 14, 2007
Degree statistics.                                    Max_out = 4
                                                                                               Max_in = 4
                                                                                               Min_out = 0
                     out = 2
                                                                                               Min_in = 0
                      in = 1                         out = 3
                     Human                            in = 0
                                                                                 out = 1
                                                       Lion                                   Hyena
                                                                                  in = 0
out = 0
 in = 2                                                       out = 0
                        out = 1                                in = 4
      Deer               in = 1
                                                       Fish                                           out = 1
                       Muskrat                                                         Meerkat
                                                                                                       in = 3

           out = 1                                         Fox
            in = 1
                                                         out = 1
    Wolf                                                                                         out = 0
                                                          in = 1                     Beetle
                                                                                                  in = 1
                                  Bear     out = 4
                                            in = 0

                                                Marko A. Rodriguez
                                  University of New Mexico, September 14, 2007
Shortest-path between Marko and Aric.



         Jen
                                       Alberto                        Whenzong




  Luda
                                        Aric
         Herbert                                                    Zhiwu


                                             Ed

Johan
                                                                  Stephan
               Marko
                                Shortest path = 1


                                 Marko A. Rodriguez
                   University of New Mexico, September 14, 2007
Eccentricity of Marko.



         Jen
                                       Alberto                        Whenzong




  Luda
                                        Aric
         Herbert                                                    Zhiwu


                                             Ed

Johan
                                                                  Stephan
               Marko
                                 Eccentricity = 3


                                 Marko A. Rodriguez
                   University of New Mexico, September 14, 2007
Radius of the network.



         Jen
                                       Alberto                        Whenzong




  Luda
                                        Aric
         Herbert                                                    Zhiwu


                                             Ed

Johan
                                                                  Stephan
               Marko
                                     Radius = 3


                                 Marko A. Rodriguez
                   University of New Mexico, September 14, 2007
Diameter of the network.



         Jen
                                       Alberto                        Whenzong




  Luda
                                        Aric
         Herbert                                                    Zhiwu


                                             Ed

Johan
                                                                  Stephan
               Marko
                                    Diameter = 4


                                 Marko A. Rodriguez
                   University of New Mexico, September 14, 2007
Closeness of Marko.



         Jen
                                       Alberto                        Whenzong




  Luda
                                        Aric
         Herbert                                                    Zhiwu


                                             Ed

Johan
                                                                  Stephan
               Marko
                                 Closeness = 0.0526


                                 Marko A. Rodriguez
                   University of New Mexico, September 14, 2007
Shortest-path metrics.

Shortest-path                                            Eccentricity




   Radius                                                  Diameter




  Closeness                                             Betweenness




                              Marko A. Rodriguez
                University of New Mexico, September 14, 2007
Power metrics.

•   Eigenvector Centrality
     o   Rank vertices according to the primary eigenvector of the adjacency matrix
         representing the network.
     o   In the language of Markov chains, find the stationary probability distribution of the
         chain.




•   PageRank
     o   Ensure a real-valued ranking by introducing a “teleportation-network” which
         ensures strong connectivity (used by Google).




                                          Marko A. Rodriguez
                            University of New Mexico, September 14, 2007
The components to calculate a
                      stationary probability distribution.

•   Take a single “random walker”.

•   Place that random walker on any random vertex in the network.          a

•   At every time step, the random walker transitions from its current node to an
    adjacent node in the network (i.e. takes a random outgoing edge from its current
    node.)

•   Anytime the random walker is at a node, increment a “times visited” counter by 1.   1



•   Let this algorithm run for an “infinite” amount of time.

•   Normalize the “times visited” counters.
     o   That is your centrality vector.          0.0123




                                          Marko A. Rodriguez
                            University of New Mexico, September 14, 2007
Random walker example.




                                   b           0




0   a

                                                       d   0




        0      c




                      Marko A. Rodriguez
        University of New Mexico, September 14, 2007
Random walker example.




                                   b           0




1   a

                                                       d   0




        0      c




                      Marko A. Rodriguez
        University of New Mexico, September 14, 2007
Random walker example.




                                   b           1




1   a

                                                       d   0




        0      c




                      Marko A. Rodriguez
        University of New Mexico, September 14, 2007
Random walker example.




                                   b           1




1   a

                                                       d   1




        0      c




                      Marko A. Rodriguez
        University of New Mexico, September 14, 2007
Random walker example.




                                   b           1




1   a

                                                       d   1




        1      c




                      Marko A. Rodriguez
        University of New Mexico, September 14, 2007
Random walker example.




                                   b           2




1   a

                                                       d   1




        1      c




                      Marko A. Rodriguez
        University of New Mexico, September 14, 2007
Random walker example.




                                   b           2




1   a

                                                       d   1




        2      c




                      Marko A. Rodriguez
        University of New Mexico, September 14, 2007
Random walker example.




                                   b           2




2   a

                                                       d   1




        2      c




                      Marko A. Rodriguez
        University of New Mexico, September 14, 2007
Random walker example.




                                   b           3




2   a

                                                       d   1




        2      c




                      Marko A. Rodriguez
        University of New Mexico, September 14, 2007
Random walker example.




                                   b           3




2   a

                                                       d   2




        2      c




                      Marko A. Rodriguez
        University of New Mexico, September 14, 2007
Random walker example.




                                   b           3




2   a

                                                       d   2




        3      c




                      Marko A. Rodriguez
        University of New Mexico, September 14, 2007
Random walker example.




                                   b           4




2   a

                                                       d   2




        3      c




                      Marko A. Rodriguez
        University of New Mexico, September 14, 2007
Random walker example.




                                        b           133321




66785   a

                                                             d   66784




        133310      c




                           Marko A. Rodriguez
             University of New Mexico, September 14, 2007
Random walker example.




                                           b           0.332




0.167   a

                                                               d   0.167




            0.332      c




                              Marko A. Rodriguez
                University of New Mexico, September 14, 2007
PageRank.

•   The random walker has 0.85 probability of using G as its propagation network
    and a 0.15 probability of using H as its propagation network (Google’s published
    alpha value).
•   Every node is reachable by every other node and thus, is strongly connected.
•   A strongly connected network guarantees a stationary probability distribution.




                                       Marko A. Rodriguez
                         University of New Mexico, September 14, 2007
Metadata distribution metrics.

•   Scalar Assortativity
     o  What’s the probability of encountering a node with degree x?
     o  What’s the probability of encountering a node with degree x that is
        connected to a node of degree y?
     o  What’s the probability of encountering a node with degree x that is
        connected to a node of degree y that is connected to a node of degree z?
     o  …

•   Discrete Assortativity
     o  What’s the probability of encountering a node with metadata x?
     o  What’s the probability of encountering a node with metadata x that is
        connected to a node of metadata y?
     o  What’s the probability of encountering a node with metadata x that is
        connected to a node of metadata y that is connected to a node of metadata
        z?
     o  …


                                       Marko A. Rodriguez
                         University of New Mexico, September 14, 2007
The CENS network dataset.

•   Center for Embedded Network Sensing at U.C. Los Angeles.
•   “An interdisciplinary and multi-institutional venture, CENS involves hundreds of
    faculty, engineers, graduate student researchers, and undergraduate students
    from multiple disciplines at the partner institutions of University of California at
    Los Angeles (UCLA), University of Southern California (USC), University of
    California Riverside (UCR), California Institute of Technology (Caltech),
    University of California at Merced (UCM), and California State University at Los
    Angeles (CSULA).”




                                         Marko A. Rodriguez
                           University of New Mexico, September 14, 2007
Everything is metadata.


   Affilation     LANL
   Department     Research Library
   Gender         Male
   JobRank        Ph.D. Student
   Buidling       P362
   Lab            Prototyping Team
   Advisor        Johan Bollen
   Degree         2

                Marko




 ….                               ….




              Marko A. Rodriguez
University of New Mexico, September 14, 2007
The CENS coauthorship network.




•   UCLA - red
•   USC - orange
•   Coventry - green
•   …

•   Ph.D. - ellipse
•   Professor - hexagon
•   …




                                         Marko A. Rodriguez
                           University of New Mexico, September 14, 2007
1st-order degree distributions in CENS coauthorship network.




                               Marko A. Rodriguez
                 University of New Mexico, September 14, 2007
2nd-order degree distributions in CENS coauthorship network.




                               Marko A. Rodriguez
                 University of New Mexico, September 14, 2007
2nd-order degree assortativity in CENS coauthorship network.




                                       •    Pearson correlation on edge degrees.
                                              o   r in [-1,1]



                                       •    r = 0.212




                               Marko A. Rodriguez
                 University of New Mexico, September 14, 2007
2nd-order degree assortativity in other networks.


•   Physics coauthorship: 0.363
•   Biology coauthorship: 0.127
•   Mathematics coauthorship: 0.120
•   Film actor collaborations: 0.208

•   Internet: -0.189
•   World Wide Web: -0.065

•   Neural network: -0.163
•   Marine food web: -0.247

•   Random graph: 0.0
•   Regular graph: 1.0


Newman, M.J., “Assortative Mixing in Networks”, Physical Review Letters, 89(20), 2002.


                                          Marko A. Rodriguez
                            University of New Mexico, September 14, 2007
Metadata path frequencies.
•   1064.0   UCLA        UCLA             •    376.0   Phd                  Professor
•   442.0    USC         USC              •    254.0   Phd                  Researcher
•   336.0    USC         UCLA             •    242.0   Researcher           Professor
•   76.0     MIT         UCLA             •    184.0   Phd                  Phd
•   58.0     UCLA        UCM
                                          •    142.0   Professor            Professor
•   32.0     Caltech     UCLA

•   1186.0   Male        Male             •    304.0               US       US
•   508.0    Male        Female           •    156.0               India    US
•   78.0     Female      Female           •    70.0                India    India
                                          •    58.0                US       China
•   750.0    CS          CS               •    36.0                India    China
•   388.0    EE          CS               •    28.0                China    China
•   340.0    EE          EE               •    24.0                US       Italy
•   84.0     CS          CivilEng         •    18.0                Iran     India
•   78.0     Biology     CS               •    14.0                Iran     US
•   74.0     CivilEng    CivilEng         •    12.0                Greece   India

                                       Marko A. Rodriguez
                         University of New Mexico, September 14, 2007
2nd-order metadata assortativity in CENS coauthorship
                      network.




                                                •   0.696    Gender
                                                •   0.641    Affiliation
                                                •   0.513    Department
                                                •   0.482    Advisor
                                                •   0.426    Lab
                                                •   0.319    Building
                                                •   0.290    Origin
                                                •   0.168    JobRank
                                                •   0.042    Room




                            Marko A. Rodriguez
              University of New Mexico, September 14, 2007
3rd-order metadata assortativity in CENS coauthorship
                      network.




                                                •   0.471    Gender
                                                •   0.435    Affiliation
                                                •   0.290    Department
                                                •   0.225    Origin
                                                •   0.207    Advisor
                                                •   0.195    Lab
                                                •   0.170    Building
                                                •   0.032    JobRank
                                                •   0.004    Room




                            Marko A. Rodriguez
              University of New Mexico, September 14, 2007
Metadata compression.

•   229.0    US:Male                               US:Female
•   228.0    US:Male                               US:Male
•   197.0    US:Male                               India:Male
•   121.0    India:Male                            India:Male
•   76.0     India:Male                            US:Female
•   36.0     US:Male                               China:Male
•   30.0     US:Female                             US:Female
•   21.0     Taiwan:Male                           China:Male
•   19.0     US:Male                               Italy:Male
•   18.0     India:Male                            South Korea:Male
•   17.0     India:Male                            China:Male
•   17.0     India:Male                            Australia:Male
•   16.0     China:Male                            China:Male
•   16.0     India:Male                            Greece:Male
•   16.0     US:Male                               Mexico:Male




                          Marko A. Rodriguez
            University of New Mexico, September 14, 2007
3rd-order metadata compression.




•   1402.0    UCLA:Phd               UCLA:Professor             UCLA:Phd
•   879.0     UCLA:Researcher        UCLA:Professor             UCLA:Phd
•   605.0     UCLA:Professor         UCLA:Professor             UCLA:Phd
•   512.0     UCLA:Researcher        UCLA:Professor             UCLA:Researcher
•   380.0     UCLA:Researcher        UCLA:Professor             UCLA:Professor
•   304.0     USC:Phd                UCLA:Professor             UCLA:Phd
•   294.0     USC:Phd                USC:Professor              USC:Phd
•   272.0     UCLA:Phd               UCLA:Phd                   UCLA:Phd
•   270.0     UCLA:Professor         UCLA:Phd                   UCLA:Phd




                               Marko A. Rodriguez
                 University of New Mexico, September 14, 2007
Breather.




              Marko A. Rodriguez
University of New Mexico, September 14, 2007
Example semantic network.



               LANL           hasLab
                                           UnitedStates                              Arnold
researches
                      locatedIn                                       stateOf
                                                    stateOf                            governerOf
  Atoms
                              cityOf       NewMexico
                SantaFe                                                        California

    madeOf                hasResident                         originallyFrom
                                                   Ryan                                northOf
                          livesIn                                         southOf
Cells        madeOf                    worksWith
                                                                             Oregon
                         Marko



                                         Marko A. Rodriguez
                           University of New Mexico, September 14, 2007
What is the Semantic Web?


•   The figurehead of the Semantic Web initiative, Tim Berners-Lee, describes the
    Semantic Web as
     o   “... an extension of the current web in which information is given well-defined
         meaning, better enabling computers and people to work in cooperation.”

•   Perhaps not the best definition. It implies a particular application space--namely
    the “web metadata and intelligent agents” space.

•   My definition is that the Semantic Web is
     o   “a highly-distributed, standardized semantic network data model--a URG (Uniform
         Resource Graph). It’s a uniform way of graphing resources.”




                                          Marko A. Rodriguez
                            University of New Mexico, September 14, 2007
What is a resource?

•   Resource = Anything.
     o   Anything that can be identified.

•   The Uniform Resource Identifier (URI):
     o   <scheme name> : <hierarchical part> [ ? <query> ] [ # <fragment> ]
           -   http://www.lanl.gov
           -   urn:uuid:550e8400-e29b-41d4-a716-446655440000
           -   urn:issn:0892-3310
           -   http://www.lanl.gov#MarkoRodriguez
                  –   prefix it to make it easier on the eyes -- lanl:MarkoRodriguez


•   The Semantic Web
     o   “first identify it, then relate it!”




                                              Marko A. Rodriguez
                                University of New Mexico, September 14, 2007
The technologies of the Semantic Web.


•   Resource Description Framework (RDF): The foundation technology of the
    Semantic Web. RDF is a highly-distributed, semantic network data model. In
    RDF, URIs and literals (e.g. ints, doubles, strings) are related to one another in
    triples.
     o   <lanl:marko> <lanl:worksWith> <lanl:jhw>
     o   <lanl:jhw> <lanl:wrote> <lanl:LAUR-07-2028>
     o   <lanl:LAUR-07-2028> <lanl:hasTitle> “Web-Based Collective Decision Making
         Systems”^^<xsd:string>


•   RDF Schema (RDFS): The ontology is to the Semantic Web as the schema is to
    the relational database.
     o   “Anything of rdf:type lanl:Human can lanl:drive anything of rdf:type lanl:Car.”


•   Triple-Store: The triple-store is to semantic networks what the relational
    database is to the data table.
     o   a.k.a. semantic repository, graph database, RDF database.



                                         Marko A. Rodriguez
                           University of New Mexico, September 14, 2007
RDF and RDFS.




                     lanl:Human                                                  lanl:Food
                                                                  rdfs:range
                                             rdfs:domain

                                                  lanl:isEating
ontology               rdf:type                                                             rdf:type

instance
                                                     lanl:isEating
                     lanl:marko                                                     lanl:cookie


 RDF is not a syntax. It’s a data model. Various syntaxes exist to encode RDF including RDF/XML, N-TRIPLE, TRiX, N3, etc.




                                                  Marko A. Rodriguez
                                    University of New Mexico, September 14, 2007
The triple-store.

•    There are two primary ways to distribute information on the Semantic Web.
      o   1.) publish RDF/XML document on a web server.
      o   2.) expose a public interface to an RDF triple-store.
•    The triple store is to semantic networks what the relational database is to data
     tables.
      o   Storing and querying triples in a triple store.
      o   SPARQLUpdate query language.
            - like SQL, but for triple-stores.


    SELECT ?a ?c WHERE                                 INSERT ?a coauthor ?c WHERE
          { ?a type human                                    { ?a type human
            ?a wrote ?b                                        ?a wrote ?b
            ?b type article                                    ?b type article
            ?c wrote ?b                                        ?c wrote ?b
            ?c type human                                      ?c type human
            ?a != ?c }                                         ?a != ?c }

    DELETE ?s ?p ?o WHERE { ?s ?p ?o }


                                            Marko A. Rodriguez
                              University of New Mexico, September 14, 2007
Triple-store vs. relational database.


                Triple-store                                                 Relational Database




             SPARQL Interface                                                 SQL Interface



SELECT (?x4)                                     SELECT collaboratesWithTable.ordId2
 WHERE {                                           FROM personTable, authorTable, articleTable, friendTable,
                                                     hasEmployeeTable, organizationTable, worksForTable,
  ?x1 dc:creator lanl:LAUR-06-2139.                  collaboratesWithTable
  ?x1 lanl:hasFriend ?x2 .                         WHERE
  ?x2 lanl:worksFor ?x3 .                           personTable.id = authorTable.personId AND
  ?x3 lanl:collaboratesWith ?x4 .                   authorTable.articleId = "dc:creator LAUR-06-2139" AND
  ?x4 lanl:hasEmployee ?x1 . }                      personTable.id = friendTable.personId1 AND
                                                    friendTable.personId2 = worksForTable.personId AND
                                                    worksForTable.orgId = collaboratesWithTable.orgId2 AND
                                                    collaboratesWithTable.ordId2 = personTable.id




                                                  Marko A. Rodriguez
                                    University of New Mexico, September 14, 2007
Semantic network metrics?

•   What does it means to run a shortest-path calculation on a semantic network?
     o   Shortest-path along which semantic--which edge type(s)?




•   What does it mean to calculate PageRank on a semantic network?
     o   What are legal semantics for the random walker?




                                        Marko A. Rodriguez
                          University of New Mexico, September 14, 2007
Shortest-path metrics in a semantic network?




                                        lanl:hasFriend
                lanl:marko                                        lanl:johan
  lanl:hasFriend
                                                                           lanl:hasFriend
                               lanl:livesInSameCityAs

lanl:chuck
                                                                    lanl:bob

                                             lanl:hasFriend
             lanl:jill

“What is the shortest path between lanl:marko and lanl:jill by taking only lanl:hasFriend edges?”

                                          Marko A. Rodriguez
                            University of New Mexico, September 14, 2007
PageRank in a semantic network?

lanl:Human                             lanl:Article
                                                rdf:type
             rdf:type

rdf:type   rdf:type
                                          lanl:p1

                          lanl:wrote
                                                             lanl:wrote
                 ?




                          ?
                      ?


                lanl:marko                                 lanl:johan
                              lanl:hasFriend



              lanl:chuck

                                        Marko A. Rodriguez
                          University of New Mexico, September 14, 2007
Grammar-based geodesics and random walkers.

•   How do you port many of the undirected and directed single-relational network
    analysis algorithms over to the semantic network domain?
     o   My solution is what I call a grammar.



•   Nearly every network analysis algorithm can be represented in terms of a walker
    traversing a network.
     o   Geodesics.
     o   PageRank
     o   Metadata paths.
     o   etc.




                                         Marko A. Rodriguez
                           University of New Mexico, September 14, 2007
Components of a grammar-based walker.

•   A walker.
     o   Discrete element.

•   A grammar.
     o   An abstract representation of legal path for the walker take.
           - e.g. “you can traverse a lanl:friendOf edge from a lanl:Human to another
             lanl:Human.”
           - Also includes rules: “increment a counter.”, “don’t ever return to this vertex.”


•   A data set that respects the ontological “expectations” of the grammar.




                                           Marko A. Rodriguez
                             University of New Mexico, September 14, 2007
Grammar-based PageRank example.

lanl:Human                           lanl:Article
                                              rdf:type
             rdf:type

rdf:type   rdf:type
                                         lanl:p1

                        lanl:wrote
                                                           lanl:wrote




                lanl:marko               0               lanl:johan             0


                         lanl:hasFriend              “Take only lanl:wrote out-edge to a resource of
                                                     rdf:type lanl:Article. Then take a lanl:wrote in-
                                                     edge to a resource of rdf:type lanl:Human.
                                                     Increment only lanl:Humans. Make sure that the
              lanl:chuck             0               lanl:Human seen before is not the same
                                                     lanl:Human currently. Repeat infinitely.”


                                      Marko A. Rodriguez
                        University of New Mexico, September 14, 2007
Grammar-based PageRank example.

lanl:Human                           lanl:Article
                                              rdf:type
             rdf:type

rdf:type   rdf:type
                                         lanl:p1

                        lanl:wrote
                                                           lanl:wrote




                lanl:marko               1               lanl:johan             0


                         lanl:hasFriend              “Take only lanl:wrote out-edge to a resource of
                                                     rdf:type lanl:Article. Then take a lanl:wrote in-
                                                     edge to a resource of rdf:type lanl:Human.
                                                     Increment only lanl:Humans. Make sure that the
              lanl:chuck             0               lanl:Human seen before is not the same
                                                     lanl:Human currently. Repeat infinitely.”


                                      Marko A. Rodriguez
                        University of New Mexico, September 14, 2007
Grammar-based PageRank example.

lanl:Human                           lanl:Article
                                              rdf:type
             rdf:type

rdf:type   rdf:type
                                         lanl:p1

                        lanl:wrote
                                                           lanl:wrote




                lanl:marko               1               lanl:johan             0


                         lanl:hasFriend              “Take only lanl:wrote out-edge to a resource of
                                                     rdf:type lanl:Article. Then take a lanl:wrote in-
                                                     edge to a resource of rdf:type lanl:Human.
                                                     Increment only lanl:Humans. Make sure that the
              lanl:chuck             0               lanl:Human seen before is not the same
                                                     lanl:Human currently. Repeat infinitely.”


                                      Marko A. Rodriguez
                        University of New Mexico, September 14, 2007
Grammar-based PageRank example.

lanl:Human                           lanl:Article
                                              rdf:type
             rdf:type

rdf:type   rdf:type
                                         lanl:p1

                        lanl:wrote
                                                           lanl:wrote




                lanl:marko               1               lanl:johan             1


                         lanl:hasFriend              “Take only lanl:wrote out-edge to a resource of
                                                     rdf:type lanl:Article. Then take a lanl:wrote in-
                                                     edge to a resource of rdf:type lanl:Human.
                                                     Increment only lanl:Humans. Make sure that the
              lanl:chuck             0               lanl:Human seen before is not the same
                                                     lanl:Human currently. Repeat infinitely.”


                                      Marko A. Rodriguez
                        University of New Mexico, September 14, 2007
Grammar-based PageRank example.

lanl:Human                           lanl:Article
                                              rdf:type
             rdf:type

rdf:type   rdf:type
                                         lanl:p1

                        lanl:wrote
                                                           lanl:wrote




                lanl:marko               1               lanl:johan             1


                         lanl:hasFriend              “Take only lanl:wrote out-edge to a resource of
                                                     rdf:type lanl:Article. Then take a lanl:wrote in-
                                                     edge to a resource of rdf:type lanl:Human.
                                                     Increment only lanl:Humans. Make sure that the
              lanl:chuck             0               lanl:Human seen before is not the same
                                                     lanl:Human currently. Repeat infinitely.”


                                      Marko A. Rodriguez
                        University of New Mexico, September 14, 2007
Grammar-based PageRank example.

lanl:Human                           lanl:Article
                                              rdf:type
             rdf:type

rdf:type   rdf:type
                                         lanl:p1

                        lanl:wrote
                                                           lanl:wrote




                lanl:marko               2               lanl:johan             1


                         lanl:hasFriend              “Take only lanl:wrote out-edge to a resource of
                                                     rdf:type lanl:Article. Then take a lanl:wrote in-
                                                     edge to a resource of rdf:type lanl:Human.
                                                     Increment only lanl:Humans. Make sure that the
              lanl:chuck             0               lanl:Human seen before is not the same
                                                     lanl:Human currently. Repeat infinitely.”


                                      Marko A. Rodriguez
                        University of New Mexico, September 14, 2007
Grammar-based PageRank example.

lanl:Human                           lanl:Article
                                              rdf:type
             rdf:type

rdf:type   rdf:type
                                         lanl:p1

                        lanl:wrote
                                                           lanl:wrote




                lanl:marko               2               lanl:johan             1


                         lanl:hasFriend              “Take only lanl:wrote out-edge to a resource of
                                                     rdf:type lanl:Article. Then take a lanl:wrote in-
                                                     edge to a resource of rdf:type lanl:Human.
                                                     Increment only lanl:Humans. Make sure that the
              lanl:chuck             0               lanl:Human seen before is not the same
                                                     lanl:Human currently. Repeat infinitely.”


                                      Marko A. Rodriguez
                        University of New Mexico, September 14, 2007
Grammars create implicit relationships.

lanl:Human                           lanl:Article
                                              rdf:type
             rdf:type

rdf:type   rdf:type
                                       lanl:p1

                        lanl:wrote
                                                           lanl:wrote




                lanl:marko                               lanl:johan
                                      lanl:hasCoauthor
                         lanl:hasFriend



              lanl:chuck

                                      Marko A. Rodriguez
                        University of New Mexico, September 14, 2007
Conclusions.

•   Many data sets can be represented as a network of “actors”.
•   There exists many network analysis algorithms.
     o   Shortest-path metrics.
     o   Eigenvector-based metrics.
     o   Assortativity coefficients.

•   The semantic network data structure is a less studied data model.
     o   Semantic Web community doesn’t take a network approach to their substrate.

•   The grammar technique can be used to port many of the common network
    analysis algorithms to the semantic network domain.
     o   Grammar-based geodesics.
     o   Grammar-based random walkers.




                                         Marko A. Rodriguez
                           University of New Mexico, September 14, 2007
Related publications.

•   Rodriguez, M.A., Watkins, J.H., Bollen, J., Gershenson, C., “Using RDF to Model the Structure and
    Process of Systems”, International Conference on Complex Systems, Boston, Massachusetts, LAUR-07-
    5720, October 2007.
•   Rodriguez, M.A., Bollen, J., Van de Sompel, H., “A Practical Ontology for the Large-Scale Modeling of
    Scholarly Artifacts and their Usage”, 2007 ACM/IEEE Joint Conference on Digital Libraries, pages 278-
    287, Vancouver, Canada, ACM/IEEE Computing, doi:10.1145/1255175.1255229, LA-UR-07-0665, June
    2007.
•   Rodriguez, M.A., "Social Decision Making with Multi-Relational Networks and Grammar-Based
    Particle Swarms", 2007 Hawaii International Conference on Systems Science (HICSS), pages 39-49,
    Waikoloa, Hawaii, IEEE Computer Society, ISSN: 1530-1605, doi:10.1109/HICSS.2007.487, LA-UR-06-
    2139, January 2007.
•   Rodriguez, M.A., "A Multi-Relational Network to Support the Scholarly Communication Process",
    International Journal of Public Information Systems, volume 2007, issue 1, pages 13-29, ISSN: 1653-4360,
    LA-UR-06-2416, March 2007.
•   Rodriguez, M.A., “Mapping Semantic Networks to Undirected Networks”, LA-UR-07-5287, August 2007.
•   Rodriguez, M.A., Watkins, J.H., “Grammar-Based Geodesics in Semantic Networks”, LA-UR-07-4042,
    June 2007.
•   Rodriguez, M.A., Bollen, J., “Modeling Computations in a Semantic Network”, LA-UR-07-3678, May
    2007.
•   Rodriguez, M.A., “General-Purpose Computing on a Semantic Network Substrate”, LA-UR-07-2885,
    April 2007.
•   Rodriguez, M.A., “Grammar-Based Random Walkers in Semantic Networks”, LA-UR-06-7791, November
    2006.




                                             Marko A. Rodriguez
                               University of New Mexico, September 14, 2007

More Related Content

More from Marko Rodriguez

Memoirs of a Graph Addict: Despair to Redemption
Memoirs of a Graph Addict: Despair to RedemptionMemoirs of a Graph Addict: Despair to Redemption
Memoirs of a Graph Addict: Despair to Redemption
Marko Rodriguez
 
The Network Data Structure in Computing
The Network Data Structure in ComputingThe Network Data Structure in Computing
The Network Data Structure in Computing
Marko Rodriguez
 
A Model of the Scholarly Community
A Model of the Scholarly CommunityA Model of the Scholarly Community
A Model of the Scholarly Community
Marko Rodriguez
 

More from Marko Rodriguez (20)

The Gremlin Graph Traversal Language
The Gremlin Graph Traversal LanguageThe Gremlin Graph Traversal Language
The Gremlin Graph Traversal Language
 
The Path Forward
The Path ForwardThe Path Forward
The Path Forward
 
Faunus: Graph Analytics Engine
Faunus: Graph Analytics EngineFaunus: Graph Analytics Engine
Faunus: Graph Analytics Engine
 
Solving Problems with Graphs
Solving Problems with GraphsSolving Problems with Graphs
Solving Problems with Graphs
 
Titan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataTitan: The Rise of Big Graph Data
Titan: The Rise of Big Graph Data
 
The Pathology of Graph Databases
The Pathology of Graph DatabasesThe Pathology of Graph Databases
The Pathology of Graph Databases
 
Traversing Graph Databases with Gremlin
Traversing Graph Databases with GremlinTraversing Graph Databases with Gremlin
Traversing Graph Databases with Gremlin
 
The Path-o-Logical Gremlin
The Path-o-Logical GremlinThe Path-o-Logical Gremlin
The Path-o-Logical Gremlin
 
The Gremlin in the Graph
The Gremlin in the GraphThe Gremlin in the Graph
The Gremlin in the Graph
 
Memoirs of a Graph Addict: Despair to Redemption
Memoirs of a Graph Addict: Despair to RedemptionMemoirs of a Graph Addict: Despair to Redemption
Memoirs of a Graph Addict: Despair to Redemption
 
Graph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of DataGraph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of Data
 
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
 
The Graph Traversal Programming Pattern
The Graph Traversal Programming PatternThe Graph Traversal Programming Pattern
The Graph Traversal Programming Pattern
 
The Network Data Structure in Computing
The Network Data Structure in ComputingThe Network Data Structure in Computing
The Network Data Structure in Computing
 
A Model of the Scholarly Community
A Model of the Scholarly CommunityA Model of the Scholarly Community
A Model of the Scholarly Community
 
General-Purpose, Internet-Scale Distributed Computing with Linked Process
General-Purpose, Internet-Scale Distributed Computing with Linked ProcessGeneral-Purpose, Internet-Scale Distributed Computing with Linked Process
General-Purpose, Internet-Scale Distributed Computing with Linked Process
 
Collective Decision Making Systems: From the Ideal State to Human Eudaimonia
Collective Decision Making Systems: From the Ideal State to Human EudaimoniaCollective Decision Making Systems: From the Ideal State to Human Eudaimonia
Collective Decision Making Systems: From the Ideal State to Human Eudaimonia
 
Distributed Graph Databases and the Emerging Web of Data
Distributed Graph Databases and the Emerging Web of DataDistributed Graph Databases and the Emerging Web of Data
Distributed Graph Databases and the Emerging Web of Data
 
An Overview of Data Management Paradigms: Relational, Document, and Graph
An Overview of Data Management Paradigms: Relational, Document, and GraphAn Overview of Data Management Paradigms: Relational, Document, and Graph
An Overview of Data Management Paradigms: Relational, Document, and Graph
 
Graph Databases and the Future of Large-Scale Knowledge Management
Graph Databases and the Future of Large-Scale Knowledge ManagementGraph Databases and the Future of Large-Scale Knowledge Management
Graph Databases and the Future of Large-Scale Knowledge Management
 

Recently uploaded

Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
UK Journal
 

Recently uploaded (20)

Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptxBT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 

The Network: A Data Structure that Links Domains

  • 1. The Network: A Data Structure that Links Domains Marko A. Rodriguez Los Alamos National Laboratory Vrije Universiteit Brussel University of California at Santa Cruz marko@lanl.gov http://www.soe.ucsc.edu/~okram Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 2. About me. • Marko Antonio Rodriguez. • Bachelors of Science in Cognitive Science from U.C. San Diego. • Minor in the Arts in Computer Music from U.C. San Diego. • Masters of Science in Computer Science from U.C. Santa Cruz. • Visiting Researcher at the Center for Evolution, Complexity, and Cognition at the Free University of Brussels. • Ph.d. in Computer Science from U.C. Santa Cruz [soon]. o I defend November 15, 2007! • Researcher at the Los Alamos National Laboratory since 2005. Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 3. Research trends. • MESUR: Metrics from Scholarly Usage of Resources. (http://www.mesur.org) • Neno/Fhat: A Semantic Network Programming Language and Virtual Machine Architecture. (http://neno.lanl.gov) • CDMS: Collective Decision Making Systems. (http://cdms.lanl.gov) Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 4. What is a network? • A network is a data structure that is used to connect vertices/nodes/dots by means of edges/links/lines. • Networks are everywhere. o Social: friendship, trust, communication, collaboration. o Technological: web-pages, communication, software dependencies, circuits. o Scholarly: journals, authors, articles, institutions. o Natural: protein interaction, neural, food web. Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 5. The undirected network. • There is the undirected network of common knowledge. o Sometimes called an undirected single-relational network. o e.g. vertex i and vertex j are “related”. • The semantic of the edge denotes the network type. o e.g. friendship network, collaboration network, etc. i j Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 6. Example undirected network. Jen Alberto Whenzong Luda Aric Herbert Zhiwu Ed Johan Stephan Marko Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 7. The directed network. • Then there is the directed network of common knowledge. o Sometimes called a directed single-relational network. o For example, vertex i is related to vertex j, but j is not related to i. i j Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 8. Example directed network. Human Lion Hyena Deer Fish Muskrat Meerkat Fox Wolf Beetle Bear Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 9. The semantic network. • Finally, there is the semantic network o Sometimes called a directed multi-relational network. o For example, vertex i is related to vertex j by the semantic s, but j is not related to i by the semantic s. i j s Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 10. Example semantic network. LANL hasLab UnitedStates Arnold researches locatedIn stateOf stateOf governerOf Atoms cityOf NewMexico SantaFe California madeOf hasResident originallyFrom Ryan northOf livesIn southOf Cells madeOf worksWith Oregon Marko Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 11. What are the techniques for analysis? • Degree statistics o How many in- and out-edges does vertex i have? o What is the maximum and minimum in- and out-degree of the network? • Shortest-path metrics o What is the smallest number of steps to get from vertex i to vertex j? o How many of the shortest-paths go through vertex i? • Power metrics o What vertices are the most “influential”? • Metadata distributions o What is the probability that a vertex of type x is connected to a vertex of type y? Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 12. Degree statistics. Max_out = 4 Max_in = 4 Min_out = 0 out = 2 Min_in = 0 in = 1 out = 3 Human in = 0 out = 1 Lion Hyena in = 0 out = 0 in = 2 out = 0 out = 1 in = 4 Deer in = 1 Fish out = 1 Muskrat Meerkat in = 3 out = 1 Fox in = 1 out = 1 Wolf out = 0 in = 1 Beetle in = 1 Bear out = 4 in = 0 Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 13. Shortest-path between Marko and Aric. Jen Alberto Whenzong Luda Aric Herbert Zhiwu Ed Johan Stephan Marko Shortest path = 1 Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 14. Eccentricity of Marko. Jen Alberto Whenzong Luda Aric Herbert Zhiwu Ed Johan Stephan Marko Eccentricity = 3 Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 15. Radius of the network. Jen Alberto Whenzong Luda Aric Herbert Zhiwu Ed Johan Stephan Marko Radius = 3 Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 16. Diameter of the network. Jen Alberto Whenzong Luda Aric Herbert Zhiwu Ed Johan Stephan Marko Diameter = 4 Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 17. Closeness of Marko. Jen Alberto Whenzong Luda Aric Herbert Zhiwu Ed Johan Stephan Marko Closeness = 0.0526 Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 18. Shortest-path metrics. Shortest-path Eccentricity Radius Diameter Closeness Betweenness Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 19. Power metrics. • Eigenvector Centrality o Rank vertices according to the primary eigenvector of the adjacency matrix representing the network. o In the language of Markov chains, find the stationary probability distribution of the chain. • PageRank o Ensure a real-valued ranking by introducing a “teleportation-network” which ensures strong connectivity (used by Google). Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 20. The components to calculate a stationary probability distribution. • Take a single “random walker”. • Place that random walker on any random vertex in the network. a • At every time step, the random walker transitions from its current node to an adjacent node in the network (i.e. takes a random outgoing edge from its current node.) • Anytime the random walker is at a node, increment a “times visited” counter by 1. 1 • Let this algorithm run for an “infinite” amount of time. • Normalize the “times visited” counters. o That is your centrality vector. 0.0123 Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 21. Random walker example. b 0 0 a d 0 0 c Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 22. Random walker example. b 0 1 a d 0 0 c Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 23. Random walker example. b 1 1 a d 0 0 c Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 24. Random walker example. b 1 1 a d 1 0 c Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 25. Random walker example. b 1 1 a d 1 1 c Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 26. Random walker example. b 2 1 a d 1 1 c Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 27. Random walker example. b 2 1 a d 1 2 c Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 28. Random walker example. b 2 2 a d 1 2 c Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 29. Random walker example. b 3 2 a d 1 2 c Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 30. Random walker example. b 3 2 a d 2 2 c Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 31. Random walker example. b 3 2 a d 2 3 c Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 32. Random walker example. b 4 2 a d 2 3 c Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 33. Random walker example. b 133321 66785 a d 66784 133310 c Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 34. Random walker example. b 0.332 0.167 a d 0.167 0.332 c Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 35. PageRank. • The random walker has 0.85 probability of using G as its propagation network and a 0.15 probability of using H as its propagation network (Google’s published alpha value). • Every node is reachable by every other node and thus, is strongly connected. • A strongly connected network guarantees a stationary probability distribution. Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 36. Metadata distribution metrics. • Scalar Assortativity o What’s the probability of encountering a node with degree x? o What’s the probability of encountering a node with degree x that is connected to a node of degree y? o What’s the probability of encountering a node with degree x that is connected to a node of degree y that is connected to a node of degree z? o … • Discrete Assortativity o What’s the probability of encountering a node with metadata x? o What’s the probability of encountering a node with metadata x that is connected to a node of metadata y? o What’s the probability of encountering a node with metadata x that is connected to a node of metadata y that is connected to a node of metadata z? o … Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 37. The CENS network dataset. • Center for Embedded Network Sensing at U.C. Los Angeles. • “An interdisciplinary and multi-institutional venture, CENS involves hundreds of faculty, engineers, graduate student researchers, and undergraduate students from multiple disciplines at the partner institutions of University of California at Los Angeles (UCLA), University of Southern California (USC), University of California Riverside (UCR), California Institute of Technology (Caltech), University of California at Merced (UCM), and California State University at Los Angeles (CSULA).” Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 38. Everything is metadata. Affilation LANL Department Research Library Gender Male JobRank Ph.D. Student Buidling P362 Lab Prototyping Team Advisor Johan Bollen Degree 2 Marko …. …. Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 39. The CENS coauthorship network. • UCLA - red • USC - orange • Coventry - green • … • Ph.D. - ellipse • Professor - hexagon • … Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 40. 1st-order degree distributions in CENS coauthorship network. Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 41. 2nd-order degree distributions in CENS coauthorship network. Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 42. 2nd-order degree assortativity in CENS coauthorship network. • Pearson correlation on edge degrees. o r in [-1,1] • r = 0.212 Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 43. 2nd-order degree assortativity in other networks. • Physics coauthorship: 0.363 • Biology coauthorship: 0.127 • Mathematics coauthorship: 0.120 • Film actor collaborations: 0.208 • Internet: -0.189 • World Wide Web: -0.065 • Neural network: -0.163 • Marine food web: -0.247 • Random graph: 0.0 • Regular graph: 1.0 Newman, M.J., “Assortative Mixing in Networks”, Physical Review Letters, 89(20), 2002. Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 44. Metadata path frequencies. • 1064.0 UCLA UCLA • 376.0 Phd Professor • 442.0 USC USC • 254.0 Phd Researcher • 336.0 USC UCLA • 242.0 Researcher Professor • 76.0 MIT UCLA • 184.0 Phd Phd • 58.0 UCLA UCM • 142.0 Professor Professor • 32.0 Caltech UCLA • 1186.0 Male Male • 304.0 US US • 508.0 Male Female • 156.0 India US • 78.0 Female Female • 70.0 India India • 58.0 US China • 750.0 CS CS • 36.0 India China • 388.0 EE CS • 28.0 China China • 340.0 EE EE • 24.0 US Italy • 84.0 CS CivilEng • 18.0 Iran India • 78.0 Biology CS • 14.0 Iran US • 74.0 CivilEng CivilEng • 12.0 Greece India Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 45. 2nd-order metadata assortativity in CENS coauthorship network. • 0.696 Gender • 0.641 Affiliation • 0.513 Department • 0.482 Advisor • 0.426 Lab • 0.319 Building • 0.290 Origin • 0.168 JobRank • 0.042 Room Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 46. 3rd-order metadata assortativity in CENS coauthorship network. • 0.471 Gender • 0.435 Affiliation • 0.290 Department • 0.225 Origin • 0.207 Advisor • 0.195 Lab • 0.170 Building • 0.032 JobRank • 0.004 Room Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 47. Metadata compression. • 229.0 US:Male US:Female • 228.0 US:Male US:Male • 197.0 US:Male India:Male • 121.0 India:Male India:Male • 76.0 India:Male US:Female • 36.0 US:Male China:Male • 30.0 US:Female US:Female • 21.0 Taiwan:Male China:Male • 19.0 US:Male Italy:Male • 18.0 India:Male South Korea:Male • 17.0 India:Male China:Male • 17.0 India:Male Australia:Male • 16.0 China:Male China:Male • 16.0 India:Male Greece:Male • 16.0 US:Male Mexico:Male Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 48. 3rd-order metadata compression. • 1402.0 UCLA:Phd UCLA:Professor UCLA:Phd • 879.0 UCLA:Researcher UCLA:Professor UCLA:Phd • 605.0 UCLA:Professor UCLA:Professor UCLA:Phd • 512.0 UCLA:Researcher UCLA:Professor UCLA:Researcher • 380.0 UCLA:Researcher UCLA:Professor UCLA:Professor • 304.0 USC:Phd UCLA:Professor UCLA:Phd • 294.0 USC:Phd USC:Professor USC:Phd • 272.0 UCLA:Phd UCLA:Phd UCLA:Phd • 270.0 UCLA:Professor UCLA:Phd UCLA:Phd Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 49. Breather. Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 50. Example semantic network. LANL hasLab UnitedStates Arnold researches locatedIn stateOf stateOf governerOf Atoms cityOf NewMexico SantaFe California madeOf hasResident originallyFrom Ryan northOf livesIn southOf Cells madeOf worksWith Oregon Marko Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 51. What is the Semantic Web? • The figurehead of the Semantic Web initiative, Tim Berners-Lee, describes the Semantic Web as o “... an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.” • Perhaps not the best definition. It implies a particular application space--namely the “web metadata and intelligent agents” space. • My definition is that the Semantic Web is o “a highly-distributed, standardized semantic network data model--a URG (Uniform Resource Graph). It’s a uniform way of graphing resources.” Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 52. What is a resource? • Resource = Anything. o Anything that can be identified. • The Uniform Resource Identifier (URI): o <scheme name> : <hierarchical part> [ ? <query> ] [ # <fragment> ] - http://www.lanl.gov - urn:uuid:550e8400-e29b-41d4-a716-446655440000 - urn:issn:0892-3310 - http://www.lanl.gov#MarkoRodriguez – prefix it to make it easier on the eyes -- lanl:MarkoRodriguez • The Semantic Web o “first identify it, then relate it!” Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 53. The technologies of the Semantic Web. • Resource Description Framework (RDF): The foundation technology of the Semantic Web. RDF is a highly-distributed, semantic network data model. In RDF, URIs and literals (e.g. ints, doubles, strings) are related to one another in triples. o <lanl:marko> <lanl:worksWith> <lanl:jhw> o <lanl:jhw> <lanl:wrote> <lanl:LAUR-07-2028> o <lanl:LAUR-07-2028> <lanl:hasTitle> “Web-Based Collective Decision Making Systems”^^<xsd:string> • RDF Schema (RDFS): The ontology is to the Semantic Web as the schema is to the relational database. o “Anything of rdf:type lanl:Human can lanl:drive anything of rdf:type lanl:Car.” • Triple-Store: The triple-store is to semantic networks what the relational database is to the data table. o a.k.a. semantic repository, graph database, RDF database. Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 54. RDF and RDFS. lanl:Human lanl:Food rdfs:range rdfs:domain lanl:isEating ontology rdf:type rdf:type instance lanl:isEating lanl:marko lanl:cookie RDF is not a syntax. It’s a data model. Various syntaxes exist to encode RDF including RDF/XML, N-TRIPLE, TRiX, N3, etc. Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 55. The triple-store. • There are two primary ways to distribute information on the Semantic Web. o 1.) publish RDF/XML document on a web server. o 2.) expose a public interface to an RDF triple-store. • The triple store is to semantic networks what the relational database is to data tables. o Storing and querying triples in a triple store. o SPARQLUpdate query language. - like SQL, but for triple-stores. SELECT ?a ?c WHERE INSERT ?a coauthor ?c WHERE { ?a type human { ?a type human ?a wrote ?b ?a wrote ?b ?b type article ?b type article ?c wrote ?b ?c wrote ?b ?c type human ?c type human ?a != ?c } ?a != ?c } DELETE ?s ?p ?o WHERE { ?s ?p ?o } Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 56. Triple-store vs. relational database. Triple-store Relational Database SPARQL Interface SQL Interface SELECT (?x4) SELECT collaboratesWithTable.ordId2 WHERE { FROM personTable, authorTable, articleTable, friendTable, hasEmployeeTable, organizationTable, worksForTable, ?x1 dc:creator lanl:LAUR-06-2139. collaboratesWithTable ?x1 lanl:hasFriend ?x2 . WHERE ?x2 lanl:worksFor ?x3 . personTable.id = authorTable.personId AND ?x3 lanl:collaboratesWith ?x4 . authorTable.articleId = "dc:creator LAUR-06-2139" AND ?x4 lanl:hasEmployee ?x1 . } personTable.id = friendTable.personId1 AND friendTable.personId2 = worksForTable.personId AND worksForTable.orgId = collaboratesWithTable.orgId2 AND collaboratesWithTable.ordId2 = personTable.id Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 57. Semantic network metrics? • What does it means to run a shortest-path calculation on a semantic network? o Shortest-path along which semantic--which edge type(s)? • What does it mean to calculate PageRank on a semantic network? o What are legal semantics for the random walker? Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 58. Shortest-path metrics in a semantic network? lanl:hasFriend lanl:marko lanl:johan lanl:hasFriend lanl:hasFriend lanl:livesInSameCityAs lanl:chuck lanl:bob lanl:hasFriend lanl:jill “What is the shortest path between lanl:marko and lanl:jill by taking only lanl:hasFriend edges?” Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 59. PageRank in a semantic network? lanl:Human lanl:Article rdf:type rdf:type rdf:type rdf:type lanl:p1 lanl:wrote lanl:wrote ? ? ? lanl:marko lanl:johan lanl:hasFriend lanl:chuck Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 60. Grammar-based geodesics and random walkers. • How do you port many of the undirected and directed single-relational network analysis algorithms over to the semantic network domain? o My solution is what I call a grammar. • Nearly every network analysis algorithm can be represented in terms of a walker traversing a network. o Geodesics. o PageRank o Metadata paths. o etc. Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 61. Components of a grammar-based walker. • A walker. o Discrete element. • A grammar. o An abstract representation of legal path for the walker take. - e.g. “you can traverse a lanl:friendOf edge from a lanl:Human to another lanl:Human.” - Also includes rules: “increment a counter.”, “don’t ever return to this vertex.” • A data set that respects the ontological “expectations” of the grammar. Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 62. Grammar-based PageRank example. lanl:Human lanl:Article rdf:type rdf:type rdf:type rdf:type lanl:p1 lanl:wrote lanl:wrote lanl:marko 0 lanl:johan 0 lanl:hasFriend “Take only lanl:wrote out-edge to a resource of rdf:type lanl:Article. Then take a lanl:wrote in- edge to a resource of rdf:type lanl:Human. Increment only lanl:Humans. Make sure that the lanl:chuck 0 lanl:Human seen before is not the same lanl:Human currently. Repeat infinitely.” Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 63. Grammar-based PageRank example. lanl:Human lanl:Article rdf:type rdf:type rdf:type rdf:type lanl:p1 lanl:wrote lanl:wrote lanl:marko 1 lanl:johan 0 lanl:hasFriend “Take only lanl:wrote out-edge to a resource of rdf:type lanl:Article. Then take a lanl:wrote in- edge to a resource of rdf:type lanl:Human. Increment only lanl:Humans. Make sure that the lanl:chuck 0 lanl:Human seen before is not the same lanl:Human currently. Repeat infinitely.” Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 64. Grammar-based PageRank example. lanl:Human lanl:Article rdf:type rdf:type rdf:type rdf:type lanl:p1 lanl:wrote lanl:wrote lanl:marko 1 lanl:johan 0 lanl:hasFriend “Take only lanl:wrote out-edge to a resource of rdf:type lanl:Article. Then take a lanl:wrote in- edge to a resource of rdf:type lanl:Human. Increment only lanl:Humans. Make sure that the lanl:chuck 0 lanl:Human seen before is not the same lanl:Human currently. Repeat infinitely.” Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 65. Grammar-based PageRank example. lanl:Human lanl:Article rdf:type rdf:type rdf:type rdf:type lanl:p1 lanl:wrote lanl:wrote lanl:marko 1 lanl:johan 1 lanl:hasFriend “Take only lanl:wrote out-edge to a resource of rdf:type lanl:Article. Then take a lanl:wrote in- edge to a resource of rdf:type lanl:Human. Increment only lanl:Humans. Make sure that the lanl:chuck 0 lanl:Human seen before is not the same lanl:Human currently. Repeat infinitely.” Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 66. Grammar-based PageRank example. lanl:Human lanl:Article rdf:type rdf:type rdf:type rdf:type lanl:p1 lanl:wrote lanl:wrote lanl:marko 1 lanl:johan 1 lanl:hasFriend “Take only lanl:wrote out-edge to a resource of rdf:type lanl:Article. Then take a lanl:wrote in- edge to a resource of rdf:type lanl:Human. Increment only lanl:Humans. Make sure that the lanl:chuck 0 lanl:Human seen before is not the same lanl:Human currently. Repeat infinitely.” Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 67. Grammar-based PageRank example. lanl:Human lanl:Article rdf:type rdf:type rdf:type rdf:type lanl:p1 lanl:wrote lanl:wrote lanl:marko 2 lanl:johan 1 lanl:hasFriend “Take only lanl:wrote out-edge to a resource of rdf:type lanl:Article. Then take a lanl:wrote in- edge to a resource of rdf:type lanl:Human. Increment only lanl:Humans. Make sure that the lanl:chuck 0 lanl:Human seen before is not the same lanl:Human currently. Repeat infinitely.” Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 68. Grammar-based PageRank example. lanl:Human lanl:Article rdf:type rdf:type rdf:type rdf:type lanl:p1 lanl:wrote lanl:wrote lanl:marko 2 lanl:johan 1 lanl:hasFriend “Take only lanl:wrote out-edge to a resource of rdf:type lanl:Article. Then take a lanl:wrote in- edge to a resource of rdf:type lanl:Human. Increment only lanl:Humans. Make sure that the lanl:chuck 0 lanl:Human seen before is not the same lanl:Human currently. Repeat infinitely.” Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 69. Grammars create implicit relationships. lanl:Human lanl:Article rdf:type rdf:type rdf:type rdf:type lanl:p1 lanl:wrote lanl:wrote lanl:marko lanl:johan lanl:hasCoauthor lanl:hasFriend lanl:chuck Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 70. Conclusions. • Many data sets can be represented as a network of “actors”. • There exists many network analysis algorithms. o Shortest-path metrics. o Eigenvector-based metrics. o Assortativity coefficients. • The semantic network data structure is a less studied data model. o Semantic Web community doesn’t take a network approach to their substrate. • The grammar technique can be used to port many of the common network analysis algorithms to the semantic network domain. o Grammar-based geodesics. o Grammar-based random walkers. Marko A. Rodriguez University of New Mexico, September 14, 2007
  • 71. Related publications. • Rodriguez, M.A., Watkins, J.H., Bollen, J., Gershenson, C., “Using RDF to Model the Structure and Process of Systems”, International Conference on Complex Systems, Boston, Massachusetts, LAUR-07- 5720, October 2007. • Rodriguez, M.A., Bollen, J., Van de Sompel, H., “A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and their Usage”, 2007 ACM/IEEE Joint Conference on Digital Libraries, pages 278- 287, Vancouver, Canada, ACM/IEEE Computing, doi:10.1145/1255175.1255229, LA-UR-07-0665, June 2007. • Rodriguez, M.A., "Social Decision Making with Multi-Relational Networks and Grammar-Based Particle Swarms", 2007 Hawaii International Conference on Systems Science (HICSS), pages 39-49, Waikoloa, Hawaii, IEEE Computer Society, ISSN: 1530-1605, doi:10.1109/HICSS.2007.487, LA-UR-06- 2139, January 2007. • Rodriguez, M.A., "A Multi-Relational Network to Support the Scholarly Communication Process", International Journal of Public Information Systems, volume 2007, issue 1, pages 13-29, ISSN: 1653-4360, LA-UR-06-2416, March 2007. • Rodriguez, M.A., “Mapping Semantic Networks to Undirected Networks”, LA-UR-07-5287, August 2007. • Rodriguez, M.A., Watkins, J.H., “Grammar-Based Geodesics in Semantic Networks”, LA-UR-07-4042, June 2007. • Rodriguez, M.A., Bollen, J., “Modeling Computations in a Semantic Network”, LA-UR-07-3678, May 2007. • Rodriguez, M.A., “General-Purpose Computing on a Semantic Network Substrate”, LA-UR-07-2885, April 2007. • Rodriguez, M.A., “Grammar-Based Random Walkers in Semantic Networks”, LA-UR-06-7791, November 2006. Marko A. Rodriguez University of New Mexico, September 14, 2007