Graph Data
Linked Data and
Property Graphs
Contents
● Example
● LD/RDF
● PG
● SPARQL
● Gremlin/Cypher
● BIG data
– Apache Giraph / Bulk Synchronous Processing
● Ideally?
– +arrays+URIs+attributes
Graphs
● A set of vertices (nodes) and edges (arcs)
● Except the useful kind have labels on edges
● … and the nodes are just dots.
Graphs
G = ( V , E )
V – Vertexes (Nodes)
E – Edges (Arcs, Links)
Graphs
Graphs For Information
Alice
Bob
Eve
listensTo
knows
Graphs For Information
Alice
... has a name “Alice Hacker”
... has an employee number
Linked Data / RDF
● Standards
– What it means
– Syntaxes for exchanging data
– Query language
● URI name things globally
● Uniform representation
– Link to another thing is same as link to a value
● Complex structures encoded in the basic mechanism
● “Schemaless” data integration
Linked Data
★ make your stuff available on the Web (whatever
format) under an open license
★★ make it available as structured data (e.g., Excel
instead of image scan of a table)
★★★ use non-proprietary formats (e.g., CSV instead
of Excel)
★★★★ use URIs to denote things, so that people
can point at your stuff
★★★★★ link your data to other data to provide context
http://5stardata.info/
Linked Data / RDF
http://example/alice
"Alice Hacker"
foaf:name
http://example/bob
foaf:knows
prefix person: <http://example/person/>
prefix foaf: <http://xmlns.com/foaf/0.1/>
<http://example/alice>
foaf:name "Alice Hacker" ;
foaf:knows <http://example/bob> .
<http://example/bob>
foaf:name "Bob Tester" ;
foaf:knows <http://example/alice> .
foaf:name
Bob Tester
foaf:knows
JSON-LD
● Links and semantics for the JSON ecosystem
{
"@context" : "http://example/person.jsonld",
"@graph" : [ {
"@id" : "http://example/alice",
"knows" : "http://example/bob",
"name" : "Alice Hacker"
}, {
"@id" : "http://example/bob",
"knows" : "http://example/alice",
"name" : "Bob Tester"
} ]
}
SPARQL Query
prefix person: <http://example/person/>
prefix foaf: <http://xmlns.com/foaf/0.1/>
<http://example/alice> foaf:name "Alice Hacker" ;
foaf:knows <http://example/bob> .
<http://example/bob> foaf:name "Bob Tester" ;
foaf:knows <http://example/alice> .
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name
WHERE
{ ?person foaf:name "Alice Hacker" ;
?person foaf:knows ?name . }
----------------
| name |
================
| "Bob Tester" |
----------------
Property Graphs
● Separates Links and Attributes
● Nodes have attributes
– … and so do edges
● Different definitions
– https://github.com/tinkerpop/ is the de facto standard
– Not universal
● Data exchange (web publishing) is not an objective
● Analysis and schema-less data applications
Build
Graph graph = new TinkerGraph();
Vertex a = graph.addVertex("alice");
Vertex b = graph.addVertex("bob");
a.setProperty("name","Alice Hacker");
b.setProperty("name","Bob Coder");
Edge e1 = graph.addEdge("k1", a, b, "knows");
Edge e2 = graph.addEdge("k2", b, a, "knows") ;
GSON
{
"edges" : [
{
"_id" : "k1" ,
"_inV" : "bob" ,
"_label" : "knows" ,
"_outV" : "alice" ,
"_type" : "edge"
} ,
{
"_id" : "k2" ,
"_inV" : "alice" ,
"_label" : "knows" ,
"_outV" : "bob" ,
"_type" : "edge"
}
] ,
"mode" : "NORMAL" ,
"vertices" : [
{
"_id" : "bob" ,
"_type" : "vertex" ,
"name" : "Bob Coder"
} ,
{
"_id" : "alice" ,
"_type" : "vertex" ,
"name" : "Alice Hacker"
}
Gremlin
// Groovy to Java
@SuppressWarnings("unchecked")
Pipe<Vertex,Vertex> pipe = Gremlin.compile("g.v('alice').out('knows').name");
for(Object name : pipe) {
System.out.println((String) name);
}
g.v('alice').out('knows').name
Cypher Query
● Neo4J specific
● Property Graph + “labels” (= types) – node names
CREATE (alice { name: 'Alice Hacker'} ) ,
(bob { name: 'Bob Tester'} ) ,
(alice) -[:knows]-> (bob) ,
(bob) -[:knows]-> (alice)
MATCH (a)-[:knows]->x
WHERE a.name = 'Alice Hacker'
RETURN x.name

Graph Data -- RDF and Property Graphs

  • 1.
    Graph Data Linked Dataand Property Graphs
  • 2.
    Contents ● Example ● LD/RDF ●PG ● SPARQL ● Gremlin/Cypher ● BIG data – Apache Giraph / Bulk Synchronous Processing ● Ideally? – +arrays+URIs+attributes
  • 3.
    Graphs ● A setof vertices (nodes) and edges (arcs) ● Except the useful kind have labels on edges ● … and the nodes are just dots.
  • 4.
    Graphs G = (V , E ) V – Vertexes (Nodes) E – Edges (Arcs, Links)
  • 5.
  • 6.
  • 7.
    Graphs For Information Alice ...has a name “Alice Hacker” ... has an employee number
  • 8.
    Linked Data /RDF ● Standards – What it means – Syntaxes for exchanging data – Query language ● URI name things globally ● Uniform representation – Link to another thing is same as link to a value ● Complex structures encoded in the basic mechanism ● “Schemaless” data integration
  • 9.
    Linked Data ★ makeyour stuff available on the Web (whatever format) under an open license ★★ make it available as structured data (e.g., Excel instead of image scan of a table) ★★★ use non-proprietary formats (e.g., CSV instead of Excel) ★★★★ use URIs to denote things, so that people can point at your stuff ★★★★★ link your data to other data to provide context http://5stardata.info/
  • 10.
    Linked Data /RDF http://example/alice "Alice Hacker" foaf:name http://example/bob foaf:knows prefix person: <http://example/person/> prefix foaf: <http://xmlns.com/foaf/0.1/> <http://example/alice> foaf:name "Alice Hacker" ; foaf:knows <http://example/bob> . <http://example/bob> foaf:name "Bob Tester" ; foaf:knows <http://example/alice> . foaf:name Bob Tester foaf:knows
  • 11.
    JSON-LD ● Links andsemantics for the JSON ecosystem { "@context" : "http://example/person.jsonld", "@graph" : [ { "@id" : "http://example/alice", "knows" : "http://example/bob", "name" : "Alice Hacker" }, { "@id" : "http://example/bob", "knows" : "http://example/alice", "name" : "Bob Tester" } ] }
  • 12.
    SPARQL Query prefix person:<http://example/person/> prefix foaf: <http://xmlns.com/foaf/0.1/> <http://example/alice> foaf:name "Alice Hacker" ; foaf:knows <http://example/bob> . <http://example/bob> foaf:name "Bob Tester" ; foaf:knows <http://example/alice> . PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name WHERE { ?person foaf:name "Alice Hacker" ; ?person foaf:knows ?name . } ---------------- | name | ================ | "Bob Tester" | ----------------
  • 13.
    Property Graphs ● SeparatesLinks and Attributes ● Nodes have attributes – … and so do edges ● Different definitions – https://github.com/tinkerpop/ is the de facto standard – Not universal ● Data exchange (web publishing) is not an objective ● Analysis and schema-less data applications
  • 14.
    Build Graph graph =new TinkerGraph(); Vertex a = graph.addVertex("alice"); Vertex b = graph.addVertex("bob"); a.setProperty("name","Alice Hacker"); b.setProperty("name","Bob Coder"); Edge e1 = graph.addEdge("k1", a, b, "knows"); Edge e2 = graph.addEdge("k2", b, a, "knows") ;
  • 15.
    GSON { "edges" : [ { "_id": "k1" , "_inV" : "bob" , "_label" : "knows" , "_outV" : "alice" , "_type" : "edge" } , { "_id" : "k2" , "_inV" : "alice" , "_label" : "knows" , "_outV" : "bob" , "_type" : "edge" } ] , "mode" : "NORMAL" , "vertices" : [ { "_id" : "bob" , "_type" : "vertex" , "name" : "Bob Coder" } , { "_id" : "alice" , "_type" : "vertex" , "name" : "Alice Hacker" }
  • 16.
    Gremlin // Groovy toJava @SuppressWarnings("unchecked") Pipe<Vertex,Vertex> pipe = Gremlin.compile("g.v('alice').out('knows').name"); for(Object name : pipe) { System.out.println((String) name); } g.v('alice').out('knows').name
  • 17.
    Cypher Query ● Neo4Jspecific ● Property Graph + “labels” (= types) – node names CREATE (alice { name: 'Alice Hacker'} ) , (bob { name: 'Bob Tester'} ) , (alice) -[:knows]-> (bob) , (bob) -[:knows]-> (alice) MATCH (a)-[:knows]->x WHERE a.name = 'Alice Hacker' RETURN x.name

Editor's Notes

  • #12 &amp;lt;number&amp;gt;