GraphTO                 February 2013, Mozilla TorontoDavid Colebatch & Darrick Wiebe               us@xnlogic.com
Agenda• Who We Are• Intro to GraphDB             Sponsored By:• Intro to Patent-Grant Data• Graph Concepts• Pacer::Xml
¿por qué?• Data Set Size• Connectivity of Data• Semi-structure• Evolution of SOA and REST
The Zone of SQL Adequacy                                                                         SQL database             ...
How?• Nodes / Vertices• Relationships / Edges
Relational Model vs. Graph                                      Each of these models                                    ex...
Graph db performance๏ a sample social graph• with ~1,000 persons๏ average 50 friends per person๏ pathExists(a,b) limited t...
Different Visualization
Query Languages• Pacer - gem install pacer• Cypher• SPARQL - if you grok RDF already
US PTO Data• Patent Grant Data in XML• bi-weekly chunks• Pacer::Xml has handy loader as an example:  jruby-1.7.0 > g = Pac...
001> PacerXmlImporting XML into a graph?   What do you do next?
Resourceshttps://github.com/xnlogic/pacer-xmlhttps://github.com/pangloss/pacerhttp://neo4j.org/http://tinkerpop.com/
20130204 graph to-pacer-xml
20130204 graph to-pacer-xml
20130204 graph to-pacer-xml
20130204 graph to-pacer-xml
Upcoming SlideShare
Loading in …5
×

20130204 graph to-pacer-xml

448 views

Published on

Intro slides; Detailed talk from Darrick at https://gist.github.com/4710452

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
448
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • There are four trends underpinning the NoSQL and specifically the GraphDB movements: 1)...the size of data that we are managing is more than doubling every two years, with around 2.4 Zettabytes expected by the end of this year (or 250mil years of the TV show “24”). 2) Data is more highly-connected than ever before. FOAF on social networks; Configuration Management for a Datacenter 3) Schema-less data persistence; Add a field to just one record, no problem. Sparkes on Toyota 4) Application Architecture changed from flat-files and batch processing, to shared RDBMS, SOA + Web services
  • *This is a somewhat contrived example, as “person” & “friend” would normally be one table with a self join.
  • A borrowed slide from neo technology
  • Gephi - example of high-level graph visualization where you might be looking for clustering of data types and super nodes.
  • d3js.org - example of mixing high-level overview of relationships, with specific relationships on hover
  • A few options exist for graph query languages, some you may have hear of. SPARQL is a recursive acronym for “SPARQL Protocol and RDF Query Language” for Resource Description Framework. Cypher and Gremlin are modern graph query languages with strong ties to the Neo4j community. Pacer is a ruby gem that you can include in your projects and get jamming on embedded graph databases straight away.
  • Chris compared Traffic-based and Content-based message ranking approaches to discover Ego Networks. We don’t need to worry about the details here though. Chris has left us with a nice property graph which identifies official reporting relationships by an edge labelled “Directly_Reported_To”.
  • Go here, cool stuff.
  • 20130204 graph to-pacer-xml

    1. 1. GraphTO February 2013, Mozilla TorontoDavid Colebatch & Darrick Wiebe us@xnlogic.com
    2. 2. Agenda• Who We Are• Intro to GraphDB Sponsored By:• Intro to Patent-Grant Data• Graph Concepts• Pacer::Xml
    3. 3. ¿por qué?• Data Set Size• Connectivity of Data• Semi-structure• Evolution of SOA and REST
    4. 4. The Zone of SQL Adequacy SQL database Social Requirement of application GeoPerformance Salary List Network / Cloud Management ERP MDM CRM Data complexity
    5. 5. How?• Nodes / Vertices• Relationships / Edges
    6. 6. Relational Model vs. Graph Each of these models expresses the same thingPerson* Person-Friend Friend*
    7. 7. Graph db performance๏ a sample social graph• with ~1,000 persons๏ average 50 friends per person๏ pathExists(a,b) limited to depth 4๏ caches warmed up to eliminate disk I/O Database # persons query time MySQL 1,000 2,000 ms Neo4j 1,000 2 ms Neo4j 1,000,000 2 ms
    8. 8. Different Visualization
    9. 9. Query Languages• Pacer - gem install pacer• Cypher• SPARQL - if you grok RDF already
    10. 10. US PTO Data• Patent Grant Data in XML• bi-weekly chunks• Pacer::Xml has handy loader as an example: jruby-1.7.0 > g = PacerXml::Sample.load_100 Downloading a sample xml file from...
    11. 11. 001> PacerXmlImporting XML into a graph? What do you do next?
    12. 12. Resourceshttps://github.com/xnlogic/pacer-xmlhttps://github.com/pangloss/pacerhttp://neo4j.org/http://tinkerpop.com/

    ×