20130204 graph to-pacer-xml


Published on

Intro slides; Detailed talk from Darrick at https://gist.github.com/4710452

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • There are four trends underpinning the NoSQL and specifically the GraphDB movements: 1)...the size of data that we are managing is more than doubling every two years, with around 2.4 Zettabytes expected by the end of this year (or 250mil years of the TV show “24”). 2) Data is more highly-connected than ever before. FOAF on social networks; Configuration Management for a Datacenter 3) Schema-less data persistence; Add a field to just one record, no problem. Sparkes on Toyota 4) Application Architecture changed from flat-files and batch processing, to shared RDBMS, SOA + Web services
  • *This is a somewhat contrived example, as “person” & “friend” would normally be one table with a self join.
  • A borrowed slide from neo technology
  • Gephi - example of high-level graph visualization where you might be looking for clustering of data types and super nodes.
  • d3js.org - example of mixing high-level overview of relationships, with specific relationships on hover
  • A few options exist for graph query languages, some you may have hear of. SPARQL is a recursive acronym for “SPARQL Protocol and RDF Query Language” for Resource Description Framework. Cypher and Gremlin are modern graph query languages with strong ties to the Neo4j community. Pacer is a ruby gem that you can include in your projects and get jamming on embedded graph databases straight away.
  • Chris compared Traffic-based and Content-based message ranking approaches to discover Ego Networks. We don’t need to worry about the details here though. Chris has left us with a nice property graph which identifies official reporting relationships by an edge labelled “Directly_Reported_To”.
  • Go here, cool stuff.
  • 20130204 graph to-pacer-xml

    1. 1. GraphTO February 2013, Mozilla TorontoDavid Colebatch & Darrick Wiebe us@xnlogic.com
    2. 2. Agenda• Who We Are• Intro to GraphDB Sponsored By:• Intro to Patent-Grant Data• Graph Concepts• Pacer::Xml
    3. 3. ¿por qué?• Data Set Size• Connectivity of Data• Semi-structure• Evolution of SOA and REST
    4. 4. The Zone of SQL Adequacy SQL database Social Requirement of application GeoPerformance Salary List Network / Cloud Management ERP MDM CRM Data complexity
    5. 5. How?• Nodes / Vertices• Relationships / Edges
    6. 6. Relational Model vs. Graph Each of these models expresses the same thingPerson* Person-Friend Friend*
    7. 7. Graph db performance๏ a sample social graph• with ~1,000 persons๏ average 50 friends per person๏ pathExists(a,b) limited to depth 4๏ caches warmed up to eliminate disk I/O Database # persons query time MySQL 1,000 2,000 ms Neo4j 1,000 2 ms Neo4j 1,000,000 2 ms
    8. 8. Different Visualization
    9. 9. Query Languages• Pacer - gem install pacer• Cypher• SPARQL - if you grok RDF already
    10. 10. US PTO Data• Patent Grant Data in XML• bi-weekly chunks• Pacer::Xml has handy loader as an example: jruby-1.7.0 > g = PacerXml::Sample.load_100 Downloading a sample xml file from...
    11. 11. 001> PacerXmlImporting XML into a graph? What do you do next?
    12. 12. Resourceshttps://github.com/xnlogic/pacer-xmlhttps://github.com/pangloss/pacerhttp://neo4j.org/http://tinkerpop.com/
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.