There are four trends underpinning the NoSQL and specifically the GraphDB movements: 1)...the size of data that we are managing is more than doubling every two years, with around 2.4 Zettabytes expected by the end of this year (or 250mil years of the TV show “24”). 2) Data is more highly-connected than ever before. FOAF on social networks; Configuration Management for a Datacenter 3) Schema-less data persistence; Add a field to just one record, no problem. Sparkes on Toyota 4) Application Architecture changed from flat-files and batch processing, to shared RDBMS, SOA + Web services
*This is a somewhat contrived example, as “person” & “friend” would normally be one table with a self join.
A borrowed slide from neo technology
Gephi - example of high-level graph visualization where you might be looking for clustering of data types and super nodes.
d3js.org - example of mixing high-level overview of relationships, with specific relationships on hover
A few options exist for graph query languages, some you may have hear of. SPARQL is a recursive acronym for “SPARQL Protocol and RDF Query Language” for Resource Description Framework. Cypher and Gremlin are modern graph query languages with strong ties to the Neo4j community. Pacer is a ruby gem that you can include in your projects and get jamming on embedded graph databases straight away.
Chris compared Traffic-based and Content-based message ranking approaches to discover Ego Networks. We don’t need to worry about the details here though. Chris has left us with a nice property graph which identifies official reporting relationships by an edge labelled “Directly_Reported_To”.
Go here, cool stuff.
20130204 graph to-pacer-xml
GraphTO February 2013, Mozilla TorontoDavid Colebatch & Darrick Wiebe email@example.com
Agenda• Who We Are• Intro to GraphDB Sponsored By:• Intro to Patent-Grant Data• Graph Concepts• Pacer::Xml
¿por qué?• Data Set Size• Connectivity of Data• Semi-structure• Evolution of SOA and REST
The Zone of SQL Adequacy SQL database Social Requirement of application GeoPerformance Salary List Network / Cloud Management ERP MDM CRM Data complexity
Relational Model vs. Graph Each of these models expresses the same thingPerson* Person-Friend Friend*
Graph db performance๏ a sample social graph• with ~1,000 persons๏ average 50 friends per person๏ pathExists(a,b) limited to depth 4๏ caches warmed up to eliminate disk I/O Database # persons query time MySQL 1,000 2,000 ms Neo4j 1,000 2 ms Neo4j 1,000,000 2 ms