InfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast


Published on

Here is the presentation from Warren Davidson, Director of Business Development, and Darren Wood, InfiniteGraph chief architect. The October 21, 2010 webinar hosted by DBTA, with InfiniteGraph and Riptano, covered new data technologies and how the NOSQL ("Not Only SQL") approach is beneficial in addressing some of the more complex application, scalability and performance requirements in handling vast amounts of data, and in performing advanced analytics on those data volumes with greater ease and speed.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Key-value pair stores have a simple interface – Put, Get and Delete
    Voldemort is a distributed key-value storage system implemented as a fault tolerant hash table
    Dynamo – a distributed storage system, highly available key-value store, fault tolerant
    BigTable – fast and extremely large scale, distributed Google File System, MapReduce – distributed parallel processing
    Cassandra – structured key-value store, columnfamily based data model, eventually consistent, distributed systems technology from Dynamo,data model from Google's BigTable
    Hbase –
    HyperTable –
    CouchDB – is a document oriented database that can be queried and indexed in a MapReduce fashion using JavaScript
    MongoDB – document oriented, more complex schema model than just key/value pairs, C++, uses MapReduce for processing
    Neo4j –
    HyperGraphDB –
    Sones -
  • InfiniteGraph Presentation from Oct 21, 2010 DBTA Webcast

    1. 1. October 21, 2010 Warren Davidson Darren Wood InfiniteGraph
    2. 2. Agenda • The NoSQL Landscape • InfiniteGraph • Solving what problems and how? Copyright © InfiniteGraph
    3. 3. Some NoSQL Notes Copyright © InfiniteGraph • NoSQL = Not Only SQL • NoSQL is requirements driven • NoSQL = open source? • NoSQL = cloud computing?
    4. 4. Company Confidential The NoSQL Landscape Cassandra InfiniteGraph
    5. 5. NoSQL Landscape Key Value Stores Key Value Stores BigTable Clones BigTable Clones Document databasesDocument databases Complexity Voldemort – LinkedIn Dynamo - Amazon Cassandra – Facebook HBase – Apache/Hadoop Hypertable CouchDB – Apache MongoDB Neo4j HypergraphDB AllegroGraph Sones Performance Graph Databases Social Network Analysis Intelligence Community Graph Databases
    6. 6. Graph Databases • A graph database is used to trace relationships among entities, most commonly people, to any depth. Its characteristics are: – Very simple, fixed schema – Very complex data relationships – Used to support complex associations among like entities. 6 Node Edge John Jones Jane Jones- Smith Nancy Jones Paul Jones Doris Smith Jim Smith Jeff Smith Meta-Model Instance Example (simplified) Attribute(s) Jeff Smith
    7. 7. InfiniteGraph A business unit of Objectivity • In the business of distributed data management for over 10 years • Solving graph data problems for over 8 years • Focusing on the emerging requirements of graph data for cloud and on-premise distributed systems Copyright © InfiniteGraph
    8. 8. Graphs are everywhere Enterprise and government 2.0, bio-engineering, gene sequencing, drug development….. LinkedIn, Facebook…. Social network analytics, social CRM…. Network analysis, complex BoM, predictive and real-time ISR, fraud detection and response….
    9. 9. Graph Databases – What’s so Different ? Darren Wood Chief Architect, InfiniteGraph
    10. 10. Graph Databases • Key technical attributes • How Infinite Graph addresses these • Query and navigation • Challenges/Requirements of Distibution • Practical applications Copyright © InfiniteGraph
    11. 11. Graph Databases • Optimized around data relationships – Relationships as first class citizens – Super fast navigation between entities – Rich/flexible annotation of connections • Small focused API (typically not SQL) – Natively work with concepts of Vertex/Edge – SQL has no concept of “navigation” – Most attempts based in SQL are convoluted Copyright © InfiniteGraph
    12. 12. Physical Storage Comparison Copyright © InfiniteGraph Meetings P1 Place TimeP2 Alice Denver 5-27-10Bob Calls From Time DurationTo Bob 13:20 25Carlos Bob 17:10 15Charlie Payments From Date AmountTo Carlos 5-12-10 100000Charlie Met 5-27-10 Alice Called 13:20 Bob Payed 100000 Carlos Charlie Called 17:10 Rows/Columns/Tables Relationship/Graph Optimized
    13. 13. Query and Navigation • Queries – but not as you know them • More like a rules based search and discovery • Asynchronous Results Copyright © InfiniteGraph Alice Carlos CharlieBob Meets Calls Pays Calls “Find all paths between Alice and Charlie” “Find all paths between Alice and Charlie – within 2 degrees” “Find all paths between Alice and Charlie – events in May 2010”
    14. 14. Management of Large Data Graphs • Graphs grow quickly – Billions of phone calls / day in US – Emails, social media events, IP Traffic – Financial transactions • Some analytics require navigation of large sections of the graph • Each step (often) depends on the last • Must distribute data and go parallel Copyright © InfiniteGraph
    15. 15. Graph Partitioning • Graph partitioning is not as simple • Graph operations are rarely partition bound • Graphs are ‘alive’ • Repartitioning is expensive • Partitions must co-operate Copyright © InfiniteGraph
    16. 16. Distributed API Application(s) Partition 1 Partition 3Partition 2 Partition ...n Processor Processor Processor Processor Graph Partitioning – Reality ! Copyright © InfiniteGraph
    17. 17. Distributed Graph Must Haves • High performance distributed persistence • Ability to deal with remote data reads (fast) • Intelligent local cache of subgraphs • Distributed navigation processing • Distributed, multi-source concurrent ingest • Write modes supporting both strict and eventual consistency Copyright © InfiniteGraph
    18. 18. Practical Applications Copyright © InfiniteGraph
    19. 19. Graph Analysis (Algorithms) • Social Networks – Most connected participants – Influencers – Important Syndicates or Sub-networks • Central figures in crime organisations • Business Intelligence – Discovering Knowledge Assets – Complex analytics Copyright © InfiniteGraph
    20. 20. Graph Analysis (Patterns) • Crime (again) – Recognize common patterns of activity – Complex chains of interaction • Security – Recognize attack/threat patterns – Auditing / log analytics • Targeting Advertising – To specific browsing patterns Copyright © InfiniteGraph
    21. 21. Many Many More ! • Spatial data • Defence / Situational Awareness • Sciences • Health Care • Genealogy • Logistics • Tracking Copyright © InfiniteGraph
    22. 22. Thankyou ! Copyright © InfiniteGraph Twitter - @infinitegraph