Making Sense of Graph Databases


Published on

As new technologies emerge, it can be difficult to identify the benefits of the many different options available. In an effort to understand the NOSQL options better, specifically graph databases, Objectivity, Inc. has formed an internal Performance Center to evaluate the features, performance and functionality of different graph database solutions that are available today. This webinar will focus on understanding the complementary nature, use cases and value of graph databases for “Big Data” solutions. Please join us with guest speaker Noel Yuhanna, Principal Analyst serving Enterprise Architecture Professionals, Forrester Research Inc, for an overview of the NOSQL market and Brian Clark, Vice President Objectivity, presenting an overview of initial Performance Center Findings.

Guest Speaker:
Noel Yuhanna
Principal Analyst serving Enterprise Architecture Professionals, Forrester Research, Inc.

Noel serves Enterprise Architecture Professionals. He primarily covers database management systems (DBMSes), infrastructure-as-a-service (IaaS), data replication and integration, data security, data management tools, and related online transaction processing issues. His current primary research focus is on customer usage experiences and broad industry trends of DBMS, IaaS, data security, enterprise data grids, outsourcing, information life-cycle management, open source databases, and other emerging database technologies.

Brian Clark

Corporate Vice President, Objectivity

Brian Clark has nearly 30 years of software and technology experience, and was one of the early architects of Objectivity/DB. Before joining Objectivity, Brian worked at Automation Technology Products, providing leading tools in the MCAD market. Prior to that, he was with Project Management Services at International Computers Limited, one of Europe’s leading computer companies at the time. Brian holds a B.S

View the webinar at:

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Making Sense of Graph Databases

  1. 1. Making Sense of Graph Database Noel Yuhanna, Principal Analyst Forrester Research Teleconference
  2. 2. Today, we live in a digital world that’s generating billions of data points every millisecond.. Today, we live in a digital world that’s generating billions of data points every millisecond..
  3. 3. 3 › Color story TK
  4. 4. © 2014 Forrester Research, Inc. Reproduction Prohibited 4 of data is on the public Net.
  5. 5. © 2014 Forrester Research, Inc. Reproduction Prohibited 5 “Why is the amount of data stored by your firm increasing?” (Please select the top three reasons.) We are digitizing everything..
  6. 6. …but the bad news is that we are creating too many data silos….that fail to deliver unified and connected data Apps
  7. 7. © 2014 Forrester Research, Inc. Reproduction Prohibited 7 Drivers and trends affecting databases DBMS strategy › Increased data volumes › Strong data security controls › Increased transaction volume › Nonstop 24x7 availability › All types of data storage › New apps — social, mobile apps › Cost control and stalled budget › Faster real-time data access › Integrated app/data › Unpredictable workloads
  8. 8. © 2014 Forrester Research, Inc. Reproduction Prohibited 8 New business requirements are making older data management methods inadequate › Business challenges: • Customer is the king – offer more personalization • Deliver more innovative products • Deliver more customer-driven products and services… › Technology challenges: • Increasing data volume, velocity, silos • Need for continuous availability of information • Increasing number of users, Apps, workloads, patterns
  9. 9. © 2014 Forrester Research, Inc. Reproduction Prohibited 9 › Social networking apps › Mobile applications › High-performance apps › Real-time apps › Real-time data mashups › Departmental and collaboration › Predictive analytics New applications are changing database requirements . . . Real-time data Unstructured data Faster access Self-service Automated Many are building a dozen apps every week!!
  10. 10. © 2014 Forrester Research, Inc. Reproduction Prohibited 10 Source: June 7, 2013, “The Steadily Growing Database Market Is Increasing Enterprises’ Choices” Forrester report Database categorization based on function
  11. 11. © 2014 Forrester Research, Inc. Reproduction Prohibited 11 Source: February 13, 2014, “TechRadar™: Enterprise DBMS, Q1 2014” Forrester report TechRadar: Database Management 2014
  12. 12. © 2014 Forrester Research, Inc. Reproduction Prohibited 12 Connected data has become critical for any business to succeed CustomerCompany Products Friends GeoLocation Devices Services Support Billing Tweets FacebookYelp Linkedin
  13. 13. © 2014 Forrester Research, Inc. Reproduction Prohibited 13 › Imagine doing a 100 table join in Relational . • How long will it take to run? • How long will you SQL statement be? • What kind of indexes would be needed? • Will it try to create a Cartesian product? • What kind of system resources are needed? ..but dealing with connected data is complex and resource intensive…
  14. 14. © 2014 Forrester Research, Inc. Reproduction Prohibited 14 Graph Databases Overcome these issues… offer new possibilities! › Graph databases simplify and speed up access to data containing many relationships. › Graph structures consist of nodes (things), edges (relationships), and properties (key values) to store and access complex data relationships which is challenging in other database types. › Graph databases directly support relationships and can rapidly access complex networks of connected data.
  15. 15. © 2014 Forrester Research, Inc. Reproduction Prohibited 15 Graph databases supports many use cases … › Social network Apps – E.g.. Facebook, twitter, LinkedIn. › Pattern analysis - E.g.. Detecting fraud, consumer behavior › Analysis of massive data - communication/network management › Recommendation engines › Consumer personalization › Mobile Apps › Gaming › Up-sell/cross-sell › Real-time Apps › Others..
  16. 16. © 2014 Forrester Research, Inc. Reproduction Prohibited 16 Recommendations › Graph Databases should be part of your DBMS strategy › NoSQL has become more mature, with 25% adoption › Graph Databases offer many use cases that go beyond traditional Application and business requirements so think differently › Train your developers, data architects and administrators on graph databases › Remember not all applications are good for Graph so pick ones that are dealing with lots of connected data requirements › Start small and grow. Build smaller graph Apps to understand its business and technology value and then expand to larger ones. › Graph Databases offer endless possibilities – Remember enterprises that’ll leverage data more efficiently are more likely to succeed and have a competitive edge.
  17. 17. Thank you Noel Yuhanna +1 650.581.3807
  18. 18. Discovering Valuable Connections in Big Data Making Sense of Graph  Database Technologies Brian Clark VP Product Management Objectivity, Inc.© 2014, Confidential
  19. 19. Agenda • An overview of NoSQL  • Why Graph? • Graph databases • Business value – what to look for? • Technical value ‐ what to look for? • Objectivity Performance Centre Objectivity, Inc.© 2014, Confidential
  20. 20. NOSQL An Overview of Four Primary NOSQL Technologies.
  21. 21. The “Not Only SQL” MarketConnectedData Query and Navigational Complexity Big Table Clones BigTable (Google), Cassandra, Cloudera, Hbase, Hypertable Scalable, Distributed Graph Database FlockDB (Twitter), AllegroGraph, DEX, InfoGrid, Neo4J, Titan Graph & Object Databases Key-Value Stores Dynamo (Amazon), Voldemort (LinkedIn), Citrusleaf, Membase, Risk, Tokyo, Cabinet Document Databases CouchOne, MongoDB, OrientDB, Terrastore
  22. 22. © Copyright 2014 Objectivity, Inc. All Rights Reserved. Strictly Confidential. Big Data Tools Massively  Parallel Data  Streams Ingest Hadoop Process Map/ Reduce Store/ Database Analysis Visualization Palantir NoSQL Files Objectivity/DB Custom Analytics & Visualization Graph/  Object DB Analytics &  Visualization Apps RDBMS InfiniteGraph
  23. 23. Ingest Process & Correlation The New Big Data Workflow © Copyright 2014 Objectivity, Inc. All Rights Reserved. Strictly Confidential. Analysis & Visualization
  24. 24. WHY GRAPH?
  25. 25. Why Graph?  According to a report by industry observer DB‐Engines, “Graph DBMSs are  gaining in popularity faster than any other database category,” growing 300  percent since January of last year. Objectivity, Inc.© 2014, Confidential
  26. 26. Why Graph?  The real world is not a set of neatly lined rows and columns.  • It’s all about understanding relationships and connections  • Graph’s relationship based data model enables modeling of  real world, complex, interconnected use cases. • Find hidden value to improve business decisions, efficiencies  and increase growth.  • High performance, complex query capabilities. Objectivity, Inc.© 2014, Confidential
  28. 28. Object Databases OID OBJECT Connections • Data Model: – Every object instance belongs to a class (type) and has a group of values (properties). – Every object instance has a unique object identifier [OID]. – Connections implemented using OIDs. • Examples: – Objectivity/DB and db4objects. • Strengths: – Simple, powerful data model that includes inheritance and polymorphism. – Good scalability if sharding is supported. – Uses object identifiers instead of “JOINs” to support very fast navigational operations. • Weaknesses: – Supports standard object oriented languages but isn't supported by a wide range of third party tools in the way that SQL is.
  29. 29. Graph Databases VERTEX EDGE 2 N • Data model: – Node (Vertex) and Relationship (Edge) objects. – Directed. • Examples: – InfiniteGraph, Neo4j, OrientDB, AllegroGraph, TitanDB. • Strengths: – Extremely fast for connected data. – Scales out, typically. – Easy to query (navigation). – Simple data model. • Weaknesses: – Requires conceptual shift... a different way of thinking.
  30. 30. Graph Computing Graph Databases Graph Analytics ‐Transactions ‐Indices ‐Concurrency ‐Availability ‐Schema ‐‘User time’ ‐Processing ‐Stateless ‐Batch ‐Supersteps ‐Algorithms ‐‘Business  time’ GraphLab Faunus (Aurelius) Apache Giraph / Pregel (Google) IG Neo4j (Neo Techlogies) Titan (Aurelius) Dex (Sparsity) ‐Queries ‐Pathfinding ‐Graphviews ‐Pipelining ‐Formatters ‐Exporters
  31. 31. Graph DB Use Cases Objectivity, Inc.© 2014, Confidential
  32. 32. Sample of Graph Database Options Objectivity, Inc.© 2014, Confidential
  33. 33. BUSINESS VALUE What to look for?
  34. 34. Business Values • Enterprise Ready and Proven – Optimized for Multi‐user/ multi‐application environments – Distributed and scalable  – Real‐time access to data – High performance search and discovery • Lower Total Cost of Ownership – How does the graph database maximize the use of  expensive scarce resources (cores, memory, disk and  network)?
  35. 35. Business Value – Enterprise Ready & Proven • Is the graph database optimized for the enterprise: – Concurrency ‐ are you able to run many threads, many  processes against the graph database? – Can many different applications from many different  locations access the graph database? • Does the graph database work in a distributed  environment: – Are both distributed data and processing supported? – Does it scale out (rather than scale up)? • What levels of support are available?
  36. 36. Business Value – Enterprise Ready & Proven • Data availability: – Is the graph data immediately available? • Or what is the latency? – Are indexes immediately consistent? • Some 3rd party indexes are not immediately consistent. – Can you use 3rd party key/value stores during ingest? • Can improve ingest performance. • High Performance Search and Discovery: – Does the graph database support schema‐less or schema‐full  approaches? • Trade off between flexibility and performance. • Trade off between flexibility and reliability (constraints implemented  by schema).
  37. 37. Business Value ‐ TCO • Lower TCO (Total Cost of Ownership): – Can the graph processing be distributed across multiple  computers? – Does the whole graph have to fit in memory? – How is the network utilized? Send the data to the  processing or send the processing to the data? – How much space does the graph database occupy on disk? 
  38. 38. TECHNICAL VALUE What to look for?
  39. 39. Performance Measurement • Measurement Criteria • Performance: – Measure throughput – ingest nodes &  edges per second; lookups per second;  traversals (paths, hops) per second. • Parallelism (distributed): – Scalability: • Processing; • Storage; • Measurement ‐ how much, how many? – Concurrency: • Multi‐threaded; multi‐user; multi‐ computer; • Measurement ‐ #concurrent users,  transactions. • Usability • Different resources can affect  performance: • CPU: – How many, # cores? • Memory: – How much? • Storage: – Local & remote? How many? – Type – SSD or rotational? • Network: – Bandwidth & latency? • Technology: – price/performance?
  40. 40. Technical Value – A Test Case • Clickstream data a good test for concurrency by splitting  up the files for parallel ingest of vertices and edges. • Clickstream data loaded into InfiniteGraph 3 ways: • single threaded – create vertices, create edges and make connections; • multiple threads within single process for improved throughput  (beware of deadlocks) – create vertices, create edges and make  connections; • multiple threads within single process, create vertices, create edges,   then use pipeline agents to complete the graph overcoming  deadlocks. • Clickstream data used to perform explore and navigate  (shortest path) queries. Generated graph has good  “connectedness” but no real schema.
  42. 42. Goals & Objectives of the Performance Centre • Improve Understanding of NoSQL Products and  Technologies. • Internal and External Education and Training. • Encourage Partner Collaboration. • Discover Areas for Improvement. • Develop a Customer Centric Suite of Tests for  Performance Comparisons.
  43. 43. 1,000,000 2,000,000 4,000,000 8,000,000 16,000,000 32,000,000 IG33 513 671 692 652 753 561 Neo4j 4790 5512 6298 5639 7347 7324 Titan‐B 1866 3517 5834 5834 6283 6310 Titan‐C 763 1435 3797 2407 4177 3548 Titan‐H 1389 1389 0 1000 2000 3000 4000 5000 6000 7000 8000 createTriples ‐ memory usage ‐ MB IG33 Neo4j Titan‐B Titan‐C Titan‐H Example of memory use
  44. 44. Q&A Thank you for your time! Please contact us for a complimentary solution  evaluation at Visit our website for access to technical  resources, demos and free trial downloads of our products.  Objectivity, Inc.© 2014, Confidential