Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How we Learned to Stop Worrying and Solve the Distributed Graph Problem

984 views

Published on

Graphs, what are they and why?
Graph Data Management. Why do we need it?
Problems in Distributed Graph
How we solved the problems?
Finding the value in Big Data.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

How we Learned to Stop Worrying and Solve the Distributed Graph Problem

  1. 1. www.Objectivity.comIbrahim SallamDirector of Development
  2. 2. What we’ll talk about• Graphs, what are they and why?• Graph Data Management. Why do we need it?• Problems in Distributed Graph• How we solved the problems
  3. 3. Graphs
  4. 4. Graphshttp://opte.orgSimple GraphNode <-> Node
  5. 5. The Value of Data• “[finding] Japanese restaurants in New YorkCity liked by people from Japan” – a challengingFacebook query.• "The value of any piece of information is onlyknown when you can connect it withsomething else that arrives at a future point intime,” – CIA’s CTO Ira "Gus" Hunt
  6. 6. Connecting DataPerson Building?Work Live RR Visit Eat Shop
  7. 7. Relationships are everywhereCRM, Sales &Marketing NetworkMgmt, TelecomIntelligence(Government& Business)FinanceHealthcareResearch:GenomicsSocialNetworksPLM (ProductLifecycleMgmt)IG
  8. 8. Graph Data• Data structure for representing complicatedrelationships between entities.• Wide applications in…– Bioinformatics.– Social networks.– Network management… etc.
  9. 9. Why a Graph Database ?
  10. 10. The Graph Specialist !• Everyone specializes– Doctors, Lawyers, Bankers, Developers • Why was data so normalized for so long !• NoSQL is all about the data specialist• Specializing in…– Distribution / deployment– Physical data storage– Logical data model– Query mechanism
  11. 11. What Problem!IngestUpdateNavigateMassive graph data require efficient and intelligent toolsto analyze and understand it.
  12. 12. Scaling Writes• Big/Fast data demands write performance• Most NoSQL solutions allow you to scalewrites by…– Partitioning the data– Understanding your consistency requirements– Allowing you to defer conflictsIngest
  13. 13. App-2(Ingest V2)App-2(E23{ V2V3})Scaling Graph WritesACID TransactionsInfiniteGraphObjectivity/DB Persistence LayerApp-1(Ingest V1)App-3(Ingest V3)V1 V2 V3App-1(E1 2{ V1V2})App-3E12 E23Ingest
  14. 14. High Performance Edge IngestIG Core/APIC1C2C3E12E23TargetContainersPipelineContainersE(1->2)E(3->1)E(2->3)E(2->1)E(2->3)E(3->1)E(1->2)E(3->2)E(1->2)E(2->3)E(3->1)E(2->1)E(2->3)E(3->1)E(3->2)E(1->2)PipelineAgentIngest
  15. 15. Trade offs• Excellent for efficient use of page cache• Able to maintain full database consistency• Achieves highest ingest rate in distributedenvironments• Almost always has highest “perceived” rate• Trading Off :• Eventual consistency in graph (connections)• Updates are still atomic, isolated and durable but phased• External agent performs graph buildingIngest
  16. 16. Result…1 client2 clients4 clients8 clients050000100000150000200000250000300000350000400000450000500000124NodesandEdgespersecond1 client2 clients4 clients8 clientsIngest
  17. 17. Scaling Reads and QueryDistributed APIApplication(s)Partition 1 Partition 3Partition 2 Partition ...nProcessor Processor Processor ProcessorPartitioning and Read Replicas… easy right !
  18. 18. Why are Graphs Different ?Distributed APIApplication(s)Partition 1 Partition 3Partition 2 Partition ...nProcessor Processor Processor ProcessorNavigate
  19. 19. Distributed Navigation• Detect local hops and perform in memorytraversal• Send the partial path to the distributedprocessing to continue the navigation.• Intelligently cache remote data when accessedfrequently• Route tasks to other hosts when it is optimalNavigate
  20. 20. Distributed Navigation ServerProcessorDistributed APIPartition 1 Partition 2ProcessorApplicationAXYBCDEP(A,B,C,D)FGNavigate
  21. 21. Schema – It’s not your enemy !(at least not all the time...)• Schema vs Schema-less– Database religion– InfiniteGraph supports schema, but does notrestrict connection types between vertices– We take advantage of the schema when pruning.– …Working on a hybrid support.
  22. 22. GraphViewsLeveraging Schema in the GraphPatient PrescriptionDrugIngredientOutcomeComplaintVisitAllergyPhysicianNavigate
  23. 23. Schema Enables Views• GraphViews are extremely powerful• Allow Big Data to appear small !• Connection inference can lead to exponentialgains in query performance• Views are reusable between queries• Built into the native kernelNavigate
  24. 24. Why InfiniteGraph™?• Objectivity/DB is a proven foundation– Building distributed databases since 1993– A complete database management system• Concurrency, transactions, cache, schema, query, indexing• It’s a Graph Specialist !– Simple but powerful API tailored for data navigation.– Easy to configure distribution model
  25. 25. Advanced Configured Placement• Physically co-locate “closely related” data• Driven through a declarative placement model• Dramatically speeds “local” readsFacility Data Page(s)Patient Data Page(s)MrCitizenVisit VisitDrJonesSanJoseFacilityDrSmithPrimaryPhysicianHasHas WithAtLocated LocatedFacility Data Page(s)DrBlakeSunny-valeDrQuinnLocated LocatedWithAt
  26. 26. Fully Distributed Data ModelZone 2Zone 1HostAIG Core/APIDistributed Object and Relationship Persistence LayerCustomizable PlacementHostB HostC HostXAddVertex()
  27. 27. Polyglot NoSQL ArchitecturesDistributed DataProcessingPlatform DocumentGraphDatabaseRDBMSPartitioned Distributed DB (often Document / KV)UsersApplicationsExternal/LegacyDataTransformationMDMBusiness
  28. 28. What else!• Distributed update.Update… we are working on it.
  29. 29. ConclusionHave we solved all the distributed graph problems?Not quite, but…We Learned to Stop Worrying.
  30. 30. QUESTIONS?

×