How we Learned to Stop Worrying and Solve the Distributed Graph Problem

868 views

Published on

Graphs, what are they and why?
Graph Data Management. Why do we need it?
Problems in Distributed Graph
How we solved the problems?
Finding the value in Big Data.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
868
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • http://opte.org/maps/Asia Pacific - RedEurope/Middle East/Central Asia/Africa - GreenNorth America - BlueLatin American and Caribbean - YellowRFC1918 IP Addresses - CyanUnknown - White
  • http://www.computerworld.com/s/article/9237037/Facebook_engineers_identify_big_data_challenges_for_Graph_Searchhttp://www.huffingtonpost.com/2013/03/20/cia-gus-hunt-big-data_n_2917842.html
  • Some data is only actionable momentarilyIntelligenceIT SecuritySite/page visitFinancial / trading behaviorPresents a different type of challengeLatency of batch data processing becomes problematic
  • How we Learned to Stop Worrying and Solve the Distributed Graph Problem

    1. 1. www.Objectivity.comIbrahim SallamDirector of Development
    2. 2. What we’ll talk about• Graphs, what are they and why?• Graph Data Management. Why do we need it?• Problems in Distributed Graph• How we solved the problems
    3. 3. Graphs
    4. 4. Graphshttp://opte.orgSimple GraphNode <-> Node
    5. 5. The Value of Data• “[finding] Japanese restaurants in New YorkCity liked by people from Japan” – a challengingFacebook query.• "The value of any piece of information is onlyknown when you can connect it withsomething else that arrives at a future point intime,” – CIA’s CTO Ira "Gus" Hunt
    6. 6. Connecting DataPerson Building?Work Live RR Visit Eat Shop
    7. 7. Relationships are everywhereCRM, Sales &Marketing NetworkMgmt, TelecomIntelligence(Government& Business)FinanceHealthcareResearch:GenomicsSocialNetworksPLM (ProductLifecycleMgmt)IG
    8. 8. Graph Data• Data structure for representing complicatedrelationships between entities.• Wide applications in…– Bioinformatics.– Social networks.– Network management… etc.
    9. 9. Why a Graph Database ?
    10. 10. The Graph Specialist !• Everyone specializes– Doctors, Lawyers, Bankers, Developers • Why was data so normalized for so long !• NoSQL is all about the data specialist• Specializing in…– Distribution / deployment– Physical data storage– Logical data model– Query mechanism
    11. 11. What Problem!IngestUpdateNavigateMassive graph data require efficient and intelligent toolsto analyze and understand it.
    12. 12. Scaling Writes• Big/Fast data demands write performance• Most NoSQL solutions allow you to scalewrites by…– Partitioning the data– Understanding your consistency requirements– Allowing you to defer conflictsIngest
    13. 13. App-2(Ingest V2)App-2(E23{ V2V3})Scaling Graph WritesACID TransactionsInfiniteGraphObjectivity/DB Persistence LayerApp-1(Ingest V1)App-3(Ingest V3)V1 V2 V3App-1(E1 2{ V1V2})App-3E12 E23Ingest
    14. 14. High Performance Edge IngestIG Core/APIC1C2C3E12E23TargetContainersPipelineContainersE(1->2)E(3->1)E(2->3)E(2->1)E(2->3)E(3->1)E(1->2)E(3->2)E(1->2)E(2->3)E(3->1)E(2->1)E(2->3)E(3->1)E(3->2)E(1->2)PipelineAgentIngest
    15. 15. Trade offs• Excellent for efficient use of page cache• Able to maintain full database consistency• Achieves highest ingest rate in distributedenvironments• Almost always has highest “perceived” rate• Trading Off :• Eventual consistency in graph (connections)• Updates are still atomic, isolated and durable but phased• External agent performs graph buildingIngest
    16. 16. Result…1 client2 clients4 clients8 clients050000100000150000200000250000300000350000400000450000500000124NodesandEdgespersecond1 client2 clients4 clients8 clientsIngest
    17. 17. Scaling Reads and QueryDistributed APIApplication(s)Partition 1 Partition 3Partition 2 Partition ...nProcessor Processor Processor ProcessorPartitioning and Read Replicas… easy right !
    18. 18. Why are Graphs Different ?Distributed APIApplication(s)Partition 1 Partition 3Partition 2 Partition ...nProcessor Processor Processor ProcessorNavigate
    19. 19. Distributed Navigation• Detect local hops and perform in memorytraversal• Send the partial path to the distributedprocessing to continue the navigation.• Intelligently cache remote data when accessedfrequently• Route tasks to other hosts when it is optimalNavigate
    20. 20. Distributed Navigation ServerProcessorDistributed APIPartition 1 Partition 2ProcessorApplicationAXYBCDEP(A,B,C,D)FGNavigate
    21. 21. Schema – It’s not your enemy !(at least not all the time...)• Schema vs Schema-less– Database religion– InfiniteGraph supports schema, but does notrestrict connection types between vertices– We take advantage of the schema when pruning.– …Working on a hybrid support.
    22. 22. GraphViewsLeveraging Schema in the GraphPatient PrescriptionDrugIngredientOutcomeComplaintVisitAllergyPhysicianNavigate
    23. 23. Schema Enables Views• GraphViews are extremely powerful• Allow Big Data to appear small !• Connection inference can lead to exponentialgains in query performance• Views are reusable between queries• Built into the native kernelNavigate
    24. 24. Why InfiniteGraph™?• Objectivity/DB is a proven foundation– Building distributed databases since 1993– A complete database management system• Concurrency, transactions, cache, schema, query, indexing• It’s a Graph Specialist !– Simple but powerful API tailored for data navigation.– Easy to configure distribution model
    25. 25. Advanced Configured Placement• Physically co-locate “closely related” data• Driven through a declarative placement model• Dramatically speeds “local” readsFacility Data Page(s)Patient Data Page(s)MrCitizenVisit VisitDrJonesSanJoseFacilityDrSmithPrimaryPhysicianHasHas WithAtLocated LocatedFacility Data Page(s)DrBlakeSunny-valeDrQuinnLocated LocatedWithAt
    26. 26. Fully Distributed Data ModelZone 2Zone 1HostAIG Core/APIDistributed Object and Relationship Persistence LayerCustomizable PlacementHostB HostC HostXAddVertex()
    27. 27. Polyglot NoSQL ArchitecturesDistributed DataProcessingPlatform DocumentGraphDatabaseRDBMSPartitioned Distributed DB (often Document / KV)UsersApplicationsExternal/LegacyDataTransformationMDMBusiness
    28. 28. What else!• Distributed update.Update… we are working on it.
    29. 29. ConclusionHave we solved all the distributed graph problems?Not quite, but…We Learned to Stop Worrying.
    30. 30. QUESTIONS?

    ×