Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

B 4 gravty

8,340 views

Published on

TBD

Published in: Technology

B 4 gravty

  1. 1. 1 What Is Gravty? 2 The Internals of Gravty 3 Fine-Tuning Gravty 4 Future Plans
  2. 2. 1 What Is Gravty? 2 The Internals of Gravty 3 Fine-Tuning Gravty 4 Future Plans
  3. 3. A Graph Database Is “A graph database is a database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data.” (Wikipedia) Stores objects (vertices) and relationships (edges) Provides graph search capabilities
  4. 4. Vertices and Edges in a Graph Database Friends Friends Likes
  5. 5. Use Cases of a Graph Database Facebook Social Graph Social networks Google PageRank Ranking websites Walmart and eBay Product recommendation
  6. 6. Need for a Large Graph Database System Social GraphLINE Timeline LINE Talk Ranking Recommendation LINE Friends Shop LINE News Gravty
  7. 7. Need for a Large Graph Database System Social GraphLINE Timeline LINE Talk Ranking Recommendation LINE Friends Shop LINE News Gravty 7 billion vertices 100 billion edges 200 billion indexes 5 billion writes a day (create / update / delete)
  8. 8. Gravty Is A scalable graph database to search relational information efficiently by searching through a large pool of data using the graph search technique.
  9. 9. Requirements for Gravty Easy to scale out • To support ever-increasing data Easy to develop • Add, modify, and remove features as necessary • Tailored to the LINE development environment • Not dependent on LINE- specific components Full control over everything! Easy to use • Graph query language • REST API
  10. 10. 1 What Is Gravty? 2 The Internals of Gravty 3 Fine-Tuning Gravty 4 Future Plans Technology Stack and Architecture Data Model
  11. 11. Technology Stack and Architecture Application TinkerPop3 Gremlin-Console TinkerPop3 Graph API Graph Processing Layer Storage Layer MySQL (config, meta) HBaseKafka Gravty
  12. 12. MySQL (config, meta) Kafka Application TinkerPop3 Gremlin-Console TinkerPop 3.2.0 Graph API Graph Processing Layer (OLTP only) HBase Storage Layer Gravty
  13. 13. HBase 1.1.x Local MemoryKafka 0.10.0.0 Phoenix 4.8.0 Application TinkerPop3 Gremlin-Console TinkerPop3 Graph API Gravty Storage Layer (Abstract Interface) Phoenix Repository (Default) Memory Repository (Standalone) Graph Processing Layer
  14. 14. • Row key: vertex-id • Edges are stored in columns • Disadvantages Data Model Flat-Wide Table Column scan is slow Columns cannot be split Row Column vertex-id1 property property edge edge edge edge edge edge vertex-id2 … vertex-id3 …
  15. 15. • Row key: edge-id Data Model Tall-Narrow Table (Gravty) SrcVertexId-Label-TgtVertexId Row Column svtxid1-label-tvtxid2 edge property edge property svtxid1-label-tvtxid3 … … • Edges are stored in rows • Advantages More effective edge scan Parallel execution
  16. 16. Friends Flat-Wide vs Tall-Narrow g.V(“brown”).out(“friends”).id().limit(3) Brown Cony Moon Sally [cony, moon, sally]
  17. 17. Flat-Wide vs Tall-Narrow Flat-Wide Model Brown edge edge edge edge edge edge (1) Row scan 2 operations (2) Column scan [cony, moon, sally] ‘likes’ ‘friends’
  18. 18. Flat-Wide vs Tall-Narrow Tall-Narrow Model (Gravty) brown-friends-sally (1) Row scan 1 operation [cony, moon, sally] brown-friends-moon brown-friends-cony • Can split by rows (region) • Can isolate hotspot rows • Can scan in parallel
  19. 19. Flat-Wide vs Tall-Narrow g.V(“brown”).out(“friends”).out(“friends”). id().limit(10) 4 searches in total • Flat-Wide = 8 operations • Tall-Narrow (Gravty) = 4 operations
  20. 20. 1 What Is Gravty? 2 The Internals of Gravty 3 Fine-Tuning Gravty 4 Future Plans Faster, Compact Querying Avoiding Hot-Spotting Efficient Secondary Indexing
  21. 21. Faster, Compact Querying g.V(brown).hasLabel("user").out("friends”).order().by(“name”, Order.incr).limit(5) Reducing graph traversal steps GraphStep VertexStepFilterStep RangeStepFilterStep GGraphStep GVertexStep
  22. 22. Faster, Compact Querying g.V(brown) .outE("friends”).limit(5) .inV().order().by("name", Order.incr) .properties("name") inV(): Pipelined iterator from outE() • TinkerPop: Sequential consuming • Gravty: Parallel querying + pre-loading vertex property Querying in parallel and pre-loading vertex properties outE() “name”: “Boss” limit 5 friends inV() “name”: “Edward” “name”: “Moon” “name”: “James” “name”: “Jessica” “name”: “Cony” “name”: “Sally”
  23. 23. Row keys that have sequential orders may cause RegionServers to suffer: Hot-spotting problem with HBase RegionServer EDGE TABLE SrcVertexId Label TgtVertexId u000001 1 u000002 u000001 1 u000003 u000002 1 u000001 u000003 1 u000001 u000004 2 u000009 • Heavy loads of writes or reads • Inefficient region splitting Avoiding Hot-Spotting
  24. 24. Solutions to the hot-spotting problem - Pre-splitting regions - Salting row keys with a hashed prefix (Salting tables by Apache Phoenix) But, there is a scan performance issue with the LIMIT clause SELECT * FROM index … LIMIT 100; Avoiding Hot-Spotting
  25. 25. Avoiding Hot-Spotting Phoenix Salted Table Scan 100 rows Client side merge sort Phoenix Client Result Scan 100 rows Scan 100 rows Scan 100 rows Scan maximum 400 rows
  26. 26. Avoiding Hot-Spotting Custom Salting + Pre-splitting hash (source-vertex-id) Result Phoenix Client Scan 100 rows sequentially Row Key Prefix
  27. 27. Indexed graph view for faster graph search Asynchronous index processing using Kafka Efficient Secondary Indexing Tools for failure recovery
  28. 28. Default Phoenix IndexCommitter HRegion HRegion HRegion HRegion HRegion HRegion Put Delete Put Indexer Coprocessor Phoenix Driver numConnections = regionServers * regionServers * needConnections Index update Index update Too many connections on each RegionServer (Network is heavily congested) Synchronous processing of index update requests
  29. 29. Gravty IndexCommitter HRegion HRegion HRegion HRegion HRegion HRegion Put Delete Put Indexer Coprocessor Phoenix Driver numConnections = indexers * regionServers * needConnections Mutations Asynchronous processing using Kafka Kafka Indexer Indexer Index update
  30. 30. Default Phoenix IndexCommitter 1. Phoenix client UPSERT INDEX 1 Phoenix Coprocessor Region Server Primary Table Phoenix Coprocessor Region Server INDEX 2 Phoenix Coprocessor Region Server PUT PUT / DELETE PUT / DELETE 2. Request HBase mutations for indexes in parallel RETURN 3. Phoenix client returns
  31. 31. Gravty IndexCommitter INDEX 1 Phoenix Coprocessor Region Server Primary Table Phoenix Coprocessor Region Server INDEX 2 Phoenix Coprocessor Region Server 1.PUT 2. HBase mutations for INDEX 1, 2 4. Consume 3.RETURN Kafka Index Consumer 5. PUT / DELETE 5. PUT / DELETE
  32. 32. Secondary Indexing Metrics Server TPS RegionServer Number of connections 3x 1/8
  33. 33. Reentrant event processing Every row is versioned in HBase (timestamp) Logging failures and replaying failed requests Time machine to resume at certain runtime Resetting runtime offset of Kafka consumers Best-Effort Failover Fail fast, fix later
  34. 34. Monitoring Tools for Failure Recovery Setting alerts and displaying metrics • Prometheus • Dropwizard metrics • jvm_exporter • Grafana • Ambari
  35. 35. 1 What Is Gravty? 2 The Internals of Gravty 3 Fine-Tuning Gravty 4 Future Plans
  36. 36. Client Graph API Multiple Graph Clusters Before Gravty HBase Cluster Client Graph API After Gravty HBase Cluster HBase Cluster HBase Cluster
  37. 37. HBase Repository Storage Layer Memory Repository (Standalone) Phoenix Repository (Default) HBase Repository Abstract Interface HBase Phoenix Region CoprocessorLocal Memory
  38. 38. Graph analytics system graph computation OLAP Functionality TinkerPop Graph Computing API
  39. 39. We will open source Gravty

×