Successfully reported this slideshow.
Your SlideShare is downloading. ×

hbaseconasia2019 Recent work on HBase at Pinterest

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 38 Ad

hbaseconasia2019 Recent work on HBase at Pinterest

Download to read offline

Lianghong Xu
Track 3: Applications
https://open.mi.com/conference/hbasecon-asia-2019
THE COMMUNITY EVENT FOR APACHE HBASE™
July 20th, 2019 - Sheraton Hotel, Beijing, China
https://hbase.apache.org/hbaseconasia-2019/

Lianghong Xu
Track 3: Applications
https://open.mi.com/conference/hbasecon-asia-2019
THE COMMUNITY EVENT FOR APACHE HBASE™
July 20th, 2019 - Sheraton Hotel, Beijing, China
https://hbase.apache.org/hbaseconasia-2019/

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to hbaseconasia2019 Recent work on HBase at Pinterest (20)

Advertisement

More from Michael Stack (20)

Advertisement

hbaseconasia2019 Recent work on HBase at Pinterest

  1. 1. Recent work on HBase at Pinterest Lianghong Xu Pinterest Software Engineer, Tech Lead
  2. 2. Introduction
  3. 3. HBase at Pinterest • Backend for many critical services • Graph database (Zen) • Generic KV store (UMS) • Around 50 HBase clusters • HBase 0.94 since 2013, HBase 1.2 since 2016 • Internal repo with ZSTD, CCSMAP, Bucket cache, etc.
  4. 4. Agenda • Omid: transaction layer for NoSQL database • Sparrow: Omid made scalable • Argus: database observer framework • Ixia: near-realtime HBase indexing
  5. 5. Agenda • Omid: transaction layer for NoSQL database • Sparrow: Omid made scalable • Argus: database observer framework • Ixia: near-realtime HBase indexing
  6. 6. NoSQL Embracing Transactions SQL NoSQL Relational Transactional Expressive Simple Fast Scalable
  7. 7. NoSQL Embracing Transactions SQL NoSQL Relational Transactional Expressive Simple Fast Scalable
  8. 8. NoSQL Embracing Transactions SQL NoSQL Relational Transactional Expressive Simple Fast Scalable
  9. 9. Apache Omid at Pinterest • Omid (Optimistically transaction Management In Datastores) • Transaction framework on top of KV stores with HBase support • Open-sourced by Yahoo! in 2016 • Powers next generation of Ads indexing at Pinterest
  10. 10. Apache Omid at Pinterest • Omid (Optimistically transaction Management In Datastores) • Transaction framework on top of KV stores with HBase support • Open-sourced by Yahoo! in 2016 • Powers next generation of Ads indexing at Pinterest • Pros: simple, reasonable performance, HA, pluggable backend with native HBase support • Cons: No SQL interface, limited isolation levels, requires MVCC support
  11. 11. Omid Architecture Client Transaction Manager (TM) begin/commit timestamp/commit status Data tables Commit table read/write check commit persist commit
  12. 12. Omid internals • Leverages Multi-version Concurrency Control (MVCC) support in HBase • Transaction ID (begin timestamp) in version, commit timestamp in shadow cell • OCC: lock-free implementation with central conflict detection mechanism Omid data and commit table
  13. 13. Agenda • Omid: transaction layer for NoSQL database • Sparrow: Omid made scalable • Argus: database observer framework • Ixia: near-realtime HBase indexing
  14. 14. Omid Architecture Client Transaction Manager (TM) begin/commit timestamp/commit status Data tables Commit table read/write check commit persist commit
  15. 15. Omid Scalability Problem Client Transaction Manager (TM) begin/commit timestamp/commit status Data tables Commit table read/write check commit persist commit Centralized batch commit to HBase
  16. 16. Omid Scalability Problem Client Transaction Manager (TM) begin/commit timestamp/commit status Data tables Commit table read/write check commit persist commit Centralized batch commit to HBase Single-threaded request/reply processor for serializability
  17. 17. Sparrow Architecture Client Transaction Manager (TM) begin/commit timestamp/commit status Data tables Commit table read/write check commit Single-threaded request/reply processor for serializability
  18. 18. Sparrow Architecture Client Transaction Manager (TM) begin/commit timestamp/commit status Data tables Commit table read/write check commit Single-threaded request/reply processor for serializability persist commit Distributed client-side commit
  19. 19. Sparrow Architecture Client Transaction Manager (TM) begin/commit timestamp/commit status Data tables Commit table read/write check commit persist commit Distributed client-side commit Parallel request processing
  20. 20. Sparrow: Omid made scalable Client Transaction Manager (TM) begin/commit timestamp/commit status Data tables Commit table read/write check commit persist commit Distributed client-side commit Parallel conflict detection persist commit Performance bottleneck
  21. 21. Sparrow techniques • Client-side commit • Client writes to commit table when there is no conflicts • Explicitly mark aborted txn in commit table (-1) • Reader may back off and abort concurrent writer in case of client failure or network partition • Avoid performance bottleneck on TM • Parallel request processing • Multi-threaded request processor with in-memory conflict map • beginTx no longer needs to wait until whole commit batch is written to HBase • Timestamp allocation still needs to be synchronized (with negligible overhead)
  22. 22. Sparrow vs. Omid beginTx P99: ~100X reduction commitTx P99: ~3X reduction
  23. 23. Agenda • Omid: transaction layer for NoSQL database • Sparrow: Omid made scalable • Argus: database observer framework • Ixia: near-realtime HBase indexing
  24. 24. Argus: Motivation and Problem Statement • Clients request a real-time notification feature similar to a database trigger • Incremental processing based on database changes • Notification cannot be missed - ”at least once” • Notification events could have different priorities and object types
  25. 25. Kafka-based Notification Pipeline
  26. 26. Kafka-based Notification Pipeline Percolator (Google) • Special notification column • Observer threads periodically scan for changes • Heavy-weight distributed scan and locking
  27. 27. Kafka-based Notification Pipeline Percolator (Google) Argus • Special notification column • Observer threads periodically scan for changes • Heavy-weight distributed scan and locking • Async notification by tailing HBase WAL • Kafka for replayable DB change stream • Support different priorities and types • Lightweight, minimal impact on DB
  28. 28. Argus Architecture Client Argus Observer HBase Annotated requests Replication proxy WAL Notification events (Kafka) read/write • HBase annotation: extra metadata in HBase requests to be passed down into WAL • Replication Proxy: ”fake” regionservers with only replication RPC implemented
  29. 29. Argus Observers • Process notification events in parallel with user-defined handlers • Event dispatching, filtering, collapse, etc. • Notification Handlers can be chained
  30. 30. Argus Observers • Process notification events in parallel with user-defined handlers • Event dispatching, filtering, collapse, etc. • Notification Handlers can be chained Use case on Ads indexing: Batch processing (15 mins) -> incremental indexing (several seconds)
  31. 31. Agenda • Omid: transaction layer for NoSQL database • Sparrow: Omid made scalable • Argus: database observer framework • Ixia: near-realtime HBase indexing
  32. 32. Ixia: Motivation • Clients ask for secondary indexing support in HBase • Analytics queries on HBase columns (filtering, range, aggregation) • Why not SQL? • Index build could take a long time • Lack of horizontal scalability and tuning expertise
  33. 33. Ixia: Near-realtime Indexing with HBase + Muse (In-house Search Engine) • Inspired by Lily indexer (HBase + Solr) • Secondary indexes in Muse (written in C++, fast in-memory inverted/forward index) • Source-of-truth data in HBase • Index built asynchronously with HBase WAL through Kafka • Ixia query engine: Thrift-based query service with a SQL-like interface
  34. 34. Ixia: Architecture Client HBase Replication proxy WAL Indexer Index events (Kafka) write Muse index docs MuseQuery engine query search query DB retrieval Index manager Index schema
  35. 35. Ixia: Pros and Cons Pros Cons Minimal impact on write path Index and data stores scaled separately Efficient indexing & retrieval No strong index consistency
  36. 36. Ixia: status and future work • Batch indexing in prod, reducing indexing time by ~15X • Query engine serving full dark traffic, reducing query latency by up to 100X • Future work: • Realtime indexing into production • SQL support • Dynamic index backfilling
  37. 37. Thanks!

×