Recent work on HBase at Pinterest
Lianghong Xu
Pinterest Software Engineer, Tech Lead
Introduction
HBase at Pinterest
• Backend for many critical services
• Graph database (Zen)
• Generic KV store (UMS)
• Around 50 HBase clusters
• HBase 0.94 since 2013, HBase 1.2 since 2016
• Internal repo with ZSTD, CCSMAP, Bucket cache, etc.
Agenda
• Omid: transaction layer for NoSQL database
• Sparrow: Omid made scalable
• Argus: database observer framework
• Ixia: near-realtime HBase indexing
Agenda
• Omid: transaction layer for NoSQL database
• Sparrow: Omid made scalable
• Argus: database observer framework
• Ixia: near-realtime HBase indexing
NoSQL Embracing Transactions
SQL NoSQL
Relational
Transactional
Expressive
Simple
Fast
Scalable
NoSQL Embracing Transactions
SQL NoSQL
Relational
Transactional
Expressive
Simple
Fast
Scalable
NoSQL Embracing Transactions
SQL NoSQL
Relational
Transactional
Expressive
Simple
Fast
Scalable
Apache Omid at Pinterest
• Omid (Optimistically transaction Management In Datastores)
• Transaction framework on top of KV stores with HBase support
• Open-sourced by Yahoo! in 2016
• Powers next generation of Ads indexing at Pinterest
Apache Omid at Pinterest
• Omid (Optimistically transaction Management In Datastores)
• Transaction framework on top of KV stores with HBase support
• Open-sourced by Yahoo! in 2016
• Powers next generation of Ads indexing at Pinterest
• Pros: simple, reasonable performance, HA, pluggable backend with native HBase support
• Cons: No SQL interface, limited isolation levels, requires MVCC support
Omid Architecture
Client Transaction
Manager (TM)
begin/commit
timestamp/commit status
Data tables
Commit
table
read/write
check
commit
persist
commit
Omid internals
• Leverages Multi-version Concurrency Control (MVCC) support in HBase
• Transaction ID (begin timestamp) in version, commit timestamp in shadow cell
• OCC: lock-free implementation with central conflict detection mechanism
Omid data and commit table
Agenda
• Omid: transaction layer for NoSQL database
• Sparrow: Omid made scalable
• Argus: database observer framework
• Ixia: near-realtime HBase indexing
Omid Architecture
Client Transaction
Manager (TM)
begin/commit
timestamp/commit status
Data tables
Commit
table
read/write
check
commit
persist
commit
Omid Scalability Problem
Client Transaction
Manager (TM)
begin/commit
timestamp/commit status
Data tables
Commit
table
read/write
check
commit
persist
commit
Centralized batch
commit to HBase
Omid Scalability Problem
Client Transaction
Manager (TM)
begin/commit
timestamp/commit status
Data tables
Commit
table
read/write
check
commit
persist
commit
Centralized batch
commit to HBase
Single-threaded request/reply
processor for serializability
Sparrow Architecture
Client Transaction
Manager (TM)
begin/commit
timestamp/commit status
Data tables
Commit
table
read/write
check
commit
Single-threaded request/reply
processor for serializability
Sparrow Architecture
Client Transaction
Manager (TM)
begin/commit
timestamp/commit status
Data tables
Commit
table
read/write
check
commit
Single-threaded request/reply
processor for serializability
persist
commit
Distributed client-side commit
Sparrow Architecture
Client Transaction
Manager (TM)
begin/commit
timestamp/commit status
Data tables
Commit
table
read/write
check
commit
persist
commit
Distributed client-side commit
Parallel request processing
Sparrow: Omid made scalable
Client Transaction
Manager (TM)
begin/commit
timestamp/commit status
Data tables
Commit
table
read/write
check
commit
persist
commit
Distributed client-side commit
Parallel conflict detection
persist
commit
Performance bottleneck
Sparrow techniques
• Client-side commit
• Client writes to commit table when there is no conflicts
• Explicitly mark aborted txn in commit table (-1)
• Reader may back off and abort concurrent writer in case of client failure or network partition
• Avoid performance bottleneck on TM
• Parallel request processing
• Multi-threaded request processor with in-memory conflict map
• beginTx no longer needs to wait until whole commit batch is written to HBase
• Timestamp allocation still needs to be synchronized (with negligible overhead)
Sparrow vs. Omid
beginTx P99: ~100X reduction
commitTx P99: ~3X reduction
Agenda
• Omid: transaction layer for NoSQL database
• Sparrow: Omid made scalable
• Argus: database observer framework
• Ixia: near-realtime HBase indexing
Argus: Motivation and Problem Statement
• Clients request a real-time notification feature similar to a database trigger
• Incremental processing based on database changes
• Notification cannot be missed - ”at least once”
• Notification events could have different priorities and object types
Kafka-based Notification Pipeline
Kafka-based Notification Pipeline
Percolator
(Google)
• Special notification column
• Observer threads periodically scan for changes
• Heavy-weight distributed scan and locking
Kafka-based Notification Pipeline
Percolator
(Google)
Argus
• Special notification column
• Observer threads periodically scan for changes
• Heavy-weight distributed scan and locking
• Async notification by tailing HBase WAL
• Kafka for replayable DB change stream
• Support different priorities and types
• Lightweight, minimal impact on DB
Argus Architecture
Client
Argus
Observer
HBase
Annotated requests
Replication
proxy
WAL
Notification events (Kafka)
read/write
• HBase annotation: extra metadata in HBase requests to be passed down into WAL
• Replication Proxy: ”fake” regionservers with only replication RPC implemented
Argus Observers
• Process notification events in parallel with user-defined handlers
• Event dispatching, filtering, collapse, etc.
• Notification Handlers can be chained
Argus Observers
• Process notification events in parallel with user-defined handlers
• Event dispatching, filtering, collapse, etc.
• Notification Handlers can be chained
Use case on Ads indexing:
Batch processing (15 mins) -> incremental indexing (several seconds)
Agenda
• Omid: transaction layer for NoSQL database
• Sparrow: Omid made scalable
• Argus: database observer framework
• Ixia: near-realtime HBase indexing
Ixia: Motivation
• Clients ask for secondary indexing support in HBase
• Analytics queries on HBase columns (filtering, range, aggregation)
• Why not SQL?
• Index build could take a long time
• Lack of horizontal scalability and tuning expertise
Ixia: Near-realtime Indexing with HBase + Muse (In-house Search Engine)
• Inspired by Lily indexer (HBase + Solr)
• Secondary indexes in Muse (written in C++, fast in-memory inverted/forward index)
• Source-of-truth data in HBase
• Index built asynchronously with HBase WAL through Kafka
• Ixia query engine: Thrift-based query service with a SQL-like interface
Ixia: Architecture
Client HBase
Replication
proxy
WAL
Indexer
Index events (Kafka)
write
Muse index docs
MuseQuery engine
query
search query
DB retrieval
Index
manager
Index
schema
Ixia: Pros and Cons
Pros Cons
Minimal impact on write path
Index and data stores scaled separately
Efficient indexing & retrieval
No strong index consistency
Ixia: status and future work
• Batch indexing in prod, reducing indexing time by ~15X
• Query engine serving full dark traffic, reducing query latency by up to 100X
• Future work:
• Realtime indexing into production
• SQL support
• Dynamic index backfilling
Thanks!

hbaseconasia2019 Recent work on HBase at Pinterest

  • 2.
    Recent work onHBase at Pinterest Lianghong Xu Pinterest Software Engineer, Tech Lead
  • 3.
  • 4.
    HBase at Pinterest •Backend for many critical services • Graph database (Zen) • Generic KV store (UMS) • Around 50 HBase clusters • HBase 0.94 since 2013, HBase 1.2 since 2016 • Internal repo with ZSTD, CCSMAP, Bucket cache, etc.
  • 5.
    Agenda • Omid: transactionlayer for NoSQL database • Sparrow: Omid made scalable • Argus: database observer framework • Ixia: near-realtime HBase indexing
  • 6.
    Agenda • Omid: transactionlayer for NoSQL database • Sparrow: Omid made scalable • Argus: database observer framework • Ixia: near-realtime HBase indexing
  • 7.
    NoSQL Embracing Transactions SQLNoSQL Relational Transactional Expressive Simple Fast Scalable
  • 8.
    NoSQL Embracing Transactions SQLNoSQL Relational Transactional Expressive Simple Fast Scalable
  • 9.
    NoSQL Embracing Transactions SQLNoSQL Relational Transactional Expressive Simple Fast Scalable
  • 10.
    Apache Omid atPinterest • Omid (Optimistically transaction Management In Datastores) • Transaction framework on top of KV stores with HBase support • Open-sourced by Yahoo! in 2016 • Powers next generation of Ads indexing at Pinterest
  • 11.
    Apache Omid atPinterest • Omid (Optimistically transaction Management In Datastores) • Transaction framework on top of KV stores with HBase support • Open-sourced by Yahoo! in 2016 • Powers next generation of Ads indexing at Pinterest • Pros: simple, reasonable performance, HA, pluggable backend with native HBase support • Cons: No SQL interface, limited isolation levels, requires MVCC support
  • 12.
    Omid Architecture Client Transaction Manager(TM) begin/commit timestamp/commit status Data tables Commit table read/write check commit persist commit
  • 13.
    Omid internals • LeveragesMulti-version Concurrency Control (MVCC) support in HBase • Transaction ID (begin timestamp) in version, commit timestamp in shadow cell • OCC: lock-free implementation with central conflict detection mechanism Omid data and commit table
  • 14.
    Agenda • Omid: transactionlayer for NoSQL database • Sparrow: Omid made scalable • Argus: database observer framework • Ixia: near-realtime HBase indexing
  • 15.
    Omid Architecture Client Transaction Manager(TM) begin/commit timestamp/commit status Data tables Commit table read/write check commit persist commit
  • 16.
    Omid Scalability Problem ClientTransaction Manager (TM) begin/commit timestamp/commit status Data tables Commit table read/write check commit persist commit Centralized batch commit to HBase
  • 17.
    Omid Scalability Problem ClientTransaction Manager (TM) begin/commit timestamp/commit status Data tables Commit table read/write check commit persist commit Centralized batch commit to HBase Single-threaded request/reply processor for serializability
  • 18.
    Sparrow Architecture Client Transaction Manager(TM) begin/commit timestamp/commit status Data tables Commit table read/write check commit Single-threaded request/reply processor for serializability
  • 19.
    Sparrow Architecture Client Transaction Manager(TM) begin/commit timestamp/commit status Data tables Commit table read/write check commit Single-threaded request/reply processor for serializability persist commit Distributed client-side commit
  • 20.
    Sparrow Architecture Client Transaction Manager(TM) begin/commit timestamp/commit status Data tables Commit table read/write check commit persist commit Distributed client-side commit Parallel request processing
  • 21.
    Sparrow: Omid madescalable Client Transaction Manager (TM) begin/commit timestamp/commit status Data tables Commit table read/write check commit persist commit Distributed client-side commit Parallel conflict detection persist commit Performance bottleneck
  • 22.
    Sparrow techniques • Client-sidecommit • Client writes to commit table when there is no conflicts • Explicitly mark aborted txn in commit table (-1) • Reader may back off and abort concurrent writer in case of client failure or network partition • Avoid performance bottleneck on TM • Parallel request processing • Multi-threaded request processor with in-memory conflict map • beginTx no longer needs to wait until whole commit batch is written to HBase • Timestamp allocation still needs to be synchronized (with negligible overhead)
  • 23.
    Sparrow vs. Omid beginTxP99: ~100X reduction commitTx P99: ~3X reduction
  • 24.
    Agenda • Omid: transactionlayer for NoSQL database • Sparrow: Omid made scalable • Argus: database observer framework • Ixia: near-realtime HBase indexing
  • 25.
    Argus: Motivation andProblem Statement • Clients request a real-time notification feature similar to a database trigger • Incremental processing based on database changes • Notification cannot be missed - ”at least once” • Notification events could have different priorities and object types
  • 26.
  • 27.
    Kafka-based Notification Pipeline Percolator (Google) •Special notification column • Observer threads periodically scan for changes • Heavy-weight distributed scan and locking
  • 28.
    Kafka-based Notification Pipeline Percolator (Google) Argus •Special notification column • Observer threads periodically scan for changes • Heavy-weight distributed scan and locking • Async notification by tailing HBase WAL • Kafka for replayable DB change stream • Support different priorities and types • Lightweight, minimal impact on DB
  • 29.
    Argus Architecture Client Argus Observer HBase Annotated requests Replication proxy WAL Notificationevents (Kafka) read/write • HBase annotation: extra metadata in HBase requests to be passed down into WAL • Replication Proxy: ”fake” regionservers with only replication RPC implemented
  • 30.
    Argus Observers • Processnotification events in parallel with user-defined handlers • Event dispatching, filtering, collapse, etc. • Notification Handlers can be chained
  • 31.
    Argus Observers • Processnotification events in parallel with user-defined handlers • Event dispatching, filtering, collapse, etc. • Notification Handlers can be chained Use case on Ads indexing: Batch processing (15 mins) -> incremental indexing (several seconds)
  • 32.
    Agenda • Omid: transactionlayer for NoSQL database • Sparrow: Omid made scalable • Argus: database observer framework • Ixia: near-realtime HBase indexing
  • 33.
    Ixia: Motivation • Clientsask for secondary indexing support in HBase • Analytics queries on HBase columns (filtering, range, aggregation) • Why not SQL? • Index build could take a long time • Lack of horizontal scalability and tuning expertise
  • 34.
    Ixia: Near-realtime Indexingwith HBase + Muse (In-house Search Engine) • Inspired by Lily indexer (HBase + Solr) • Secondary indexes in Muse (written in C++, fast in-memory inverted/forward index) • Source-of-truth data in HBase • Index built asynchronously with HBase WAL through Kafka • Ixia query engine: Thrift-based query service with a SQL-like interface
  • 35.
    Ixia: Architecture Client HBase Replication proxy WAL Indexer Indexevents (Kafka) write Muse index docs MuseQuery engine query search query DB retrieval Index manager Index schema
  • 36.
    Ixia: Pros andCons Pros Cons Minimal impact on write path Index and data stores scaled separately Efficient indexing & retrieval No strong index consistency
  • 37.
    Ixia: status andfuture work • Batch indexing in prod, reducing indexing time by ~15X • Query engine serving full dark traffic, reducing query latency by up to 100X • Future work: • Realtime indexing into production • SQL support • Dynamic index backfilling
  • 38.