HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed Storage

Apache HBase @Pinterest
Scaling our “Feed” storage
Varun Sharma
Software Engineer
June 13, 2013

An online pinboard where you “curate” and
“discover” things you love and go do them in
real life
What is Pinterest ?

Discovery - “Follow” Model
(Follower) (Followee)
“Follower” follows
“Followee”
Follow Interest
Graph
• Follower indicates interest in
Followee’s content
• Following feed - Content
collated from followees

“Following” Feed @Pinterest
New Pin
Follower 1
Follower 2
Follower 3
.
.
.
Fanout
Fanout
Fanout
Challenges @scale
• 100s of millions of pins/repins per month
• High fanout - billions of writes per day (High throughput)
• Billions of requests per month (Low latency and high availability)

“Following” Feed on HBase
CreationTs=100,PinId=8 CreationTs=99,PinId=6
UserId=678 <Empty> <Empty>
• Pins in following feed reverse chronologically sorted
• HBase Schema - choose wide over tall
- Exploit lexicographic sorting within columns for ordering
- Atomic transactions per user (inconsistencies get noticed at
scale)
- Row level bloom filters to eliminate unnecessary seeks
- Prefix compression (FAST_DIFF)
........

Frontend
Message Bus
Workers Follow Store
Thrift + Finagle
layer
HBASE
Pin Store
Async. task
enqueue
Task dequeue
Follow
Unfollow
New Pin
Write Path
• Follow => put
• Unfollow => delete
• New Pin => multi put

Optimizing Writes
• Increase per region memstore size
- 512M memstore -> 40M HFile
- Fewer HFiles and hence less frequent compactions
• GC tuning
- More frequent but smaller pauses

Read Path
Frontend
Thrift + Finagle
layer
HBASE
Pin Metadata
Store
Retrieve
PinId(s)
Retrieve Pin
metadata

Optimizing Reads
Schema
• Prefix compression - FAST_DIFF - 4 X size reduction
• Reduced block size - 16K
Cache
• More block cache (hot set/temporal locality)
• High cache hit rates
Other Standard optimizations
• Short circuit local reads
• HBase checksums

“Scale” Challenges
ABfollow t1
ABunfollow t2
t1 < t2 (user)
M(ABfollow) t1’
M(ABunfollow) t2’
t1’ < t2’ (msg queue)
“Follow Unfollow” race
• Lack of total ordering inside
message queue
• Resolution - client side
timestamps
• Example - use t1 and t2 as cell
timestamps
user1,pin1
user1,pin2
user1,pin3
.
.
.
.
.
.
.
.
user1,pin4
user1,pin5
user1,pin6
.
.
.
.
.
.
.
.
CompactionUnbounded dataset growth
• Mapreduce/realtime trimming
unfeasible
• Coprocessors - trim during
compactions
user1,pin1
user1,pin2
user1,pin4
.
.
.
.
.
.
.
.
Trim

MTTR
20s
Stale node
30s
ZooKeeper
HDFS NN HBase Master
• MTTR < 2 minutes consistently
HBase
• ZK session timeout 30 sec
HDFS
• Tight timeouts
- socket timeout < 5 sec
- connect timeout - 1 sec X 3
• Stale node timeout - 20 sec
• Avoided during
- WAL read
- Lease recovery
- Writing splits
Failed status
checks

Single Points of Failure
Cluster 1 Cluster 2
Message Queue
Frontend
Dual writes
Cross cluster
replication
ZK Quorum
Writes
Reads

Single Points of Failure (contd)
Ephemeral
EBS
• No concept of HA shared storage on EC2
• Keep it simple
- HA namenode + QJM - hell, no !
- Operate two clusters each in its own AZ
HDFS NN

Am I Better Oﬀ ?
Redis vs HBase
• Sharding, load balancing and fault tolerance
• Longer feeds
• Resolve data inconsistencies
• Savings in $$
Cluster configuration
• hi1.4xlarge - SSD backed for performance parity
• HBase - 0.94.3 and 0.94.7
• HDFS - CDH 4.2.0

And many more...
• Rich pins
• Duplicate pin notifications
• Pinterest analytics
• Recommendations - “People who pinned this also pinned”
More to come...

HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed Storage

HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed Storage

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed Storage

Similar to HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed Storage (20)

More from Cloudera, Inc.

More from Cloudera, Inc. (20)

Recently uploaded

Recently uploaded (20)

HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed Storage