SlideShare a Scribd company logo
1 of 34
Download to read offline
HBASE @ FLIPBOARD
What is Flipboard?
Publications
People
Topics
Scale
100+ million users
250,000+ new users/day
15+ million magazines
Why HBase?
• Transferrable Hadoop operational expertise
• MapReduce!
• Write throughput
• Better elasticity than MySQL, our other primary data store
• Strong consistency
• Column-oriented, as opposed to simple K/V
What do we use it for?
• User-generated magazines, likes, comments
• Vanity metrics for those magazines- daily + all-time counters/HLLs
• Follow graph
• RSS feeds
• More and more every day…
Magazine Storage
• Stored in a single HBase table
• Magazines live in one column family (“magazine”)
• Articles in temporal order in another CF (“article”)
• Logically, everything shared is tagged with magazine ID (prefix
compression helps here)
• Makes the calculation of everything a user shared efficient
User Magazines
magazine:<magazineid> magazine:<magazineid> magazine:<magazineid>
sha1(userid) MagazineData (serialized JSON) MagazineData (serialized JSON) MagazineData (serialized JSON)
Magazine CF of Collection Table
Listing magazines a user has created is a single read
Data is stored in serialized JSON for language interoperability but is parsed and
serialized by plain old java objects
User Magazines
[Reverse Unix TS]:
[magazineid]
[Reverse Unix TS]:
[magazineid]
sha1(userid)
Article Data(Serialized
JSON)
Article Data(Serialized
JSON)
Articles CF of Collection Table
Kept in temporal order so that most recently shared articles are first
Access patterns are usually newest first.
HBase Filters are used to slice wide rows.
User Magazines
like:[userid] reflip:[new article id]
comment:[timestamp]
[userid]
Article
ID
JSON (who/when)
JSON (where it was
reflipped)
JSON (comment/person)
Social Activity
One cell per like, since you can only do it once per user
Can be many comment and reflip cells by one user per article
Alternative orderings can be computed from Elasticsearch indexes
User Magazines
magazine:<magazineid>
contributor:<magazine>:
<userid>
sha1(userid) JSON metadata JSON metadata
Multiple Contributors
Magazine CF contains magazines that user can share into.
Contributor CF contains user’s magazines that others are allowed to
share into.
User Magazines
<metric>:<day> <metric>_count
magID long count for day alltime count
Per magazine metrics live in Stats CF
Atomic increments for counters, both a per day count and a total count:
Total Articles
Contributors
etc.
User Magazines
<unique>:<day> <unique>_count
magID
HLL for individual
day
Premerged HLL
Unique readers kept in each magazine’s row as a serialized HyperLogLog
Allows for merging unique data over day ranges or displaying all time
Social Graph
follow:userid follower:userid stats:<counter>
sha1(userid)
JSON person that
I follow
JSON person that
follows me
long count of
followers/
following
Stored in friends table, follower/followers/stats CFs; metadata in MySQL
Alternative indexes in Elasticsearch
HBase Table Access Patterns
• Tables optimized for application access patterns (“design for the questions, not the
answers”)
• Fetching an individual magazine- collection table, magazine CF, [magazine ID] -> cell
• Fetching an individual article - article table, article:[article ID] cell
• Fetch an article’s stats - article table, article:stats cells
• Fetching a magazine’s articles: collection table, article CF, with cell limit and column
qualifier starts with magazine id
• Fetching a user’s magazines- collection table, magazine CF, [magazine ID] in the CQ
Client Stats
• Articles: sum(magazine stats:article_count for
each magazine)
• Magazines: count(collection:magazine) cells
• Followers: friends:stats:follower_count +
sum(magazine stats:subscriber_count for each
magazine)
More Client Stats
• Summary stats use counters from the article
table, detailed stats (who liked the article?)
read cells
• We can cache the feed of items, but the
stats/like state is calculated per user
• likes: article:stats:like_count
• reflips: article:stats:reflip_count
• comments: article:stats:comment_count
Even More Client Stats
AsyncHBase Usage
• Our fork adds column filters on wide rows- we’d like to get these upstream
• Stats requests require scatter/gather reads for several tables, sometimes over
multiple HBase clusters
• HBaseAsync requests are grouped into a single Deferred
• Most requests are a get on a single row, no multi row scans
• Most requests wait once until the results are returned or a deadline expires
• If data is returned late or HBase regions are not available, partial calculations are
allowed (we just display the stats we’ve got)
Handling HBase Failures
• Most patterns are read before write which causes early failure
• We can tolerate some data loss (atomic increments, vanity stats)
• Individual servers track inflight requests to HBase, slow puts and gets, and report
to Graphite
• Various levels of caching allow HBase recovery/region reassignment without end
users noticing
• Read Only mode - writes are stopped at the application layer
• Ability to switch to replica under duress
Current HBase Fleet
• 15 clusters
• ~100 tables
• ~250TB in HDFS
• ~250 RegionServers
• Busiest clusters: 100,000+ qps, 1000 regions
HBase Fleet, continued
• All in EC2 😳
• Nothing in VPC, yet
• Each cluster lives within an AZ
• 1 durable cluster doing cross-AZ HBase-level replication
• 1 cluster running Stargate (it works, but we’re not in love with it)
HBase History at Flipboard
Oldest current production instances launched in 2011
(This cluster is going away soon 😀)
HBase Version Distribution
• 0.90: dwindling, thankfully
• 0.94: Moved to Snappy; Stargate cluster for RSS storage; Python
writers, Go readers
• 0.96: First CDH5/Java 7/Ubuntu Precise clusters; magazines live in
one of these
• 0.98: pre-calculated user homefeeds, more
• 1.0, 1.1: Soon…
Which instance types?
• Started off with m1.xlarges for the 1.6TB of ephemeral (spinning)
disk; when we started using HBase, AWS didn’t have SSDs
• Moved to hi1.4xlarges (16 cores, 60GB RAM, 2x1TB local SSD)
• Moved to i2s (next-gen SSD instances, made for databases) as soon
as AWS let us launch them!
• ❤ i2s; some 2x, some 4x
AWS tips
• Use instance storage, not EBS
• Rely on HDFS to keep your bits replicated instead of using EBS
• Cross-AZ latency is minimal, but traffic is expensive!
• Push HDFS snapshots to S3 (we trigger this from Jenkins)
• If you jack up your network-related timeouts to handle AWS’ network
flakiness, your MTTR rises, so be careful…
• Upgrade often, you’ll get more sleep!
Clients
• Java, Scala- AsyncHBase, which we love. Added column filtering
for wide rows.
• Python + Go: protobufs over HTTP via Stargate, which works
• We use HAProxy everywhere, so we use that to load balance
requests to Stargate servers
What’s next for HBase at Flipboard?
• Moar HBase
• 1.0, now that CDH has it in 5.4.0; 1.1 when CDH gets it, hopefully soon…
• Region replicas (HBASE-10070) will help with use cases that can tolerate timeline consistency; 1.1
will have many improvements here
• Compaction throttling! (HBASE-8329)
• Java 8 + G1, Ubuntu Trusty, 3.13 kernel
• EC2 placement groups + VPC enhanced networking, once we’re in (no charge for these)
• HTrace (we use a little Zipkin, would love to get more HBase visibility)
• Multitenancy improvements in Apache HBase will help us put more customers on a cluster
Wish List $
HydraBase! (HBASE-12259)
Async client in Apache HBase so it keeps pace (HBASE-12684 is a
start!)
Native Go client for 0.96+
Thanks!
• Matt Blair (@mb on Flipboard, @mattyblair on Twitter)
• Jason Culverhouse (@jsonculverhouse on both)
• Sang Chi (@sangchi on Flipboard, @sandbreaker on Twitter)

More Related Content

What's hot

HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, PhotobucketHBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, PhotobucketCloudera, Inc.
 
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDBBenchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDBAthiq Ahamed
 
Hadoop and Cassandra at Rackspace
Hadoop and Cassandra at RackspaceHadoop and Cassandra at Rackspace
Hadoop and Cassandra at RackspaceStu Hood
 
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...Cloudera, Inc.
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalabilityjbellis
 
Apache Drill (ver. 0.1, check ver. 0.2)
Apache Drill (ver. 0.1, check ver. 0.2)Apache Drill (ver. 0.1, check ver. 0.2)
Apache Drill (ver. 0.1, check ver. 0.2)Camuel Gilyadov
 
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBaseHBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBaseHBaseCon
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetCloudera, Inc.
 
Digital Library Collection Management using HBase
Digital Library Collection Management using HBaseDigital Library Collection Management using HBase
Digital Library Collection Management using HBaseHBaseCon
 
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment
HBaseCon 2015: HBase at Scale in an Online and  High-Demand EnvironmentHBaseCon 2015: HBase at Scale in an Online and  High-Demand Environment
HBaseCon 2015: HBase at Scale in an Online and High-Demand EnvironmentHBaseCon
 
HBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on MesosHBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on MesosHBaseCon
 
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster Cloudera, Inc.
 
Partners in Crime: Cassandra Analytics and ETL with Hadoop
Partners in Crime: Cassandra Analytics and ETL with HadoopPartners in Crime: Cassandra Analytics and ETL with Hadoop
Partners in Crime: Cassandra Analytics and ETL with HadoopStu Hood
 
Introduction to AWS Big Data
Introduction to AWS Big Data Introduction to AWS Big Data
Introduction to AWS Big Data Omid Vahdaty
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesPhil Peace
 
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsightHBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsightHBaseCon
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseCloudera, Inc.
 
HBase: Extreme Makeover
HBase: Extreme MakeoverHBase: Extreme Makeover
HBase: Extreme MakeoverHBaseCon
 
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsightOptimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsightHBaseCon
 
HBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestHBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestCloudera, Inc.
 

What's hot (20)

HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, PhotobucketHBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
 
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDBBenchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
 
Hadoop and Cassandra at Rackspace
Hadoop and Cassandra at RackspaceHadoop and Cassandra at Rackspace
Hadoop and Cassandra at Rackspace
 
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalability
 
Apache Drill (ver. 0.1, check ver. 0.2)
Apache Drill (ver. 0.1, check ver. 0.2)Apache Drill (ver. 0.1, check ver. 0.2)
Apache Drill (ver. 0.1, check ver. 0.2)
 
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBaseHBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
 
Digital Library Collection Management using HBase
Digital Library Collection Management using HBaseDigital Library Collection Management using HBase
Digital Library Collection Management using HBase
 
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment
HBaseCon 2015: HBase at Scale in an Online and  High-Demand EnvironmentHBaseCon 2015: HBase at Scale in an Online and  High-Demand Environment
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment
 
HBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on MesosHBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on Mesos
 
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
 
Partners in Crime: Cassandra Analytics and ETL with Hadoop
Partners in Crime: Cassandra Analytics and ETL with HadoopPartners in Crime: Cassandra Analytics and ETL with Hadoop
Partners in Crime: Cassandra Analytics and ETL with Hadoop
 
Introduction to AWS Big Data
Introduction to AWS Big Data Introduction to AWS Big Data
Introduction to AWS Big Data
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsightHBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
 
HBase: Extreme Makeover
HBase: Extreme MakeoverHBase: Extreme Makeover
HBase: Extreme Makeover
 
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsightOptimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
 
HBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestHBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at Pinterest
 

Viewers also liked

Flipboard presentation
Flipboard presentationFlipboard presentation
Flipboard presentationbolu_alextaiwo
 
Contently Pitch Deck
Contently Pitch DeckContently Pitch Deck
Contently Pitch DeckRyan Gum
 
Pendo Series B Investor Deck External
Pendo Series B Investor Deck ExternalPendo Series B Investor Deck External
Pendo Series B Investor Deck ExternalTodd Olson
 
Tinder Pitch Deck
Tinder Pitch DeckTinder Pitch Deck
Tinder Pitch DeckRyan Gum
 
Airbnb Pitch Deck From 2008
Airbnb Pitch Deck From 2008Airbnb Pitch Deck From 2008
Airbnb Pitch Deck From 2008Ryan Gum
 
Intercom's first pitch deck!
Intercom's first pitch deck!Intercom's first pitch deck!
Intercom's first pitch deck!Eoghan McCabe
 
How Wealthsimple raised $2M in 2 weeks
How Wealthsimple raised $2M in 2 weeksHow Wealthsimple raised $2M in 2 weeks
How Wealthsimple raised $2M in 2 weeksWealthsimple
 
AdPushup Fundraising Deck - First Pitch
AdPushup Fundraising Deck - First PitchAdPushup Fundraising Deck - First Pitch
AdPushup Fundraising Deck - First Pitchadpushup
 
Zenpayroll Pitch Deck Template
Zenpayroll Pitch Deck TemplateZenpayroll Pitch Deck Template
Zenpayroll Pitch Deck TemplateJoseph Hsieh
 
The deck we used to raise $270k for our startup Castle
The deck we used to raise $270k for our startup CastleThe deck we used to raise $270k for our startup Castle
The deck we used to raise $270k for our startup Castleentercastle
 
AppVirality.com - Investor Pitch Deck
AppVirality.com - Investor Pitch DeckAppVirality.com - Investor Pitch Deck
AppVirality.com - Investor Pitch DeckLaxman Papineni
 
The 10 most interesting slides that helped our SaaS company raise 9 million
The 10 most interesting slides that helped our SaaS company raise 9 millionThe 10 most interesting slides that helped our SaaS company raise 9 million
The 10 most interesting slides that helped our SaaS company raise 9 millionGoCanvas
 
Swipes pitch deck for Beta Pitch 2013 Finals in Berlin
Swipes pitch deck for Beta Pitch 2013 Finals in BerlinSwipes pitch deck for Beta Pitch 2013 Finals in Berlin
Swipes pitch deck for Beta Pitch 2013 Finals in BerlinSwipes App
 
500’s Demo Day Batch 16 >> Podozi
500’s Demo Day Batch 16 >>  Podozi500’s Demo Day Batch 16 >>  Podozi
500’s Demo Day Batch 16 >> Podozi500 Startups
 
Fittr Pitch Deck
Fittr Pitch DeckFittr Pitch Deck
Fittr Pitch Decknolanperk
 
Mattermark 2nd (Final) Series A Deck
Mattermark 2nd (Final) Series A DeckMattermark 2nd (Final) Series A Deck
Mattermark 2nd (Final) Series A DeckDanielle Morrill
 

Viewers also liked (20)

Flipboard presentation
Flipboard presentationFlipboard presentation
Flipboard presentation
 
BuzzFeed Pitch Deck
BuzzFeed Pitch DeckBuzzFeed Pitch Deck
BuzzFeed Pitch Deck
 
Contently Pitch Deck
Contently Pitch DeckContently Pitch Deck
Contently Pitch Deck
 
Pendo Series B Investor Deck External
Pendo Series B Investor Deck ExternalPendo Series B Investor Deck External
Pendo Series B Investor Deck External
 
Tinder Pitch Deck
Tinder Pitch DeckTinder Pitch Deck
Tinder Pitch Deck
 
Airbnb Pitch Deck From 2008
Airbnb Pitch Deck From 2008Airbnb Pitch Deck From 2008
Airbnb Pitch Deck From 2008
 
Intercom's first pitch deck!
Intercom's first pitch deck!Intercom's first pitch deck!
Intercom's first pitch deck!
 
Front series A deck
Front series A deckFront series A deck
Front series A deck
 
How Wealthsimple raised $2M in 2 weeks
How Wealthsimple raised $2M in 2 weeksHow Wealthsimple raised $2M in 2 weeks
How Wealthsimple raised $2M in 2 weeks
 
AdPushup Fundraising Deck - First Pitch
AdPushup Fundraising Deck - First PitchAdPushup Fundraising Deck - First Pitch
AdPushup Fundraising Deck - First Pitch
 
Zenpayroll Pitch Deck Template
Zenpayroll Pitch Deck TemplateZenpayroll Pitch Deck Template
Zenpayroll Pitch Deck Template
 
The deck we used to raise $270k for our startup Castle
The deck we used to raise $270k for our startup CastleThe deck we used to raise $270k for our startup Castle
The deck we used to raise $270k for our startup Castle
 
SteadyBudget's Seed Funding Pitch Deck
SteadyBudget's Seed Funding Pitch DeckSteadyBudget's Seed Funding Pitch Deck
SteadyBudget's Seed Funding Pitch Deck
 
AppVirality.com - Investor Pitch Deck
AppVirality.com - Investor Pitch DeckAppVirality.com - Investor Pitch Deck
AppVirality.com - Investor Pitch Deck
 
The 10 most interesting slides that helped our SaaS company raise 9 million
The 10 most interesting slides that helped our SaaS company raise 9 millionThe 10 most interesting slides that helped our SaaS company raise 9 million
The 10 most interesting slides that helped our SaaS company raise 9 million
 
Swipes pitch deck for Beta Pitch 2013 Finals in Berlin
Swipes pitch deck for Beta Pitch 2013 Finals in BerlinSwipes pitch deck for Beta Pitch 2013 Finals in Berlin
Swipes pitch deck for Beta Pitch 2013 Finals in Berlin
 
500’s Demo Day Batch 16 >> Podozi
500’s Demo Day Batch 16 >>  Podozi500’s Demo Day Batch 16 >>  Podozi
500’s Demo Day Batch 16 >> Podozi
 
Square Pitch Deck
Square Pitch DeckSquare Pitch Deck
Square Pitch Deck
 
Fittr Pitch Deck
Fittr Pitch DeckFittr Pitch Deck
Fittr Pitch Deck
 
Mattermark 2nd (Final) Series A Deck
Mattermark 2nd (Final) Series A DeckMattermark 2nd (Final) Series A Deck
Mattermark 2nd (Final) Series A Deck
 

Similar to HBaseCon 2015- HBase @ Flipboard

HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practicelarsgeorge
 
Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012Chris Huang
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars GeorgeJAX London
 
Introduction to Apache HBase
Introduction to Apache HBaseIntroduction to Apache HBase
Introduction to Apache HBaseGokuldas Pillai
 
Need for Time series Database
Need for Time series DatabaseNeed for Time series Database
Need for Time series DatabasePramit Choudhary
 
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"Inhacking
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars GeorgeJAX London
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics PlatformN Masahiro
 
Thug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen ZhangThug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen ZhangChen Zhang
 
Schema Design
Schema DesignSchema Design
Schema DesignQBurst
 
Hbasepreso 111116185419-phpapp02
Hbasepreso 111116185419-phpapp02Hbasepreso 111116185419-phpapp02
Hbasepreso 111116185419-phpapp02Gokuldas Pillai
 
Unit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptxUnit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptxBhavanaHotchandani
 

Similar to HBaseCon 2015- HBase @ Flipboard (20)

HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
Apache HBase Workshop
Apache HBase WorkshopApache HBase Workshop
Apache HBase Workshop
 
Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
 
Introduction to Apache HBase
Introduction to Apache HBaseIntroduction to Apache HBase
Introduction to Apache HBase
 
Need for Time series Database
Need for Time series DatabaseNeed for Time series Database
Need for Time series Database
 
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
 
Valerii Moisieienko Apache hbase workshop
Valerii Moisieienko	Apache hbase workshopValerii Moisieienko	Apache hbase workshop
Valerii Moisieienko Apache hbase workshop
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Apache hadoop hbase
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbase
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
Thug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen ZhangThug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen Zhang
 
Schema Design
Schema DesignSchema Design
Schema Design
 
HBase ArcheTypes
HBase ArcheTypesHBase ArcheTypes
HBase ArcheTypes
 
Hbasepreso 111116185419-phpapp02
Hbasepreso 111116185419-phpapp02Hbasepreso 111116185419-phpapp02
Hbasepreso 111116185419-phpapp02
 
Unit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptxUnit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptx
 

Recently uploaded

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Recently uploaded (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

HBaseCon 2015- HBase @ Flipboard

  • 3.
  • 7. Scale 100+ million users 250,000+ new users/day 15+ million magazines
  • 8. Why HBase? • Transferrable Hadoop operational expertise • MapReduce! • Write throughput • Better elasticity than MySQL, our other primary data store • Strong consistency • Column-oriented, as opposed to simple K/V
  • 9. What do we use it for? • User-generated magazines, likes, comments • Vanity metrics for those magazines- daily + all-time counters/HLLs • Follow graph • RSS feeds • More and more every day…
  • 10. Magazine Storage • Stored in a single HBase table • Magazines live in one column family (“magazine”) • Articles in temporal order in another CF (“article”) • Logically, everything shared is tagged with magazine ID (prefix compression helps here) • Makes the calculation of everything a user shared efficient
  • 11. User Magazines magazine:<magazineid> magazine:<magazineid> magazine:<magazineid> sha1(userid) MagazineData (serialized JSON) MagazineData (serialized JSON) MagazineData (serialized JSON) Magazine CF of Collection Table Listing magazines a user has created is a single read Data is stored in serialized JSON for language interoperability but is parsed and serialized by plain old java objects
  • 12. User Magazines [Reverse Unix TS]: [magazineid] [Reverse Unix TS]: [magazineid] sha1(userid) Article Data(Serialized JSON) Article Data(Serialized JSON) Articles CF of Collection Table Kept in temporal order so that most recently shared articles are first Access patterns are usually newest first. HBase Filters are used to slice wide rows.
  • 13. User Magazines like:[userid] reflip:[new article id] comment:[timestamp] [userid] Article ID JSON (who/when) JSON (where it was reflipped) JSON (comment/person) Social Activity One cell per like, since you can only do it once per user Can be many comment and reflip cells by one user per article Alternative orderings can be computed from Elasticsearch indexes
  • 14. User Magazines magazine:<magazineid> contributor:<magazine>: <userid> sha1(userid) JSON metadata JSON metadata Multiple Contributors Magazine CF contains magazines that user can share into. Contributor CF contains user’s magazines that others are allowed to share into.
  • 15. User Magazines <metric>:<day> <metric>_count magID long count for day alltime count Per magazine metrics live in Stats CF Atomic increments for counters, both a per day count and a total count: Total Articles Contributors etc.
  • 16. User Magazines <unique>:<day> <unique>_count magID HLL for individual day Premerged HLL Unique readers kept in each magazine’s row as a serialized HyperLogLog Allows for merging unique data over day ranges or displaying all time
  • 17. Social Graph follow:userid follower:userid stats:<counter> sha1(userid) JSON person that I follow JSON person that follows me long count of followers/ following Stored in friends table, follower/followers/stats CFs; metadata in MySQL Alternative indexes in Elasticsearch
  • 18. HBase Table Access Patterns • Tables optimized for application access patterns (“design for the questions, not the answers”) • Fetching an individual magazine- collection table, magazine CF, [magazine ID] -> cell • Fetching an individual article - article table, article:[article ID] cell • Fetch an article’s stats - article table, article:stats cells • Fetching a magazine’s articles: collection table, article CF, with cell limit and column qualifier starts with magazine id • Fetching a user’s magazines- collection table, magazine CF, [magazine ID] in the CQ
  • 19. Client Stats • Articles: sum(magazine stats:article_count for each magazine) • Magazines: count(collection:magazine) cells • Followers: friends:stats:follower_count + sum(magazine stats:subscriber_count for each magazine)
  • 20. More Client Stats • Summary stats use counters from the article table, detailed stats (who liked the article?) read cells • We can cache the feed of items, but the stats/like state is calculated per user • likes: article:stats:like_count • reflips: article:stats:reflip_count • comments: article:stats:comment_count
  • 22. AsyncHBase Usage • Our fork adds column filters on wide rows- we’d like to get these upstream • Stats requests require scatter/gather reads for several tables, sometimes over multiple HBase clusters • HBaseAsync requests are grouped into a single Deferred • Most requests are a get on a single row, no multi row scans • Most requests wait once until the results are returned or a deadline expires • If data is returned late or HBase regions are not available, partial calculations are allowed (we just display the stats we’ve got)
  • 23. Handling HBase Failures • Most patterns are read before write which causes early failure • We can tolerate some data loss (atomic increments, vanity stats) • Individual servers track inflight requests to HBase, slow puts and gets, and report to Graphite • Various levels of caching allow HBase recovery/region reassignment without end users noticing • Read Only mode - writes are stopped at the application layer • Ability to switch to replica under duress
  • 24. Current HBase Fleet • 15 clusters • ~100 tables • ~250TB in HDFS • ~250 RegionServers • Busiest clusters: 100,000+ qps, 1000 regions
  • 25. HBase Fleet, continued • All in EC2 😳 • Nothing in VPC, yet • Each cluster lives within an AZ • 1 durable cluster doing cross-AZ HBase-level replication • 1 cluster running Stargate (it works, but we’re not in love with it)
  • 26. HBase History at Flipboard Oldest current production instances launched in 2011
  • 27. (This cluster is going away soon 😀)
  • 28. HBase Version Distribution • 0.90: dwindling, thankfully • 0.94: Moved to Snappy; Stargate cluster for RSS storage; Python writers, Go readers • 0.96: First CDH5/Java 7/Ubuntu Precise clusters; magazines live in one of these • 0.98: pre-calculated user homefeeds, more • 1.0, 1.1: Soon…
  • 29. Which instance types? • Started off with m1.xlarges for the 1.6TB of ephemeral (spinning) disk; when we started using HBase, AWS didn’t have SSDs • Moved to hi1.4xlarges (16 cores, 60GB RAM, 2x1TB local SSD) • Moved to i2s (next-gen SSD instances, made for databases) as soon as AWS let us launch them! • ❤ i2s; some 2x, some 4x
  • 30. AWS tips • Use instance storage, not EBS • Rely on HDFS to keep your bits replicated instead of using EBS • Cross-AZ latency is minimal, but traffic is expensive! • Push HDFS snapshots to S3 (we trigger this from Jenkins) • If you jack up your network-related timeouts to handle AWS’ network flakiness, your MTTR rises, so be careful… • Upgrade often, you’ll get more sleep!
  • 31. Clients • Java, Scala- AsyncHBase, which we love. Added column filtering for wide rows. • Python + Go: protobufs over HTTP via Stargate, which works • We use HAProxy everywhere, so we use that to load balance requests to Stargate servers
  • 32. What’s next for HBase at Flipboard? • Moar HBase • 1.0, now that CDH has it in 5.4.0; 1.1 when CDH gets it, hopefully soon… • Region replicas (HBASE-10070) will help with use cases that can tolerate timeline consistency; 1.1 will have many improvements here • Compaction throttling! (HBASE-8329) • Java 8 + G1, Ubuntu Trusty, 3.13 kernel • EC2 placement groups + VPC enhanced networking, once we’re in (no charge for these) • HTrace (we use a little Zipkin, would love to get more HBase visibility) • Multitenancy improvements in Apache HBase will help us put more customers on a cluster
  • 33. Wish List $ HydraBase! (HBASE-12259) Async client in Apache HBase so it keeps pace (HBASE-12684 is a start!) Native Go client for 0.96+
  • 34. Thanks! • Matt Blair (@mb on Flipboard, @mattyblair on Twitter) • Jason Culverhouse (@jsonculverhouse on both) • Sang Chi (@sangchi on Flipboard, @sandbreaker on Twitter)