SlideShare a Scribd company logo
HBase In
Production
Hey we’re hiring!
Contents
● Bronto Overview
● HBase Architecture
● Operations
● Table Design
● Questions?
Bronto Overview
Bronto Software provides a cloud-based
marketing platform for organizations to drive
revenue through their email, mobile and social
campaigns
Bronto Contd.
● ESP for E-Commerce retailers
● Our customers are marketers
● Charts, graphs, reports
● Market segmentation
● Automation
● We are also hiring
Where We Use HBase
● High volume scenarios
● Realtime data
● Batch processing
● HDFS staging area
● Sorting/Indexing not a priority
○ We are working on this
HBase Overview
● Implementation of Google’s BigTable
● Sparse, sorted, versioned map
● Built on top of HDFS
● Row level ACID
● Get, Put, Scan
● Assorted RMW operations
Tables Overview
Tables are sorted (lexicographically) key value
pairs of uninterpreted byte[]s. Keyspace is
divided up into regions of keys. Each region is
hosted by exactly one machine.
R3R1
Server 1
Key Value
a byte[]
aa byte[]
b byte[]
bb byte[]
c byte[]
ca byte[]
R1: [a, b)
R2: [b, c)
R3: [c, d)
R2
Server 1
Table Overview
Operations
● Layers of complexity
● Normal failure modes
○ Hardware dies (or combust)
○ Human error
● JVM
● HDFS considerations
● Lots of knobs
Cascading Failure
1. High write volume fragments heap
2. GC promotion failure
3. Stop the world GC
4. ZK timeout
5. Receive YouAreDeadException, die
6. Failover
7. Goto 1
Useful Tunings
● MSLAB enabled
● hbase.regionserver.handler.count
○ Increasing puts more IO load on RS
○ 50 is our sweet spot
● JVM tuning
○ UseConcMarkSweepGC
○ UseParNewGC
Monitoring Tools
● Nagios for hardware checks
● Cloudera Manager
○ Reporting and health checks
○ Apache Ambari and MapR provide similar tools
● Hannibal + custom scripts
○ Identify hot regions for splitting
Table Design
● Table design is deceptively simple
● Main Considerations:
○ Row key structure
○ Number of column families
● Know your queries in advance
Additional Context
● SAAS environment
○ “Twitter clone” model won’t work
● Thousands of users millions, of attributes
● Skewed customer base
○ Biggest clients have 10MM+ contacts
○ Smallest have thousands
Row Keys
● Most important decision
● The only (native) index in HBase
● Random reads and writes are fast
○ Sorted on disk and in memory
○ Bloom filters speed read performance (not in use)
Hotspotting
● Associated with monotonically increasing
keys
○ MySql AUTO_INCREMENT
● Writes lock onto one region at a time
● Consequences:
○ Flush and compaction storms
○ $500K cluster limited by $10K machine
Row Key Advice
● Read/Write ratio should drive design
○ We pay a write time penalty for faster reads
● Identify queries you need to support
● Consider composite keys instead of indexes
● Bucketed/Salted keys are an option
○ Distribute writes across N buckets
○ Rebucketing is difficult
○ Requires N reads, slow workers
Variable Width Keys
customer_hash::email
● Allows scans for a single customer
● Hashed id distributes customers
● Sorted by email address
○ Could also use reverse domain for gmail, yahoo, etc.
Fixed Width Keys
site::contact::create::email
● FuzzyRowFilter
○ Can fix site, contact, and reverse_create
○ Can search for any email address
○ Could use a fixed width encoding for domain
■ Search for just gmail, yahoo, etc
● Distributes sites and users
● Contacts sorted by create date
Column Families
● Groupings of named columns
● Versioning, compression, TTL
● Different than BigTable
○ BigTable: 100s
○ HBase: 1 or 2
Column Family Example
Id d {VERSIONS => 2} s7 {TTL => 604800}
a (address) p (phone) o:3-27 (open) c:3-20 (click)
dfajkdh byte[] byte[]:555-5555 byte[]
hnvdzu9 byte[]:1234 St. XXXX
hnvdzu9 byte[]:1233 St.
hnvdzu9 XXXX byte[]
er9asyjk byte[]: 324 Ave
Column Family Example
● PROTIP: Keep CF and qualifier names short
○ They are repeated on disk for every cell
● “d” supports 2 versions of each column, maps to demographics
● “s7” has seven day TTL, maps to stats kept for 7 days.
MemStore
HDFS
s2s1 s3
f1
Column Families In Depth
MemStore
HDFS
s2s1
f2
my_table,,1328551097416.12921
bbc0c91869f88ba6a044a6a1c50.
● StoreFile(s) for each CF
in region
● Sparse
● One memstore per CF
○ Must flush together
● Compactions happen at
region level
(Region)
(family) (family)
Compactions
● Rewrites StoreFiles
○ Improves read performance
○ IO Intensive
● Region scope
● Used to take > 50 hours
● Custom script took it down to 18
○ Can (theoretically) run during the day
MemStore
HDFS
S1
f1
my_table,,
1328551097416.12921
bbc0c91869f88ba6a044a6a1c5
0.
(Region)
MemStore
HDFS
s
2
s
1 s3
f1
my_table,,1328551097416.12921
bbc0c91869f88ba6a044a6a1c50.
(Region)
Compaction Before and After
s4 s5 s6
Before After
K-Way
Merge
The Table From Hell
● 19 Column Families
● 60% of our region count
● Skewed write pattern
○ KB size store files
○ Frequent compaction storms
○ hbase.hstore.compaction.min.size (HBASE-5461)
● Moved to it’s own cluster
And yet...
● Cluster remained operational
○ Table is still in use today
● Met read and write demand
● Regions only briefly active
○ Rowkeys by date and customer
What saved us
● Keyed by customer and date
● Effectively write once
○ Kept “active” region count low
● Custom compaction script
○ Skipped old regions
● More hardware
● Were able to selectively migrate
Column Family Advice
● Bad choice for fine grained partitioning
● Good for
○ Similarly typed data
○ Varying versioning/retention requirements
● Prefer intra row scans
○ CF and qualifiers are sorted
○ ColumnRangeFilter
Questions?

More Related Content

What's hot

Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)
Eric Evans
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
Dori Waldman
 
HBase, dances on the elephant back.
HBase, dances on the elephant back.HBase, dances on the elephant back.
HBase, dances on the elephant back.
Roman Nikitchenko
 
Effectively deploying hadoop to the cloud
Effectively  deploying hadoop to the cloudEffectively  deploying hadoop to the cloud
Effectively deploying hadoop to the cloud
Avinash Ramineni
 
Datomic rtree-pres
Datomic rtree-presDatomic rtree-pres
Datomic rtree-presjsofra
 
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GRGlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
Theophanis Kontogiannis
 
RubiX
RubiXRubiX
Davraz - A graph visualization and exploration software.
Davraz - A graph visualization and exploration software.Davraz - A graph visualization and exploration software.
Davraz - A graph visualization and exploration software.
TigerGraph
 
Sharding - patterns & antipatterns, Константин Осипов, Алексей Рыбак
Sharding -  patterns & antipatterns, Константин Осипов, Алексей РыбакSharding -  patterns & antipatterns, Константин Осипов, Алексей Рыбак
Sharding - patterns & antipatterns, Константин Осипов, Алексей Рыбак
Ontico
 
Hadoop at datasift
Hadoop at datasiftHadoop at datasift
Hadoop at datasift
Jairam Chandar
 
Hadoop at datasift
Hadoop at datasiftHadoop at datasift
Hadoop at datasift
Jairam Chandar
 
Alluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudAlluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the Cloud
Shubham Tagra
 
Tms training
Tms trainingTms training
Tms training
Chi Lee
 
A journey through cosmos - 5th el
A journey through cosmos - 5th elA journey through cosmos - 5th el
A journey through cosmos - 5th el
AvinashRamakanth
 
In-memory database
In-memory databaseIn-memory database
In-memory database
Chien Nguyen Dang
 
Falando de MySQL
Falando de MySQLFalando de MySQL
Falando de MySQL
Márcio Junior
 
Improve Presto Architectural Decisions with Shadow Cache
 Improve Presto Architectural Decisions with Shadow Cache Improve Presto Architectural Decisions with Shadow Cache
Improve Presto Architectural Decisions with Shadow Cache
Alluxio, Inc.
 
Redis Overview
Redis OverviewRedis Overview
Redis Overview
Hoang Long
 
Thrashing allocation frames.43
Thrashing allocation frames.43Thrashing allocation frames.43
Thrashing allocation frames.43myrajendra
 

What's hot (19)

Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
 
HBase, dances on the elephant back.
HBase, dances on the elephant back.HBase, dances on the elephant back.
HBase, dances on the elephant back.
 
Effectively deploying hadoop to the cloud
Effectively  deploying hadoop to the cloudEffectively  deploying hadoop to the cloud
Effectively deploying hadoop to the cloud
 
Datomic rtree-pres
Datomic rtree-presDatomic rtree-pres
Datomic rtree-pres
 
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GRGlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
 
RubiX
RubiXRubiX
RubiX
 
Davraz - A graph visualization and exploration software.
Davraz - A graph visualization and exploration software.Davraz - A graph visualization and exploration software.
Davraz - A graph visualization and exploration software.
 
Sharding - patterns & antipatterns, Константин Осипов, Алексей Рыбак
Sharding -  patterns & antipatterns, Константин Осипов, Алексей РыбакSharding -  patterns & antipatterns, Константин Осипов, Алексей Рыбак
Sharding - patterns & antipatterns, Константин Осипов, Алексей Рыбак
 
Hadoop at datasift
Hadoop at datasiftHadoop at datasift
Hadoop at datasift
 
Hadoop at datasift
Hadoop at datasiftHadoop at datasift
Hadoop at datasift
 
Alluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudAlluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the Cloud
 
Tms training
Tms trainingTms training
Tms training
 
A journey through cosmos - 5th el
A journey through cosmos - 5th elA journey through cosmos - 5th el
A journey through cosmos - 5th el
 
In-memory database
In-memory databaseIn-memory database
In-memory database
 
Falando de MySQL
Falando de MySQLFalando de MySQL
Falando de MySQL
 
Improve Presto Architectural Decisions with Shadow Cache
 Improve Presto Architectural Decisions with Shadow Cache Improve Presto Architectural Decisions with Shadow Cache
Improve Presto Architectural Decisions with Shadow Cache
 
Redis Overview
Redis OverviewRedis Overview
Redis Overview
 
Thrashing allocation frames.43
Thrashing allocation frames.43Thrashing allocation frames.43
Thrashing allocation frames.43
 

Viewers also liked

HBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table designHBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table design
phanleson
 
Hbase at Salesforce.com
Hbase at Salesforce.comHbase at Salesforce.com
Hbase at Salesforce.com
Salesforce Engineering
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
Carol McDonald
 
Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?
Daniel Abadi
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guide
larsgeorge
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best Practices
Venu Anuganti
 
[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming Overview[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming Overview
Stratio
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
DataWorks Summit/Hadoop Summit
 
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseApache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAse
enissoz
 
Intro to column stores
Intro to column storesIntro to column stores
Intro to column stores
Justin Swanhart
 
Introduction to column oriented databases
Introduction to column oriented databasesIntroduction to column oriented databases
Introduction to column oriented databasesArangoDB Database
 
Apache Phoenix + Apache HBase
Apache Phoenix + Apache HBaseApache Phoenix + Apache HBase
Apache Phoenix + Apache HBase
DataWorks Summit/Hadoop Summit
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
Anil Gupta
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
HBaseCon
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
Nick Dimiduk
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)
alexbaranau
 
Rise of Column Oriented Database
Rise of Column Oriented DatabaseRise of Column Oriented Database
Rise of Column Oriented Database
Suvradeep Rudra
 
Hbase: Introduction to column oriented databases
Hbase: Introduction to column oriented databasesHbase: Introduction to column oriented databases
Hbase: Introduction to column oriented databases
Luis Cipriani
 

Viewers also liked (20)

HBASE Overview
HBASE OverviewHBASE Overview
HBASE Overview
 
HBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table designHBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table design
 
Hbase at Salesforce.com
Hbase at Salesforce.comHbase at Salesforce.com
Hbase at Salesforce.com
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guide
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best Practices
 
[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming Overview[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming Overview
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
 
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseApache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAse
 
Intro to column stores
Intro to column storesIntro to column stores
Intro to column stores
 
Introduction to column oriented databases
Introduction to column oriented databasesIntroduction to column oriented databases
Introduction to column oriented databases
 
Apache Phoenix + Apache HBase
Apache Phoenix + Apache HBaseApache Phoenix + Apache HBase
Apache Phoenix + Apache HBase
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)
 
Rise of Column Oriented Database
Rise of Column Oriented DatabaseRise of Column Oriented Database
Rise of Column Oriented Database
 
Hbase: Introduction to column oriented databases
Hbase: Introduction to column oriented databasesHbase: Introduction to column oriented databases
Hbase: Introduction to column oriented databases
 

Similar to TriHUG 3/14: HBase in Production

Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
ScyllaDB
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
ScyllaDB
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDB
jhugg
 
Understanding the architecture of MariaDB ColumnStore
Understanding the architecture of MariaDB ColumnStoreUnderstanding the architecture of MariaDB ColumnStore
Understanding the architecture of MariaDB ColumnStore
MariaDB plc
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Amazon Web Services
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
Amazon Web Services
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Amazon Web Services
 
SRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDBSRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDB
Amazon Web Services
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Amazon Web Services
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
Application Caching: The Hidden Microservice
Application Caching: The Hidden MicroserviceApplication Caching: The Hidden Microservice
Application Caching: The Hidden Microservice
Scott Mansfield
 
Cassandra in production
Cassandra in productionCassandra in production
Cassandra in production
valstadsve
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
Amazon Web Services
 
Amazon Redshift
Amazon RedshiftAmazon Redshift
Amazon Redshift
Jeff Patti
 
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
Omid Vahdaty
 
Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20
Jelena Zanko
 
Introduction to NoSql
Introduction to NoSqlIntroduction to NoSql
Introduction to NoSql
Omid Vahdaty
 
NoSQL Solutions - a comparative study
NoSQL Solutions - a comparative studyNoSQL Solutions - a comparative study
NoSQL Solutions - a comparative studyGuillaume Lefranc
 
Amazon DynamoDB 深入探討
Amazon DynamoDB 深入探討Amazon DynamoDB 深入探討
Amazon DynamoDB 深入探討
Amazon Web Services
 
Application Caching: The Hidden Microservice (SAConf)
Application Caching: The Hidden Microservice (SAConf)Application Caching: The Hidden Microservice (SAConf)
Application Caching: The Hidden Microservice (SAConf)
Scott Mansfield
 

Similar to TriHUG 3/14: HBase in Production (20)

Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDB
 
Understanding the architecture of MariaDB ColumnStore
Understanding the architecture of MariaDB ColumnStoreUnderstanding the architecture of MariaDB ColumnStore
Understanding the architecture of MariaDB ColumnStore
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
SRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDBSRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDB
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Application Caching: The Hidden Microservice
Application Caching: The Hidden MicroserviceApplication Caching: The Hidden Microservice
Application Caching: The Hidden Microservice
 
Cassandra in production
Cassandra in productionCassandra in production
Cassandra in production
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
Amazon Redshift
Amazon RedshiftAmazon Redshift
Amazon Redshift
 
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
 
Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20
 
Introduction to NoSql
Introduction to NoSqlIntroduction to NoSql
Introduction to NoSql
 
NoSQL Solutions - a comparative study
NoSQL Solutions - a comparative studyNoSQL Solutions - a comparative study
NoSQL Solutions - a comparative study
 
Amazon DynamoDB 深入探討
Amazon DynamoDB 深入探討Amazon DynamoDB 深入探討
Amazon DynamoDB 深入探討
 
Application Caching: The Hidden Microservice (SAConf)
Application Caching: The Hidden Microservice (SAConf)Application Caching: The Hidden Microservice (SAConf)
Application Caching: The Hidden Microservice (SAConf)
 

More from trihug

TriHUG October: Apache Ranger
TriHUG October: Apache RangerTriHUG October: Apache Ranger
TriHUG October: Apache Ranger
trihug
 
TriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparkTriHUG Feb: Hive on spark
TriHUG Feb: Hive on spark
trihug
 
TriHUG 2/14: Apache Sentry
TriHUG 2/14: Apache SentryTriHUG 2/14: Apache Sentry
TriHUG 2/14: Apache Sentry
trihug
 
TriHUG talk on Spark and Shark
TriHUG talk on Spark and SharkTriHUG talk on Spark and Shark
TriHUG talk on Spark and Shark
trihug
 
Impala presentation
Impala presentationImpala presentation
Impala presentation
trihug
 
Practical pig
Practical pigPractical pig
Practical pig
trihug
 
Financial services trihug
Financial services trihugFinancial services trihug
Financial services trihug
trihug
 
TriHUG January 2012 Talk by Chris Shain
TriHUG January 2012 Talk by Chris ShainTriHUG January 2012 Talk by Chris Shain
TriHUG January 2012 Talk by Chris Shain
trihug
 
TriHUG November HCatalog Talk by Alan Gates
TriHUG November HCatalog Talk by Alan GatesTriHUG November HCatalog Talk by Alan Gates
TriHUG November HCatalog Talk by Alan Gatestrihug
 
TriHUG November Pig Talk by Alan Gates
TriHUG November Pig Talk by Alan GatesTriHUG November Pig Talk by Alan Gates
TriHUG November Pig Talk by Alan Gatestrihug
 
MapR, Implications for Integration
MapR, Implications for IntegrationMapR, Implications for Integration
MapR, Implications for Integration
trihug
 

More from trihug (11)

TriHUG October: Apache Ranger
TriHUG October: Apache RangerTriHUG October: Apache Ranger
TriHUG October: Apache Ranger
 
TriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparkTriHUG Feb: Hive on spark
TriHUG Feb: Hive on spark
 
TriHUG 2/14: Apache Sentry
TriHUG 2/14: Apache SentryTriHUG 2/14: Apache Sentry
TriHUG 2/14: Apache Sentry
 
TriHUG talk on Spark and Shark
TriHUG talk on Spark and SharkTriHUG talk on Spark and Shark
TriHUG talk on Spark and Shark
 
Impala presentation
Impala presentationImpala presentation
Impala presentation
 
Practical pig
Practical pigPractical pig
Practical pig
 
Financial services trihug
Financial services trihugFinancial services trihug
Financial services trihug
 
TriHUG January 2012 Talk by Chris Shain
TriHUG January 2012 Talk by Chris ShainTriHUG January 2012 Talk by Chris Shain
TriHUG January 2012 Talk by Chris Shain
 
TriHUG November HCatalog Talk by Alan Gates
TriHUG November HCatalog Talk by Alan GatesTriHUG November HCatalog Talk by Alan Gates
TriHUG November HCatalog Talk by Alan Gates
 
TriHUG November Pig Talk by Alan Gates
TriHUG November Pig Talk by Alan GatesTriHUG November Pig Talk by Alan Gates
TriHUG November Pig Talk by Alan Gates
 
MapR, Implications for Integration
MapR, Implications for IntegrationMapR, Implications for Integration
MapR, Implications for Integration
 

Recently uploaded

1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
3ipehhoa
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
Gal Baras
 
Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
GTProductions1
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
Rogerio Filho
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
Arif0071
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
JeyaPerumal1
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Sanjeev Rampal
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
natyesu
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
laozhuseo02
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
JungkooksNonexistent
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
ufdana
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
Javier Lasa
 
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Brad Spiegel Macon GA
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
eutxy
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
laozhuseo02
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
keoku
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
3ipehhoa
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
3ipehhoa
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
nirahealhty
 

Recently uploaded (20)

1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
 
Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
 
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
 

TriHUG 3/14: HBase in Production

  • 2. Contents ● Bronto Overview ● HBase Architecture ● Operations ● Table Design ● Questions?
  • 3. Bronto Overview Bronto Software provides a cloud-based marketing platform for organizations to drive revenue through their email, mobile and social campaigns
  • 4. Bronto Contd. ● ESP for E-Commerce retailers ● Our customers are marketers ● Charts, graphs, reports ● Market segmentation ● Automation ● We are also hiring
  • 5. Where We Use HBase ● High volume scenarios ● Realtime data ● Batch processing ● HDFS staging area ● Sorting/Indexing not a priority ○ We are working on this
  • 6. HBase Overview ● Implementation of Google’s BigTable ● Sparse, sorted, versioned map ● Built on top of HDFS ● Row level ACID ● Get, Put, Scan ● Assorted RMW operations
  • 7. Tables Overview Tables are sorted (lexicographically) key value pairs of uninterpreted byte[]s. Keyspace is divided up into regions of keys. Each region is hosted by exactly one machine.
  • 8. R3R1 Server 1 Key Value a byte[] aa byte[] b byte[] bb byte[] c byte[] ca byte[] R1: [a, b) R2: [b, c) R3: [c, d) R2 Server 1 Table Overview
  • 9. Operations ● Layers of complexity ● Normal failure modes ○ Hardware dies (or combust) ○ Human error ● JVM ● HDFS considerations ● Lots of knobs
  • 10. Cascading Failure 1. High write volume fragments heap 2. GC promotion failure 3. Stop the world GC 4. ZK timeout 5. Receive YouAreDeadException, die 6. Failover 7. Goto 1
  • 11. Useful Tunings ● MSLAB enabled ● hbase.regionserver.handler.count ○ Increasing puts more IO load on RS ○ 50 is our sweet spot ● JVM tuning ○ UseConcMarkSweepGC ○ UseParNewGC
  • 12. Monitoring Tools ● Nagios for hardware checks ● Cloudera Manager ○ Reporting and health checks ○ Apache Ambari and MapR provide similar tools ● Hannibal + custom scripts ○ Identify hot regions for splitting
  • 13. Table Design ● Table design is deceptively simple ● Main Considerations: ○ Row key structure ○ Number of column families ● Know your queries in advance
  • 14. Additional Context ● SAAS environment ○ “Twitter clone” model won’t work ● Thousands of users millions, of attributes ● Skewed customer base ○ Biggest clients have 10MM+ contacts ○ Smallest have thousands
  • 15. Row Keys ● Most important decision ● The only (native) index in HBase ● Random reads and writes are fast ○ Sorted on disk and in memory ○ Bloom filters speed read performance (not in use)
  • 16. Hotspotting ● Associated with monotonically increasing keys ○ MySql AUTO_INCREMENT ● Writes lock onto one region at a time ● Consequences: ○ Flush and compaction storms ○ $500K cluster limited by $10K machine
  • 17. Row Key Advice ● Read/Write ratio should drive design ○ We pay a write time penalty for faster reads ● Identify queries you need to support ● Consider composite keys instead of indexes ● Bucketed/Salted keys are an option ○ Distribute writes across N buckets ○ Rebucketing is difficult ○ Requires N reads, slow workers
  • 18. Variable Width Keys customer_hash::email ● Allows scans for a single customer ● Hashed id distributes customers ● Sorted by email address ○ Could also use reverse domain for gmail, yahoo, etc.
  • 19. Fixed Width Keys site::contact::create::email ● FuzzyRowFilter ○ Can fix site, contact, and reverse_create ○ Can search for any email address ○ Could use a fixed width encoding for domain ■ Search for just gmail, yahoo, etc ● Distributes sites and users ● Contacts sorted by create date
  • 20. Column Families ● Groupings of named columns ● Versioning, compression, TTL ● Different than BigTable ○ BigTable: 100s ○ HBase: 1 or 2
  • 21. Column Family Example Id d {VERSIONS => 2} s7 {TTL => 604800} a (address) p (phone) o:3-27 (open) c:3-20 (click) dfajkdh byte[] byte[]:555-5555 byte[] hnvdzu9 byte[]:1234 St. XXXX hnvdzu9 byte[]:1233 St. hnvdzu9 XXXX byte[] er9asyjk byte[]: 324 Ave Column Family Example ● PROTIP: Keep CF and qualifier names short ○ They are repeated on disk for every cell ● “d” supports 2 versions of each column, maps to demographics ● “s7” has seven day TTL, maps to stats kept for 7 days.
  • 22. MemStore HDFS s2s1 s3 f1 Column Families In Depth MemStore HDFS s2s1 f2 my_table,,1328551097416.12921 bbc0c91869f88ba6a044a6a1c50. ● StoreFile(s) for each CF in region ● Sparse ● One memstore per CF ○ Must flush together ● Compactions happen at region level (Region) (family) (family)
  • 23. Compactions ● Rewrites StoreFiles ○ Improves read performance ○ IO Intensive ● Region scope ● Used to take > 50 hours ● Custom script took it down to 18 ○ Can (theoretically) run during the day
  • 25. The Table From Hell ● 19 Column Families ● 60% of our region count ● Skewed write pattern ○ KB size store files ○ Frequent compaction storms ○ hbase.hstore.compaction.min.size (HBASE-5461) ● Moved to it’s own cluster
  • 26. And yet... ● Cluster remained operational ○ Table is still in use today ● Met read and write demand ● Regions only briefly active ○ Rowkeys by date and customer
  • 27. What saved us ● Keyed by customer and date ● Effectively write once ○ Kept “active” region count low ● Custom compaction script ○ Skipped old regions ● More hardware ● Were able to selectively migrate
  • 28. Column Family Advice ● Bad choice for fine grained partitioning ● Good for ○ Similarly typed data ○ Varying versioning/retention requirements ● Prefer intra row scans ○ CF and qualifiers are sorted ○ ColumnRangeFilter