SlideShare a Scribd company logo
1 of 21
Download to read offline
© Hortonworks Inc. 2011
Apache HBase
For Architects
Nick Dimiduk
Member of Technical Staff, HBase
Seattle Technical Forum, 2013-05-15
Page 1
© Hortonworks Inc. 2011
Page 2
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Agenda
•  Background
–  (how did we get here?)
•  High-level Architecture
–  (where are we?)
•  Anatomy of a RegionServer
–  (how does this thing work?)
•  TL;DR
–  (what did we learn?)
•  Resources
–  (where do we go from here?)
Page 3
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Background
Architecting the Future of Big Data
Page 4
© Hortonworks Inc. 2011
Apache Hadoop in Review
•  Apache Hadoop Distributed Filesystem (HDFS)
–  Distributed, fault-tolerant, throughput-optimized data storage
–  Uses a filesystem analogy, not structured tables
–  The Google File System, 2003, Ghemawat et al.
–  http://research.google.com/archive/gfs.html
•  Apache Hadoop MapReduce (MR)
–  Distributed, fault-tolerant, batch-oriented data processing
–  Line- or record-oriented processing of the entire dataset *
–  “[Application] schema on read”
–  MapReduce: Simplified Data Processing on Large Clusters, 2004, Dean and
Ghemawat
–  http://research.google.com/archive/mapreduce.html
Page 5
Architecting the Future of Big Data
* For more on writing MapReduce applications, see “MapReduce
Patterns, Algorithms, and Use Cases”
http://highlyscalable.wordpress.com/2012/02/01/mapreduce-patterns/
© Hortonworks Inc. 2011
So what is HBase anyway?
•  BigTable paper from Google, 2006, Dean et al.
–  “Bigtable is a sparse, distributed, persistent multi-dimensional sorted map.”
–  http://research.google.com/archive/bigtable.html
•  Key Features:
–  Distributed storage across cluster of machines
–  Random, online read and write data access
–  Schemaless data model (“NoSQL”)
–  Self-managed data partitions
Page 6
Architecting the Future of Big Data
© Hortonworks Inc. 2011
High-level Architecture
Architecting the Future of Big Data
Page 7
© Hortonworks Inc. 2011
Page 9
Architecting the Future of Big Data
Logical Architecture
Distributed, persistent partitions of a BigTable
a
b
d
c
e
f
h
g
i
j
l
k
m
n
p
o
Table A
Region 1
Region 2
Region 3
Region 4
Region Server 7
Table A, Region 1
Table A, Region 2
Table G, Region 1070
Table L, Region 25
Region Server 86
Table A, Region 3
Table C, Region 30
Table F, Region 160
Table F, Region 776
Region Server 367
Table A, Region 4
Table C, Region 17
Table E, Region 52
Table P, Region 1116
Legend:
- A single table is partitioned into Regions of roughly equal size.
- Regions are assigned to Region Servers across the cluster.
- Region Servers host roughly the same number of regions.
© Hortonworks Inc. 2011
Page 11
Architecting the Future of Big Data
Physical Architecture
Distribution and Data Path
...
Zoo
Keeper
Zoo
Keeper
Zoo
Keeper
HBase
Client
JavaApp
HBase
Client
JavaApp
HBase
Client
HBase Shell
HBase
Client
REST/Thrift
Gateway
HBase
Client
JavaApp
HBase
Client
JavaApp
Region
Server
Data
Node
Region
Server
Data
Node
...
Region
Server
Data
Node
Region
Server
Data
Node
HBase
Master
Name
Node
Legend:
- An HBase RegionServer is collocated with an HDFS DataNode.
- HBase clients communicate directly with Region Servers for sending and receiving data.
- HMaster manages Region assignment and handles DDL operations.
- Online configuration state is maintained in ZooKeeper.
- HMaster and ZooKeeper are NOT involved in data path.
© Hortonworks Inc. 2011
Page 13
Architecting the Future of Big Data
Logical Data Model
A sparse, multi-dimensional, sorted map
Legend:
- Rows are sorted by rowkey.
- Within a row, values are located by column family and qualifier.
- Values also carry a timestamp; there can me multiple versions of a value.
- Within a column family, data is schemaless. Qualifiers and values are treated as arbitrary bytes.
1368387247 [3.6 kb png data]"thumb"cf2b
a
cf1
1368394583 7
1368394261 "hello"
"bar"
1368394583 22
1368394925 13.6
1368393847 "world"
"foo"
cf2
1368387684 "almost the loneliest number"1.0001
1368396302 "fourth of July""2011-07-04"
Table A
rowkey
column
family
column
qualifier
timestamp value
© Hortonworks Inc. 2011
Anatomy of a
RegionServer
Architecting the Future of Big Data
Page 14
© Hortonworks Inc. 2011
Page 16
Architecting the Future of Big Data
RegionServer
HDFS
HLog
(WAL)
HRegion
HStore
StoreFile
HFile
StoreFile
HFile
MemStore
...
...
HStore
BlockCache
HRegion
...
HStoreHStore
...
Legend:
- A RegionServer contains a single WAL, single BlockCache, and multiple Regions.
- A Region contains multiple Stores, one for each Column Family.
- A Store consists of multiple StoreFiles and a MemStore.
- A StoreFile corresponds to a single HFile.
- HFiles and WAL are persisted on HDFS.
Storage Machinery
Implementing the data model
© Hortonworks Inc. 2011
TL;DR
Architecting the Future of Big Data
Page 21
© Hortonworks Inc. 2011
For what kinds of workloads is it well suited?
•  It depends on how you tune it, but…
•  HBase is good for:
–  Large datasets
–  Sparse datasets
–  Loosely coupled (denormalized) records
–  Lots of concurrent clients
•  Try to avoid:
–  Small datasets (unless you have lots of them)
–  Highly relational records
–  Schema designs requiring transactions *
Page 22
Architecting the Future of Big Data
* Transactions might not be as necessary as you think, see “Eric
Brewer on why banks are BASE not ACID”
http://highscalability.com/blog/2013/5/1/myth-eric-brewer-on-why-
banks-are-base-not-acid-availability.html
** Or maybe not, “We believe it is better to have application
programmers deal with performance problems due to overuse of
transactions as bottlenecks arise, rather than always coding around
the lack of transactions.” – Google Spanner paper, http://
research.google.com/archive/spanner.html
© Hortonworks Inc. 2011
How does it integrate with my infrastructure?
•  Horizontally scale application data
–  Highly concurrent, read/write access
–  Consistent, persisted shared state
–  Distributed online data processing via Coprocessors (experimental)
•  Gateway between online services and offline storage/analysis
–  Staging area to receive new data
–  Serve online, indexed “views” on datasets from HDFS
–  Glue between batch (HDFS, MR1) and online (CEP, Storm) systems
Page 23
Architecting the Future of Big Data
© Hortonworks Inc. 2011
What data semantics does it provide?
•  GET, PUT, DELETE key-value operations
•  SCAN for queries
•  INCREMENT, CAS server-side atomic operations
•  Row-level write atomicity
•  MapReduce integration
–  Online API (today)
–  Bulkload (today)
–  Snapshots (coming)
Page 24
Architecting the Future of Big Data
© Hortonworks Inc. 2011
What about operational concerns?
•  Provision hardware with more spindles/TB
•  Balance memory and IO for reads
–  Contention between random and sequential access
–  Configure Block size, BlockCache, compression, codecs based on access patterns
–  Additional resources
–  “HBase: Performance Tuners,” http://labs.ericsson.com/blog/hbase-performance-tuners
–  “Scanning in HBase,” http://hadoop-hbase.blogspot.com/2012/01/scanning-in-
hbase.html
•  Balance IO for writes
–  Configure C1 (compactions, region size, compression, pre-splits, &c.) based on
write pattern
–  Balance IO contention between maintaining C1 and serving reads
–  Additional resources
–  “Configuring HBase Memstore: what you should know,” http://blog.sematext.com/
2012/07/16/hbase-memstore-what-you-should-know/
–  “Visualizing HBase Flushes And Compactions,” http://www.ngdata.com/visualizing-
hbase-flushes-and-compactions/
Page 25
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Resources
Architecting the Future of Big Data
Page 26
© Hortonworks Inc. 2011
Join the Community!
•  hbase.apache.org
–  blogs.apache.org/hbase/
•  Mailing lists
–  hbase.apache.org/mail-lists.html
–  user@hbase.apache.org
•  IRC
–  irc.freenode.net#hbase
•  JIRA
–  issues.apache.org/jira/browse/HBASE
•  Source
–  git clone git://git.apache.org/hbase.git
–  svn checkout http://svn.apache.org/repos/asf/hbase/trunk hbase
•  Conference Season
–  HBaseCon 2013, June 13, hbasecon.com
–  Hadoop Summit, June 26-27, hadoopsummit.org
Page 27
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HBase@Hortonworks
•  Mean Time To Recovery (MTTR)
–  HDFS improvements, faster recovery of META, log replay instead of log splitting,
improving failure detection
•  Testing
–  Integration test suite, system tests, destructive testing, ChaosMonkey, load tests,
Namenode HA, test coverage and consistency
•  Compaction Improvements
–  Pluggable compaction, tier based compaction, stripe / leveldb compactions, etc
•  IPC / Wire compatibility
–  Migration to Google’s Protocol Buffers
•  HBase MapReduce improvements (Import / Export, etc)
–  Performance improvements, API uniformity/usability
•  Hardening 0.94
–  Assignment Manager, Log splitting, Region splits, Replication
•  Not to mention:
–  Windows support, Security, Snapshots, Hadoop2, 0.96, LOTS of bug fixes and
community reviews
Page 28
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Thanks!
Architecting the Future of Big Data
Page 29
M A N N I N G
Nick Dimiduk
Amandeep Khurana
FOREWORD BY
Michael Stack
hbaseinaction.com
Nick Dimiduk
github.com/ndimiduk
@xefyr
n10k.com

More Related Content

What's hot

HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
Apache spark 소개 및 실습
Apache spark 소개 및 실습Apache spark 소개 및 실습
Apache spark 소개 및 실습동현 강
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...Databricks
 
Building real-time applications with Amazon ElastiCache - ADB204 - Anaheim AW...
Building real-time applications with Amazon ElastiCache - ADB204 - Anaheim AW...Building real-time applications with Amazon ElastiCache - ADB204 - Anaheim AW...
Building real-time applications with Amazon ElastiCache - ADB204 - Anaheim AW...Amazon Web Services
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to ElasticsearchClifford James
 
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBDistributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBYugabyteDB
 
Using AI to Build a Self-Driving Query Optimizer with Shivnath Babu and Adria...
Using AI to Build a Self-Driving Query Optimizer with Shivnath Babu and Adria...Using AI to Build a Self-Driving Query Optimizer with Shivnath Babu and Adria...
Using AI to Build a Self-Driving Query Optimizer with Shivnath Babu and Adria...Databricks
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialDaniel Abadi
 
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)Amy W. Tang
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안SANG WON PARK
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Presentation Hadoop Québec
Presentation Hadoop QuébecPresentation Hadoop Québec
Presentation Hadoop QuébecMathieu Dumoulin
 
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...DataWorks Summit
 

What's hot (20)

Influxdb and time series data
Influxdb and time series dataInfluxdb and time series data
Influxdb and time series data
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Apache spark 소개 및 실습
Apache spark 소개 및 실습Apache spark 소개 및 실습
Apache spark 소개 및 실습
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
 
Building real-time applications with Amazon ElastiCache - ADB204 - Anaheim AW...
Building real-time applications with Amazon ElastiCache - ADB204 - Anaheim AW...Building real-time applications with Amazon ElastiCache - ADB204 - Anaheim AW...
Building real-time applications with Amazon ElastiCache - ADB204 - Anaheim AW...
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBDistributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
 
Using AI to Build a Self-Driving Query Optimizer with Shivnath Babu and Adria...
Using AI to Build a Self-Driving Query Optimizer with Shivnath Babu and Adria...Using AI to Build a Self-Driving Query Optimizer with Shivnath Babu and Adria...
Using AI to Build a Self-Driving Query Optimizer with Shivnath Babu and Adria...
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
 
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
 
Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache FlinkUnified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Presentation Hadoop Québec
Presentation Hadoop QuébecPresentation Hadoop Québec
Presentation Hadoop Québec
 
噛み砕いてKafka Streams #kafkajp
噛み砕いてKafka Streams #kafkajp噛み砕いてKafka Streams #kafkajp
噛み砕いてKafka Streams #kafkajp
 
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
 

Viewers also liked

Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)alexbaranau
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceCloudera, Inc.
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance TuningLars Hofhansl
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance ImprovementBiju Nair
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBaseCloudera, Inc.
 
20090713 Hbase Schema Design Case Studies
20090713 Hbase Schema Design Case Studies20090713 Hbase Schema Design Case Studies
20090713 Hbase Schema Design Case StudiesEvan Liu
 
Apache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use CasesApache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use CasesData Con LA
 
Hadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignHadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignCloudera, Inc.
 
HBase Blockcache 101
HBase Blockcache 101HBase Blockcache 101
HBase Blockcache 101Nick Dimiduk
 
Apache HBase for Architects
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for ArchitectsNick Dimiduk
 
A Survey of HBase Application Archetypes
A Survey of HBase Application ArchetypesA Survey of HBase Application Archetypes
A Survey of HBase Application ArchetypesHBaseCon
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the BasicsHBaseCon
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guidelarsgeorge
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars GeorgeJAX London
 
Apache HBase 1.0 Release
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 ReleaseNick Dimiduk
 
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)tatsuya6502
 
HBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBaseHBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBaseCloudera, Inc.
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!
 

Viewers also liked (20)

Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage Internals
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance Improvement
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
 
20090713 Hbase Schema Design Case Studies
20090713 Hbase Schema Design Case Studies20090713 Hbase Schema Design Case Studies
20090713 Hbase Schema Design Case Studies
 
Apache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use CasesApache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use Cases
 
Hadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignHadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema Design
 
HBase Blockcache 101
HBase Blockcache 101HBase Blockcache 101
HBase Blockcache 101
 
Apache HBase for Architects
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for Architects
 
A Survey of HBase Application Archetypes
A Survey of HBase Application ArchetypesA Survey of HBase Application Archetypes
A Survey of HBase Application Archetypes
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guide
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
 
Apache HBase 1.0 Release
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 Release
 
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
 
HBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBaseHBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBase
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
 
Apache hadoop hbase
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbase
 

Similar to HBase for Architects

Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBaseHortonworks
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBaseHortonworks
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshotsenissoz
 
Sept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionSept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionAdam Muise
 
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseCloudera, Inc.
 
Techincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql databaseTechincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql databaseRishabh Dugar
 
La big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixitLa big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixitData Con LA
 
HBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBaseHBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBaseCloudera, Inc.
 
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Ashish Narasimham
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosLester Martin
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconYiwei Ma
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统yongboy
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...Michael Stack
 
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolutionDataWorks Summit
 

Similar to HBase for Architects (20)

Hbase mhug 2015
Hbase mhug 2015Hbase mhug 2015
Hbase mhug 2015
 
Mar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBaseMar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBase
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBase
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBase
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshots
 
Sept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical IntroductionSept 17 2013 - THUG - HBase a Technical Introduction
Sept 17 2013 - THUG - HBase a Technical Introduction
 
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
 
Techincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql databaseTechincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql database
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Hive
HiveHive
Hive
 
Horizon for Big Data
Horizon for Big DataHorizon for Big Data
Horizon for Big Data
 
La big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixitLa big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixit
 
HBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBaseHBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBase
 
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
 
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolution
 

More from Nick Dimiduk

Apache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBaseApache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBaseNick Dimiduk
 
Apache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - PhoenixApache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - PhoenixNick Dimiduk
 
HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014Nick Dimiduk
 
Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low LatencyNick Dimiduk
 
HBase Data Types (WIP)
HBase Data Types (WIP)HBase Data Types (WIP)
HBase Data Types (WIP)Nick Dimiduk
 
Bring Cartography to the Cloud
Bring Cartography to the CloudBring Cartography to the Cloud
Bring Cartography to the CloudNick Dimiduk
 
HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)Nick Dimiduk
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop EasyNick Dimiduk
 
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQLIntroduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQLNick Dimiduk
 

More from Nick Dimiduk (10)

Apache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBaseApache Big Data EU 2015 - HBase
Apache Big Data EU 2015 - HBase
 
Apache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - PhoenixApache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - Phoenix
 
HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014
 
HBase Data Types
HBase Data TypesHBase Data Types
HBase Data Types
 
Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low Latency
 
HBase Data Types (WIP)
HBase Data Types (WIP)HBase Data Types (WIP)
HBase Data Types (WIP)
 
Bring Cartography to the Cloud
Bring Cartography to the CloudBring Cartography to the Cloud
Bring Cartography to the Cloud
 
HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop Easy
 
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQLIntroduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
 

Recently uploaded

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 

Recently uploaded (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 

HBase for Architects

  • 1. © Hortonworks Inc. 2011 Apache HBase For Architects Nick Dimiduk Member of Technical Staff, HBase Seattle Technical Forum, 2013-05-15 Page 1
  • 2. © Hortonworks Inc. 2011 Page 2 Architecting the Future of Big Data
  • 3. © Hortonworks Inc. 2011 Agenda •  Background –  (how did we get here?) •  High-level Architecture –  (where are we?) •  Anatomy of a RegionServer –  (how does this thing work?) •  TL;DR –  (what did we learn?) •  Resources –  (where do we go from here?) Page 3 Architecting the Future of Big Data
  • 4. © Hortonworks Inc. 2011 Background Architecting the Future of Big Data Page 4
  • 5. © Hortonworks Inc. 2011 Apache Hadoop in Review •  Apache Hadoop Distributed Filesystem (HDFS) –  Distributed, fault-tolerant, throughput-optimized data storage –  Uses a filesystem analogy, not structured tables –  The Google File System, 2003, Ghemawat et al. –  http://research.google.com/archive/gfs.html •  Apache Hadoop MapReduce (MR) –  Distributed, fault-tolerant, batch-oriented data processing –  Line- or record-oriented processing of the entire dataset * –  “[Application] schema on read” –  MapReduce: Simplified Data Processing on Large Clusters, 2004, Dean and Ghemawat –  http://research.google.com/archive/mapreduce.html Page 5 Architecting the Future of Big Data * For more on writing MapReduce applications, see “MapReduce Patterns, Algorithms, and Use Cases” http://highlyscalable.wordpress.com/2012/02/01/mapreduce-patterns/
  • 6. © Hortonworks Inc. 2011 So what is HBase anyway? •  BigTable paper from Google, 2006, Dean et al. –  “Bigtable is a sparse, distributed, persistent multi-dimensional sorted map.” –  http://research.google.com/archive/bigtable.html •  Key Features: –  Distributed storage across cluster of machines –  Random, online read and write data access –  Schemaless data model (“NoSQL”) –  Self-managed data partitions Page 6 Architecting the Future of Big Data
  • 7. © Hortonworks Inc. 2011 High-level Architecture Architecting the Future of Big Data Page 7
  • 8. © Hortonworks Inc. 2011 Page 9 Architecting the Future of Big Data Logical Architecture Distributed, persistent partitions of a BigTable a b d c e f h g i j l k m n p o Table A Region 1 Region 2 Region 3 Region 4 Region Server 7 Table A, Region 1 Table A, Region 2 Table G, Region 1070 Table L, Region 25 Region Server 86 Table A, Region 3 Table C, Region 30 Table F, Region 160 Table F, Region 776 Region Server 367 Table A, Region 4 Table C, Region 17 Table E, Region 52 Table P, Region 1116 Legend: - A single table is partitioned into Regions of roughly equal size. - Regions are assigned to Region Servers across the cluster. - Region Servers host roughly the same number of regions.
  • 9. © Hortonworks Inc. 2011 Page 11 Architecting the Future of Big Data Physical Architecture Distribution and Data Path ... Zoo Keeper Zoo Keeper Zoo Keeper HBase Client JavaApp HBase Client JavaApp HBase Client HBase Shell HBase Client REST/Thrift Gateway HBase Client JavaApp HBase Client JavaApp Region Server Data Node Region Server Data Node ... Region Server Data Node Region Server Data Node HBase Master Name Node Legend: - An HBase RegionServer is collocated with an HDFS DataNode. - HBase clients communicate directly with Region Servers for sending and receiving data. - HMaster manages Region assignment and handles DDL operations. - Online configuration state is maintained in ZooKeeper. - HMaster and ZooKeeper are NOT involved in data path.
  • 10. © Hortonworks Inc. 2011 Page 13 Architecting the Future of Big Data Logical Data Model A sparse, multi-dimensional, sorted map Legend: - Rows are sorted by rowkey. - Within a row, values are located by column family and qualifier. - Values also carry a timestamp; there can me multiple versions of a value. - Within a column family, data is schemaless. Qualifiers and values are treated as arbitrary bytes. 1368387247 [3.6 kb png data]"thumb"cf2b a cf1 1368394583 7 1368394261 "hello" "bar" 1368394583 22 1368394925 13.6 1368393847 "world" "foo" cf2 1368387684 "almost the loneliest number"1.0001 1368396302 "fourth of July""2011-07-04" Table A rowkey column family column qualifier timestamp value
  • 11. © Hortonworks Inc. 2011 Anatomy of a RegionServer Architecting the Future of Big Data Page 14
  • 12. © Hortonworks Inc. 2011 Page 16 Architecting the Future of Big Data RegionServer HDFS HLog (WAL) HRegion HStore StoreFile HFile StoreFile HFile MemStore ... ... HStore BlockCache HRegion ... HStoreHStore ... Legend: - A RegionServer contains a single WAL, single BlockCache, and multiple Regions. - A Region contains multiple Stores, one for each Column Family. - A Store consists of multiple StoreFiles and a MemStore. - A StoreFile corresponds to a single HFile. - HFiles and WAL are persisted on HDFS. Storage Machinery Implementing the data model
  • 13. © Hortonworks Inc. 2011 TL;DR Architecting the Future of Big Data Page 21
  • 14. © Hortonworks Inc. 2011 For what kinds of workloads is it well suited? •  It depends on how you tune it, but… •  HBase is good for: –  Large datasets –  Sparse datasets –  Loosely coupled (denormalized) records –  Lots of concurrent clients •  Try to avoid: –  Small datasets (unless you have lots of them) –  Highly relational records –  Schema designs requiring transactions * Page 22 Architecting the Future of Big Data * Transactions might not be as necessary as you think, see “Eric Brewer on why banks are BASE not ACID” http://highscalability.com/blog/2013/5/1/myth-eric-brewer-on-why- banks-are-base-not-acid-availability.html ** Or maybe not, “We believe it is better to have application programmers deal with performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions.” – Google Spanner paper, http:// research.google.com/archive/spanner.html
  • 15. © Hortonworks Inc. 2011 How does it integrate with my infrastructure? •  Horizontally scale application data –  Highly concurrent, read/write access –  Consistent, persisted shared state –  Distributed online data processing via Coprocessors (experimental) •  Gateway between online services and offline storage/analysis –  Staging area to receive new data –  Serve online, indexed “views” on datasets from HDFS –  Glue between batch (HDFS, MR1) and online (CEP, Storm) systems Page 23 Architecting the Future of Big Data
  • 16. © Hortonworks Inc. 2011 What data semantics does it provide? •  GET, PUT, DELETE key-value operations •  SCAN for queries •  INCREMENT, CAS server-side atomic operations •  Row-level write atomicity •  MapReduce integration –  Online API (today) –  Bulkload (today) –  Snapshots (coming) Page 24 Architecting the Future of Big Data
  • 17. © Hortonworks Inc. 2011 What about operational concerns? •  Provision hardware with more spindles/TB •  Balance memory and IO for reads –  Contention between random and sequential access –  Configure Block size, BlockCache, compression, codecs based on access patterns –  Additional resources –  “HBase: Performance Tuners,” http://labs.ericsson.com/blog/hbase-performance-tuners –  “Scanning in HBase,” http://hadoop-hbase.blogspot.com/2012/01/scanning-in- hbase.html •  Balance IO for writes –  Configure C1 (compactions, region size, compression, pre-splits, &c.) based on write pattern –  Balance IO contention between maintaining C1 and serving reads –  Additional resources –  “Configuring HBase Memstore: what you should know,” http://blog.sematext.com/ 2012/07/16/hbase-memstore-what-you-should-know/ –  “Visualizing HBase Flushes And Compactions,” http://www.ngdata.com/visualizing- hbase-flushes-and-compactions/ Page 25 Architecting the Future of Big Data
  • 18. © Hortonworks Inc. 2011 Resources Architecting the Future of Big Data Page 26
  • 19. © Hortonworks Inc. 2011 Join the Community! •  hbase.apache.org –  blogs.apache.org/hbase/ •  Mailing lists –  hbase.apache.org/mail-lists.html –  user@hbase.apache.org •  IRC –  irc.freenode.net#hbase •  JIRA –  issues.apache.org/jira/browse/HBASE •  Source –  git clone git://git.apache.org/hbase.git –  svn checkout http://svn.apache.org/repos/asf/hbase/trunk hbase •  Conference Season –  HBaseCon 2013, June 13, hbasecon.com –  Hadoop Summit, June 26-27, hadoopsummit.org Page 27 Architecting the Future of Big Data
  • 20. © Hortonworks Inc. 2011 HBase@Hortonworks •  Mean Time To Recovery (MTTR) –  HDFS improvements, faster recovery of META, log replay instead of log splitting, improving failure detection •  Testing –  Integration test suite, system tests, destructive testing, ChaosMonkey, load tests, Namenode HA, test coverage and consistency •  Compaction Improvements –  Pluggable compaction, tier based compaction, stripe / leveldb compactions, etc •  IPC / Wire compatibility –  Migration to Google’s Protocol Buffers •  HBase MapReduce improvements (Import / Export, etc) –  Performance improvements, API uniformity/usability •  Hardening 0.94 –  Assignment Manager, Log splitting, Region splits, Replication •  Not to mention: –  Windows support, Security, Snapshots, Hadoop2, 0.96, LOTS of bug fixes and community reviews Page 28 Architecting the Future of Big Data
  • 21. © Hortonworks Inc. 2011 Thanks! Architecting the Future of Big Data Page 29 M A N N I N G Nick Dimiduk Amandeep Khurana FOREWORD BY Michael Stack hbaseinaction.com Nick Dimiduk github.com/ndimiduk @xefyr n10k.com