Using Apache HBase Effectively
 

Using Apache HBase Effectively

on

  • 2,856 views

 

Statistics

Views

Total Views
2,856
Views on SlideShare
2,841
Embed Views
15

Actions

Likes
17
Downloads
86
Comments
0

1 Embed 15

https://twitter.com 15

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Hbase is a project that solves this problem. In a sentence, Hbase is an open source, distributed, sorted map modeled after Google’s BigTable.Open-source: Apache HBase is an open source project with an Apache 2.0 license.Distributed: HBase is designed to use multiple machines to store and serve data.Sorted Map: HBase stores data as a map, and guarantees that adjacent keys will be stored next to each other on disk.HBase is modeled after BigTable, a system that is used for hundreds of applications at Google.
  • Is it still the data store for those things?
  • Hbase is a project that solves this problem. In a sentence, Hbase is an open source, distributed, sorted map modeled after Google’s BigTable.Open-source: Apache HBase is an open source project with an Apache 2.0 license.Distributed: HBase is designed to use multiple machines to store and serve data.Sorted Map: HBase stores data as a map, and guarantees that adjacent keys will be stored next to each other on disk.HBase is modeled after BigTable, a system that is used for hundreds of applications at Google.
  • Hbase is a project that solves this problem. In a sentence, Hbase is an open source, distributed, sorted map modeled after Google’s BigTable.Open-source: Apache HBase is an open source project with an Apache 2.0 license.Distributed: HBase is designed to use multiple machines to store and serve data.Sorted Map: HBase stores data as a map, and guarantees that adjacent keys will be stored next to each other on disk.HBase is modeled after BigTable, a system that is used for hundreds of applications at Google.
  • Hbase is a project that solves this problem. In a sentence, Hbase is an open source, distributed, sorted map modeled after Google’s BigTable.Open-source: Apache HBase is an open source project with an Apache 2.0 license.Distributed: HBase is designed to use multiple machines to store and serve data.Sorted Map: HBase stores data as a map, and guarantees that adjacent keys will be stored next to each other on disk.HBase is modeled after BigTable, a system that is used for hundreds of applications at Google.
  • Hbase is a project that solves this problem. In a sentence, Hbase is an open source, distributed, sorted map modeled after Google’s BigTable.Open-source: Apache HBase is an open source project with an Apache 2.0 license.Distributed: HBase is designed to use multiple machines to store and serve data.Sorted Map: HBase stores data as a map, and guarantees that adjacent keys will be stored next to each other on disk.HBase is modeled after BigTable, a system that is used for hundreds of applications at Google.
  • Tested under HBase
  • http://www.slideshare.net/iamcal/scalable-web-architectures-common-patterns-and-approaches-web-20-expo-nyc-presentation/
  • Earlier, I said that Hbase is a big sorted map. Here is an example of a table. The map key is (row key+column+timestamp). The value is the cell contents. The rows in the map are sorted by key. In this example, Row1 has 3 columns in the "info" column family. Row2 only has a single column. A column can also be empty.Each row has a timestamp. By default, the timestamp is set to the current time (in milliseconds since the Unix Epoch, January 1st 1970) when the row is inserted. A client can specify a timestamp when inserting or retrieving data, and specify how many versions of each cell should be maintained.Data in HBase is non-typed; everything is an array of bytes. Rows are sorted lexicographically. This order is maintained on disk, so Row1 and Row2 can be read together in just one disk seek.
  • Earlier, I said that Hbase is a big sorted map. Here is an example of a table. The map key is (row key+column+timestamp). The value is the cell contents. The rows in the map are sorted by key. In this example, Row1 has 3 columns in the "info" column family. Row2 only has a single column. A column can also be empty.Each row has a timestamp. By default, the timestamp is set to the current time (in milliseconds since the Unix Epoch, January 1st 1970) when the row is inserted. A client can specify a timestamp when inserting or retrieving data, and specify how many versions of each cell should be maintained.Data in HBase is non-typed; everything is an array of bytes. Rows are sorted lexicographically. This order is maintained on disk, so Row1 and Row2 can be read together in just one disk seek.
  • Earlier, I said that Hbase is a big sorted map. Here is an example of a table. The map key is (row key+column+timestamp). The value is the cell contents. The rows in the map are sorted by key. In this example, Row1 has 3 columns in the "info" column family. Row2 only has a single column. A column can also be empty.Each row has a timestamp. By default, the timestamp is set to the current time (in milliseconds since the Unix Epoch, January 1st 1970) when the row is inserted. A client can specify a timestamp when inserting or retrieving data, and specify how many versions of each cell should be maintained.Data in HBase is non-typed; everything is an array of bytes. Rows are sorted lexicographically. This order is maintained on disk, so Row1 and Row2 can be read together in just one disk seek.
  • Earlier, I said that Hbase is a big sorted map. Here is an example of a table. The map key is (row key+column+timestamp). The value is the cell contents. The rows in the map are sorted by key. In this example, Row1 has 3 columns in the "info" column family. Row2 only has a single column. A column can also be empty.Each row has a timestamp. By default, the timestamp is set to the current time (in milliseconds since the Unix Epoch, January 1st 1970) when the row is inserted. A client can specify a timestamp when inserting or retrieving data, and specify how many versions of each cell should be maintained.Data in HBase is non-typed; everything is an array of bytes. Rows are sorted lexicographically. This order is maintained on disk, so Row1 and Row2 can be read together in just one disk seek.
  • Earlier, I said that Hbase is a big sorted map. Here is an example of a table. The map key is (row key+column+timestamp). The value is the cell contents. The rows in the map are sorted by key. In this example, Row1 has 3 columns in the "info" column family. Row2 only has a single column. A column can also be empty.Each row has a timestamp. By default, the timestamp is set to the current time (in milliseconds since the Unix Epoch, January 1st 1970) when the row is inserted. A client can specify a timestamp when inserting or retrieving data, and specify how many versions of each cell should be maintained.Data in HBase is non-typed; everything is an array of bytes. Rows are sorted lexicographically. This order is maintained on disk, so Row1 and Row2 can be read together in just one disk seek.
  • Earlier, I said that Hbase is a big sorted map. Here is an example of a table. The map key is (row key+column+timestamp). The value is the cell contents. The rows in the map are sorted by key. In this example, Row1 has 3 columns in the "info" column family. Row2 only has a single column. A column can also be empty.Each row has a timestamp. By default, the timestamp is set to the current time (in milliseconds since the Unix Epoch, January 1st 1970) when the row is inserted. A client can specify a timestamp when inserting or retrieving data, and specify how many versions of each cell should be maintained.Data in HBase is non-typed; everything is an array of bytes. Rows are sorted lexicographically. This order is maintained on disk, so Row1 and Row2 can be read together in just one disk seek.
  • Earlier, I said that Hbase is a big sorted map. Here is an example of a table. The map key is (row key+column+timestamp). The value is the cell contents. The rows in the map are sorted by key. In this example, Row1 has 3 columns in the "info" column family. Row2 only has a single column. A column can also be empty.Each row has a timestamp. By default, the timestamp is set to the current time (in milliseconds since the Unix Epoch, January 1st 1970) when the row is inserted. A client can specify a timestamp when inserting or retrieving data, and specify how many versions of each cell should be maintained.Data in HBase is non-typed; everything is an array of bytes. Rows are sorted lexicographically. This order is maintained on disk, so Row1 and Row2 can be read together in just one disk seek.
  • Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
  • Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
  • Earlier, I said that Hbase is a big sorted map. Here is an example of a table. The map key is (row key+column+timestamp). The value is the cell contents. The rows in the map are sorted by key. In this example, Row1 has 3 columns in the "info" column family. Row2 only has a single column. A column can also be empty.Each row has a timestamp. By default, the timestamp is set to the current time (in milliseconds since the Unix Epoch, January 1st 1970) when the row is inserted. A client can specify a timestamp when inserting or retrieving data, and specify how many versions of each cell should be maintained.Data in HBase is non-typed; everything is an array of bytes. Rows are sorted lexicographically. This order is maintained on disk, so Row1 and Row2 can be read together in just one disk seek.
  • Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
  • Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
  • Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
  • Data Layout: An traditional RDBMS uses a fixed schema and row-oriented storage model. This has drawbacks if the number of columns per row could vary drastically. A semi-structured column-oriented store handles this case very well.Transactions: A benefit that an RDBMS offers is strict ACID compliance with full transaction support. HBase currently offers transactions on a per row basis. There is work being done to expand HBase's transactional support.Query language: RDBMSs support SQL, a full-featured language for doing filtering, joining, aggregating, sorting, etc. HBase does not support SQL*. There are two ways to find rows in HBase: get a row by key or scan a table.Security: In version 0.20.4, authentication and authorization are not yet available for HBase.Indexes: In a typical RDBMS, indexes can be created on arbitrary columns. HBase does not have any traditional indexes**. The rows are stored sorted, with a sparse index of row offsets. This means it is very fast to find a row by its row key.Max data size: Most RDBMS architectures are designed to store GBs or TBs of data. HBase can scale to much larger data sizes.Read/write throughput limits: Typical RDBMS deployments can scale to thousands of queries/second. There is virtually no upper bound to the number of reads and writes HBase can handle.* Hive/HBase integration is being worked on** There are contrib packages for building indexes on HBase tables
  • Let’s look at a typical Hadoop cluster. Most production clusters have at least 5 servers, though you can run it on a laptop for development. A typical server probably has 8 cores, 24GB of RAM, 4-12TB of disk, and gigabit ethernet, for example something like a Dell R410 or an HP SL170. On larger clusters, the machines are spread out in multiple racks, with 20 or 40 nodes per rack. The largest Hadoop clusters have about 4000 servers in them.
  • Tested under HBase
  • Tested under HBase
  • Tested under HBase
  • Tested under HBase
  • Tested under HBase
  • Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
  • Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
  • http://www.facebook.com/video/video.php?v=690851516105http://www.slideshare.net/brizzzdotcom/facebook-messages-hbaseNicolas Spiegelberg, KarthikRanganathan (SW eng Facebook)
  • Given that Hbase stores a large sorted map, the API looks similar to a map. You can get or put individual rows, or scan a range of rows. There is also a very efficient way of incrementing a particular cell – this can be useful for maintaining high performance counters or statistics. Lastly, it’s possible to write MapReduce jobs that analyze the data in Hbase.
  • Tested under HBase
  • One of the interesting things about NoSQL is that the different systems don’t usually compete directly. We all have picked different tradeoffs.Hbase is a strongly consistent system, so it does not have as good availability as an eventual consistency system like Cassandra. But, we find that availability is good in practice!Since Hbase is built on top of Hadoop, it has very good integration. For example, we have a very efficient bulk load feature, and the ability to run mapreduce into or out of Hbase tables.Hbase’s partitioning is range based, and data is sorted by key on disk. This is different than other systems which use a hash function to distribute keys. This can be useful for guaranteeing that for a given user account, all of that user’s data can be read with just one disk seek.Hbase automatically reshards when necessary, and regions automatically reassign if servers die. Adding more servers is simple – just turn them on. There is no “reshard” step.Hbase is not just a key value store – it is similar to Cassandra in that each row has a sparse set of columns which are efficiently stored
  • One of the interesting things about NoSQL is that the different systems don’t usually compete directly. We all have picked different tradeoffs.Hbase is a strongly consistent system, so it does not have as good availability as an eventual consistency system like Cassandra. But, we find that availability is good in practice!Since Hbase is built on top of Hadoop, it has very good integration. For example, we have a very efficient bulk load feature, and the ability to run mapreduce into or out of Hbase tables.Hbase’s partitioning is range based, and data is sorted by key on disk. This is different than other systems which use a hash function to distribute keys. This can be useful for guaranteeing that for a given user account, all of that user’s data can be read with just one disk seek.Hbase automatically reshards when necessary, and regions automatically reassign if servers die. Adding more servers is simple – just turn them on. There is no “reshard” step.Hbase is not just a key value store – it is similar to Cassandra in that each row has a sparse set of columns which are efficiently stored
  • One of the interesting things about NoSQL is that the different systems don’t usually compete directly. We all have picked different tradeoffs.Hbase is a strongly consistent system, so it does not have as good availability as an eventual consistency system like Cassandra. But, we find that availability is good in practice!Since Hbase is built on top of Hadoop, it has very good integration. For example, we have a very efficient bulk load feature, and the ability to run mapreduce into or out of Hbase tables.Hbase’s partitioning is range based, and data is sorted by key on disk. This is different than other systems which use a hash function to distribute keys. This can be useful for guaranteeing that for a given user account, all of that user’s data can be read with just one disk seek.Hbase automatically reshards when necessary, and regions automatically reassign if servers die. Adding more servers is simple – just turn them on. There is no “reshard” step.Hbase is not just a key value store – it is similar to Cassandra in that each row has a sparse set of columns which are efficiently stored
  • So, if you are interested in Hadoop and Hbase, here are some resources. The easiest way to install Hadoop is to use Cloudera’s Distribution for Hadoop from cloudera.com. You can also download the Apache source directly from hadoop.apache.org. You can get started on your laptop, in a VM, or running on EC2. I also recommend our free training videos from our website.The Hadoop: The Definitive Guide book is also really great – it’s also available translated in Japanese.
  • CDH1, Hadoop 0.18.3, Pig, HiveCDH2, Hadoop 0.20.0, Pig, HiveCDH3, Hadoop 0.20.2+security+append, ZK, HBase, Flume, Pig,Hive, Hue, Sqoop, Oozie, Whirr, (later Mahout)CDH4, Hadoop 2.0.0!

Using Apache HBase Effectively Using Apache HBase Effectively Presentation Transcript

  • DO NOT USE PUBLICLY Using HBase Effectively PRIOR TO 10/23/12 Headline Goes Here Jonathan Hsieh | @jmhsieh | Software Engineer at Cloudera / Speaker Name or Subhead Goes Here HBase PMC Member Himanshu Vashishtha | himanshu@cloudera| Software Engineer at Cloudera February 20131 2/26/13 Strata Conference 2013
  • Who are we? Jonathan Hsieh Himanshu Vashishtha • Cloudera: • Cloudera: • Software Engineer • Software Engineer • Apache HBase committer / PMC • Apache HBase contributor • Apache Flume founder / PMC • U of Alberta, IIT Varanasi • Apache Sqoop committer / PMC • U of Washington: • Research in Distributed Systems2 2/26/13 Strata Conference 2013
  • What is Apache HBase? Apache HBase is an open App MR source, distributed, scala ble, consistent, low latency, random access ZK HDFS non-relational database built on Apache Hadoop3 2/26/13 Strata Conference 2013
  • HBase provides Low-latency Random Access • Writes: • 1-3ms, 1k-10k writes/sec per node 0000000000 • Reads: 4 1111111111 • 0-3ms cached, 10-30ms disk 1 2222222222 • 10-40k reads / second / node from 3333333333 cache 5 4444444444 • Cell size: 5555555555 6666666666 • 0-3MB preferred 2 7777777777 • Read, write and insert data anywhere in 3 the table • No sequential write limitations4 2/26/13 Strata Conference 2013
  • Production Apache HBase Applications • Inbox • Storage • Web • Search • Analytics • Monitoring More Case Studies at http://www.hbasecon.com/agenda/5 2/26/13 Strata Conference 2013
  • Inspiration: Google BigTable (OSDI 2006) • Goal: Low latency, Consistent, random read/write access to massive amounts of structured data. • It was the data store for Google’s crawler web table, gmail, analytics, earth, blogger, …6 2/26/13 Strata Conference 2013
  • Implementation: Apache HBase (2013) • Web Application Backend • Inboxes • Catalogs • Search index server • Web cache • Social Media storage • Monitoring Real-time Analytics • OpenTSDB • A data storage layer for higher level platforms7 2/26/13 Strata Conference 2013
  • Outline • Enter Apache HBase • The HBase Data Model • System Architecture • Real-World Applications • Effective Application Schemas • Conclusions8 2/26/13 Strata Conference 2013
  • Enter Apache HBase Low Latency, Consistent, random read/write big data access9 2/26/13 Strata Conference 2013
  • What is Apache HBase? Apache HBase is an open source, horizontally scalable, consistent, low latency, random access data store built on top of Apache Hadoop10 2/26/13 Strata Conference 2013
  • HBase is Open Source • Apache 2.0 License • A Community project with committers and contributors from diverse organizations • Facebook, Cloudera, Salesforce.com, Huawei, TrendMicro, eBay, HortonWorks, Intel, Twitter … • Code license means anyone can modify and use the code.11 2/26/13 Strata Conference 2013
  • HBase is Horizontally Scalable 600 (IOPs/Storage/Throughput) • Adding more servers linearly 500 Performance 400 increases performance and 300 capacity 200 100 • Storage capacity 0 # of servers • Input/output operations • Largest cluster: ~1000 nodes, ~1PB • Store and access data on • Most clusters: 10-40 nodes, 1-1000’s commodity 100GB-4TB servers12 2/26/13 Strata Conference 2013
  • Commodity Servers (circa 2012) • 12x 1TB hard disks in a JBOD (Just a Bunch Of Disks) configuration • 2 quad core CPUs, 2+GHz • 24-96GBs of RAM (96GBs if you’re considering HBase w/ MR) • 2x 1Gigabit Ethernet • $5k-10k / machine13 2/26/13 Strata Conference 2013
  • What is Apache HBase? Apache HBase is an open source, horizontally scalable, consistent, low latency, random access big-data store built on top of Apache Hadoop14 2/26/13 Strata Conference 2013
  • What is Apache HBase? Apache HBase is an open source, horizontally scalable, consistent, low latency, random access big-data store built on top of Apache Hadoop15 2/26/13 Strata Conference 2013
  • HBase is Consistent • Brewer’s CAP theorem Consistency • Consistency: 5 4 • DB-style ACID guarantees on 3 rows 2 • Availability: 1 0 • Favor recovering from faults over returning stale data Partition tolerance Availability • Partition Tolerance: • If a node goes down, the system continues.16 2/26/13 Strata Conference 2013
  • HBase depends on Apache Hadoop Apache Hadoop is an open source, horizontally scalable system for reliably storing and processing massive amounts of data across many commodity servers.17 2/26/13 Strata Conference 2013
  • HBase Depedencies • Apache Hadoop HDFS for data durability and reliability (Write-Ahead Log) • Apache ZooKeeper for distributed coordination App MR • Apache Hadoop MapReduce support built-in support for running MapReduce jobs ZK HDFS18 2/26/13 Strata Conference 2013
  • HBase On a Cluster HDFS NameNodes ZooKeeper Slave Boxes (DN + RS) HBase Masters Quorum Rack 1 Name node Rack 2 Name node19 2/26/13 Strata Conference 2013
  • Do I need an HBase or some other “NoSQL” data store?20 2/26/13 Strata Conference 2013
  • Did you try scaling your RDBMS vertically? => • Upgrading to a beefier machines could be quick • (upgrade that m1.large to a m2.4xlarge) • This is a good idea. • What if this isn’t enough?21 2/26/13 Strata Conference 2013
  • Changed your RDBMS schema and queries? • Remove text search queries (LIKE) • Remove joins • Joins due to Normalization require expensive seeks • Remove foreign keys and encode your own relations • Avoid constraint checks • Just put all parts of a query in a single table. • Lots of Full table scans? • Good time for Hadoop. • Time to consider HBase22 2/26/13 Strata Conference 2013
  • Need to scale RDBMS reads? • Using DB replication to make more copies to read from • Use Memcached • Assumes 80/20 read to write ratio, this works reasonably well if can tolerate replication lag. • Unfortunately, eventually you may need more writes. • Replication has diminishing returns with more writes.23 2/26/13 Strata Conference 2013
  • Need to scale RDBMS writes? • Let’s shard and federate the DB • Loses consistency, order of operations. • HA introduces operational complexity • This is definitely a good time to consider HBase24 2/26/13 Strata Conference 2013
  • We “optimized the DB” by discarding some fundamental SQL/relational databases features.25 2/26/13 Strata Conference 2013
  • Outline • Enter Apache HBase • The HBase Data Model • System Architecture • Real-World Applications • Effective Application Schemas • Conclusions26 2/26/13 Strata Conference 2013
  • The HBase Data Model Rows and columns, gets and puts27 2/26/13 Strata Conference 2013
  • Sorted Map Datastore • It is a big table 0000000000 • Tables consist of rows, each of which has 1111111111 2222222222 a primary row key 3333333333 • Each row has a set of columns 4444444444 5555555555 • Rows are stored in sorted order 6666666666 777777777728 2/26/13 Strata Conference 2013
  • Sorted Map Datastore (logical view as “records”) Row key info: height info:state roles:hadoop roles:hbase cutting ‘9ft’ ‘CA’ ‘Founder’ tlipcon ‘5ft7’ ‘CA’ ‘PMC’ ‘Committer’ @ts=2011 ‘Committer’ @ts=201029 2/26/13 Strata Conference 2013
  • Sorted Map Datastore (logical view as “records”) Implicit PRIMARY KEY in RDBMS terms Row key info: height info:state roles:hadoop roles:hbase cutting ‘9ft’ ‘CA’ ‘Founder’ tlipcon ‘5ft7’ ‘CA’ ‘PMC’ ‘Committer’ @ts=2011 ‘Committer’ @ts=201030 2/26/13 Strata Conference 2013
  • Sorted Map Datastore (logical view as “records”) Implicit PRIMARY KEY in Data is all byte[] in HBase RDBMS terms Row key info: height info:state roles:hadoop roles:hbase cutting ‘9ft’ ‘CA’ ‘Founder’ tlipcon ‘5ft7’ ‘CA’ ‘PMC’ ‘Committer’ @ts=2011 ‘Committer’ @ts=201031 2/26/13 Strata Conference 2013
  • Sorted Map Datastore (logical view as “records”) Implicit PRIMARY KEY in Data is all byte[] in HBase RDBMS terms Row key info: height info:state roles:hadoop roles:hbase cutting ‘9ft’ ‘CA’ ‘Founder’ tlipcon ‘5ft7’ ‘CA’ ‘PMC’ ‘Committer’ @ts=2011A single cell mighthave different ‘Committer’values at different @ts=2010timestamps 32 2/26/13 Strata Conference 2013
  • Sorted Map Datastore (logical view as “records”) Implicit PRIMARY KEY in Data is all byte[] in HBase RDBMS terms Row key info: height info:state roles:hadoop roles:hbase cutting ‘9ft’ ‘CA’ ‘Founder’ Different rows may have different sets of tlipcon ‘5ft7’ ‘CA’ ‘PMC’ ‘Committer’ columns(table is @ts=2011 sparse)A single cell mighthave different ‘Committer’values at different @ts=2010timestamps 33 2/26/13 Strata Conference 2013
  • Sorted Map Datastore (logical view as “records”) Implicit PRIMARY KEY in Column format family:qualifier Data is all byte[] in HBase RDBMS terms Row key info: height info:state roles:hadoop roles:hbase cutting ‘9ft’ ‘CA’ ‘Founder’ Different rows may have different sets of tlipcon ‘5ft7’ ‘CA’ ‘PMC’ ‘Committer’ columns(table is @ts=2011 sparse)A single cell mighthave different ‘Committer’values at different @ts=2010timestamps 34 2/26/13 Strata Conference 2013
  • Anatomy of a Row • Each row has a primary key • Lexicographically sorted byte[] • Timestamp associated for keeping multiple versions of data (MVCC for consistency) • Row is made up of columns. • Each (row,column) referred to as a Cell • Contents of a cell are all byte*+’s. • Apps must “know” types and handle them. • Rows are Strongly consistent35 2/26/13 Strata Conference 2013
  • Access HBase data via an API • Data operations • Get • Put • Scan • CheckAndPut • Delete • DDL operations • Create • Alter • Enable/Disable • Access via HBase shell, Java API, REST proxy36 2/26/13 Strata Conference 2013
  • Example in Hbase Shell37 2/26/13 Strata Conference 2013
  • Shell: Create a table, add, and read a row38 2/26/13 Strata Conference 2013
  • Java API byte[] row = Bytes.toBytes(“row”); byte[] col = Bytes.toBytes(“cf1”); byte[] putVal = Bytes.toBytes(“your boat”); Configuration config = HBaseConfiguration.create(); HTable table = new HTable(config, “table”); Put p = new Put(row); p.add(col, putVal) table.put(p); Get g = new Get(row); Result r = table.get(g); byte[] getVal = r.getValue(col); assertEquals(putVal, getVal);39 2/26/13 Strata Conference 2013
  • REST Server • bin/hbase rest start –p 8070 • Browse: • curl –H “Accept: application/json/” http://localhost:8070/... • http://localhost:8070/ tables • http://localhost:8070/people/schema schema • http://localhost:8070/people/tlipcon row • More info here • http://wiki.apache.org/hadoop/Hbase/Stargate40 2/26/13 Strata Conference 2013
  • Simple API. What is the catch? With great power comes great responsibility.41 2/26/13 Strata Conference 2013
  • Cost Transparency • Goal: Predictable latency of random read and write operations. • Now you have to understand some of the physical layout of your datastore. • Efficiencies are based on Locality and your schema. • Need to understand some physical concepts: • Column Families • Sparse Columns • Regions • Row Key • Your schema needs to consider these.42 2/26/13 Strata Conference 2013
  • Column Families 0000000000 1111111111 2222222222 • A Column family is a set of 3333333333 4444444444 related columns. 5555555555 6666666666 • Group sets of columns that 7777777777 have similar access patterns • Select parameters to tune 0000000000 0000000000 read performance per column 1111111111 1111111111 family 2222222222 2222222222 3333333333 3333333333 4444444444 4444444444 5555555555 5555555555 6666666666 6666666666 7777777777 777777777743 2/26/13 Strata Conference 2013
  • Physical Storage of Columns Families info Column Family Row key Column key Timestamp Cell valueEach column cutting info:height 1273516197868 9ftfamily is cutting info:state 1043871824184 CAcontained in tlipcon info:height 1273878447049 5ft7its own file. tlipcon info:state 1273616297446 CA roles Column Family Row key Column key Timestamp Cell value cutting roles:ASF 1273871823022 Director Sorted cutting roles:Hadoop 1183746289103 Founder on disk by Row key, Col tlipcon roles:Hadoop 1300062064923 PMC key, tlipcon roles:Hadoop 1293388212294 Committer descending tlipcon roles:Hive 1273616297446 Contributor timestamp Milliseconds since unix epoch44 2/26/13 Strata Conference 2013
  • Tuning Column Families 0000000000 1111111111 2222222222 • Good for tuning read performance 3333333333 • Store related data together for 4444444444 better compression 5555555555 6666666666 • Avoid polluting cache from 7777777777 another • Derived data can have different retention policies 0000000000 0000000000 1111111111 1111111111 • Column family parameters 2222222222 2222222222 • Block Compression (none, gzip, LZO, 3333333333 3333333333 Snappy) 4444444444 4444444444 • Version retention policies 5555555555 5555555555 • Cache priority 6666666666 6666666666 7777777777 777777777745 2/26/13 Strata Conference 2013
  • Sparse Columns 0000000000 1111111111 2222222222 3333333333 • Sparseness provides schema 4444444444 flexibility 5555555555 6666666666 • Add columns later, no need to 7777777777 transform entire schema • If you find yourself adding 0000000000 0000000000 columns to your db, HBase is 1111111111 1111111111 a good model. 2222222222 3333333333 2222222222 3333333333 4444444444 4444444444 5555555555 6666666666 7777777777 777777777746 2/26/13 Strata Conference 2013
  • Sparse Columns 0000000000 1111111111 2222222222 3333333333 • Sparseness can improve 4444444444 performance 5555555555 6666666666 • Null columns don’t take 7777777777 space. You don’t need to read what is not there. 0000000000 0000000000 • If you have a traditional db 1111111111 1111111111 table with lots of nulls, your 2222222222 3333333333 2222222222 3333333333 data will probably fit well! 4444444444 4444444444 5555555555 6666666666 7777777777 777777777747 2/26/13 Strata Conference 2013
  • Horizontal Scaling - Regions • Tables are divided into sets of rows called regions • Scale read and write capacity by spreading across many regions. 0000000000 1111111111 0000000000 2222222222 1111111111 2222222222 3333333333 3333333333 4444444444 4444444444 5555555555 5555555555 6666666666 6666666666 7777777777 777777777748 2/26/13 Strata Conference 2013
  • Regions: Tradeoffs • Easier to scale cluster capacity • Auto sharding and load balancing capability • Greater throughput and storage capacity • Horizontal scalability of writes and reads and storage • Enough consistency for many applications • Per row ACID guarantees • No built-in atomic multi-row operations • No built-in consistent secondary indices • No built-in global time ordering49 2/26/13 Strata Conference 2013
  • SQL + HBase • No built-in SQL query language and query optimizer • There is work on integration Apache Hive (SQL-like query language) • Currently not the optimal, x5 slower than normal Hive+HDFS • Apache Sqoop and HBase Integration • Copy RDMS tables from database to Hbase • Copy HBase Tables into RDMS • (there is some impedance mismatch) • 2 hbase tables as to a primary/secondary tables in rdbms • Impala integration: Work is on progress.50 2/26/13 Strata Conference 2013
  • HBase vs RDBMS RDBMS HBase Data layout Row-oriented Column-family-oriented Transactions Multi-row ACID Single row only Query language SQL get/put/scan/etc * Security Authentication/Authorization Authentication / Column qualifier level Authorization Indexes On arbitrary columns Row-key only* Max data size TBs ~1PB Read/write throughput 1000s queries/second Millions of “queries”/second limits51 2/26/13 Strata Conference 2013
  • Outline • Enter Apache HBase • The HBase Data Model • System Architecture • Real-World Applications • Effective Application Schemas • Conclusions52 2/26/13 Strata Conference 2013
  • System Architecture Clusters, Nodes, and Dependencies HBase Internals: MemStore-flush, Compaction53 2/26/13 Strata Conference 2013
  • A Typical Look... • 5-4000 commodity servers (8-core, 48 GB RAM, 12-24 TB, 1 gig-E) • 2-level network architecture • 20-40 nodes per rack54 2/26/13 Strata Conference 2013
  • HBase Processes • HDFS App MR • HBase • Name node • Master • Data node • Region Server • ZooKeeper ZK HDFS • Quorum Peer55 2/26/13 Strata Conference 2013
  • HDFS Nodes (physically) HDFS NameNodes ZooKeeper Slave Boxes (DN) Quorum Rack 1 Name node Rack 2 Name node56 2/26/13 Strata Conference 2013
  • HDFS Name Node • Stores file system metadata on disk and in memory • Directory structures, permissions • Modifications stored as an edit log • Fault tolerant and Highly Available57 2/26/13 Strata Conference 2013
  • HDFS Data nodes • HDFS splits the files into 128MB (or 256MB) blocks Client Write ACK • Data nodes store and serve these Blocks Data • By default, pipeline writes to 3 different machines. Write node ACK • By default, local machine, machines on other racks. • Locality helps significantly on subsequent reads and Data node computation scheduling. Write ACK Data node58 2/26/13 Strata Conference 2013
  • HDFS Nodes (physically) HDFS NameNodes ZooKeeper Slave Boxes (DN) Quorum Rack 1 Name node Rack 2 Name node59 2/26/13 Strata Conference 2013
  • HBase + HDFS Nodes (physically) HDFS NameNodes ZooKeeper Slave Boxes (DN + RS) HBase Masters Quorum Rack 1 Name node Rack 2 Name node60 2/26/13 Strata Conference 2013
  • HMaster and ZooKeeper • HMaster • Controls which Regions are served by which Region Servers. • Assigns regions to new region servers when they arrive or fail. • Standby master becomes the active master if original master goes down. • Transitions are coordinated by ZooKeeper • Apache ZooKeeper • Highly Available System for coordination. • Generally 3 or 5 machines (always an odd number) • Uses consensus to guarantee common shared state. • Writes are considered expensive61 2/26/13 Strata Conference 2013
  • Region Server • Tables are chopped up into regions • A region is only served by a single “region server” at a time. • Region Server can serve multiple regions • Automatic load balancing if region server goes down. • Co-locate region servers with data nodes. • Takes advantage of HDFS file locality62 2/26/13 Strata Conference 2013
  • HBase + HDFS Nodes (physically) HDFS NameNodes ZooKeeper Slave Boxes (DN + RS) HBase Masters Quorum Rack 1 Name node Rack 2 Name node63 2/26/13 Strata Conference 2013
  • HBase + HDFS: No SPOF HDFS NameNodes ZooKeeper Slave Boxes (DN + RS) HBase Masters Quorum Rack 1 Name node Rack 2 Name node64 2/26/13 Strata Conference 2013
  • HBase Write Path: MemStore-flush/Compaction/Split • Region level background processes • Region? • a subset ([start-rowkey, end-rowkey)) of a table • Host data (all Column Families) for that range • A Column Family • has a Memstore (in-memory, sorted map); contains last modified rows, and • HFile: on disk data file; created after flushing a Memstore65 2/26/13 Strata Conference 2013
  • HBase Write PathPut Region Server HLog Put Put Put Put Del … Server HRegion Store Store MemStore MemStore Put Put Del Put Del Store Store Store Store HFile File HFile File File File66 2/26/13 Strata Conference 2013
  • HBase Write Path: Flush Region Server HLog Put Put Put Put Del … Server All column families HRegion are flushed on flush Store Store call MemStore MemStore Flush all MemStores Put Put Del Put Del in region to store files Store Store Store Store HFile File HFile File File File67 2/26/13 Strata Conference 2013
  • HBase Write Path: Flush Region Server HLog Put Put Put Put Del … Server HRegion Store Store MemStore MemStore Put Store Store Put Store Store Del HFile File Del HFile File File File68 2/26/13 Strata Conference 2013
  • There are too many HFiles HBase Write Path: Compaction and a Read can be inefficient. Lets combine them. HRegion Store Store MemStore MemStore HFile HFile HFile HFile Hfile HFile Hfile HFile69 2/26/13 Strata Conference 2013
  • HBase Write Path: Compaction HRegion Store Store MemStore MemStore HFile HFile70 2/26/13 Strata Conference 2013
  • HBase Write Path: Region Splitting • After Compaction, a region size may change • In case it becomes too large*, it splits into two child regions.71 2/26/13 Strata Conference 2013
  • HBase Write Path: Region Splitting This region has way more data than other regions. Load is now imbalanced! RegionServer RegionServer HRegion HRegion HRegion
  • HBase Write Path: Region Splitting Let’s Split this region into two and the share the load RegionServer RegionServer HRegion HRegion HRegion
  • HBase Write Path: Region Splitting RegionServer RegionServer HRegion HRegion A HRegion HRegionB
  • HBase Write Path: Region Balancing Still too much load on this RS RegionServer RegionServer HRegion HRegion A HRegion HRegionB
  • HBase Write Path: Region Balancing RegionServer RegionServer HRegion HRegion A HRegion HRegionB
  • Outline • Enter Apache HBase • The HBase Data Model • System Architecture • Real-World Applications • Effective Application Schemas • Conclusions77 2/26/13 Strata Conference 2013
  • Real World Applications How and where this infrastructure is being used78 2/26/13 Strata Conference 2013
  • HBase Application Architecture Web Application Cluster boundary Thrift/REST Thrift/REST Gateway … Gateway HBase Client HBase Client Analysis: Analysis: Analysis: Analysis: Analysis: MapReduce Analysis: MapReduce Bulk Import: MapReduce Analysis: MapReduce MapReduce MapReduce MapReduce HBase Client MapReduce HBase Client HBase Client HBase Client HBase Client HBase Client HBase Client HDFS HBase Client79 2/26/13 Strata Conference 2013
  • Example Apache HBase Applications • Web Application Backends • Inboxes: Facebook Messages, Tumblr • Catalogs: Intuit Mint Merchant DB, Gap Inc. Clothing Database, OLCL (world library catalog) • URL Shortener: StumbleUpon http://su.pr, • Search Index: eBay Cassini, PhotoBucket, YapMap • Massive datastore for Analysis • Mozilla Socorro (crash report DB), Yahoo! Web Crawl Cache • Mignify / Internet Memory project • Monitoring Real-time Analytics • OpenTSDB, Sproxil, Sematext More Info at http://www.hbasecon.com/agenda/80 2/26/13 Strata Conference 2013
  • HBase Web Application Web Application Cluster boundary Thrift/REST Thrift/REST Gateway … Gateway HBase Client HBase Client Analysis: Analysis: Analysis: Analysis: Analysis: MapReduce Analysis: MapReduce Bulk Import: MapReduce Analysis: MapReduce MapReduce MapReduce MapReduce HBase Client MapReduce HBase Client HBase Client HBase Client HBase Client HBase Client HBase Client HDFS HBase Client81 2/26/13 Strata Conference 2013
  • RDBMS: Data-centric schema design • Entity relational model. • Design schema in “Normalized form” • Figure out your queries • DBA to sets primary secondary keys once query is known • Issues: • Join latency and cost can be difficult to predict • Difficult/Expensive to change schema / add columns82 2/26/13 Strata Conference 2013
  • HBase: Query-centric schema design • Know your queries then design your schema • Column-family oriented • Create these by knowing fields needed by queries • Its better to have a fewer than many • App developers optimize the queries, not DBAs • If you’ve done the relational DB query optimizations, you are mostly there already!83 2/26/13 Strata Conference 2013
  • Url Shortener Service Lookup hash, track click, and forward to full url Enter new long url, generate store to users’ mapping to short url Look up all of a users Track historical click counts shortened urls and display over time84 2/26/13 Strata Conference 2013
  • Url Shortener schema • All queries have at least one join • Constraints when adding new urls, and short urls. • How do we delete users?85 2/26/13 Strata Conference 2013
  • Url Shortener schema 1. Lookup Hash via index 2. Join id to get full 3. Join to click table url to update click info • All queries have at least one join • Constraints when adding new urls, and short urls. • How do we delete users?86 2/26/13 Strata Conference 2013
  • Url Shortener HBase schema 2. Put to update click metrics • All single queries 1. Get url from url hash • Use compression settings on content column families. • Using rowkey to group all of a user’s shortened urls • Consistency not guaranteed between tables.87 2/26/13 Strata Conference 2013
  • Facebook Messages (as of 12/10) • 15Bn/month message email, 1k = 14TB • 120Bn /month, 100 bytes = 11TB Create a new Keyword search of message/conversation messages Show full List most recent conversation conversations88 2/26/13 Strata Conference 2013
  • HBase Analysis Application Architecture Web Application Cluster boundary Thrift/REST Thrift/REST Gateway … Gateway HBase Client HBase Client Analysis: Analysis: Analysis: Analysis: Analysis: MapReduce Analysis: MapReduce Bulk Import: MapReduce Analysis: MapReduce MapReduce MapReduce MapReduce HBase Client MapReduce HBase Client HBase Client HBase Client HBase Client HBase Client HBase Client HDFS HBase Client89 2/26/13 Strata Conference 2013
  • Example: Web Tables • Goal: Manage web crawls and its data by keeping snapshots of the web. • Google used BigTable for Web table example • Yahoo uses HBase for Web crawl cache Full scan applications Random access applications HDFS90 2/26/13 Strata Conference 2013
  • Hadoop MapReduce and HBase • Use to scalably process data into HBase mapreduce map map map • Create new data sets from raw data. map map reduce map reduce • ETL into DBs/HBase map map reduce • High throughput batch processing • You would not serve live traffic from MR query or directly from HDFS. • Users just write a “map” function and a ZK HDFS “reduce” function.91 2/26/13 Strata Conference 2013
  • MapReduce Processes • Job Tracker • Schedules work and resource usage through out the cluster • Makes sure work gets done • Controls retry, speculative execution, etc. • Task Trackers • These slaves do the “map” and “reduce” work • Co-located with data nodes92 2/26/13 Strata Conference 2013
  • Processing with Map Reduce map sort shuffle input map merge reduce reducer output input map input map reducer output input map93 2/26/13 Strata Conference 2013
  • Processing with Map Reduce With HBase map sort shuffle input map merge reduceRegion [_,C) reducer output input mapRegion [C,M)Region [M,T) input map reducer output input mapRegion [T,_) • One mapper per region / Bulk load output || individual puts94 2/26/13 Strata Conference 2013
  • HBase + HDFS + MR Nodes (physically) HDFS NameNodes ZooKeeper Slave Boxes (DN + RS + TT) HBase Masters Quorum Hadoop MR Job Tracker Rack 1 Name node Rack 2 Name Job node Tracker95 2/26/13 Strata Conference 2013
  • Data Loading Patterns • Random Writes • Bulk import • Uses Put API • Use MapReduce to generate HBase • Simple native files, atomically add metadata. • Low latency • High Latency • Less throughput (more HBase overhead) • High Throughput • Use Case: • Use Case: • Real-time Web apps • Large scale analysis • Real-time serving of data • Generating derived data • ETL • Example: • Examples: • Inbox, url shortener • Delayed Secondary indexes • Building search index • Exploring HBase Schemas96 2/26/13 Strata Conference 2013
  • Schema Design Exploration • Applications optimized by designing the structure of data in HBase • MR, HDFS and HBase complement each other. • Exploration Steps: • Save raw data to HDFS/HBase. • MR for data transformation and ETL-like jobs from raw data. • Use bulk import from MR to HBase. • Serve data from HBase97 2/26/13 Strata Conference 2013
  • HBase vs just HDFS Plain HDFS/MR HBase Abstractions Files + bytes Tables + Rows Write pattern Append-only Random write, bulk incremental Read pattern Full file scan, partition table scan (hive) Random read, small range scan, or table scan Structured storage Do-it-yourself / TSV / SequenceFile / Sparse column-family data model Avro / ? Max data size 30+ PB ~1PBIf you have neither random write nor random read, stick to HDFS!98 2/26/13 Strata Conference 2013
  • Outline • Enter Apache HBase • The HBase Data Model • System Architecture • Real-World Applications • Effective Application Schemas • Conclusions99 2/26/13 Strata Conference 2013
  • Effective Applications Schemas Trade-offs, Schema Design, and Row key design100 2/26/13 Strata Conference 2013
  • Designing Effective Application Schemas • Characterize your application • Understand HBase’s strengths • Understanding Row key selection • Experiement, Measure, and Repeat101 2/26/13 Strata Conference 2013
  • Characterizing your workload • Write heavy? • Read: • Scan heavy? • Random read heavy? • Update heavy? • Sequential vs Random Accesses? • Distribution of rowkey reads? (Uniform or Zipf?)102 2/26/13 Strata Conference 2013
  • HBase’s performance Strengths • Scaling out the number of writes handled • Scaling out the number of reads handled • Read workloads with short scans • Techniques: • Pick the more efficient data arrangement • Minimize # of seeks (file accessed) per operation • Effectively balance work across nodes • Smart Row Key selection103 2/26/13 Strata Conference 2013
  • How should I arrange my data? Tall skinny with • Isomorphic data represnetations! compound rowkey rowkey d: Short Fat Table using column qualifiers jon-col1 aaaa Rowkey d:col1 d:col2 d:col3 d:col4 jon-col2 bbbb jon aaaa bbbb cccc dddd jon-col3 cccc him eeee ffff gggg hhhhh jon-col4 dddd Short Fat Table using column families him-col1 eeee Rowkey col1: col2: col3: col4: him-col2 ffff jon aaaa bbbb cccc dddd him-col3 gggg him eeee ffff gggg hhhhh him-col4 hhhh104 2/26/13 Strata Conference 2013
  • HBase Read path Each CF is a new set of files and requires seeks HBase client Region Server Get Send get to Server HRegion regionserver HStore HStore scanner scanner Each HFile (flush) KV Scanner KV Scanner may require seek Heap Heap105 2/26/13 Strata Conference 2013
  • HBase Read path HBase client Region Server Result Server HRegion HStore HStore Construct result scanner scanner from scan results Blooms filters KV Scanner KV Scanner and scan filters Heap Heap can skip seeks or reduce seeks106 2/26/13 Strata Conference 2013
  • HBase Read path HBase client Region Server Result Server HRegion HStore HStore Return result. Filtered results (a scanner scanner la DB push down predicate) send less data over the wire. KV Scanner KV Scanner Heap Heap107 2/26/13 Strata Conference 2013
  • Performance characteristics of writes randWrite - 10 - (blank) randWrite - 100 - (blank) • Linear cost increase for adding randWrite - 1000 more columns 60000 • At 10,100 bytes per value, 50000 overhead dominates. Throughput (rows/s) • At 1000 bytes per value, IO 40000 dominates 30000 • Cheaper to have fewer columns and big values. 20000 • Write 1MM rows with 1000 10000 bytes per row • Write 1MM rows with 100 0 1 2 4 6 8 10 bytes per row Columns per row108 2/26/13 Strata Conference 2013
  • HBase Schema Rules of Thumb Tall skinny with • More Writes: compound rowkey • Tall skinny tables preferred rowkey d: • Consolidate data into single cols if possible jon-col1 aaaa jon-col2 bbbb • Use Compression (Snappy/GZ/LZO) jon-col3 cccc • More Reads: jon-col4 dddd • Use fewer column families him-col1 eeee • Use bloom filters him-col2 ffff him-col3 gggg • Use filters him-col4 hhhh109 2/26/13 Strata Conference 2013
  • Proper load balancing • Ideally 10-20 regions per region RS RS RS server HRegion • Pre-split your regions • Creating a table by default Default table creation creates one region that splits. • If you know your key RS RS RS distribution presplit so that HRegion HRegion HRegion HRegion HRegion HRegion writes and reads can be load balanced. Pre-split regions110 2/26/13 Strata Conference 2013
  • Proper load balance and Row Key Design • Row Key design for schemas are critical • Make sure key distributes to spread write load. • Take advantage of lexicographic sort order. RS RS RS RS RS RS HRegion HRegion HRegion HRegion HRegion HRegion HRegion HRegion HRegion HRegion HRegion HRegion Poor load balancing due to region splits Good load balancing due to good region splits111 2/26/13 Strata Conference 2013
  • Row key design techniques Row100 Row003 • Numeric Keys and lexicographic sort • Store numbers big-endian. Row3 vs. Row031 • Pad ASCII numbers with 0’s. Row 31 Row100 • Use reversal to have most significant traits first. blog.cloudera.com com.cloudera.blog • Reverse URL. hbase.apache.org vs. com.strataconf • Reverse timestamp to get most recent first. strataconf.com org.apache.hbase • (MAX_LONG - ts) so “time” gets monotonically smaller. • Use composite keys to make key distribute nicely and work well with sub-scans • Ex: User-ReverseTimeStamp • Do not use current timestamp as first part of row key!112 2/26/13 Strata Conference 2013
  • Row Key exercise: Web Table • Crawler continuously updating link and pages • Want to track individual pages over time • Want to group related pages from same site • Want to calculate PageRank (links and backlinks) • Want to do build a search index • Want to do ad-hoc analytical queries on page content113 2/26/13 Strata Conference 2013
  • Google web table schema Column family:qualifier Uses of sparse column names in link col fam 2/1/2013 Row key content: link:com.cloudera link:com.twitte link:gov.uspto.w link:org.apache.h .www r.www w base 2/1/2012 com.cloudera.archive <html>…. CDH4 Homepage @2935290495 Com.cloudera.archive <html>…. CDH3 Home Page @2966912895Compositekey : com.cloudera.blog@ <html>…. Downloads Tweet me! VendorsReversed 2935280495Urls ‘@’ com.cloudera.www@ <html>…. Home @cloudera Patents Open Source(MAXINT 2935280495 #strataconf– millis forunix ts) gov.uspto.www@293 <html>… 5280495 org.apache.hbase@2 <html>…. Open Source HBase 935280495114 2/26/13 Strata Conference 2013
  • Key Take-aways • Denormalized schema localized data for single lookups. • Rowkey is critical for lookups and subset scans. • Make sure when writing row keys are distributed. • Use Bulk loads and Map Reduce to re-organize or change you schema (during down time). • Multiple clusters for different workloads if you can afford it.115 2/26/13 Strata Conference 2013
  • Performance benchmarks suites • Hbase’s Performance Evaluation • Test some basic hbase workloads (random/sequential reads/writes) hbase org.apache.hadoop.hbase.PerformanceEvaluation • Yahoo!’s Cloud Serving Benchmark (YCSB) • http://research.yahoo.com/Web_Information_Management/YCSB • Different query and write distributions: Uniform and Zipf • Records latencies and throughput. java -classpath `hbase classpath` com.yahoo.ycsb.Client -db com.yahoo.ycsb.db.HBaseClient116 2/26/13 Strata Conference 2013
  • Outline • Enter Apache HBase • The HBase Data Model • System Architecture • Real-World Applications • Effective Application Schemas • Conclusions117 2/26/13 Strata Conference 2013
  • Conclusions118 2/26/13 Strata Conference 2013
  • Key takeaways • Apache HBase is not a RDBMS! There are other scalable databases. • Query-centric schema design, not data-centric schema design. • In production at 100’s of TB scale at several large enterprises. • If you are restructuring your SQL DB to scale it, you may be a candidate for HBase. • HBase complements and depends upon Hadoop. • New features focused for enterprise needs.119 2/26/13 Strata Conference 2013
  • HBase vs other “NoSQL” • Favors Strong Consistency over Availability (but availability is good in Consistency practice!) 5 4 • Great Hadoop integration (very efficient 3 2 bulk loads, MapReduce analysis) 1 0 • Ordered range partitions (not hash) Partition tolerance Availability • Automatically shards/scales (just turn on more servers, really proven at petabyte Hbase RDBMS Dynamo scale) • Sparse column storage (not key-value)120 2/26/13 Strata Conference 2013
  • HBase vs just HDFS Plain HDFS/MR HBase Write pattern Append-only Random write, bulk incremental Read pattern Full table scan, partition table scan Random read, small range scan, or table scan Structured storage Do-it-yourself / TSV / SequenceFile / Sparse column-family data model Avro / ? Max data size 30+ PB ~1PBIf you have neither random write nor random read, stick to HDFS!121 2/26/13 Strata Conference 2013
  • Deployment and Operations • HBase interacts with many systems. • This is powerful – • Each component excels at a particular function • Live and in production at top 100 websites • This can be challenging– • Many knobs to tune • Isolated instances to make managing easier • Monitor monitor monitor122 2/26/13 Strata Conference 2013
  • More resources? • Download Hbase (and Hadoop)! • CDH - Cloudera’s Distribution including Apache Hadoop http://cloudera.com/ • http://hbase.apache.org/ • Try it out! (Locally, VM, or EC2)123 2/26/13 Strata Conference 2013
  • 124 2/26/13 Strata Conference 2013
  • The Hadoop Big Data Stack UI Framework Workflow and Scheduling Metadata Languages Data Fast Read / Kernel Integration Write access File System Distributed Coordination • Hadoop is the core of the Stack. • The kernel of a cluster operating system • Each “friend” is a distributed service that has a similar workstation tool.125 2/26/13 Strata Conference 2013
  • CDH4’s Hadoop stack UI Framework Workflow and Scheduling Metadata Languages Data Fast Read / Kernel Integration Write access File System Distributed Coordination • Storage : HDFS, HBASE • Processing : MR, MR2, Pig, Hive, Mahout, Oozie • Data Integration: Flume, Sqoop • Coordination : ZooKeeper, Avro, Bigtop, (Hive), Hue126 2/26/13 Strata Conference 2013
  • Rules for Applications • HBase isn’t really worth the effort until you have 5-10 machines. • No Performance Isolation guarantees in HBase (yet) • Start conservative • Isolate applications and workloads by having more clusters. • If scale warrants, separate real-time applications into separate HBases. • Separate your Batch MR workloads from your realtime workloads if possible.127 2/26/13 Strata Conference 2013
  • Distribution Options • Use Tarballs • Build your own deployment or use puppet/chef • Apache Whirr for EC2 • Use Packages (RPMs/Debs) • Apache Bigtop (Hadoop stack integration and packaging) • Use Vendor Packages • Cloudera’s Distribution including Apache Hadoop (CDH) • Cloudera Manager Free Edition • HortonWorks Data Platform128 2/26/13 Strata Conference 2013
  • Origin of Apache Hadoop and CDH Hadoop wins Terabyte sort Releases benchmark CDH3 and Cloudera Open Source, Enterprise Publishes MapReduce Runs 4,000 Releases MapReduce, Releases & HDFS Node Hadoop CDH4 and GFS Paper CDH2 project Cluster Cloudera created by Releases Enterprise 4 Doug Cutting CDH12004 2005 2006 2007 2008 2009 2010 2011 2012129 2/26/13 Strata Conference 2013
  • Predictable Release Schedule • Regular release and updates • Skip updates without penalty • Compatibility policy • Only major releases break compatibility • Updates can include new features • Updates include fixes ..CDH2 CDH2 EOL2 CDH3b3 CDH3b4 CDH3u0 CDH3u1 CDH3u2 CDH3u3 CDH3u4 CDH3u5 CDH3u6 CDH3u7 CDH4b1 CDH4b2 CDH4.0 CDH4.1 CDH4.2 CDH4.3 CDH4.5 GA! b CDH5b1 CDH5b2 CDH5.0 GA! Q3 2010 Q1 2011 Q3 2011 Q1 2011 Q3 2011 Q1 2012 ... The Future!010 Q4 2010 Q2 2011 Q4 2011 Q2 2011 Q4 2011 Q2 2012 … GA! 130 2/26/13 Strata Conference 2013
  • Open source methodology trunk Branch 0.94 • What’s different from Apache? CDH 0.92.1+3 • All components have code committed upstream first then backports code to CDH on top of a “pristine apache release” • All patches available in tarballs Apache Release 0.92.1131 2/26/13 Strata Conference 2013
  • Install HBase on your laptop • Download and untar wget http://apache.osuosl.org/hbase/hbase- 0.94.5/hbase-0.94.5.tar.gz tar xvfz hbase-0.94.5.tar.gz cd hbase-0.94.5 bin/start-hbase.sh • Verify: bin/hbase shell Browse http://localhost:60010132 2/26/13 Strata Conference 2013