Michael stack -the state of apache h base
Upcoming SlideShare
Loading in...5

Michael stack -the state of apache h base



BDTC 2013 Beijing China

BDTC 2013 Beijing China



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Michael stack -the state of apache h base Michael stack -the state of apache h base Presentation Transcript

  • The State of Apache HBase Michael Stack
  • stack@apache.org PMC* Chair of Apache HBase Project ● Caretaker/Janitor ● Member of the Hadoop PMC ● Engineer at Cloudera in San Francisco ● * Project Management Committee
  • Table of Contents: What is HBase? ● Who uses it? ● Who runs the project? ● HBase Today ● Tomorrow ● Ecosystem ● View slide
  • HBase is... ”...an open source, distributed, scalable, consistent, low latency, non-relational, random access database” View slide
  • Built on Apache Hadoop ● Hadoop core is: – Distributed file system (HDFS) – MapReduce ● HBase persists all data to HDFS ● Uses Apache ZooKeeper – Cluster coordination
  • Project Goal: “Billions of rows X millions of columns on clusters of 'commodity hardware'” http://www.flickr.com/photos/ag_gilmore/8170021483/in/photostream/
  • Inspiration A Google Technology described in a 2006 paper, Bigtable: A Distributed Storage System for Structured Da ta by Chang et al.?
  • First commit... commit 454a9dbe046194f8eef3dddc3e5942910dd5b7a1 Author: Douglass Cutting <cutting@apache.org> Date: Tue Apr 3 20:34:28 2007 +0000 HADOOP-1045. Add contrib/hbase, a BigTable-like online database.
  • When to use it?
  • BIG data
  • s ca lE!
  • Low-latency, online, random read/writes + “Simple” access patterns
  • Datamodel* *Like Google Bigtable model only different nomenclature
  • DataModel: A Bigtable! 0-N Bigtable(s) ● Bigtable has: ● Rows x Column Families ● Rows have primary key ● Column Families have: ● Any number of Columns ● By access/attributes ● CF prefix and qualifier ● ● e.g. attribute:mimetype = Cell @ bigtable 'A', row key 'p', CF 'B:red' Bigtable A Row Key a b c d e f g h i j k l m n o p q r s t u v w x Colum n Family A Column Family B
  • Datamodel: Regions Bigtable splits into “regions” ● Automatically as table grows ● Region has contiguous rows ● Known by [startRow, endRow) a ● Distributed over cluster b Region a-e c d ● 0-100s per server e ● Region e-j f g h i j Region k-o k l m n o Etc.
  • DataModel: Sorted & Versioned All is byte [] ● No native 'types' ● Minor schema or schema-less (NoSQL) ● All is SORTED ● Rows in byte-lexicographical order ● Columns sorted along row ● VERSIONED ab ● Cells are “versioned” bc c ab c d bc cdde ● 3D (timestamp) ab c d e e b c de ● Region a-e c d cdd e e dee e 3D
  • Datamodel: Strongly consistent ● Favors consistency over availability “Designing applications to cope with concurrency anomalies in their data is very error-prone, timeconsuming, and ultimately not worth the performance gains” -- F1: A Distributed SQL Database That Scales ● Row modifications are atomic ● Even if thousands of columns on a row
  • Datamodel: in short ”...a sparse, distributed, persistent multidimensional sorted map” – Bigtable Paper (2006) (Table,  Row, ColumnFamily, Qualifer, Timestamp)   → Value
  • Architecture: Birds-eye view Application M apR educe Im pala Thrift/REST G atew ay H Base Java C lient ZooKeeper HBase M aster H Base R egionServer H DFS
  • Features •Classes to MapReduce HBase tables – HIVE, PIG, etc. •Query predicate push down via server side filters •Coprocessors (stored procedures/triggers) – e.g. security, secondary indices •Java clients – REST and thrift too •Extensible jruby-based (JIRB) shell •Replication •Security – Table/Column Family – Kerberos Authentication, ACLs
  • API ● ● ● ● ● ● ● ● ● get put delete multi scan increment append checkAnd* MapReduce
  • What to expect • Writes: – 1-3ms, 1k-20k writes/sec per node • Reads: – 0-3ms cached, 10-30ms disk – 10-40k reads / second / node from cache – > if SSD • Cell size • 0-3MB preferred • Column-orientated so wide tables are OK • Sparsely populated rows OK
  • Who uses it?
  • In Production
  • ● ● ● ● OLTP & Batch Messages ○ 1B+ users ○ Tens of PBs (compressed) ○ Thousands of machines, Pods of ~200 ODS/Real-time monitoring/Timeseries ○ Metrics from every server @ FB ○ 2.5B writes/16k reads per minute Post Search Store ○ ○ MapReduce to build index 1 Trillion posts
  • ● ● ● ● All on AWS 5 production clusters and growing Mix of SSD and SATA Billions of page views per month
  • ● Long time HBase user ● Two clusters of 1k nodes each ○ Master-Master replicating ● Separate low-latency cluster ○ Up to 1M reads a second
  • Cassini ● ● ● ● ● ● ● Ebay item search indexing 600M active items in HBase tables 1.4TB of data processed each day 400M puts to HBase each day 250M search metrics per day Two datacenters Growing clusters... – 500->1k
  • Deploy types • • • • Multitenant multifarious feature store o a.k.a dumping ground o Stumbleupon, Y!, SalesForce Reconciliation store o ebay Timeseries o SalesForce, FB ODS Lots-o-entities store o Flurry, genome o Lots-o-entities BLOBs, FB Messages
  • Who runs the project?
  • Diverse team* COMMITTERS! Preferably ALIVE! * http://hbase.apache.org/team-list.html
  • Dev Rate
  • # of commits Total Files 2021 Total Lines of Code 832122 Total Commits 6615 (~ 3/day) Authors 39 (https://www.ohloh.net/p/hbase)
  • JIRA: 2008-2013
  • Commits/Month Over Time (0.94/trunk)
  • HBase Today
  • • Release every month • Each more stable • & more performant • Some features… • • Currently at 0.94.13 Wire compatible between releases
  • http://www.flickr.com/photos/sysli/3026288256/sizes/o/in/photostream/
  • hbase-0.96.0 – Released October 19th, 2013 – 18months in the making ● >2000 fixes ●
  • Big Themes ● ● ● ● Stability Operability – Insight, tools Scalability Evolvability
  • Sampler ● Pluggable Compaction – Smarter triggers ● Hadoop1 AND Hadoop2 ● Smarter Region Balancer ● Region Assignment & Replication – ● Hardened Coprocessors – More hooks
  • http://www.flickr.com/photos/allspaw/5815258929/sizes/o/in/photostream/
  • http://www.flickr.com/photos/38595542@N02/3690830720/sizes/o/in/photostream/
  • • System tables • Filesystem • Up in zookeeper • Over the wire
  • Snapshots • • • By Table oSnapshot, clone, restore, export Inexpensive oJust metadata Good for... oBackups oReplication oOffline processing
  • Namespaces • Grouping of tables – Like database in mysql • System/User hbase:meta Quota Coming – Security by namespace – Grouping on cluster by namespace – • •
  • And more... • X-row (in-region) Transactions • Query tracing • New UI • Online Region Merge • Client-side types • Metrics2 o Radical revamp • Windows!
  • • Branched, released soon • Rolling upgrade from 0.96.0 • In-line Cell-tags – Security++ ACL down to the Cell-level ● Cell-level visibility labels ● Encryption Reverse Scan ● •
  • HBase 2014 HBase 1.0.0 th ●Reining in the 99 percentiles ●Multi-WAL ●Speculative replica reads ●More support for multi-tenancy ●Off-heap ●
  • Ecosystem
  • OpenTSDB Timeseries ● Store, index and serve metrics at large scale ● Make data easily accessible and graphable ●
  • Haeinsa Haeinsa 란 무엇인가 ? Is a linearly scalable multi-row, multi-table transaction library for HBase. Haeinsa uses two-phase locking and optimistic concurrency control for implementing transaction. The isolation level of transaction is serializable. Inspired by Google Percolator ● VCNC ●
  • Chasm
  • How to make it easier writing applications against HBase?
  • Frameworks: Kiji.org • • • • • • • Entity-centric, simple model o Types, complex, compound types. Each cell is schema versioned Works across MR & REST, etc. Machine-learning libs Examples, tutorials Production users Open-source
  • Frameworks: CDK • • APIs providing Dataset abstraction – get/put/delete API in AVRO objects Highlights: – Supports multiple components ● flume, morphlines, hive, crunch, hcat – – • Types using Avro and parquet formats Manages schema evolution Open source by Cloudera – http://cloudera.github.io/cdk/docs/current
  • ● Client-embedded JDBC driver ○ Connection conn = DriverManager.getConnection("jdbc:phoenix:localhost"); ● Alternate HBase Client API (SQL) ● Fast! ○ ○ ○ ○ ○ Exploits HBase Coprocessors/Filters Types Aggregations Skip scans Secondary indices
  • + Datastores + etc
  • End Thank You! stack@apache.org
  • TODO ● ● ● ● ● DBA: R (read), W (write), C (create), X (execute), A (admin). cell-level security. Every cell in an Accumulo store can have a label, stored effectively as part of the key, which is used to determine whether a value is visible to a given subject or not. The label is not an ACL, it is a different way of expressing security policy. A label instead turns this on its head and describes the sensitivity of the information to a decision engine that then figures out if the subject is authorized to view data of that sensitivity based on (potentially, many) factors. Then, as of HBASE-7662, HBase can store into and apply ACLs from cell tags, extending the current HBase ACL model down to the cell. Finally, we have also contributed transparent server side encryption, as HBASE-7544, for additional assurance against accidental leakage of data at rest, which is at this time an HBaseonly feature. ● Auto-manages partitioning ● Storage machinery in the RS ● I like the Latency/Throughput/Read/Write axis in Nick