Apache HBase Road MapA short history of nearly everything HBase. Past, Present, and Future.Jonathan GrayNovember ,Hadoop W...
Agenda   Past (<= . )   Present (== .    )   Future (>= . )
Apache HBaseA Friendly Open Source ProjectDisclaimer: These are the personal opinions of Jonathan Gray and do not necessar...
Apache HBase▪ A dynamic and pragmatic community  ▪ HBase committers scattered around many companies  ▪ A culture of accept...
The Ghost of HBase PastEarly days through . and .
HBase History▪ Started in         as Bigtable clone for Hadoop  ▪ First code released in    as part of Hadoop .▪ Six major...
Random read/write access for offline processes
HBase History▪ Early users focused on offline, crawl data storage  ▪ Powerset was primary user  ▪ Others like WorldLingo, O...
OLTP database for web startups
Online HBase▪ Next generation of HBasers wanted OLTP  ▪ Streamy.com (my previous startup)  ▪ StumbleUpon and others▪ HBase...
HBase 0.20▪ Performance Release (aka the Unjavafy release)  ▪ Rewrite of entire read and write paths    ▪ Introduction of ...
A highly available, scalable database for tech companies
HBase 0.90▪ Durability, Stability, Availability Release  ▪ “Production Ready HBase”  ▪ Zero data loss  ▪ Rewrite of Master...
HBase 0.90: Production Ready▪ Zero data loss  ▪ HDFS Appends, HLog fixes, gremlin testing▪ Master rewrite  ▪ Remove from re...
HBase 0.90: New Features▪ Cluster-to-cluster replication▪ Read performance  ▪ Bloom filters rewrite  ▪ Efficient intra-row s...
HBase Today
A large scale production-capable database system
HBase 0.92▪ Stability and feature release    ▪ Lots of usability and stability improvements    ▪ Coprocessors and security...
HBase 0.92: Big new features▪ Coprocessors  ▪ Triggers and Stored Procedures  ▪ Pre/Post hooks to all client calls and ser...
HBase 0.92: Performance▪ Performance improvements  ▪ More seeking and early-out hints  ▪ Distributed log splitting  ▪ Cach...
HBase 0.92: Improvements▪ Operational improvements  ▪ HBCK improvements, Web UI improvements  ▪ Slow query log, running ta...
HBase 0.92: Documentation!▪ The (O’Reilly) HBase Book  ▪ HBase: The Definitive Guide released in September  ▪ Massive effor...
HBase of the Future . and beyond
?           You!A usable, large scale production database system
HBase 0.94▪ Stability and usability is the core focus  ▪ Increase stability by decreasing complexity  ▪ More work on UI, t...
HBase 0.94: New Stuff▪ Thrift   .  ▪ New Thrift API to more closely match Java API  ▪ Embedded Thrift w/ RS short-circuit▪...
HBase 0.94: Performance▪ Scaling for throughput vs. latency  ▪ Early-lock-release to decrease row contention  ▪ Early-thre...
HBase 0.94: Project Management▪ Renewed focus on fast release cycle  ▪ HBase   . branch cut immediately after . release  ▪...
flyingnanobots                     jetpacks                  carsHolographic storage renders HBase obsolete
Beyond HBase 0.94▪ Stability and usability is still the core focus  ▪ More tests, testing frameworks, integration tests▪ B...
Conclusion▪ Apache HBase has come a long way  ▪ Use case driven development▪ HBase   .   coming very soon  ▪ Most stable r...
Check out the HBase at Facebook Page:facebook.com/UsingHbase    Thanks! Questions?
Hadoop World 2011: Apache HBase Road Map - Jonathan Gray - Facebook
Upcoming SlideShare
Loading in...5
×

Hadoop World 2011: Apache HBase Road Map - Jonathan Gray - Facebook

4,463

Published on

This technical session will provide a quick review of the Apache HBase project, looking at it from the past to the future. It will cover the imminent HBase 0.92 release as well as what is slated for 0.94 and beyond. A number of companies and use cases will be used as examples to describe the overall direction of the HBase community and project.

Published in: Technology
1 Comment
13 Likes
Statistics
Notes
No Downloads
Views
Total Views
4,463
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
0
Comments
1
Likes
13
Embeds 0
No embeds

No notes for slide

Hadoop World 2011: Apache HBase Road Map - Jonathan Gray - Facebook

  1. 1. Apache HBase Road MapA short history of nearly everything HBase. Past, Present, and Future.Jonathan GrayNovember ,Hadoop World NYC
  2. 2. Agenda Past (<= . ) Present (== . ) Future (>= . )
  3. 3. Apache HBaseA Friendly Open Source ProjectDisclaimer: These are the personal opinions of Jonathan Gray and do not necessarily reflect the opinions ofFacebook Inc., Apache HBase, the Apache HBase community, or any other person or organization. I also apologizein advance to any individuals or companies that were left out of slides or discussion. This was not donepurposefully and I love you all.
  4. 4. Apache HBase▪ A dynamic and pragmatic community ▪ HBase committers scattered around many companies ▪ A culture of acceptance (contributions please!) ▪ Perhaps, occasionally, to a fault ▪ Many HBase committers have moved companies▪ “Road Map” driven by sponsoring companies ▪ Bugs fixed and features developed decided by them ▪ HBase has no Enterprise Software Company behind it
  5. 5. The Ghost of HBase PastEarly days through . and .
  6. 6. HBase History▪ Started in as Bigtable clone for Hadoop ▪ First code released in as part of Hadoop .▪ Six major releases (three versioning schemes) ▪ . . in March ▪ . . in August ▪ . . in September ▪ . . in January ▪ . . in September ▪ . . in January
  7. 7. Random read/write access for offline processes
  8. 8. HBase History▪ Early users focused on offline, crawl data storage ▪ Powerset was primary user ▪ Others like WorldLingo, OpenPlaces▪ Augmenting Offline MapReduce ▪ Needed random writes for web crawl storage ▪ Also use random writes to store links and images ▪ The road map was easy... Bigtable
  9. 9. OLTP database for web startups
  10. 10. Online HBase▪ Next generation of HBasers wanted OLTP ▪ Streamy.com (my previous startup) ▪ StumbleUpon and others▪ HBase Goes Realtime ▪ Gave this talk at Hadoop Summit w/ JD Cryans ▪ “HBase . ... First ever Performance Release” “As a random-access store, we are well suited for the storing and serving of Web applications, but high latency and variability (100s of ms to seconds) has reduced the usefulness of HBase and required the use of external caching in the past”
  11. 11. HBase 0.20▪ Performance Release (aka the Unjavafy release) ▪ Rewrite of entire read and write paths ▪ Introduction of KeyValue and zero-copy reads ▪ New block-based HFile format and LRU block cache ▪ New client APIs: Put, Get, Scan, Delete, Result▪ ZooKeeper Integration ▪ Remove all dependencies on master for reads/writes ▪ Leader election, fault detection, remove SPOF
  12. 12. A highly available, scalable database for tech companies
  13. 13. HBase 0.90▪ Durability, Stability, Availability Release ▪ “Production Ready HBase” ▪ Zero data loss ▪ Rewrite of Master and ZooKeeper interactions ▪ Testing, debugging, monitoring improvements ▪ Random read and large row improvements ▪ Lots of awesome new features
  14. 14. HBase 0.90: Production Ready▪ Zero data loss ▪ HDFS Appends, HLog fixes, gremlin testing▪ Master rewrite ▪ Remove from read/write path + failover, no SPOF▪ Operational improvements ▪ HBCK (fsck for HBase), HFile/HLog command-line tools ▪ Rolling restarts for minor upgrades ▪ New testing framework and k new lines of tests
  15. 15. HBase 0.90: New Features▪ Cluster-to-cluster replication▪ Read performance ▪ Bloom filters rewrite ▪ Efficient intra-row seeking for large row support▪ Other stuff ▪ Mavenized ▪ Stargate REST server and AVRO server ▪ Shell improvements and EC scripts
  16. 16. HBase Today
  17. 17. A large scale production-capable database system
  18. 18. HBase 0.92▪ Stability and feature release ▪ Lots of usability and stability improvements ▪ Coprocessors and security ▪ Multi-Master cluster replication▪ . . RC sometime in November ▪ blockers and criticals as of this morning ▪ FB already deploying a -based branch in dev
  19. 19. HBase 0.92: Big new features▪ Coprocessors ▪ Triggers and Stored Procedures ▪ Pre/Post hooks to all client calls and server operations ▪ Dynamically add new RPC calls ▪ ACL security atop Coprocessors▪ HFile v ▪ Support for very large regions / files ▪ Multi-level block index and inline blooms
  20. 20. HBase 0.92: Performance▪ Performance improvements ▪ More seeking and early-out hints ▪ Distributed log splitting ▪ CacheOnWrite, EvictOnClose▪ Compaction improvements ▪ Multi-threaded compactions ▪ Vastly improved file selection algorithm ▪ Lots of metrics and highly configurable
  21. 21. HBase 0.92: Improvements▪ Operational improvements ▪ HBCK improvements, Web UI improvements ▪ Slow query log, running tasks and thread status ▪ Online schema modifications▪ Usability and API improvements ▪ Increment client API ▪ String-based Filter language ▪ Multi-family bulk load ▪ The HBase Books!
  22. 22. HBase 0.92: Documentation!▪ The (O’Reilly) HBase Book ▪ HBase: The Definitive Guide released in September ▪ Massive effort by committer Lars George ▪ Lots of input and feedback from the community▪ The (Apache) HBase Book ▪ Apache HBase now has an docbook-format book ▪ Every HBase release will ship with a versioned book ▪ From installation to schema design and architecture ▪ Latest version @ http://hbase.apache.org/book.html
  23. 23. HBase of the Future . and beyond
  24. 24. ? You!A usable, large scale production database system
  25. 25. HBase 0.94▪ Stability and usability is the core focus ▪ Increase stability by decreasing complexity ▪ More work on UI, tools, monitoring, operability ▪ Table/family-level metrics▪ But features will always continue... ▪ Fast backups w/ point-in-time recovery ▪ Multi-Slave Replication ▪ Constraints and other Coprocessor-based contribs
  26. 26. HBase 0.94: New Stuff▪ Thrift . ▪ New Thrift API to more closely match Java API ▪ Embedded Thrift w/ RS short-circuit▪ Other Goodies ▪ TTL + minVersions ▪ Point-in-time snapshot scanners ▪ Atomic Append operation
  27. 27. HBase 0.94: Performance▪ Scaling for throughput vs. latency ▪ Early-lock-release to decrease row contention ▪ Early-thread-release to increase throughput ▪ Remove all global wait()/notify() on HLog▪ Improved seeking and file selection ▪ “Lazy-seek” in-order file processing ▪ DeleteFamily bloom filter
  28. 28. HBase 0.94: Project Management▪ Renewed focus on fast release cycle ▪ HBase . branch cut immediately after . release ▪ Already close to . feature freeze, . dev release? ▪ blockers and criticals left▪ Apache HBase: A slightly less accepting project ▪ Stability is really code stability ▪ Push towards iterative feature dev and branch dev ▪ Coprocessors and Service Interfaces go a long way
  29. 29. flyingnanobots jetpacks carsHolographic storage renders HBase obsolete
  30. 30. Beyond HBase 0.94▪ Stability and usability is still the core focus ▪ More tests, testing frameworks, integration tests▪ But features will always continue... ▪ RPC redux ▪ Dynamic configuration ▪ Request, IO, and locality based load balancing ▪ Multi-Tenancy (QoS, ACL) ▪ Tighter coordination with rest of stack (HDFS, Linux)
  31. 31. Conclusion▪ Apache HBase has come a long way ▪ Use case driven development▪ HBase . coming very soon ▪ Most stable release to date▪ Contributors and committers drive development ▪ Consumers can’t dictate the road map ▪ Individuals and organizations solve their problems (They have their own users... and jobs to keep)
  32. 32. Check out the HBase at Facebook Page:facebook.com/UsingHbase Thanks! Questions?

×