Hadoop World 2011: Apache HBase Road Map - Jonathan Gray - Facebook


Published on

This technical session will provide a quick review of the Apache HBase project, looking at it from the past to the future. It will cover the imminent HBase 0.92 release as well as what is slated for 0.94 and beyond. A number of companies and use cases will be used as examples to describe the overall direction of the HBase community and project.

Published in: Technology

Hadoop World 2011: Apache HBase Road Map - Jonathan Gray - Facebook

  1. 1. Apache HBase Road MapA short history of nearly everything HBase. Past, Present, and Future.Jonathan GrayNovember ,Hadoop World NYC
  2. 2. Agenda Past (<= . ) Present (== . ) Future (>= . )
  3. 3. Apache HBaseA Friendly Open Source ProjectDisclaimer: These are the personal opinions of Jonathan Gray and do not necessarily reflect the opinions ofFacebook Inc., Apache HBase, the Apache HBase community, or any other person or organization. I also apologizein advance to any individuals or companies that were left out of slides or discussion. This was not donepurposefully and I love you all.
  4. 4. Apache HBase▪ A dynamic and pragmatic community ▪ HBase committers scattered around many companies ▪ A culture of acceptance (contributions please!) ▪ Perhaps, occasionally, to a fault ▪ Many HBase committers have moved companies▪ “Road Map” driven by sponsoring companies ▪ Bugs fixed and features developed decided by them ▪ HBase has no Enterprise Software Company behind it
  5. 5. The Ghost of HBase PastEarly days through . and .
  6. 6. HBase History▪ Started in as Bigtable clone for Hadoop ▪ First code released in as part of Hadoop .▪ Six major releases (three versioning schemes) ▪ . . in March ▪ . . in August ▪ . . in September ▪ . . in January ▪ . . in September ▪ . . in January
  7. 7. Random read/write access for offline processes
  8. 8. HBase History▪ Early users focused on offline, crawl data storage ▪ Powerset was primary user ▪ Others like WorldLingo, OpenPlaces▪ Augmenting Offline MapReduce ▪ Needed random writes for web crawl storage ▪ Also use random writes to store links and images ▪ The road map was easy... Bigtable
  9. 9. OLTP database for web startups
  10. 10. Online HBase▪ Next generation of HBasers wanted OLTP ▪ Streamy.com (my previous startup) ▪ StumbleUpon and others▪ HBase Goes Realtime ▪ Gave this talk at Hadoop Summit w/ JD Cryans ▪ “HBase . ... First ever Performance Release” “As a random-access store, we are well suited for the storing and serving of Web applications, but high latency and variability (100s of ms to seconds) has reduced the usefulness of HBase and required the use of external caching in the past”
  11. 11. HBase 0.20▪ Performance Release (aka the Unjavafy release) ▪ Rewrite of entire read and write paths ▪ Introduction of KeyValue and zero-copy reads ▪ New block-based HFile format and LRU block cache ▪ New client APIs: Put, Get, Scan, Delete, Result▪ ZooKeeper Integration ▪ Remove all dependencies on master for reads/writes ▪ Leader election, fault detection, remove SPOF
  12. 12. A highly available, scalable database for tech companies
  13. 13. HBase 0.90▪ Durability, Stability, Availability Release ▪ “Production Ready HBase” ▪ Zero data loss ▪ Rewrite of Master and ZooKeeper interactions ▪ Testing, debugging, monitoring improvements ▪ Random read and large row improvements ▪ Lots of awesome new features
  14. 14. HBase 0.90: Production Ready▪ Zero data loss ▪ HDFS Appends, HLog fixes, gremlin testing▪ Master rewrite ▪ Remove from read/write path + failover, no SPOF▪ Operational improvements ▪ HBCK (fsck for HBase), HFile/HLog command-line tools ▪ Rolling restarts for minor upgrades ▪ New testing framework and k new lines of tests
  15. 15. HBase 0.90: New Features▪ Cluster-to-cluster replication▪ Read performance ▪ Bloom filters rewrite ▪ Efficient intra-row seeking for large row support▪ Other stuff ▪ Mavenized ▪ Stargate REST server and AVRO server ▪ Shell improvements and EC scripts
  16. 16. HBase Today
  17. 17. A large scale production-capable database system
  18. 18. HBase 0.92▪ Stability and feature release ▪ Lots of usability and stability improvements ▪ Coprocessors and security ▪ Multi-Master cluster replication▪ . . RC sometime in November ▪ blockers and criticals as of this morning ▪ FB already deploying a -based branch in dev
  19. 19. HBase 0.92: Big new features▪ Coprocessors ▪ Triggers and Stored Procedures ▪ Pre/Post hooks to all client calls and server operations ▪ Dynamically add new RPC calls ▪ ACL security atop Coprocessors▪ HFile v ▪ Support for very large regions / files ▪ Multi-level block index and inline blooms
  20. 20. HBase 0.92: Performance▪ Performance improvements ▪ More seeking and early-out hints ▪ Distributed log splitting ▪ CacheOnWrite, EvictOnClose▪ Compaction improvements ▪ Multi-threaded compactions ▪ Vastly improved file selection algorithm ▪ Lots of metrics and highly configurable
  21. 21. HBase 0.92: Improvements▪ Operational improvements ▪ HBCK improvements, Web UI improvements ▪ Slow query log, running tasks and thread status ▪ Online schema modifications▪ Usability and API improvements ▪ Increment client API ▪ String-based Filter language ▪ Multi-family bulk load ▪ The HBase Books!
  22. 22. HBase 0.92: Documentation!▪ The (O’Reilly) HBase Book ▪ HBase: The Definitive Guide released in September ▪ Massive effort by committer Lars George ▪ Lots of input and feedback from the community▪ The (Apache) HBase Book ▪ Apache HBase now has an docbook-format book ▪ Every HBase release will ship with a versioned book ▪ From installation to schema design and architecture ▪ Latest version @ http://hbase.apache.org/book.html
  23. 23. HBase of the Future . and beyond
  24. 24. ? You!A usable, large scale production database system
  25. 25. HBase 0.94▪ Stability and usability is the core focus ▪ Increase stability by decreasing complexity ▪ More work on UI, tools, monitoring, operability ▪ Table/family-level metrics▪ But features will always continue... ▪ Fast backups w/ point-in-time recovery ▪ Multi-Slave Replication ▪ Constraints and other Coprocessor-based contribs
  26. 26. HBase 0.94: New Stuff▪ Thrift . ▪ New Thrift API to more closely match Java API ▪ Embedded Thrift w/ RS short-circuit▪ Other Goodies ▪ TTL + minVersions ▪ Point-in-time snapshot scanners ▪ Atomic Append operation
  27. 27. HBase 0.94: Performance▪ Scaling for throughput vs. latency ▪ Early-lock-release to decrease row contention ▪ Early-thread-release to increase throughput ▪ Remove all global wait()/notify() on HLog▪ Improved seeking and file selection ▪ “Lazy-seek” in-order file processing ▪ DeleteFamily bloom filter
  28. 28. HBase 0.94: Project Management▪ Renewed focus on fast release cycle ▪ HBase . branch cut immediately after . release ▪ Already close to . feature freeze, . dev release? ▪ blockers and criticals left▪ Apache HBase: A slightly less accepting project ▪ Stability is really code stability ▪ Push towards iterative feature dev and branch dev ▪ Coprocessors and Service Interfaces go a long way
  29. 29. flyingnanobots jetpacks carsHolographic storage renders HBase obsolete
  30. 30. Beyond HBase 0.94▪ Stability and usability is still the core focus ▪ More tests, testing frameworks, integration tests▪ But features will always continue... ▪ RPC redux ▪ Dynamic configuration ▪ Request, IO, and locality based load balancing ▪ Multi-Tenancy (QoS, ACL) ▪ Tighter coordination with rest of stack (HDFS, Linux)
  31. 31. Conclusion▪ Apache HBase has come a long way ▪ Use case driven development▪ HBase . coming very soon ▪ Most stable release to date▪ Contributors and committers drive development ▪ Consumers can’t dictate the road map ▪ Individuals and organizations solve their problems (They have their own users... and jobs to keep)
  32. 32. Check out the HBase at Facebook Page:facebook.com/UsingHbase Thanks! Questions?