Successfully reported this slideshow.
Your SlideShare is downloading. ×

Hbase at

Loading in …3

Check these out next

1 of 45 Ad

More Related Content

Slideshows for you (20)

Viewers also liked (20)


Similar to Hbase at (20)

More from Salesforce Engineering (20)


Hbase at

  1. 1. HBase @ Salesforce Lars Hofhansl Architect, Father, Meditator,Aikido Blackbelt
  2. 2. Safe harbor statement under the Private Securities Litigation Reform Act of 1995: This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proves incorrect, the results of, inc. could differ materially from the results expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services. The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any litigation, risks associated with completed and any possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of, inc. is included in our annual report on Form 10-K for the most recent fiscal year and in our quarterly report on Form 10-Q for the most recent fiscal quarter. These documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of our Web site. Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available., inc. assumes no obligation and does not intend to update these forward-looking statements. Safe Harbor
  3. 3. Why HBase? • SAN • RDBMS • Transactions
  4. 4. Zookeeper? Commodity Hardware? HBase? HDFS?Unstructured Data?
  5. 5. A. Why HBase? B. Interacting with the open source community C. HBase at Salesforce
  6. 6. Size Matters* New Salesforce customer: •“How many rows do you have?” •We will turn folks away if they have too many! Data Storage is expensive: •SAN storage •Relational Database •Too many rows  Too expensive * In a relational world
  7. 7. What if in the future we: … and have cheaper storage? … and never need to ask again about the number of rows? … grow with the data by just adding more machines? (Disclaimer: no transactions, no joins, no 2nd’ary indexes, …)
  8. 8. (A quick note about) Relational Databases • We love them. They are core to our infrastructure. • SQL and NoSQL NoACID are complementary. • (Almost) everything we do is SQL based (see Phoenix – the SQL layer for HBase.)
  9. 9. The Search - Requirements • Consistent – “Eventually consistent stores are 100% consistent 99% of the time” – Ian Varley • Scalable – No “features” impeding horizontal scaling • Persistent – Duh...? • Key lookups • Range lookups • Open source (ASL great, GPLv2 OK, GPLv3/AGPL not acceptable)
  10. 10. Enter HBase “A Sparse, Consistent, Distributed, Multidimensional, Persistent, Sorted Map”
  11. 11. Salesforce and the HBase Community
  12. 12. To Fork or not to Fork – that is the question Fork - pros • Agility. No waiting for community review. Just get stuff done • Freedom. Patches that might not be acceptable to the community Fork - cons • Lose out on community work • Patches not useful to other parties There is no right or wrong. It’s a matter of choice, taste, and requirements.
  13. 13. HBase Development @ Salesforce • No fork of HBase. • No fork of HBase. • Internal HBase/HDFS branch for possible emergency fixes • All fixes are cleaned and contributed back • We switch to the next open source point release periodically
  14. 14. PMC member, 2 committers, release manager, contributors HBASE-11042 HBASE-11040 HBASE-11037 HBASE-11030 HBASE-11029 HBASE-11024 HBASE-11022 HBASE- 11010 HBASE-10996 HBASE-10989 HBASE-10988 HBASE-10987 HBASE-10982 HBASE-10969 HBASE-10847 HBASE-10805 HBASE-10722 HBASE-10706 HBASE-10642 HBASE-10594 HBASE-10562 HBASE-10551 HBASE-10546 HBASE-10505 HBASE-10501 HBASE-10489 HBASE-10470 HBASE-10420 HBASE-10416 HBASE-10383 HBASE-10363 HBASE-10320 HBASE-10317 HBASE-10286 HBASE-10284 HBASE-10281 HBASE-10279 HBASE-10259 HBASE-10257 HBASE-10250 HBASE-10181 HBASE-10117 HBASE-10076 HBASE-10058 HBASE-10057 HBASE-10015 HBASE-9993 HBASE-9971 HBASE-9956 HBASE-9915 HBASE- 9865 HBASE-9834 HBASE-9807 HBASE-9799 HBASE-9789 HBASE-9778 HBASE-9751 HBASE-9749 HBASE- 9732 HBASE-9731 HBASE-9711 HBASE-9658 HBASE-9584 HBASE-9566 HBASE-9534 HBASE-9429 HBASE- 9428 HBASE-9377 HBASE-9356 HBASE-9344 HBASE-9301 HBASE-9266 HBASE-9231 HBASE-9221 HBASE- 9186 HBASE-9158 HBASE-9103 HBASE-9097 HBASE-9049 HBASE-8971 HBASE-8945 HBASE-8930 HBASE- 8912 HBASE-8858 HBASE-8809 HBASE-8767 HBASE-8702 HBASE-8698 HBASE-8684 HBASE-8671 HBASE- 8636 HBASE-8525 HBASE-8503 HBASE-8355 HBASE-8316 HBASE-8229 HBASE-8188 HBASE-8166 HBASE- 8151 HBASE-8110 HBASE-8108 HBASE-8055 HBASE-8008 HBASE-7999 HBASE-7947 HBASE-7945 HBASE- 7817 HBASE-7801 HBASE-7729 HBASE-7725 HBASE-7717 HBASE-7709 HBASE-7702 HBASE-7681 HBASE- 7617 HBASE-7602 HBASE-7578 HBASE-7550 HBASE-7499 HBASE-7497 HBASE-7483 HBASE-7466 HBASE- 7465 HBASE-7455 HBASE-7438 HBASE-7435 HBASE-7432 HBASE-7431 HBASE-7417 HBASE-7415 HBASE- 7371 HBASE-7336 HBASE-7293 HBASE-7279 HBASE-7270 HBASE-7252 HBASE-7240 HBASE-7215 HBASE- 7214 HBASE-7180 HBASE-7177 HBASE-7166 HBASE-7165 HBASE-7091 HBASE-7069 HBASE-7051 HBASE- 7047 HBASE-7021 HBASE-7010 HBASE-6996 HBASE-6974
  15. 15. PMC member, 2 committers, release manager, contributors HBASE-6949 HBASE-6946 HBASE-6912 HBASE-6889 HBASE-6879 HBASE-6868 HBASE-6865 HBASE-6863 HBASE-6797 HBASE-6796 HBASE-6784 HBASE-6765 HBASE-6757 HBASE-6755 HBASE-6711 HBASE-6707 HBASE-6690 HBASE-6667 HBASE-6638 HBASE-6637 HBASE-6621 HBASE-6582 HBASE-6580 HBASE-6579 HBASE-6573 HBASE-6571 HBASE-6570 HBASE-6569 HBASE-6568 HBASE-6561 HBASE-6523 HBASE-6522 HBASE-6505 HBASE-6504 HBASE-6496 HBASE-6495 HBASE-6441 HBASE-6439 HBASE-6427 HBASE-6426 HBASE-6421 HBASE-6406 HBASE-6355 HBASE-6347 HBASE-6326 HBASE-6296 HBASE-6293 HBASE-6291 HBASE-6178 HBASE-6138 HBASE-6113 HBASE-6112 HBASE-6110 HBASE-6087 HBASE-5961 HBASE-5955 HBASE-5909 HBASE-5884 HBASE-5871 HBASE-5865 HBASE-5782 HBASE-5775 HBASE-5774 HBASE-5682 HBASE-5670 HBASE-5659 HBASE-5641 HBASE-5609 HBASE-5604 HBASE-5574 HBASE-5569 HBASE-5548 HBASE-5547 HBASE-5541 HBASE-5526 HBASE-5523 HBASE-5509 HBASE-5497 HBASE-5460 HBASE-5455 HBASE-5440 HBASE-5431 HBASE-5368 HBASE-5350 HBASE-5348 HBASE-5318 HBASE-5304 HBASE-5266 HBASE-5229 HBASE-5203 HBASE-5118 HBASE-5096 HBASE-5088 HBASE-5084 HBASE-5070 HBASE-5058 HBASE-5005 HBASE-5001 HBASE-4998 HBASE-4981 HBASE-4979 HBASE-4945 HBASE-4886 HBASE-4874 HBASE-4870 HBASE-4838 HBASE-4805 HBASE-4800 HBASE-4691 HBASE-4682 HBASE-4673 HBASE-4657 HBASE-4626 HBASE-4605 HBASE-4583 HBASE-4561 HBASE-4559 HBASE-4556 HBASE-4536 HBASE-4517 HBASE-4488 HBASE-4454 HBASE-4439 HBASE-4404 HBASE-4387 HBASE-4347 HBASE-4336 HBASE-4335 HBASE-4334 HBASE-4331 HBASE-4296 HBASE-4283 HBASE-4263 HBASE-4242 HBASE-4241 HBASE-4197 HBASE-4178 HBASE-4171 HBASE-4102 HBASE-4071 HBASE-3661 HBASE-3645 HBASE-3584 HBASE-3443 HBASE-3433 HBASE-3387 HBASE-2947 HBASE-2196 HBASE-2195 HDFS-3979 HDFS-744
  16. 16. Managing HBase 0.94
  17. 17. Established monthly release train for 0.94
  18. 18. Contributed >300 of features, bug fixes, perf improvements
  19. 19. Reviewed 1000’s of open source patches
  20. 20. Committed 100’s of patches
  21. 21. Open Sourced Apache Phoenix – SQL skin on HBase
  22. 22. Salesforce High-level Architecture
  23. 23. Salesforce *is* a database
  24. 24. Salesforce is a Database Query Parser Query (SQL) Parsed Query Query Optimizer Plan Generator Plan Cost Estimator Evaluation Plan Query Plan Evaluator System Catalog Database Stats Tables Columns Indexes
  25. 25. Salesforce is a Database Query Parser Query (SOQL) Parsed Query Query Optimizer Plan Generator Plan Cost Estimator System Catalog Oracle Hinted Oracle SQL Database Stats Objects Fields Indexes
  26. 26. Salesforce is multi tenant
  27. 27. …pod Tenant A-D pod Tenant E-H pod Tenant I-O
  28. 28. pod = a database instance •Oracle RAC •AppServers •Blob store servers •Search servers •Shared SAN storage •SAN replication for DR App Server App Server App Server App Server … Oracle Node Oracle Node Oracle Node Oracle Node… Oracle RAC cluster Primary Site Secondary Site SAN replication SAN SAN SQL/JDBC
  29. 29. Finally: HBase @ Salesforce
  30. 30. Oracle Hinted Oracle SQL Query Parser Query (SOQL) Parsed Query Query Optimizer Plan Generator Plan Cost Estimator System Catalog Database Stats Objects Fields Indexes 1. External Objects 2. Phoenix SQL HBaseHBaseHBaseHBase Where does HBase Fit?
  31. 31. Where does HBase Fit? •Separate HBase per pod (close to 50 clusters) •Logically co-located with Oracle •Small clusters striped across five racks •Each cluster’s master service on a different rack •Identical cluster for DR App Server App Server App Server App Server … Oracle Node Oracle Node HBase Node HBase Node… Oracle Cluster HBase Node HBase Node HBase Node … Primary Site Secondary Site DR HBase Cluster Decentralized HBase Replication SQL/JDBC via Phoenix HBase Cluster … SAN SAN
  32. 32. Use Cases
  33. 33. 1. Audit Trails (Entity History) • Identity managed in RDBMS • Indexed in HBase (Phoenix indexes) • Historical, immutable data only • No need to reason about updates, split identities, and transactions
  34. 34. 2. Archiving (Data Lifecycle Management) • Objects (rows) moved to HBase • Identity managed in HBase after move • Data immutable in HBase • No Transactions
  35. 35. 3. Live data in HBase (BigObjects) • Mutable data (possibly) • Everything managed in HBase • Still no Transactions, yet • Platform for other team to use
  36. 36. Merrill Lynch Rationalization Data Governance, Audit & Archive • First Salesforce Enterprise Customer • On PlatformArchival compelling versus On Premise Solution from Informatica • Retention Requirements for 7 Years Merrill Lynch “Data Audit, Governance & Lifecycle management is critical for Merrill for the entire banking & financial industry has become a benchmark requirement
  37. 37. Heating, ventilation, and air-conditioning in the EU • Top 10 Platform Users • Subject to highly variable data governance and retention requirements • Significant SAP footprint driving business rules – need to connect that to Salesforce data for archival and data retention needs • Massive service workforce generates significant data processing challenges “The Platform roadmap for Data Archive is critical for future data management needs” MichaelRoehr, CTO Vailliant
  38. 38. BMW Enriches Their Customer Perspective • Sales Cloud available across all German Dealership Franchises • All customer data subject stringent & government mandated protection, audit & retention • Correlations with Car Builder App data enables more contextual customer interactions • Car Telemetry, used correctly help refine product evolution and customer needs alignment “Data driven customer engagement is a key driver for our enhance customer experience
  39. 39. System Of Record (SOR) SOR = HA + DR + Backup + M&M + Security
  40. 40. Highly Available, Disaster Recovery • Five peer Zookeeper Quorum • Five Quorum Journals (for fs edits) • Five HMasters • Three NameNodes (yes, three, we made a patch to run more than one standby) • HBase Replication to identical hot standby pod in a different data center – In the event of a disaster we fail a complete pod to the secondary site • Weekly automated, unattended rolling restarts
  41. 41. Replication Backup High-level Architecture Primary pod HBase 48h HDFS Backup per tenant DR pod HBase 48h HDFS Merkle Tree Verification Backup per tenant
  42. 42. Monitoring & Management (M&M) • Nagios alerts • Trending via OpenTSDB. Custom UI on top the time series data. • Rolling upgrades – Eventually scheduled and unattended • Absolutely no unscheduled downtime. Not even during a rack failure.
  43. 43. A. Why HBase? B. Interacting with the open source community C. HBase at Salesforce
  44. 44. Lars Hofhansl

Editor's Notes

  • Spent time with StumbleUpon, Facebook, many others. This is a great community.
  • Salesforce is seeing increasing change of center of gravity of customer data.Driving this forward across verticals such as Banking & Finserv requires data audit driven by post 2008 regularity requirements and Sar-Box requirements. As this data generated in a transactional environment we use HBase as our historical and immutable storage. 
  • Their use of the platform to drive their entire business keeps to keep their dynamic and highly work force mobile in touch with their data.Given their operating environment in Germany they are required to deliver complete data audit and use Field History for this. They also are required to keep all customer data for at least 15 years which is why Archive is so key for them.
  • Across Germany we've had a successful deployment in each franchise to establish new base lines in customer interactions with BMW customers, leases and service interactions. Looking beyond this usecase the capability of marrying together the customer data generated for the BMW Car Builder application and cleansed and anonymizedtelemetrics data is pushing Salesforce to deliver the concepts and tools to allow BMW to absorb the full spectrum of their customer event data stream, and take business actions on it.Imagine how I would feel as a prospective customer if I walked into a dealership and they have a more informed knowledge of who I am and my likely preferences. We are using the notion of BigObjects to absorb, store and act on the data that is behind the Internet of Customers.