HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second Generation and Beyond


Published on

Presented by: Doug Meil, Explorys

Published in: Technology, Business
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second Generation and Beyond

  1. 1. page 1 | Evolving a 1st Generation HBase Deployment to 2nd and Beyond Doug Meil Chief Software Architect HBase Committer HBaseCon2013
  2. 2. page 2 | Company Background
  3. 3. page 3 | Comprehensive view of care including all venues of delivery representative of all major diseases, treatments, and demographics 14 integrated delivery networks with over 200 hospitals and 100,000 providers $46 billion in care delivered annual by our network members 24 million truly unique patients The Explorys Value Based Care Big Network
  4. 4. page 4 | Clinical EMRs, claims, labs, registries, rep orted outcomes Operational Providers org charts, practices, locations, depa rtments, physical assets, and care workflow Financial Private / payer claims, billing, patient accounting systems The Explorys Platform PCP Specialist Hospital Post acute Long term Home Mobile Full view of the continuum of care & cost Secure | Cost Effective | Ready Now Start with Data Completeness  Aggregation  Patient matching  Curation & attribution  Data governance  Profiling  Risk analytics  Prediction Insight
  5. 5. page 5 | Why HBase?
  6. 6. page 6 | HBase at Explorys Transactional Store General Store
  7. 7. page 7 | Source 1 Source 2 Source 3 Source 4 Explorys Apps 1Extract & Load Loads (Puts)1 Read (Scan)2 Bulk-Load3 Multi-Get4 Impala5 5 Queries MultiGet 4 Power Search 2 Patient Chart M/ R M/ R “Late Binding” Transformation & Standardization Generated Results / Indexes 3 Explore Measure Registry Engage High Level HBase Usage Overview
  8. 8. page 8 | Functional Examples
  9. 9. page 9 | NQF 0575 Example (Simple Example, Condensed) Initial Population Patients >= 17 and <= 74 before the start of the measurement period Denominator 2 encounters (non-acute and outpatient) and an active diagnosis of diabetes Or Active meds indicative of diabetes All within 2 years or during the measurement end-date Exclusions Things like active diagnosis of gestational diabetes will exclude patient from denominator Numerator Most recent HbA1c test < 8% Measures Generated in MapReduce Measure Calculations
  10. 10. page 10 | Measure Results Generated to HBase Results by  Measure  Attributed Provider  Patient  Reporting Window  … generated to HBase Lots of Generated Data Hundreds of Measures Generates Hundreds of Millions of Measure Results Per Day Measure Generated Data
  11. 11. page 11 | Heart Failure Functional Example  No evidence of Myocardial Infarction  THEN a prescription for Angiotensin-converting enzyme (ACE) inhibitor agent  THEN Myocardial Infarction within one year C. Diff. Infection Functional Example  Ambulatory Encounter  THEN an Inpatient Encounter  THEN evidence of C. Diff. infection within 10 days  THEN an Ambulatory Encounter within 30 days Summary NoSQL works well as the backend implementation for these kinds of “queries” because it takes complex logic to satisfy this result. PowerSearch
  12. 12. page 12 | Technical Details
  13. 13. page 13 | Distro CDH4.2.1 Hadoop Knobs  HDFS Local read shortcut on  HDFS Drop behind reads, Read-ahead on  Snappy for MR temp files  Read-ahead for MR temp files  MR heartbeat on task finish Cluster Information
  14. 14. page 14 | HBase Knobs  We pre-split our tables  We Use KeyPrefixRegionSplitPolicy  Snappy CF compression  HLog compression on  RegionSize still 2-3 Gb (we’ve tested bigger, but staying here for now) HBase Knobs Under Consideration  HBase Checksumming - currently off, but will probably turn on  FAST_DIFF encoding – currently not in use, but will probably use for lookup tables Cluster Information
  15. 15. page 15 | Compression (HDFS and HBase) LZO  Snappy HBase Key Redesign  Our initial HBase RowKeys were too beefy and too Stringy. • Refactored to be tighter.  Column names a bit too descriptive initially  Changes related to the new KeyPrefixRegionSplitPolicy. HBase Table Management We have a layer of metadata around our MR jobs and apps and re-create our tables from time to time, which makes schema changes easier. What Have We Changed?
  16. 16. page 16 | HBase Loading  Index tables loaded with bulk-loading  Experimented with WAL off and deferred log flushing, but bulk-loading is better. HBase Gets  When we started multi-Get didn’t even exist in HBase!  This feature was very much appreciated, our DAO layer was modified to accept batch requests. • Minimizing RPCs makes a difference. SQL? Impala against HBase for internal data investigation What Have We Changed?
  17. 17. page 17 | Data Browsers  We’ve built our own data browser for data inspection, and continue to add to it.  This isn’t going away any time soon and is highly used.  Also kind of necessary if you store complex objects in HBase HBase Filters  We have some.  Didn’t initially, but they have proven quite useful. Things We’ve Built
  18. 18. page 18 | Questions? Doug Meil Chief Software Architect Doug.Meil@explorys.com www.explorys.com Thank You!