Your SlideShare is downloading. ×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second Generation and Beyond


Published on

Presented by: Doug Meil, Explorys

Presented by: Doug Meil, Explorys

Published in: Technology, Business

1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. page 1 | Evolving a 1st Generation HBase Deployment to 2nd and Beyond Doug Meil Chief Software Architect HBase Committer HBaseCon2013
  • 2. page 2 | Company Background
  • 3. page 3 | Comprehensive view of care including all venues of delivery representative of all major diseases, treatments, and demographics 14 integrated delivery networks with over 200 hospitals and 100,000 providers $46 billion in care delivered annual by our network members 24 million truly unique patients The Explorys Value Based Care Big Network
  • 4. page 4 | Clinical EMRs, claims, labs, registries, rep orted outcomes Operational Providers org charts, practices, locations, depa rtments, physical assets, and care workflow Financial Private / payer claims, billing, patient accounting systems The Explorys Platform PCP Specialist Hospital Post acute Long term Home Mobile Full view of the continuum of care & cost Secure | Cost Effective | Ready Now Start with Data Completeness  Aggregation  Patient matching  Curation & attribution  Data governance  Profiling  Risk analytics  Prediction Insight
  • 5. page 5 | Why HBase?
  • 6. page 6 | HBase at Explorys Transactional Store General Store
  • 7. page 7 | Source 1 Source 2 Source 3 Source 4 Explorys Apps 1Extract & Load Loads (Puts)1 Read (Scan)2 Bulk-Load3 Multi-Get4 Impala5 5 Queries MultiGet 4 Power Search 2 Patient Chart M/ R M/ R “Late Binding” Transformation & Standardization Generated Results / Indexes 3 Explore Measure Registry Engage High Level HBase Usage Overview
  • 8. page 8 | Functional Examples
  • 9. page 9 | NQF 0575 Example (Simple Example, Condensed) Initial Population Patients >= 17 and <= 74 before the start of the measurement period Denominator 2 encounters (non-acute and outpatient) and an active diagnosis of diabetes Or Active meds indicative of diabetes All within 2 years or during the measurement end-date Exclusions Things like active diagnosis of gestational diabetes will exclude patient from denominator Numerator Most recent HbA1c test < 8% Measures Generated in MapReduce Measure Calculations
  • 10. page 10 | Measure Results Generated to HBase Results by  Measure  Attributed Provider  Patient  Reporting Window  … generated to HBase Lots of Generated Data Hundreds of Measures Generates Hundreds of Millions of Measure Results Per Day Measure Generated Data
  • 11. page 11 | Heart Failure Functional Example  No evidence of Myocardial Infarction  THEN a prescription for Angiotensin-converting enzyme (ACE) inhibitor agent  THEN Myocardial Infarction within one year C. Diff. Infection Functional Example  Ambulatory Encounter  THEN an Inpatient Encounter  THEN evidence of C. Diff. infection within 10 days  THEN an Ambulatory Encounter within 30 days Summary NoSQL works well as the backend implementation for these kinds of “queries” because it takes complex logic to satisfy this result. PowerSearch
  • 12. page 12 | Technical Details
  • 13. page 13 | Distro CDH4.2.1 Hadoop Knobs  HDFS Local read shortcut on  HDFS Drop behind reads, Read-ahead on  Snappy for MR temp files  Read-ahead for MR temp files  MR heartbeat on task finish Cluster Information
  • 14. page 14 | HBase Knobs  We pre-split our tables  We Use KeyPrefixRegionSplitPolicy  Snappy CF compression  HLog compression on  RegionSize still 2-3 Gb (we’ve tested bigger, but staying here for now) HBase Knobs Under Consideration  HBase Checksumming - currently off, but will probably turn on  FAST_DIFF encoding – currently not in use, but will probably use for lookup tables Cluster Information
  • 15. page 15 | Compression (HDFS and HBase) LZO  Snappy HBase Key Redesign  Our initial HBase RowKeys were too beefy and too Stringy. • Refactored to be tighter.  Column names a bit too descriptive initially  Changes related to the new KeyPrefixRegionSplitPolicy. HBase Table Management We have a layer of metadata around our MR jobs and apps and re-create our tables from time to time, which makes schema changes easier. What Have We Changed?
  • 16. page 16 | HBase Loading  Index tables loaded with bulk-loading  Experimented with WAL off and deferred log flushing, but bulk-loading is better. HBase Gets  When we started multi-Get didn’t even exist in HBase!  This feature was very much appreciated, our DAO layer was modified to accept batch requests. • Minimizing RPCs makes a difference. SQL? Impala against HBase for internal data investigation What Have We Changed?
  • 17. page 17 | Data Browsers  We’ve built our own data browser for data inspection, and continue to add to it.  This isn’t going away any time soon and is highly used.  Also kind of necessary if you store complex objects in HBase HBase Filters  We have some.  Didn’t initially, but they have proven quite useful. Things We’ve Built
  • 18. page 18 | Questions? Doug Meil Chief Software Architect Thank You!