• Share
  • Email
  • Embed
  • Like
  • Private Content
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second Generation and Beyond
 

HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second Generation and Beyond

on

  • 686 views

Presented by: Doug Meil, Explorys

Presented by: Doug Meil, Explorys

Statistics

Views

Total Views
686
Views on SlideShare
686
Embed Views
0

Actions

Likes
1
Downloads
8
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second Generation and Beyond HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second Generation and Beyond Presentation Transcript

    • page 1 | Evolving a 1st Generation HBase Deployment to 2nd and Beyond Doug Meil Chief Software Architect HBase Committer HBaseCon2013
    • page 2 | Company Background
    • page 3 | Comprehensive view of care including all venues of delivery representative of all major diseases, treatments, and demographics 14 integrated delivery networks with over 200 hospitals and 100,000 providers $46 billion in care delivered annual by our network members 24 million truly unique patients The Explorys Value Based Care Big Network
    • page 4 | Clinical EMRs, claims, labs, registries, rep orted outcomes Operational Providers org charts, practices, locations, depa rtments, physical assets, and care workflow Financial Private / payer claims, billing, patient accounting systems The Explorys Platform PCP Specialist Hospital Post acute Long term Home Mobile Full view of the continuum of care & cost Secure | Cost Effective | Ready Now Start with Data Completeness  Aggregation  Patient matching  Curation & attribution  Data governance  Profiling  Risk analytics  Prediction Insight
    • page 5 | Why HBase?
    • page 6 | HBase at Explorys Transactional Store General Store
    • page 7 | Source 1 Source 2 Source 3 Source 4 Explorys Apps 1Extract & Load Loads (Puts)1 Read (Scan)2 Bulk-Load3 Multi-Get4 Impala5 5 Queries MultiGet 4 Power Search 2 Patient Chart M/ R M/ R “Late Binding” Transformation & Standardization Generated Results / Indexes 3 Explore Measure Registry Engage High Level HBase Usage Overview
    • page 8 | Functional Examples
    • page 9 | NQF 0575 Example (Simple Example, Condensed) Initial Population Patients >= 17 and <= 74 before the start of the measurement period Denominator 2 encounters (non-acute and outpatient) and an active diagnosis of diabetes Or Active meds indicative of diabetes All within 2 years or during the measurement end-date Exclusions Things like active diagnosis of gestational diabetes will exclude patient from denominator Numerator Most recent HbA1c test < 8% Measures Generated in MapReduce Measure Calculations
    • page 10 | Measure Results Generated to HBase Results by  Measure  Attributed Provider  Patient  Reporting Window  … generated to HBase Lots of Generated Data Hundreds of Measures Generates Hundreds of Millions of Measure Results Per Day Measure Generated Data
    • page 11 | Heart Failure Functional Example  No evidence of Myocardial Infarction  THEN a prescription for Angiotensin-converting enzyme (ACE) inhibitor agent  THEN Myocardial Infarction within one year C. Diff. Infection Functional Example  Ambulatory Encounter  THEN an Inpatient Encounter  THEN evidence of C. Diff. infection within 10 days  THEN an Ambulatory Encounter within 30 days Summary NoSQL works well as the backend implementation for these kinds of “queries” because it takes complex logic to satisfy this result. PowerSearch
    • page 12 | Technical Details
    • page 13 | Distro CDH4.2.1 Hadoop Knobs  HDFS Local read shortcut on  HDFS Drop behind reads, Read-ahead on  Snappy for MR temp files  Read-ahead for MR temp files  MR heartbeat on task finish Cluster Information
    • page 14 | HBase Knobs  We pre-split our tables  We Use KeyPrefixRegionSplitPolicy  Snappy CF compression  HLog compression on  RegionSize still 2-3 Gb (we’ve tested bigger, but staying here for now) HBase Knobs Under Consideration  HBase Checksumming - currently off, but will probably turn on  FAST_DIFF encoding – currently not in use, but will probably use for lookup tables Cluster Information
    • page 15 | Compression (HDFS and HBase) LZO  Snappy HBase Key Redesign  Our initial HBase RowKeys were too beefy and too Stringy. • Refactored to be tighter.  Column names a bit too descriptive initially  Changes related to the new KeyPrefixRegionSplitPolicy. HBase Table Management We have a layer of metadata around our MR jobs and apps and re-create our tables from time to time, which makes schema changes easier. What Have We Changed?
    • page 16 | HBase Loading  Index tables loaded with bulk-loading  Experimented with WAL off and deferred log flushing, but bulk-loading is better. HBase Gets  When we started multi-Get didn’t even exist in HBase!  This feature was very much appreciated, our DAO layer was modified to accept batch requests. • Minimizing RPCs makes a difference. SQL? Impala against HBase for internal data investigation What Have We Changed?
    • page 17 | Data Browsers  We’ve built our own data browser for data inspection, and continue to add to it.  This isn’t going away any time soon and is highly used.  Also kind of necessary if you store complex objects in HBase HBase Filters  We have some.  Didn’t initially, but they have proven quite useful. Things We’ve Built
    • page 18 | Questions? Doug Meil Chief Software Architect Doug.Meil@explorys.com www.explorys.com Thank You!