1. page 1 |
Evolving a 1st Generation HBase
Deployment to 2nd and Beyond
Chief Software Architect
2. page 2 |
3. page 3 |
Comprehensive view of care including all
venues of delivery representative of all major
diseases, treatments, and demographics
14 integrated delivery networks
with over 200 hospitals and
$46 billion in care delivered
annual by our network members
24 million truly unique patients
The Explorys Value Based Care Big Network
4. page 4 |
EMRs, claims, labs, registries, rep
charts, practices, locations, depa
rtments, physical assets, and care
Private / payer
claims, billing, patient
PCP Specialist Hospital
Full view of the continuum of care & cost
Secure | Cost Effective | Ready Now
Start with Data Completeness
Curation & attribution
5. page 5 |
6. page 6 |
HBase at Explorys
Transactional Store General Store
9. page 9 |
NQF 0575 Example (Simple Example, Condensed)
Patients >= 17 and <= 74 before the start of the measurement period
2 encounters (non-acute and outpatient) and an active diagnosis of diabetes
Active meds indicative of diabetes
All within 2 years or during the measurement end-date
Things like active diagnosis of gestational diabetes will exclude patient from
Most recent HbA1c test < 8%
Measures Generated in MapReduce
10. page 10 |
Measure Results Generated to HBase
… generated to HBase
Lots of Generated Data
Hundreds of Measures Generates Hundreds of Millions of Measure Results Per Day
Measure Generated Data
11. page 11 |
Heart Failure Functional Example
No evidence of Myocardial Infarction
THEN a prescription for Angiotensin-converting enzyme (ACE) inhibitor agent
THEN Myocardial Infarction within one year
C. Diff. Infection Functional Example
THEN an Inpatient Encounter
THEN evidence of C. Diff. infection within 10 days
THEN an Ambulatory Encounter within 30 days
NoSQL works well as the backend implementation for these kinds of “queries”
because it takes complex logic to satisfy this result.
12. page 12 |
13. page 13 |
HDFS Local read shortcut on
HDFS Drop behind reads, Read-ahead on
Snappy for MR temp files
Read-ahead for MR temp files
MR heartbeat on task finish
14. page 14 |
We pre-split our tables
We Use KeyPrefixRegionSplitPolicy
Snappy CF compression
HLog compression on
RegionSize still 2-3 Gb (we’ve tested bigger, but staying here for now)
HBase Knobs Under Consideration
HBase Checksumming - currently off, but will probably turn on
FAST_DIFF encoding – currently not in use, but will probably use for lookup tables
15. page 15 |
Compression (HDFS and HBase)
HBase Key Redesign
Our initial HBase RowKeys were too beefy and too Stringy.
• Refactored to be tighter.
Column names a bit too descriptive initially
Changes related to the new KeyPrefixRegionSplitPolicy.
HBase Table Management
We have a layer of metadata around our MR jobs and apps and re-create our tables
from time to time, which makes schema changes easier.
What Have We Changed?
16. page 16 |
Index tables loaded with bulk-loading
Experimented with WAL off and deferred log flushing, but bulk-loading is better.
When we started multi-Get didn’t even exist in HBase!
This feature was very much appreciated, our DAO layer was modified to accept
• Minimizing RPCs makes a difference.
Impala against HBase for internal data investigation
What Have We Changed?
17. page 17 |
We’ve built our own data browser for data inspection, and continue to add to it.
This isn’t going away any time soon and is highly used.
Also kind of necessary if you store complex objects in HBase
We have some.
Didn’t initially, but they have proven quite useful.
Things We’ve Built