HBaseCon 2012 | Real-Time and Batch HBase for Healthcare at Explorys


Published on

Explorys leverages HBase and the Hadoop stack to power the next generation of Enterprise Performance Management for Healthcare. The Explorys team will present an overview in 3 parts: Explorys functional and technical overview, approaches in MapReduce performance tuning, and system operations for HBase and Hadoop.

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Performance ManagementMonitoring and ReportingConfiguration ManagementAutomationRelease ManagementUpgrades and TuningTeamworkYou’re in this togetherCustomer ServiceUnderstand who you work for
  • Step 1: monitor, Monitor, MONITOR!Hadoop and Hbase ship with native Ganglia reporting. Reasonably easy to set up. Ganglia can be finicky.Nagios, Zenoss, etc. Everyone uses some sort of NMS. Choose your poison.OpenTSDB is great for those who want everything in one place, forever.Step 2: Understand what you’re monitoringIf you don’t know what a metric means, look it up! Always be learning.It may take you 20 minutes to figure out what something means, but you’ll know if for next timeWork with customers to understand what’s important to them, too.This doesn’t always mean paying customers, although they are important. This also means other teams in your company.Step 3: Act on the dataState-based alerting is easyAny NMS can give you up/down alertsData-driven alerts are harderWe have a script that reports when individual task trackers are more than 2 std deviations outside of mean for the gridBehavioral monitoring is goal“Listen for the silence”, report when expected tasks run for too long, or don’t run at all. We’re still working on this.
  • Do this when you’re small!No, really.Don’t wait. Do it now.Consistency is essentialYou must trust your platform. You have to know that everything is working.Your customers must trust your platform. They’ll try to work around you if you can’t provide stability.Use version control. Manually editing configs will only take you so far. It breaks down quickly. It’s not about blame, it’s about consistency.Choose a methodology and implement itParallel execution/distributionWe’ve managed to strong-arm our way using SVN and parallel-ssh. Our arms are tired.Configuration managementConfiguration management tools mean you change it once and it goes everywhere.Means the difference between a date night and a date with your computer.
  • Upgrade early and oftenTest, test, and re-test!The logistics of upgrading can be tough, but it’s worth it.Get involved with the communityHBase is constantly evolvingYour feature request might help someone else, toohbase-user and hbase-dev are very active mailing listsThe HBase developers don’t bite (hard)Case studies and documentation are always welcome
  • It takes a village…… to raise an HBaseInter-team communication is essentialWe’re all part of the effortAdministratorsEngineersDevelopersManagersEnd users
  • HBaseCon 2012 | Real-Time and Batch HBase for Healthcare at Explorys

    1. 1. Mixing Real Time and Batch with HBase HBaseCon 2012 Doug Meil Casey Stella Dan Washburnpage 1
    2. 2. Explorys Technical Overview Doug Meil Chief Software Architect HBase Committerpage 2
    3. 3. Healthcare organizations that leverage BIG DATA and take action on it will survive and thrive.page 3
    4. 4. The volume of data… plus the variety of systems and sources of data… is piling up at a velocity… that traditional data approaches were not designed to support. Healthcare’s Data Overloadpage 4
    5. 5. Explorys Provides...  A platform to leverage data across systems, venues, and partners to drive care quality, cost efficiency, BIG and risk mitigation.  Rapidly deployable Software-as-a- DATA Service apps for leadership and providers.  Extensible Data-as-a-Service functions to support healthcare IT and business intelligence.page 5
    6. 6. Explorys’ Customers and Patient Span By ZIP Code 80 hospitals, hundreds of ambulatory practices and thousands of providers caring for 14 million patients.page 6
    7. 7. page 7
    8. 8. 44 billion curated clinical, operation, and financial data points, 4 4,0 0 0,1 3 1,1 1 7 and counting.page 8
    9. 9. What Explorys DoesPlatform and Apps The Applications j Explore: High speed search and population Measure: Provider & group level performance exploration. metrics and benchmarks. DataGrid Registry: Automated Engage: Rule-based care and disease patient & provider workflow management registries. and outreach.page 9
    10. 10. What Explorys DoesPlatform and Apps (video demo) j DataGridpage 10
    11. 11. HBase and MR at Explorys Casey Stella Senior Software Engineerpage 11
    12. 12. Map Reduce Strategies  HBase at Explorys  HBase is our transactional data store  Keys group data from a given patient together  MR jobs process data from HBase  Transform data and report data  Sample data  Emit data into a form which can be accessed efficiently from applications  Naïve MR jobs cause much, much stresspage 12
    13. 13. Local Aggregation Map Task 1  Locally aggregate processing of a patient Patient 1 : Encounter in an individual mapper Patient 1 : Observation  Fewer keys and chunkier values Patient 1 : Observation  Sorting is cheaper Patient 1 : Diagnosis  Careful Map Task 2  Patient data can span tasks Patient 1 : Drug Patient 2 : Encounter  Potential scalability issues Patient 2 : Observation Patient 2 : Observation  Data Intensive Text Processing with Map Reduce by Jimmy Lin and Chris Dyer covers this technique very wellpage 13
    14. 14. Map Reduce and Junior Engineers  Map Reduce is Distributed Computing for the masses  Masses still do stupid things  Masses still have to write MR jobs to do their job  Safety at Explorys  Most of our engineers start without prior experience in Hadoop or HBase  Giving them a book only goes so far  Need a combination of process and technology  Still an uphill battlepage 14
    15. 15. Map Reduce and Junior Engineers  Process  Jobs are tested in development grid with real data  Most map reduce jobs are pushed into teams where MR and HBase education are very important  Technology  Constructed an API wrapping Hadoop mapreduce package  Alternate job builder interface with added type-safety  Adds the ability to swap-out at launch-time different contextspage 15
    16. 16. Building a Solid Foundation Daniel Washburn Systems Engineerpage 16
    17. 17. Key Components Performance Management Release Configuration Management Management Teamworkpage 17
    18. 18. Performance Management  Collect as much as you can  Ganglia, OpenTSDB  Nagios, Zenoss  Understand what you’re monitoring  If you don’t know what a metric means, look it up!  Work with customers to understand what’s important to them  Act on it  State-based alerting is where many people stop  Data-driven, predictive approach is the goal  Create dashboardspage 18
    19. 19. Configuration Management  Consistency is essential  Do this while you’re still small!  Choose a methodology  Parallel execution/distribution  Configuration management engine  Implement it  Parallel-ssh, mcollective  Puppetpage 19
    20. 20. Release Management  Upgrade early and often  Become comfortable with the process  The logistics of upgrading can be tough, but it’s worth it  Get involved with the community  HBase is constantly evolving  The mailing lists and IRC channel are very active  Your contribution might help someone elsepage 20
    21. 21. Teamwork  It takes a village…  … to raise an HBase  Effective communication is essential  We’re all part of the effort  Administrators  Engineers  Developers  End userspage 21
    22. 22. Thank You! Questions? Doug Meil Chief Software Architect Doug.Meil@explorys.com Casey Stella Senior Software Engineer Casey.Stella@explorys.com Daniel Washburn Systems Engineer Daniel.Washburn@explorys.com www.explorys.compage 22