Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

HBaseCon 2012 | Real-Time and Batch HBase for Healthcare at Explorys

3,873 views

Published on

Explorys leverages HBase and the Hadoop stack to power the next generation of Enterprise Performance Management for Healthcare. The Explorys team will present an overview in 3 parts: Explorys functional and technical overview, approaches in MapReduce performance tuning, and system operations for HBase and Hadoop.

Published in: Technology, Business
  • Be the first to comment

HBaseCon 2012 | Real-Time and Batch HBase for Healthcare at Explorys

  1. 1. Mixing Real Time and Batch with HBase HBaseCon 2012 Doug Meil Casey Stella Dan Washburnpage 1
  2. 2. Explorys Technical Overview Doug Meil Chief Software Architect HBase Committerpage 2
  3. 3. Healthcare organizations that leverage BIG DATA and take action on it will survive and thrive.page 3
  4. 4. The volume of data… plus the variety of systems and sources of data… is piling up at a velocity… that traditional data approaches were not designed to support. Healthcare’s Data Overloadpage 4
  5. 5. Explorys Provides...  A platform to leverage data across systems, venues, and partners to drive care quality, cost efficiency, BIG and risk mitigation.  Rapidly deployable Software-as-a- DATA Service apps for leadership and providers.  Extensible Data-as-a-Service functions to support healthcare IT and business intelligence.page 5
  6. 6. Explorys’ Customers and Patient Span By ZIP Code 80 hospitals, hundreds of ambulatory practices and thousands of providers caring for 14 million patients.page 6
  7. 7. page 7
  8. 8. 44 billion curated clinical, operation, and financial data points, 4 4,0 0 0,1 3 1,1 1 7 and counting.page 8
  9. 9. What Explorys DoesPlatform and Apps The Applications j Explore: High speed search and population Measure: Provider & group level performance exploration. metrics and benchmarks. DataGrid Registry: Automated Engage: Rule-based care and disease patient & provider workflow management registries. and outreach.page 9
  10. 10. What Explorys DoesPlatform and Apps (video demo) j DataGridpage 10
  11. 11. HBase and MR at Explorys Casey Stella Senior Software Engineerpage 11
  12. 12. Map Reduce Strategies  HBase at Explorys  HBase is our transactional data store  Keys group data from a given patient together  MR jobs process data from HBase  Transform data and report data  Sample data  Emit data into a form which can be accessed efficiently from applications  Naïve MR jobs cause much, much stresspage 12
  13. 13. Local Aggregation Map Task 1  Locally aggregate processing of a patient Patient 1 : Encounter in an individual mapper Patient 1 : Observation  Fewer keys and chunkier values Patient 1 : Observation  Sorting is cheaper Patient 1 : Diagnosis  Careful Map Task 2  Patient data can span tasks Patient 1 : Drug Patient 2 : Encounter  Potential scalability issues Patient 2 : Observation Patient 2 : Observation  Data Intensive Text Processing with Map Reduce by Jimmy Lin and Chris Dyer covers this technique very wellpage 13
  14. 14. Map Reduce and Junior Engineers  Map Reduce is Distributed Computing for the masses  Masses still do stupid things  Masses still have to write MR jobs to do their job  Safety at Explorys  Most of our engineers start without prior experience in Hadoop or HBase  Giving them a book only goes so far  Need a combination of process and technology  Still an uphill battlepage 14
  15. 15. Map Reduce and Junior Engineers  Process  Jobs are tested in development grid with real data  Most map reduce jobs are pushed into teams where MR and HBase education are very important  Technology  Constructed an API wrapping Hadoop mapreduce package  Alternate job builder interface with added type-safety  Adds the ability to swap-out at launch-time different contextspage 15
  16. 16. Building a Solid Foundation Daniel Washburn Systems Engineerpage 16
  17. 17. Key Components Performance Management Release Configuration Management Management Teamworkpage 17
  18. 18. Performance Management  Collect as much as you can  Ganglia, OpenTSDB  Nagios, Zenoss  Understand what you’re monitoring  If you don’t know what a metric means, look it up!  Work with customers to understand what’s important to them  Act on it  State-based alerting is where many people stop  Data-driven, predictive approach is the goal  Create dashboardspage 18
  19. 19. Configuration Management  Consistency is essential  Do this while you’re still small!  Choose a methodology  Parallel execution/distribution  Configuration management engine  Implement it  Parallel-ssh, mcollective  Puppetpage 19
  20. 20. Release Management  Upgrade early and often  Become comfortable with the process  The logistics of upgrading can be tough, but it’s worth it  Get involved with the community  HBase is constantly evolving  The mailing lists and IRC channel are very active  Your contribution might help someone elsepage 20
  21. 21. Teamwork  It takes a village…  … to raise an HBase  Effective communication is essential  We’re all part of the effort  Administrators  Engineers  Developers  End userspage 21
  22. 22. Thank You! Questions? Doug Meil Chief Software Architect Doug.Meil@explorys.com Casey Stella Senior Software Engineer Casey.Stella@explorys.com Daniel Washburn Systems Engineer Daniel.Washburn@explorys.com www.explorys.compage 22

×