More Related Content

Viewers also liked(20)


More from Cloudera, Inc.(20)


HBaseCon 2012 | Real-Time and Batch HBase for Healthcare at Explorys

  1. Mixing Real Time and Batch with HBase HBaseCon 2012 Doug Meil Casey Stella Dan Washburn page 1
  2. Explorys Technical Overview Doug Meil Chief Software Architect HBase Committer page 2
  3. Healthcare organizations that leverage BIG DATA and take action on it will survive and thrive. page 3
  4. The volume of data… plus the variety of systems and sources of data… is piling up at a velocity… that traditional data approaches were not designed to support. Healthcare’s Data Overload page 4
  5. Explorys Provides...  A platform to leverage data across systems, venues, and partners to drive care quality, cost efficiency, BIG and risk mitigation.  Rapidly deployable Software-as-a- DATA Service apps for leadership and providers.  Extensible Data-as-a-Service functions to support healthcare IT and business intelligence. page 5
  6. Explorys’ Customers and Patient Span By ZIP Code 80 hospitals, hundreds of ambulatory practices and thousands of providers caring for 14 million patients. page 6
  7. page 7
  8. 44 billion curated clinical, operation, and financial data points, 4 4,0 0 0,1 3 1,1 1 7 and counting. page 8
  9. What Explorys Does Platform and Apps The Applications j Explore: High speed search and population Measure: Provider & group level performance exploration. metrics and benchmarks. DataGrid Registry: Automated Engage: Rule-based care and disease patient & provider workflow management registries. and outreach. page 9
  10. What Explorys Does Platform and Apps (video demo) j DataGrid page 10
  11. HBase and MR at Explorys Casey Stella Senior Software Engineer page 11
  12. Map Reduce Strategies  HBase at Explorys  HBase is our transactional data store  Keys group data from a given patient together  MR jobs process data from HBase  Transform data and report data  Sample data  Emit data into a form which can be accessed efficiently from applications  Naïve MR jobs cause much, much stress page 12
  13. Local Aggregation Map Task 1  Locally aggregate processing of a patient Patient 1 : Encounter in an individual mapper Patient 1 : Observation  Fewer keys and chunkier values Patient 1 : Observation  Sorting is cheaper Patient 1 : Diagnosis  Careful Map Task 2  Patient data can span tasks Patient 1 : Drug Patient 2 : Encounter  Potential scalability issues Patient 2 : Observation Patient 2 : Observation  Data Intensive Text Processing with Map Reduce by Jimmy Lin and Chris Dyer covers this technique very well page 13
  14. Map Reduce and Junior Engineers  Map Reduce is Distributed Computing for the masses  Masses still do stupid things  Masses still have to write MR jobs to do their job  Safety at Explorys  Most of our engineers start without prior experience in Hadoop or HBase  Giving them a book only goes so far  Need a combination of process and technology  Still an uphill battle page 14
  15. Map Reduce and Junior Engineers  Process  Jobs are tested in development grid with real data  Most map reduce jobs are pushed into teams where MR and HBase education are very important  Technology  Constructed an API wrapping Hadoop mapreduce package  Alternate job builder interface with added type-safety  Adds the ability to swap-out at launch-time different contexts page 15
  16. Building a Solid Foundation Daniel Washburn Systems Engineer page 16
  17. Key Components Performance Management Release Configuration Management Management Teamwork page 17
  18. Performance Management  Collect as much as you can  Ganglia, OpenTSDB  Nagios, Zenoss  Understand what you’re monitoring  If you don’t know what a metric means, look it up!  Work with customers to understand what’s important to them  Act on it  State-based alerting is where many people stop  Data-driven, predictive approach is the goal  Create dashboards page 18
  19. Configuration Management  Consistency is essential  Do this while you’re still small!  Choose a methodology  Parallel execution/distribution  Configuration management engine  Implement it  Parallel-ssh, mcollective  Puppet page 19
  20. Release Management  Upgrade early and often  Become comfortable with the process  The logistics of upgrading can be tough, but it’s worth it  Get involved with the community  HBase is constantly evolving  The mailing lists and IRC channel are very active  Your contribution might help someone else page 20
  21. Teamwork  It takes a village…  … to raise an HBase  Effective communication is essential  We’re all part of the effort  Administrators  Engineers  Developers  End users page 21
  22. Thank You! Questions? Doug Meil Chief Software Architect Casey Stella Senior Software Engineer Daniel Washburn Systems Engineer page 22

Editor's Notes

  1. Performance ManagementMonitoring and ReportingConfiguration ManagementAutomationRelease ManagementUpgrades and TuningTeamworkYou’re in this togetherCustomer ServiceUnderstand who you work for
  2. Step 1: monitor, Monitor, MONITOR!Hadoop and Hbase ship with native Ganglia reporting. Reasonably easy to set up. Ganglia can be finicky.Nagios, Zenoss, etc. Everyone uses some sort of NMS. Choose your poison.OpenTSDB is great for those who want everything in one place, forever.Step 2: Understand what you’re monitoringIf you don’t know what a metric means, look it up! Always be learning.It may take you 20 minutes to figure out what something means, but you’ll know if for next timeWork with customers to understand what’s important to them, too.This doesn’t always mean paying customers, although they are important. This also means other teams in your company.Step 3: Act on the dataState-based alerting is easyAny NMS can give you up/down alertsData-driven alerts are harderWe have a script that reports when individual task trackers are more than 2 std deviations outside of mean for the gridBehavioral monitoring is goal“Listen for the silence”, report when expected tasks run for too long, or don’t run at all. We’re still working on this.
  3. Do this when you’re small!No, really.Don’t wait. Do it now.Consistency is essentialYou must trust your platform. You have to know that everything is working.Your customers must trust your platform. They’ll try to work around you if you can’t provide stability.Use version control. Manually editing configs will only take you so far. It breaks down quickly. It’s not about blame, it’s about consistency.Choose a methodology and implement itParallel execution/distributionWe’ve managed to strong-arm our way using SVN and parallel-ssh. Our arms are tired.Configuration managementConfiguration management tools mean you change it once and it goes everywhere.Means the difference between a date night and a date with your computer.
  4. Upgrade early and oftenTest, test, and re-test!The logistics of upgrading can be tough, but it’s worth it.Get involved with the communityHBase is constantly evolvingYour feature request might help someone else, toohbase-user and hbase-dev are very active mailing listsThe HBase developers don’t bite (hard)Case studies and documentation are always welcome
  5. It takes a village…… to raise an HBaseInter-team communication is essentialWe’re all part of the effortAdministratorsEngineersDevelopersManagersEnd users