Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hadoop in Healthcare - A No-Nonsense Q & A

69,840 views

Published on

Many industries, especially those using huge amounts of data like Facebook, are using Hadoop for their processing needs. So, what exactly is Big Data and Hadoop and what are its implications for healthcare? Hadoop is a distributed processing and storage platform. The use of Hadoop is rare in the healthcare industry, but healthcare analytics hasn’t necessarily been stalled because of this. In fact, the quality of data healthcare produces doesn’t justify Hadoop-level of processing power. This article answers questions such as what is Hadoop, what are the drivers of this platform in other industries, how might it affect healthcare analytics, how would clinicians use data sources outside their environment, and what drawbacks currently exist for further adoption.

Published in: Healthcare

Hadoop in Healthcare - A No-Nonsense Q & A

  1. 1. Hadoop in Healthcare – A No-nonsense Q & A © 2014 Health Catalyst www.healthcatalyst.com Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. © 2014 Health Catalyst www.healthcatalyst.com Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. By Jared Crapo
  2. 2. © 2014 Health Catalyst www.healthcatalyst.com Hadoop in Healthcare Hadoop is used in all kinds of applications like Facebook and LinkedIn. The potential for Big Data and Hadoop in healthcare and managing healthcare data is exciting, but—as of yet—has not been fully realized. Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
  3. 3. Although healthcare analytics haven’t yet been hampered by hospital systems not using Hadoop, it never hurts to look forward and consider the possibilities. Hadoop is an indispensable tool for efficiently storing and processing large quantities of data. Its unique capabilities will offer new ways of thinking about how we use healthcare data and analytics to provide improved patient care at reduced costs. What follows is a Q & A on Hadoop and its implications for the future of healthcare. © 2014 Health Catalyst www.healthcatalyst.com Hadoop in Healthcare Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
  4. 4. © 2014 Health Catalyst www.healthcatalyst.com What is Hadoop? 1 Hadoop is an open-source distributed data storage and analysis application that was developed by Yahoo! based on research papers published by Google. Hadoop implements Google’s MapReduce algorithm by divvying up a large query into many parts, sending those respective parts to many different processing nodes, and then combining the results from each node. Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. QUESTIONS HADOOP
  5. 5. © 2014 Health Catalyst www.healthcatalyst.com 1 What is Hadoop? Hadoop also refers to the tools and software that works with and enhances Hadoop’s core storage and processing components: Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. QUESTIONS HADOOP Hive – a SQL-like query language for Hadoop Pig – a high-level query language for MapReduce HBase – a columnar data store that runs on top of the Hadoop distributed file storage mechanism Spark – general purpose cluster computing framework
  6. 6. What are some key reasons to adopt Hadoop? © 2014 Health Catalyst www.healthcatalyst.com 2 Large companies are moving to Hadoop for generally two reasons: 1. Enormous data sets 2. Costs Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. QUESTIONS HADOOP For example, Yahoo! implemented 42,000 nodes in several different Hadoop clusters with a combined capacity of about 200 petabytes (200,000 terabytes).
  7. 7. What are some key reasons to adopt Hadoop? © 2014 Health Catalyst www.healthcatalyst.com Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. QUESTIONS 2 HADOOP Even if existing database applications could accommodate these large data sets, the cost of typical enterprise hardware and disk storage becomes prohibitive. Hadoop was designed from the beginning to run on commodity hardware which substantially reduces the need for expensive hardware infrastructure. Because Hadoop is open source, there are no licensing fees for the software either, another substantial savings.
  8. 8. How will Hadoop impact and/or change healthcare analytics? © 2014 Health Catalyst www.healthcatalyst.com Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. QUESTIONS 3 HADOOP Hadoop has been called the most significant data processing platform for big data analytics in healthcare. Using Hadoop, researchers can now use data sets that were traditionally impossible to handle. A team in Colorado is correlating air quality data with asthma admissions. Life sciences companies use genomic and proteomic data to speed drug development.
  9. 9. How will Hadoop impact and/or change healthcare analytics? © 2014 Health Catalyst www.healthcatalyst.com Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. QUESTIONS 3 HADOOP Healthcare analytics is generally not held back by the capability of the data processing platforms. There are a few exceptions in life sciences. But for most healthcare providers, the limiting factor is the willingness and ability let data inform and change the way care is delivered. Today, it takes more than a decade for compelling clinical evidence to become common clinical practice. It’s not how much data you have that matters, but how you use it.
  10. 10. How will clinicians use outside data sources? © 2014 Health Catalyst www.healthcatalyst.com Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. QUESTIONS 4 HADOOP Data from other clinical providers in your geography can be very useful. Claims data give a broad picture but not a deep one. Data from other non-traditional sources also has surprising relevance; in some cases, it’s a better predictor than clinical data. For example: EPA data on geographical toxic chemical load adds additional insight to cancer rates for long-term residents. The CMS-HCC risk adjustment model can help providers understand why patients in their area seem to have higher or lower risk for certain disease conditions. Household size of one increases the risk of readmissions because there is no other caregiver in the home.
  11. 11. © 2014 Health Catalyst www.healthcatalyst.com What are the drawbacks of Hadoop? Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. QUESTIONS 5 HADOOP What do CTOs, CIOs and other IT leaders need to consider? Hadoop is very young technology and the capabilities and tools are relatively immature. So too are the number of people who have Hadoop experience. Competition for these resources will be large technology and financial services companies. People with Hadoop experience are in high demand.
  12. 12. © 2014 Health Catalyst www.healthcatalyst.com What are the drawbacks of Hadoop? Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. QUESTIONS 5 HADOOP You should also consider alternate hardware maintenance schemes. Hadoop was designed for commodity hardware which generally experienced higher failure rates. Instead of purchasing hardware maintenance you should plan to have spare nodes on standby. The good news is that commercial database vendors, including Microsoft, Oracle, and Teradata, are all racing to integrate Hadoop into their offerings.
  13. 13. Where is Hadoop headed and how will it impact big data? © 2014 Health Catalyst www.healthcatalyst.com Proprietary. Feel free to share but we would appreciate a Health Catalyst citation. QUESTIONS 6 HADOOP Fifteen years ago, we didn’t capture data unless we knew we needed it. The cost to capture and store it was just too high. Fifteen years from now, reductions in the cost to capture and store data will likely mean that we will capture and store everything. Hadoop is a huge leap forward in our ability to efficiently store and process large quantities of data and allows creative thinking about how to apply the resulting answers in a meaningful and useful way.
  14. 14. © 2014 Health Catalyst www.healthcatalyst.com More about this topic Five Reasons Healthcare Data Is Different Dan LeSueur, Vice President, Technical Operations Big Data in Healthcare: Separating the Hype from Reality Jared Crapo, Vice President In Healthcare Predictive Analytics, Sometimes Big Data Is a Big Mess David Crockett, Senior Director, Research and Predictive Analytics Data Alone Is Not Enough: A Clinical Perspective (free, on-demand webinar, transcript, and slides) Dale Sanders, Senior Vice President, Strategy and John Kenagy, MD Using Healthcare Data: Healthcare Analytics Adoption Model (white paper) Dale Sanders, Senior Vice President, Strategy Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
  15. 15. © 2014 Health Catalyst www.healthcatalyst.com For more information: Proprietary. Feel free to share but we would appreciate a Health Catalyst citation.
  16. 16. Other Clinical Quality Improvement Resources © 2013 Health Catalyst www.healthcatalyst.com Click to read additional information at www.healthcatalyst.com Jared Crapo joined Health Catalyst in February 2013 as a Vice President. Prior to coming to Catalyst, he worked for Medicity as the Chief of Staff to the CEO. During his tenure at Medicity, he was also the Director of Product Management and the Director of Product Strategy. Jared co-founded Allviant, a spin-out of Medicity, that created consumer health management tools. In his early career, he developed physician accounting systems and health claims payment systems.

×