The document provides an overview of big data analytics for healthcare. It begins with motivating examples that demonstrate how big data can help improve healthcare outcomes and lower costs. It then discusses the main sources of healthcare data, including structured EHR data like billing codes, labs, and medications, as well as unstructured clinical notes. The document outlines challenges in analyzing these different types of complex healthcare data. It also introduces a healthcare analytics platform that can extract and select features from various data sources to build predictive models. Finally, it discusses techniques for clinical text mining, including named entity recognition and negation analysis.
Welcome to the age of cognitive computing: where intelligent machines have
moved from the realms of science fiction to the present day. This groundbreaking
technology is driving advanced discoveries and allowing improved decision-making –
resulting in better patient care
IBM Watson Health: How cognitive technologies have begun transforming clinica...Maged N. Kamel Boulos
Cite as: Kamel Boulos MN. IBM Watson Health: how cognitive technologies have begun transforming clinical medicine and healthcare (Oral session IV – Patient safety tools, Thursday 19 May 2016, 15:45-16:45, Hotel Puijonsarvi, Kuopio). In: Proceedings of the 4th Nordic Conference on Research in Patient Safety and Quality in Healthcare (NSQH2016), Kuopio, Finland, 18-20 May 2016 (organised by University of Eastern Finland), p.29. URL: http://www.uef.fi/NSQH2016 (In: Nykanen I (ed.). The 4th Nordic Conference on Research in Patient Safety and Quality in Healthcare. Kuopio, Finland, May 18-20, 2016. Program and Abstracts. Publications of the University of Eastern Finland. Report and Studies in Health Sciences 21. 2016, p.29 (of 119 p.). ISBN: 978-952-61-2130-7 (nid.), ISSNL: 1798-5722, ISSN: 1798-5730.)
IBM Watson health: how cognitive technologies have begun transforming clinical medicine and healthcare
Maged N Kamel Boulos
ABSTRACT
Background: IBM Watson Health (http://www.ibm.com/smarterplanet/us/en/ibmwatson/health/) belongs to a new generation of smart cognitive computing technologies (a type of artificial intelligence) that are poised to transform the way healthcare is delivered, and to vastly improve clinical outcomes, quality of care and patient safety.
Objectives: Our goal was to collect and document the huge potential of a range of emerging and exemplary uses of IBM Watson in healthcare in both developed and developing country settings.
Methods: A survey of current peer reviewed and grey literature has been conducted, looking for reports and case studies involving the use of IBM Watson in different health and healthcare applications.
Results, conclusions and clinical implications: With its ability to make sense of unstructured medical information by analysing the meaning and context of natural language, and uncovering important knowledge buried within large volumes of data and information, including medical images, IBM Watson is exceptionally well suited for clinical and healthcare decision support, where there are often elements of ambiguity and uncertainty. It has been (or is currently being) successfully deployed in many developed countries in the West, as well as in developing countries, such as India and South Africa. IBM Watson unlocks a complex case by acquiring information from multiple sources, e.g., accessing the electronic patient record, then parsing all related medical evidence at up to 60 million pages per second. After processing all of this information, Watson offers relevant and prioritised suggestions to the decision-maker, e.g., helping clinicians identify the best diagnosis and treatment options in complex oncology cases, and providing hospital managers with new operational insights. The ultimate goals are to reduce cost, medical errors, mortality rates, and help improve patients' quality of life.
Mie2014 workshop: Gap Analysis of Personalized Health Services through Patien...Pei-Yun Sabrina Hsueh
Gap Analysis of Insight-Driven Personalized Health Services through Patient-Controlled Devices
Pei-Yun Sabrina HSUEH, , Michael MARSCHOLLEK, Yardena PERES, Stefan von CAVALLAR and Fernando J. MARTIN-SANCHEZ
IBM T.J. Watson Research Center, Yorktown Heights, NY, USA
Hannover Medical School, Germany
IBM Research Lab in Haifa, Israel
IBM Research Lab in Melbourne, Australia
Melbourne Medical School, Australia
Mobile computing, wearable and embedded tech entail new and different styles of healthcare data processing, clinical and wellness decision support, and patient engagement schemes. This is especially important to the preventive and disease management scenarios that require better understanding of disease progression previously unable to achieve due to the lack of reliable means to capture granular patient-generated data in non-clinical settings. The new sources of data, when coupled with a framework to integrate analytical insights with feasible service models, enable reliable detection of inflection points, habit formation cycles and assessments of treatment efficacy. Research into data collection, recording, management and analysis of behavioral manisfestations and triggers will help address these challenges in areas spanning from simple fall detection to situations requiring complicated, multi-modal health monitoring such as Alzheimer’s progression and other adherence management cases. Leveraging recent advance in health devices and sensors as well as expertise in healthcare practice and informatics, the proposed workshop will help form a deeper understanding of requirements on patient-controlled devices to address unique healthcare challenges, identify care flow gaps and translate these findings to the design of platforms for patient-controlled devices and a portfolio of potential service models for preventive care and disease management.
Welcome to the age of cognitive computing: where intelligent machines have
moved from the realms of science fiction to the present day. This groundbreaking
technology is driving advanced discoveries and allowing improved decision-making –
resulting in better patient care
IBM Watson Health: How cognitive technologies have begun transforming clinica...Maged N. Kamel Boulos
Cite as: Kamel Boulos MN. IBM Watson Health: how cognitive technologies have begun transforming clinical medicine and healthcare (Oral session IV – Patient safety tools, Thursday 19 May 2016, 15:45-16:45, Hotel Puijonsarvi, Kuopio). In: Proceedings of the 4th Nordic Conference on Research in Patient Safety and Quality in Healthcare (NSQH2016), Kuopio, Finland, 18-20 May 2016 (organised by University of Eastern Finland), p.29. URL: http://www.uef.fi/NSQH2016 (In: Nykanen I (ed.). The 4th Nordic Conference on Research in Patient Safety and Quality in Healthcare. Kuopio, Finland, May 18-20, 2016. Program and Abstracts. Publications of the University of Eastern Finland. Report and Studies in Health Sciences 21. 2016, p.29 (of 119 p.). ISBN: 978-952-61-2130-7 (nid.), ISSNL: 1798-5722, ISSN: 1798-5730.)
IBM Watson health: how cognitive technologies have begun transforming clinical medicine and healthcare
Maged N Kamel Boulos
ABSTRACT
Background: IBM Watson Health (http://www.ibm.com/smarterplanet/us/en/ibmwatson/health/) belongs to a new generation of smart cognitive computing technologies (a type of artificial intelligence) that are poised to transform the way healthcare is delivered, and to vastly improve clinical outcomes, quality of care and patient safety.
Objectives: Our goal was to collect and document the huge potential of a range of emerging and exemplary uses of IBM Watson in healthcare in both developed and developing country settings.
Methods: A survey of current peer reviewed and grey literature has been conducted, looking for reports and case studies involving the use of IBM Watson in different health and healthcare applications.
Results, conclusions and clinical implications: With its ability to make sense of unstructured medical information by analysing the meaning and context of natural language, and uncovering important knowledge buried within large volumes of data and information, including medical images, IBM Watson is exceptionally well suited for clinical and healthcare decision support, where there are often elements of ambiguity and uncertainty. It has been (or is currently being) successfully deployed in many developed countries in the West, as well as in developing countries, such as India and South Africa. IBM Watson unlocks a complex case by acquiring information from multiple sources, e.g., accessing the electronic patient record, then parsing all related medical evidence at up to 60 million pages per second. After processing all of this information, Watson offers relevant and prioritised suggestions to the decision-maker, e.g., helping clinicians identify the best diagnosis and treatment options in complex oncology cases, and providing hospital managers with new operational insights. The ultimate goals are to reduce cost, medical errors, mortality rates, and help improve patients' quality of life.
Mie2014 workshop: Gap Analysis of Personalized Health Services through Patien...Pei-Yun Sabrina Hsueh
Gap Analysis of Insight-Driven Personalized Health Services through Patient-Controlled Devices
Pei-Yun Sabrina HSUEH, , Michael MARSCHOLLEK, Yardena PERES, Stefan von CAVALLAR and Fernando J. MARTIN-SANCHEZ
IBM T.J. Watson Research Center, Yorktown Heights, NY, USA
Hannover Medical School, Germany
IBM Research Lab in Haifa, Israel
IBM Research Lab in Melbourne, Australia
Melbourne Medical School, Australia
Mobile computing, wearable and embedded tech entail new and different styles of healthcare data processing, clinical and wellness decision support, and patient engagement schemes. This is especially important to the preventive and disease management scenarios that require better understanding of disease progression previously unable to achieve due to the lack of reliable means to capture granular patient-generated data in non-clinical settings. The new sources of data, when coupled with a framework to integrate analytical insights with feasible service models, enable reliable detection of inflection points, habit formation cycles and assessments of treatment efficacy. Research into data collection, recording, management and analysis of behavioral manisfestations and triggers will help address these challenges in areas spanning from simple fall detection to situations requiring complicated, multi-modal health monitoring such as Alzheimer’s progression and other adherence management cases. Leveraging recent advance in health devices and sensors as well as expertise in healthcare practice and informatics, the proposed workshop will help form a deeper understanding of requirements on patient-controlled devices to address unique healthcare challenges, identify care flow gaps and translate these findings to the design of platforms for patient-controlled devices and a portfolio of potential service models for preventive care and disease management.
The Future of Medical Education From Dreams to Reality (VR, AR, AI)SeriousGamesAssoc
With three decades of e-learning experience, Dr. Levy will present innovations in technology-enhanced education from the past, present, and into the future. He will highlight some of his medical education inventions and advances including some of the first laser discs, CD-ROMs, online case-based education, 3-D anatomical and procedural animations, robotic-assisted surgery, and virtual reality surgical simulation. He will describe the role of artificial intelligence and machine learning in medical education and clinical decision support and some future work in augmented reality. It is true that what were once dreams are now reality, but there are certainly more dreams to come.
Augmented Personalized Health: an explicit knowledge enhanced neurosymbolic d...Amit Sheth
Keynote at the SWAT4HCLS (Semantic Web Applications and Tools for Healthcare and Life Sciences), 12 Jan 2022. Event info:
http://www.swat4ls.org/workshops/leiden2022/keynotes/
Video: https://youtu.be/nwGAv9q2wsY
Healthcare as we know it is in the process of going through a massive change – from episodic to continuous, from disease-focused to wellness and quality of life focused, from clinic centric to anywhere a patient is, from clinician controlled to patient empowered, and from being driven by limited data to 360-degree, multimodal personal-public-population physical-cyber-social big data-driven. While the ability to create and capture data is already here, the upcoming innovations will be in converting this big data into smart data through contextual and personalized processing such that patients and clinicians can make better decisions and take timely actions. The exploitation of all relevant data, relevant medical knowledge, and explainable AI techniques will also support better communications between patients, clinicians, and virtual health assistants with higher-level abstractions (rather than low-level data) representing health choices, decisions and actions.
Augmented Personalized Healthcare (APH) strategy we are developing empowers patients with self-monitoring (collecting relevant data), self-appraisal (interpreting data in the patient’s context), self-management (assisting the patient in following personalized care plan to maintain health), to intervention (when the clinical help is needed) and disease progression tracking and prediction (http://bit.ly/AI-APH, http://bit.ly/APH-TED). We currently apply APH using mobile Apps and virtual health assistants for patients managing pediatric asthma (http://bit.ly/kAsthma), mental health, carbohydrate management for type 1 diabetes, hypertension, etc. In this talk, I will describe some of the technical components that incorporate context, personalization, and abstraction for supporting advanced capabilities such as patient engagement through meaningful question generation, chatbot safety, and explainable decision-making using knowledge-infused learning, a neurosymbolic AI strategy that utilizes many types and levels of explicit knowledge.
Insights from the Organization of International Challenges on Artificial Inte...Asma Ben Abacha
"Insights from the Organization of International Challenges on Artificial Intelligence in Medical Question Answering". Invited talk at the SciNLP (Natural Language Processing and Data Mining for Scientific Text) Workshop.
Dr. Asma Ben Abacha.
June 24, 2020.
Precision medicine and AI: problems aheadNeil Raden
The promise of personalized medicine has sparked a proliferation of AI hype. But the obstacles AI faces in the healthcare industry are daunting. Look no further than data silos - and the factors that spawned them.
Medical Question Answering: Dealing with the complexity and specificity of co...Asma Ben Abacha
"Medical Question Answering: Dealing with the complexity and specificity of consumer health questions and visual questions". Invited talk at the Allen Institute for AI (AI2), Seattle, Washington.
Dr. Asma Ben Abacha.
November 12, 2019.
Multimodal Question Answering in the Medical Domain (CMU/LTI 2020) | Dr. Asma...Asma Ben Abacha
"Multimodal Question Answering in the Medical Domain". Invited talk at the Language Technologies Institute (LTI), Carnegie Mellon University (CMU).
Dr. Asma Ben Abacha.
April 24, 2020.
The Hive Think Tank: Unpacking AI for Healthcare The Hive
In this The Hive Think Tank talk, Ash Damle, CEO of Lumiata takes a deep dive into Lumiata’s core technological engine - the Lumiata Medical Graph, which applies graph-based machine learning to compute the complex relationships between health data in the same way that a physician would, and how this medical AI engine powers personalization and automation within risk and care management.
Spurred by advances in processing power, memory, storage, and an unprecedented wealth of data, computers are being asked to tackle increasingly complex learning tasks, often with astonishing success. Computers have now mastered a popular variant of poker, learned the laws of physics from experimental data, and become experts in video games – tasks which would have been deemed impossible not too long ago. In parallel, the number of companies centered on applying complex data analysis to varying industries has exploded, and it is thus unsurprising that some analytic companies are turning attention to problems in healthcare. The purpose of this review is to explore what problems in medicine might benefit from such learning approaches and use examples from the literature to introduce basic concepts in machine learning. It is important to note that seemingly large enough medical data sets and adequate learning algorithms have been available for many decades – and yet, although there are thousands of papers applying machine learning algorithms to medical data, very few have contributed meaningfully to clinical care. This lack of impact stands in stark contrast to the enormous relevance of machine learning to many other industries. Thus part of my effort will be to identify what obstacles there may be to changing the practice of medicine through statistical learning approaches, and discuss how these might be overcome. Tarun Jaiswal | Sushma Jaiswal ""Deep Learning in Medicine"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23641.pdf
Paper URL: https://www.ijtsrd.com/computer-science/artificial-intelligence/23641/deep-learning-in-medicine/tarun-jaiswal
New Normal, New Future - Free Download E bookkevin brown
Healthcare is shifting from the traditional provider-centric,
in-patient setting to patient-centric, virtual consultations
with increased remote care monitoring. This transition
has prompted the need for MedTech industry to relook
at the products they develop and enhance value in care
delivery.
The COVID-19 pandemic has increased the use of
digital health technologies, and the need to develop
innovative devices or systems that support virtual
health. The last couple of years have seen increased
use of wearables, mobile and app-based technologies
along with data and analytics have been transforming
healthcare delivery.
Advancements in healthcare technologies like
Artificial Intelligence (AI), Virtual Reality and Augmented
Reality 3D-printing, robotics and nanotechnology are
shaping the future of healthcare. This technology boom
is helping address disease and medical conditions
through provision of cheaper, faster and more effective
solutions for diseases.
HEC 2016 Panel: Putting User-Generated Data in Action: Improving Interpretabi...Pei-Yun Sabrina Hsueh
Chair/Moderator: Pei-Yun Sabrina HSUEH, PhD (IBM T.J. Watson Research Center)
Panelists: XinXin ZHU, Bian YANG, Ying-Kuen CHEUNG , Thomas WETTER, and Sanjoy DEY
a IBM T.J. Watson Research Center, USA
b Norwegian University of Science and Technology, Norway
c Mailman School of Public health, Columbia University, USA
d, Department of Biomedical Informatics, University of Washington, USA
e Department of Medical Informatics, University of Heidelberg, Germany
The rise of consumer health awareness and the recent advent of personal health management tools (including mobile and health wearable devices) have contributed to another shift transforming the healthcare landscape. Despite the rise of health consumers, the impact of user-generated health data remains to be validated. In fact, many applications are hinged on the interpretability issues of this sort of data. The aim of this panel is two-fold. First, this panel aims to review the key dimensions in the interpretability, spanning from quality and reliability to information security and trust management. Secondly, since similar issues and methodologies have been proposed in different application areas ranging from clinical decision support to behavioral interventions and clinical trials, the panelists will also discuss both the success stories and the areas that fall short. The opportunities and barriers identified can then serve as guidelines or action items individuals can bring to their organizations to further improve the interpretability of user-generated data.
Healthcare Innovation and Transformation - Dr. Ken YaleKen Yale
We are living in the greatest time in human history! People are living their lives on smartphones and apps, measuring themselves with wearable devices like the Apple Watch, and improving their health and care with advanced analytic algorithms. Healthcare is adopting AI, Machine Learning, and Deep Learning at an accelerated pace. “Healthcare is very important for people. We are democratizing it. We are taking what has been with the institutions, and empowering the individual to manage their health.
And we’re just getting started!” - Apple CEO Tim Cook, Jan 2019
Healthcare is changing rapidly. It is clear that humans need mechanisms to automate some parts of data processing and help humans in decision making. This talk will concentrate on how to improve the machine understanding of unstructured data.
The Future of Medical Education From Dreams to Reality (VR, AR, AI)SeriousGamesAssoc
With three decades of e-learning experience, Dr. Levy will present innovations in technology-enhanced education from the past, present, and into the future. He will highlight some of his medical education inventions and advances including some of the first laser discs, CD-ROMs, online case-based education, 3-D anatomical and procedural animations, robotic-assisted surgery, and virtual reality surgical simulation. He will describe the role of artificial intelligence and machine learning in medical education and clinical decision support and some future work in augmented reality. It is true that what were once dreams are now reality, but there are certainly more dreams to come.
Augmented Personalized Health: an explicit knowledge enhanced neurosymbolic d...Amit Sheth
Keynote at the SWAT4HCLS (Semantic Web Applications and Tools for Healthcare and Life Sciences), 12 Jan 2022. Event info:
http://www.swat4ls.org/workshops/leiden2022/keynotes/
Video: https://youtu.be/nwGAv9q2wsY
Healthcare as we know it is in the process of going through a massive change – from episodic to continuous, from disease-focused to wellness and quality of life focused, from clinic centric to anywhere a patient is, from clinician controlled to patient empowered, and from being driven by limited data to 360-degree, multimodal personal-public-population physical-cyber-social big data-driven. While the ability to create and capture data is already here, the upcoming innovations will be in converting this big data into smart data through contextual and personalized processing such that patients and clinicians can make better decisions and take timely actions. The exploitation of all relevant data, relevant medical knowledge, and explainable AI techniques will also support better communications between patients, clinicians, and virtual health assistants with higher-level abstractions (rather than low-level data) representing health choices, decisions and actions.
Augmented Personalized Healthcare (APH) strategy we are developing empowers patients with self-monitoring (collecting relevant data), self-appraisal (interpreting data in the patient’s context), self-management (assisting the patient in following personalized care plan to maintain health), to intervention (when the clinical help is needed) and disease progression tracking and prediction (http://bit.ly/AI-APH, http://bit.ly/APH-TED). We currently apply APH using mobile Apps and virtual health assistants for patients managing pediatric asthma (http://bit.ly/kAsthma), mental health, carbohydrate management for type 1 diabetes, hypertension, etc. In this talk, I will describe some of the technical components that incorporate context, personalization, and abstraction for supporting advanced capabilities such as patient engagement through meaningful question generation, chatbot safety, and explainable decision-making using knowledge-infused learning, a neurosymbolic AI strategy that utilizes many types and levels of explicit knowledge.
Insights from the Organization of International Challenges on Artificial Inte...Asma Ben Abacha
"Insights from the Organization of International Challenges on Artificial Intelligence in Medical Question Answering". Invited talk at the SciNLP (Natural Language Processing and Data Mining for Scientific Text) Workshop.
Dr. Asma Ben Abacha.
June 24, 2020.
Precision medicine and AI: problems aheadNeil Raden
The promise of personalized medicine has sparked a proliferation of AI hype. But the obstacles AI faces in the healthcare industry are daunting. Look no further than data silos - and the factors that spawned them.
Medical Question Answering: Dealing with the complexity and specificity of co...Asma Ben Abacha
"Medical Question Answering: Dealing with the complexity and specificity of consumer health questions and visual questions". Invited talk at the Allen Institute for AI (AI2), Seattle, Washington.
Dr. Asma Ben Abacha.
November 12, 2019.
Multimodal Question Answering in the Medical Domain (CMU/LTI 2020) | Dr. Asma...Asma Ben Abacha
"Multimodal Question Answering in the Medical Domain". Invited talk at the Language Technologies Institute (LTI), Carnegie Mellon University (CMU).
Dr. Asma Ben Abacha.
April 24, 2020.
The Hive Think Tank: Unpacking AI for Healthcare The Hive
In this The Hive Think Tank talk, Ash Damle, CEO of Lumiata takes a deep dive into Lumiata’s core technological engine - the Lumiata Medical Graph, which applies graph-based machine learning to compute the complex relationships between health data in the same way that a physician would, and how this medical AI engine powers personalization and automation within risk and care management.
Spurred by advances in processing power, memory, storage, and an unprecedented wealth of data, computers are being asked to tackle increasingly complex learning tasks, often with astonishing success. Computers have now mastered a popular variant of poker, learned the laws of physics from experimental data, and become experts in video games – tasks which would have been deemed impossible not too long ago. In parallel, the number of companies centered on applying complex data analysis to varying industries has exploded, and it is thus unsurprising that some analytic companies are turning attention to problems in healthcare. The purpose of this review is to explore what problems in medicine might benefit from such learning approaches and use examples from the literature to introduce basic concepts in machine learning. It is important to note that seemingly large enough medical data sets and adequate learning algorithms have been available for many decades – and yet, although there are thousands of papers applying machine learning algorithms to medical data, very few have contributed meaningfully to clinical care. This lack of impact stands in stark contrast to the enormous relevance of machine learning to many other industries. Thus part of my effort will be to identify what obstacles there may be to changing the practice of medicine through statistical learning approaches, and discuss how these might be overcome. Tarun Jaiswal | Sushma Jaiswal ""Deep Learning in Medicine"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23641.pdf
Paper URL: https://www.ijtsrd.com/computer-science/artificial-intelligence/23641/deep-learning-in-medicine/tarun-jaiswal
New Normal, New Future - Free Download E bookkevin brown
Healthcare is shifting from the traditional provider-centric,
in-patient setting to patient-centric, virtual consultations
with increased remote care monitoring. This transition
has prompted the need for MedTech industry to relook
at the products they develop and enhance value in care
delivery.
The COVID-19 pandemic has increased the use of
digital health technologies, and the need to develop
innovative devices or systems that support virtual
health. The last couple of years have seen increased
use of wearables, mobile and app-based technologies
along with data and analytics have been transforming
healthcare delivery.
Advancements in healthcare technologies like
Artificial Intelligence (AI), Virtual Reality and Augmented
Reality 3D-printing, robotics and nanotechnology are
shaping the future of healthcare. This technology boom
is helping address disease and medical conditions
through provision of cheaper, faster and more effective
solutions for diseases.
HEC 2016 Panel: Putting User-Generated Data in Action: Improving Interpretabi...Pei-Yun Sabrina Hsueh
Chair/Moderator: Pei-Yun Sabrina HSUEH, PhD (IBM T.J. Watson Research Center)
Panelists: XinXin ZHU, Bian YANG, Ying-Kuen CHEUNG , Thomas WETTER, and Sanjoy DEY
a IBM T.J. Watson Research Center, USA
b Norwegian University of Science and Technology, Norway
c Mailman School of Public health, Columbia University, USA
d, Department of Biomedical Informatics, University of Washington, USA
e Department of Medical Informatics, University of Heidelberg, Germany
The rise of consumer health awareness and the recent advent of personal health management tools (including mobile and health wearable devices) have contributed to another shift transforming the healthcare landscape. Despite the rise of health consumers, the impact of user-generated health data remains to be validated. In fact, many applications are hinged on the interpretability issues of this sort of data. The aim of this panel is two-fold. First, this panel aims to review the key dimensions in the interpretability, spanning from quality and reliability to information security and trust management. Secondly, since similar issues and methodologies have been proposed in different application areas ranging from clinical decision support to behavioral interventions and clinical trials, the panelists will also discuss both the success stories and the areas that fall short. The opportunities and barriers identified can then serve as guidelines or action items individuals can bring to their organizations to further improve the interpretability of user-generated data.
Healthcare Innovation and Transformation - Dr. Ken YaleKen Yale
We are living in the greatest time in human history! People are living their lives on smartphones and apps, measuring themselves with wearable devices like the Apple Watch, and improving their health and care with advanced analytic algorithms. Healthcare is adopting AI, Machine Learning, and Deep Learning at an accelerated pace. “Healthcare is very important for people. We are democratizing it. We are taking what has been with the institutions, and empowering the individual to manage their health.
And we’re just getting started!” - Apple CEO Tim Cook, Jan 2019
Healthcare is changing rapidly. It is clear that humans need mechanisms to automate some parts of data processing and help humans in decision making. This talk will concentrate on how to improve the machine understanding of unstructured data.
Data Science in Healthcare" by authors Sergio Consoli, Diego Reforgiato Recupero, and Milan Petkovic is an insightful guide that delves into the intersection of data science and healthcare. As a first-year student in Pharmaceutical Management, I found this book to be a valuable resource for understanding how data-driven approaches are transforming the healthcare industry, offering fresh perspectives and practical insights for future professionals like myself.
CORD Rare Drug Conference, June 8 - 9, 2022
Opportunities and Challenges for Data Management Real-World Data and Real-World Evidence
• Patient support programs: Sandra Anderson, Innomar Strategies
• AI for Data Management and Enhancement: Aaron Leibtag, Pentavere
• Patient Support and RWE: Laurie Lambert, CADTH
Leveraging Data Analysis for Advancements in Healthcare and Medical Research.pdfSoumodeep Nanee Kundu
Data analysis in healthcare encompasses a wide range of applications, all geared toward improving patient care and well-being. It begins with the collection of diverse healthcare data, which includes electronic health records, medical imaging, genomic data, wearable device data, and more. These data sources provide a rich tapestry of information that can be analysed to unlock valuable insights and drive healthcare advancements.
One of the primary areas where data analysis is a game-changer is in clinical decision-making. Through the utilization of data-driven algorithms, healthcare professionals are empowered to make informed decisions regarding patient diagnosis, treatment plans, and prognosis. Clinical Decision Support Systems (CDSS), powered by data analysis, provide real-time guidance based on evidence-based medical knowledge, assisting physicians in choosing the most appropriate treatments and interventions. This not only enhances patient care but also reduces medical errors and ensures that treatment decisions are aligned with the most current medical research.
Data analysis is also instrumental in early disease identification and monitoring. Machine learning models, for example, can predict the onset of diseases like diabetes, Alzheimer's, and cardiovascular conditions by analysing patient data. This early detection capability enables healthcare providers to intervene proactively, potentially preventing or mitigating the severity of these conditions. This aspect of data analysis significantly contributes to the shift from reactive to proactive healthcare, improving patient outcomes and reducing healthcare costs.
Epidemiology and public health are areas where data analysis plays a vital role. The analysis of healthcare data is essential for tracking and predicting disease outbreaks, which is especially critical in the context of infectious diseases and bioterrorism preparedness. Real-time analysis of health data can offer early warning signs of emerging epidemics, allowing authorities to take timely preventive measures and allocate resources efficiently.
Augmented Personalized Health: using AI techniques on semantically integrated...Amit Sheth
Keynote @ 2018 AAAI Joint Workshop on Health Intelligence (W3PHIAI 2018), 2 February 2018, New Orleans, LA [Video: https://youtu.be/GujvoWRa0O8]
Related article: https://ieeexplore.ieee.org/document/8355891/
Abstract
Healthcare as we know it is in the process of going through a massive change - from episodic to continuous, from disease-focused to wellness and quality of life focused, from clinic centric to anywhere a patient is, from clinician controlled to patient empowered, and from being driven by limited data to 360-degree, multimodal personal-public-population physical-cyber-social big data-driven. While the ability to create and capture data is already here, the upcoming innovations will be in converting this big data into smart data through contextual and personalized processing such that patients and clinicians can make better decisions and take timely actions for augmented personalized health. In this talk, we will discuss how use of AI techniques on semantically integrated patient-generated health data (PGHD), environmental data, clinical data, and public social data is exploited to achieve a range of augmented health management strategies that include self-monitoring, self-appraisal, self-management, intervention, and Disease Progression Tracking and Prediction. We will review examples and outcomes from a number of applications, some involving patient evaluations, including asthma in children, bariatric surgery/obesity, mental health/depression, that are part of the Kno.e.sis kHealth personalized digital health initiative.
Background: Background: http://bit.ly/k-APH, http://bit.ly/kAsthma, http://j.mp/PARCtalk
This white paper offers a detailed perspective on how big data is impacting the healthcare industry and its underlying implication on the industry as a whole. It outlines the role of big data in healthcare, its benefits, core components and challenges faced by the healthcare sector towards full-fledged adoption & implementation.
From personal health data to a personalized adviceWessel Kraaij
Invited talk at the health track of ICT.OPEN 2018, 20-3-2018
1. Related Data science challenges to Digital Health trends
2. Designing an infrastructure to support secure learning from distributed health data repositories, for personalized health advice
3. Supporting patients with rare diseases with patient driven research and the generation of new hypotheses based on patient experiences.
Improving health care outcomes with responsible data scienceWessel Kraaij
Keynote presentation by Wessel Kraaij at the Dutch pattern recognition and impage processing society (NVPBV) 29/5/2018, Eindhoven.
This talk discusses
1. trends in health care and respondible data science and their intersection
2. Secure federated analytics on distributed data repositories
3. Generating clinically relevant hypotheses from patient forum discussions.
Overview of Health Informatics: survey of fundamentals of health information technology, Identify the forces behind health informatics, educational and career opportunities in health informatics.
The Power and Promise of Unstructured Patient Data Healthline
Unstructured search capabilities, superior natural language processing, and healthcare ontology capabilities will help distinguish the leading products in the category (information and data-driven decision making).
A BIG DATA REVOLUTION IN HEALTH CARE SECTOR: OPPORTUNITIES, CHALLENGES AND TE...ijistjournal
Health care sector grows tremendously in last few decades. The health care sector has generated huge amounts of data that has huge volume, enormous velocity and vast variety. Also it comes from a variety of new sources as hospitals are now tend to implemented electronic health record (EHR) systems. These sources have strained the existing capabilities of existing conventional relational database management systems. In such scenario, Big data solutions offer to harness these massive, heterogeneous and complex data sets to obtain more meaningful and knowledgeable information.
This paper basically studies the impact of implementing the big data solutions on the healthcare sector, the potential opportunities, challenges and available platform and tools to implement Big data analytics in health care sector.
Big data for healthcare analytics final -v0.3 mizYusuf Brima
Sources of Big Data in Health (a comparative description of national and international data sources and identification of new/emerging sources of data)
The main aim of this paper is to provide a deep analysis on the research field of healthcare data analytics., as well as highlighting some of guidelines and gaps in previous studies. This study has focused on searching relevant papers about healthcare analytics by searching in seven popular databases such as google scholar and springer using specific keywords, in order to understand the healthcare topic and conduct our literature review. The paper has listed some data analytics tools and techniques that have been used to improve healthcare performance in many areas such as medical operations, reports, decision making, and prediction and prevention system. Moreover, the systematic review has showed an interesting demographic of fields of publication, research approaches, as well as outlined some of the possible reasons and issues associated with healthcare data analytics, based on geographical distribution theme. Snober Jon | Shafqat Manzoor | Beenish Bashir | Monisa Nazir "Data Science in Healthcare" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-6 | Issue-1 , December 2021, URL: https://www.ijtsrd.com/papers/ijtsrd47870.pdf Paper URL: https://www.ijtsrd.com/engineering/computer-engineering/47870/data-science-in-healthcare/snober-jon
Benefits of Big Data in Health Care A Revolutionijtsrd
Lifespan of a normal human is increasing with the world population and it produces new challenge in health care. big data change the method of data management ,leverage data and analyzing data.with the help of big data we can reduces the costs of treatment, reducing medication and provide better treatment with predictive analytics. Health related data collected from various sources like electronic health record EHR ,medical imaging system, genomic sequencing, pay of records, pharmaceutical research , and medical devices, etc. are refers to as big data in healthcare. Dr. Ritushree Narayan ""Benefits of Big Data in Health Care: A Revolution"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-3 , April 2019, URL: https://www.ijtsrd.com/papers/ijtsrd22974.pdf
Paper URL: https://www.ijtsrd.com/computer-science/data-miining/22974/benefits-of-big-data-in-health-care-a-revolution/dr-ritushree-narayan
Gain insights from data analytics and take action! Learn why everyone is making a big deal about big data in healthcare and how data analytics creates action.
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as “distorted thinking”.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
2024.06.01 Introducing a competency framework for languag learning materials ...
Sun==big data analytics for health care
1. 1
Big Data Analytics for
Healthcare
Chandan K. Reddy
Department of Computer Science
Wayne State University
Tutorial presentation at the SIAM International
Conference on Data Mining, Austin, TX, 2013.
The updated tutorial slides are available at http://dmkd.cs.wayne.edu/TUTORIAL/Healthcare/
Jimeng Sun
Healthcare Analytics Department
IBM TJ Watson Research Center
2. 2
Motivation
Hersh, W., Jacko, J. A., Greenes, R., Tan, J., Janies, D., Embi, P. J., & Payne, P. R. (2011). Health-care hit or miss? Nature, 470(7334), 327.
In 2012, worldwide digital healthcare
data was estimated to be equal to 500
petabytes and is expected to reach
25,000 petabytes in 2020.
Can we learn from the past to
become better in the future ??
Healthcare Data is
becoming more complex !!
3. 3
Organization of this Tutorial
Introduction
Motivating Examples
Sources and Techniques for Big Data in Healthcare
– Structured EHR Data
– Unstructured Clinical Notes
– Medical Imaging Data
– Genetic Data
– Other Data (Epidemiology & Behavioral)
Final Thoughts and Conclusion
5. 5
Definition of Big Data
A collection of large and complex data sets which are difficult to process
using common database management tools or traditional data
processing applications.
The challenges include capturing, storing,
searching, sharing & analyzing.
The four dimensions (V’s) of Big Data
Big data is not just about size.
• Finds insights from complex, noisy,
heterogeneous, longitudinal, and
voluminous data.
• It aims to answer questions that
were previously unanswered.
BIG
DATA
Velocity
Veracity
Variety
Volume
“Big data refers to the tools, processes
and procedures allowing an organization
to create, manipulate, and manage very
large data sets and storage facilities”
– according to zdnet.com
6. 6
Reasons for Growing Complexity/Abundance of Healthcare Data
Standard medical practice is moving from relatively ad-hoc and subjective
decision making to evidence-based healthcare.
More incentives to professionals/hospitals to use EHR technology.
Additional Data Sources
Development of new technologies such as capturing devices, sensors, and
mobile applications.
Collection of genomic information became cheaper.
Patient social communications in digital forms are increasing.
More medical knowledge/discoveries are being accumulated.
7. 7
Big Data Challenges in Healthcare
Inferring knowledge from complex heterogeneous patient sources.
Leveraging the patient/data correlations in longitudinal records.
Understanding unstructured clinical notes in the right context.
Efficiently handling large volumes of medical imaging data and extracting
potentially useful information and biomarkers.
Analyzing genomic data is a computationally intensive task and combining
with standard clinical data adds additional layers of complexity.
Capturing the patient’s behavioral data through several sensors; their
various social interactions and communications.
8. 8
Take advantage of the massive amounts of data and provide
right intervention to the right patient at the right time.
Personalized care to the patient.
Potentially benefit all the components of a healthcare system
i.e., provider, payer, patient, and management.
Electronic
Health Records
Evidence
+ Insights
Improved outcomes
through smarter decisions
Lower costs
Big Data
Analytics
Overall Goals of Big Data Analytics in Healthcare
BehavioralGenomic
Public Health
9. 9
Purpose of this Tutorial
Two-fold objectives:
Introduce the data mining researchers to the sources available and the
possible challenges and techniques associated with using big data in
healthcare domain.
Introduce Healthcare analysts and practitioners to the advancements in the
computing field to effectively handle and make inferences from voluminous
and heterogeneous healthcare data.
The ultimate goal is to bridge data mining and medical informatics
communities to foster interdisciplinary works between the two communities.
PS: Due to the broad nature of the topic, the primary emphasis will be on
introducing healthcare data repositories, challenges, and concepts to data
scientists. Not much focus will be on describing the details of any particular
techniques and/or solutions.
10. 10
Disclaimers
Being a recent and growing topic, there might be several other
resources that might not be covered here.
Presentation here is more biased towards the data scientists’
perspective and may be less towards the healthcare
management or healthcare provider’s perspective.
Some of the website links provided might become obsolete in
the future. This tutorial is prepared in early 2013.
Since this topic contains a wide varieties of problems, there
might be some aspects of healthcare that might not be
covered in the tutorial.
12. 12
EXAMPLE 1: Heritage Health Prize
Over $30 billion was spent on unnecessary hospital admissions.
Goals:
Identify patients at high-risk and ensure they get the treatment they need.
Develop algorithms to predict the number of days a patient will spend in a
hospital in the next year.
Outcomes:
Health care providers can develop new strategies to care for patients before
its too late reduces the number of unnecessary hospitalizations.
Improving the health of patients while decreasing the costs of care.
Winning solutions use a combination of several predictive models.
http://www.heritagehealthprize.com
13. 13
EXAMPLE 2: Penalties for Poor Care - 30-Day Readmissions
Hospitalizations account for more than 30% of the 2
trillion annual cost of healthcare in the United States.
Around 20% of all hospital admissions occur within 30
days of a previous discharge.
– not only expensive but are also potentially harmful,
and most importantly, they are often preventable.
Medicare penalizes hospitals that have high rates of readmissions among
patients with heart failure, heart attack, and pneumonia.
Identifying patients at risk of readmission can guide efficient resource
utilization and can potentially save millions of healthcare dollars each year.
Effectively making predictions from such complex hospitalization data will
require the development of novel advanced analytical models.
14. 14
EXAMPE 3: White House unveils BRAIN Initiative
The US President unveiled a new bold $100 million
research initiative designed to revolutionize our
understanding of the human brain. BRAIN (Brain Research
through Advancing Innovative Neurotechnologies) Initiative.
Find new ways to treat, cure, and even prevent brain
disorders, such as Alzheimer’s disease, epilepsy, and
traumatic brain injury.
“Every dollar we invested to map the human genome returned $140 to our
economy... Today, our scientists are mapping the human brain to unlock the
answers to Alzheimer’s.”
-- President Barack Obama, 2013 State of the Union.
“advances in "Big Data" that are necessary to analyze the huge amounts of
information that will be generated; and increased understanding of how
thoughts, emotions, actions and memories are represented in the brain.” : NSF
Joint effort by NSF, NIH, DARPA, and other private partners.
http://www.whitehouse.gov/infographics/brain-initiative
15. 15
EXAMPLE 4: GE Head Health Challenge
Challenge 1: Methods for Diagnosis and Prognosis of Mild
Traumatic Brain Injuries.
Challenge 2: The Mechanics of Injury: Innovative
Approaches For Preventing And Identifying Brain Injuries.
In Challenge 1, GE and the NFL will award up to $10M for
two types of solutions: Algorithms and Analytical Tools, and
Biomarkers and other technologies. A total of $60M in
funding over a period of 4 years.
17. 17
Data Collection and Analysis
Jensen, Peter B., Lars J. Jensen, and Søren Brunak. "Mining electronic health records: towards better
research applications and clinical care." Nature Reviews Genetics (2012).
Effectively integrating and efficiently analyzing various forms of healthcare data over
a period of time can answer many of the impending healthcare problems.
18. 18
Organization of this Tutorial
Introduction
Motivating Examples
Sources and Techniques for Big Data in Healthcare
– Structured EHR Data
– Unstructured Clinical Notes
– Medical Imaging Data
– Genetic Data
– Other Data (Epidemiology & Behavioral)
Final Thoughts and Conclusion
22. 22
Data
Health
data
Genomic data
• DNA sequences
Clinical data
• Structured EHR
• Unstructured
EHR
• Medical Images Behavior data
• Social network
data
• Mobility sensor
data
23. 23
Billing data - ICD codes
ICD stands for International Classification of Diseases
ICD is a hierarchical terminology of diseases, signs, symptoms, and
procedure codes maintained by the World Health Organization
(WHO)
In US, most people use ICD-9, and the rest of world use ICD-10
Pros: Universally available
Cons: medium recall and medium precision for characterizing
patients
• (250) Diabetes mellitus
• (250.0) Diabetes mellitus without mention of complication
• (250.1) Diabetes with ketoacidosis
• (250.2) Diabetes with hyperosmolarity
• (250.3) Diabetes with other coma
• (250.4) Diabetes with renal manifestations
• (250.5) Diabetes with ophthalmic manifestations
• (250.6) Diabetes with neurological manifestations
• (250.7) Diabetes with peripheral circulatory disorders
• (250.8) Diabetes with other specified manifestations
• (250.9) Diabetes with unspecified complication
24. 24
Billing data – CPT codes
CPT stands for Current Procedural Terminology created by the
American Medical Association
CPT is used for billing purposes for clinical services
Pros: High precision
Cons: Low recall
Codes for Evaluation and Management: 99201-99499
(99201 - 99215) office/other outpatient services
(99217 - 99220) hospital observation services
(99221 - 99239) hospital inpatient services
(99241 - 99255) consultations
(99281 - 99288) emergency dept services
(99291 - 99292) critical care services
…
25. 25
Lab results
The standard code for lab is Logical Observation Identifiers Names and Codes
(LOINC®)
Challenges for lab
– Many lab systems still use local dictionaries to encode labs
– Diverse numeric scales on different labs
• Often need to map to normal, low or high ranges in order to be useful for
analytics
– Missing data
• not all patients have all labs
• The order of a lab test can be predictive, for example, BNP indicates high
likelihood of heart failure
Time Lab Value
1996-03-15 12:50:00.0 CO2 29.0
1996-03-15 12:50:00.0 BUN 16.0
1996-03-15 12:50:00.0 HDL-C 37.0
1996-03-15 12:50:00.0 K 4.5
1996-03-15 12:50:00.0 Cl 102.0
1996-03-15 12:50:00.0 Gluc 86.0
26. 26
Medication
Standard code is National Drug Code (NDC) by Food and Drug
Administration (FDA), which gives a unique identifier for each drug
– Not used universally by EHR systems
– Too specific, drugs with the same ingredients but different brands
have different NDC
RxNorm: a normalized naming system for generic and branded drugs
by National Library of Medicine
Medication data can vary in EHR systems
– can be in both structured or unstructured forms
Availability and completeness of medication data vary
– Inpatient medication data are complete, but outpatient medication
data are not
– Medication usually only store prescriptions but we are not sure
whether patients actually filled those prescriptions
27. 27
Clinical notes
Clinical notes contain rich and diverse source of information
Challenges for handling clinical notes
– Ungrammatical, short phrases
– Abbreviations
– Misspellings
– Semi-structured information
• Copy-paste from other structure source
– Lab results, vital signs
• Structured template:
– SOAP notes: Subjective,
Objective, Assessment,
Plan
28. 28
Summary of common EHR data
ICD CPT Lab Medication Clinical notes
Availability High High High Medium Medium
Recall Medium Poor Medium Inpatient: High
Outpatient: Variable
Medium
Precision Medium High High Inpatient: High
Outpatient: Variable
Medium high
Format Structured Structured Mostly
structured
Structured and
unstructured
Unstructured
Pros Easy to work with, a
good approximation
of disease status
Easy to work
with, high
precision
High data
validity
High data validity More details
about doctors’
thoughts
Cons Disease code often
used for screening,
therefore disease
might not be there
Missing data Data
normalization
and ranges
Prescribed not
necessary taken
Difficult to
process
Joshua C. Denny Chapter 13: Mining Electronic Health Records in the Genomics Era. PLoS Comput Biol. 2012 December; 8(12):
34. 34
Text Mining in Healthcare
Text mining
– Information Extraction
• Name Entity Recognition
– Information Retrieval
Clinical text vs. Biomedical text
– Biomedical text: medical literatures (well-written medical text)
– Clinical text is written by clinicians in the clinical settings
• Meystre et al. Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent
Research. IMIA 2008
• Zweigenbaum et al. Frontiers of biomedical text mining: current progress, BRIEFINGS IN BIOINFORMATICS. VOL 8. NO 5.
358-375
• Cohen and Hersh, A survey of current work in biomedical text mining. BRIEFINGS IN BIOINFORMATICS. VOL 6. NO 1.
57–71.
35. 35
Auto-Coding: Extracting Codes from Clinical Text
Problem
– Automatically assign diagnosis codes to clinical text
Significance
– The cost is approximately $25 billion per year in the US
Available Data
– Medical NLP Challenges from 2007
• Subsections from radiology reports: clinical history and impression
Potential Evaluation Metric:
– F-measure = 2P*R/(P+R), where P is precision, and R is recall.
Example References
• Aronson et al. From indexing the biomedical literature to coding clinical text: experience with MTI
and machine learning approaches. BioNLP 2007
• Crammer et al. Automatic Code Assignment to Medical Text. BioNLP 2007
• Friedman C, Shagina L, Lussier Y, Hripcsak G. Automated encoding of clinical documents based
on natural language processing. JAMIA. 2004:392-402
• Pakhomov SV, Buntrock JD, Chute CG. Automating the assignment of diagnosis codes to patient
encounters using example-based and machine learning techniques. JAMIA 2006:516-25.
36. 36
Context Analysis - Negation
Negation: e.g., ...denies chest pain…
– NegExpander [1] achieves 93% precision on mammographic
reports
– NegEx [2] uses regular expression and achieves 94.5% specificity
and 77.8% sensitivity
– NegFinder [3] uses UMLS and regular expression, and achieves
97.7 specificity and 95.3% sensitivity when analyzing surgical
notes and discharge summaries
– A hybrid approach [4] uses regular expression and grammatical
parsing and achieves 92.6% sensitivity and 99.8% specificity
1. Aronow DB, Fangfang F, Croft WB. Ad hoc classification of radiology reports. JAMIA 1999:393-411
2. Chapman et al. A simple algorithm for identifying negated findings and diseases in discharge summaries. JBI 2001:301-10.
3. Mutalik PG, et al. Use of general-purpose negation detection to augment concept indexing of medical documents: a
quantitative study using the UMLS. JAMIA 2001:598-609.
4. Huang Y, Lowe HJ. A novel hybrid approach to automated negation detection in clinical radiology reports. JAMIA 2007
37. 37
Context Analysis - Temporality
Temporality: e.g., …fracture of the tibia 2 years ago
– TimeText [1] can detect temporal relations with 93.2% recall and
96.9% precision on 14 discharge summaries
– Context [2] is an extension of NegEx, which identifies
• negations (negated, affirmed),
• temporality (historical, recent, hypothetical)
• experiencer (patient, other)
1. Zhou et al. The Evaluation of a Temporal Reasoning System in Processing Clinical Discharge Summaries. JAMIA 2007.
2. Chapman W, Chu D, Dowling JN. ConText: An Algorithm for Identifying Contextual Features from Clinical Text. BioNLP 2007
Trigger TerminationClinical concepts
Scope of the context
v v v v
38. 38
CASE 1: CASE BASED
RETRIEVAL
Sondhi P, Sun J, Zhai C, Sorrentino R, Kohn MS. Leveraging medical thesauri and physician feedback for improving medical
literature retrieval for case queries. JAMIA. 2012
39. 39
Input: Case Query
Patient with smoking habit and weight loss. The
frontal and lateral chest X rays show a mass in the
posterior segment of the right upper lobe as well as a
right hilar enlargement and obliteration of the right
paratracheal stripe. On the chest CT the contours of
the mass are lobulated with heterogeneous
enhancement.Enlarged mediastinal and hilar lymph
nodes are present.
40. 40
Goal: Find Relevant Research Articles to a Query
Additional Related Information
Disease MeSH
41. 41
Challenge 1: Query Weighing
Patient with smoking habit and weight
loss. The frontal and lateral chest X
rays show a mass in the posterior
segment of the right upper lobe as well as
a right hilar enlargement and obliteration
of the right paratracheal stripe. On the
chest CT the contours of the mass are
lobulated with heterogeneous
enhancement. Enlarged mediastinal and
hilar lymph nodes are present.
Queries are long
Not all words useful
IDF does not reflect
importance
Semantics decide weight
42. 42
Method: Semantic Query Weighing
Identify important UMLS Semantic Types
based on their definition
Assign higher weights to query words
under these types
Disease or syndrome, Body part organ or organ
component, Sign or symptom, Finding, Acquired
abnormality, Congenital abnormality, Mental or
behavioral dysfunction, Neoplasm, Pharmacologic
substance, Individual Behavior
Included UMLS semantic types
43. 43
Matching variants
“x ray”, “x-rays”,
“x rays”
Matching synonyms
“CT” or “x rays”
Knowledge gap
Challenge 2: Vocabulary Gap
Patient with smoking habit
and weight loss. The frontal and
lateral chest X rays show a mass in
the posterior segment of
the right upper lobe as well as a right
hilar enlargement and obliteration of
the right paratracheal stripe. On the
chest CT the contours of the mass are
lobulated with heterogeneous
enhancement.Enlarged mediastinal
and hilar lymph nodes are present.
44. 44
Method: Additional Query Keywords
Asked physicians to provide additional keywords
Adding them with low weight helps
Any potential diagnosis keywords help greatly
Gives us insights into better query formulation
Female patient, 25 years old, with fatigue and a swallowing disorder
(dysphagia worsening during a meal). The frontal chest X-ray shows
opacity with clear contours in contact with the right heart border. Right
hilar structures are visible through the mass. The lateral X-ray confirms
the presence of a mass in the anterior mediastinum. On CT images, the
mass has a relatively homogeneous tissue density.
Additional keywords: Thymoma, Lymphoma, Dysphagia, Esophageal
obstruction, Myasthenia gravis, Fatiguability, Ptosis
45. 45
Challenge 3: Pseudo-Feedback
General vocabulary gap solution:
– Apply Pseudo-Relevance Feedback
What if very few of the top N are
relevant?
No idea which keywords to pick up
46. 46
Method: Medical Subject Heading (MeSH) Feedback
Any case related query usually relates only to
handful of conditions
How to guess the condition of the query?
–Select MeSH terms from top N=10 ranked
documents
–Select MeSH terms covering most query
keywords
–Use them for feedback
52. 52
CASE 2: HEART FAILURE SIGNS
AND SYMPTOMS
Roy J. Byrd, Steven R. Steinhubl, Jimeng Sun, Shahram Ebadollahi, Walter F. Stewart. Automatic identification of heart failure
diagnostic criteria, using text analysis of clinical notes from electronic health records. International Journal of Medical Informatics 2013
53. 53
Framingham HF Signs and Symptoms
Framingham criteria for HF* are signs and symptoms that
are documented even at primary care visits
* McKee PA, Castelli WP, McNamara PM, Kannel WB. The natural history of congestive heart failure: the Framingham study. N Engl J Med. 1971;285(26):1441-6.
54. 54
Natural Language Processing (NLP) Pipeline
Criteria extraction comes from sentence level.
Encounter label comes from the entire note.
Roy J. Byrd, Steven R. Steinhubl, Jimeng Sun, Shahram Ebadollahi, Walter F. Stewart. Automatic identification of heart failure
diagnostic criteria, using text analysis of clinical notes from electronic health records. International Journal of Medical Informatics 2013
55. 55
Performance on Encounter Level on Test Set
Machine learning method: decision tree
Rule-based method is to construct grammars by computational
linguists
Manually constructed rules are more accurate but more effort to
construct than automatic rules from learning a decision tree
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Recall Precision F-Score Recall Precision F-Score
Machine-learning method Rule-based method
Overall
Affir
m
e d
Denied
56. 56
Potential Impact on Evidence-based Therapies
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
Preceding Framingham
diagnosis
After Framingham
diagnosis
After clinical diagnosis
Applying text mining to extract Framingham symptoms
can help trigger early intervention
3,168 patients eventually all
diagnosed with HF
Vhavakrishnan R, Steinhubl SR, Sun J, et al. Potential impact of predictive models for early detection of heart
failure on the initiation of evidence-based therapies. J Am Coll Cardiol. 2012;59(13s1):E949-E949.
No symptoms
Framingham
symptoms
Clinical
diagnosis
Opportunity for early intervention
58. 58
KNOWLEDGE+DATA
FEATURE SELECTION
Jimeng Sun, Jianying Hu, Dijun Luo, Marianthi Markatou, Fei Wang, Shahram Ebadollahi, Steven E. Steinhubl,
Zahra Daar, Walter F. Stewart. Combining Knowledge and Data Driven Insights for Identifying Risk Factors using Electronic Health
Records. AMIA (2012).
59. 59
Combining Knowledge- and Data-driven Risk Factors
Knowledge
Data
Combination
Risk factor
augmentation
Risk factor
augmentation Combined
risk factors
Combined
risk factors
Knowledge
base
Knowledge
base
Risk factor
gathering
Risk factor
gathering Knowledge
risk factors
Knowledge
risk factors
Clinical dataClinical data
Data
processing
Data
processing
Potential
risk
factors
Potential
risk
factors
Target
condition
Target
condition
Risk factor
augmentation
Risk factor
augmentation
60. 60
Risk Factor Augmentation
Model Accuracy:
– The selected risk factors are highly predictive of the target
condition
– Sparse feature selection through L1 regularization
Minimal Correlations:
– Between data driven risk factors and knowledge driven risk factors
– Among the data driven risk factors
Model error Correlation among
data-driven features
Correlation between
data- and knowledge-
driven features
Sparse
Penalty
Dijun Luo, Fei Wang, Jimeng Sun, Marianthi Markatou, Jianying Hu,Shahram Ebadollahi, SOR:
Scalable Orthogonal Regression for Low-Redundancy Feature Selection and its Healthcare Applications.
SDM’12
61. 61
Prediction Results using Selected Features
AUC significantly improves as complementary data driven risk factors
are added into existing knowledge based risk factors.
A significant AUC increase occurs when we add first 50 data driven
features
+200+150
+100
+50
all knowledge
features+diabetes
+Hypertension
CAD
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0 100 200 300 400 500 600
AUC
Number of features
Jimeng Sun, Jianying Hu, Dijun Luo, Marianthi Markatou, Fei Wang, Shahram Ebadollahi, Steven E. Steinhubl, Zahra Daar, Walter
F. Stewart. Combining Knowledge and Data Driven Insights for Identifying Risk Factors using Electronic Health Records. AMIA
(2012).
62. 62
Clinical Validation of Data-driven Features
9 out of 10 are considered relevant to HF
The data driven features are complementary to the
existing knowledge-driven features
71. 71
Motivations for Early Detection of Heart Failure
Heart failure (HF) is a complex disease
Huge Societal Burden
For payers
– Reduce cost and hospitalization
– Improve the existing clinical guidance of HF prevention
For providers
– Slow or potentially reverse disease progress
– Improve quality of life, reduce mortality
72. 72
Predictive Modeling Study Design
Goal: Classify HF cases against control patients
Population
– 50,625 Patients (Geisinger Clinic PCPs)
– Cases: 4,644 case patients
– Controls 45,981 matched on age, gender and clinic
Cases Controls
73. 73
Predictive Modeling Setup
We define
– Diagnosis date and index date
– Prediction and observation windows
Features are constructed from the observation window and predict HF
onset after the prediction window
Index date Diagnosis date
Observation Window
Prediction Window
74. 74
Features
We construct over 20K features of different types
Through feature selection and generalization, we result in the following predictive
features
Feature type Cardinality Predictive Features
DIAGNOSIS 17,322
Diabetes, CHD, hypertensions, valvular disease, left
ventricular hypertrophy, angina, atrial fibrillation, MI, COPD
Demographics 11 Age, race, gender, smoking status
Framingham 15
rales, cardiomegaly, acute pulmonary edema, HJReflex,
ankle edema, nocturnal cough, DOExertion,
hepatomegaly, pleural effusion
Lab 1,264
eGFR, LVEF, albumin, glucose, cholesterol, creatinine,
cardiomegaly, heart rate, hemoglobin
Medication 3,922
antihypertensive, lipid-lowering, CCB, ACEI, ARB, beta
blocker, diuretic, digitalis, antiarrhythmic
Vital 6 blood pressure and heart rate
75. 75
Prediction Performance on Different Prediction Windows
Setting: observation window = 720 days, classifiers = random forest, evaluation
mechanism = 10-fold cross-validation for 10 times
Observation:
AUC slowly decreases as the prediction window increases
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
0 90 180 270 360 450 540 630 720
AUC
Prediction Window (days)
HF Onset
(ObservationWindow=720, PredictionWindow=variable)
76. 76
Prediction Performance on Different Observation Windows
Setting: prediction window= 180 days, classifiers= random forest, evaluation
mechanism =10-fold cross-validation
Observation:
AUC increases as the observation window increases. i.e., more data for a longer
period of time will lead to better performance of the predictive model
Combined features performed the best at observation window = 720 days
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
30 90 180 270 360 450 540 630 720 810 900
AUC
Observation Window (days)
HF Onset
(ObservationWindow=variable, PredictionWindow=180)
79. 79
Unstructured Clinical Data
Dataset Link Description
i2b2 Informatics for
Integrating Biology & the
Bedside https://www.i2b2.org/NLP/Dat
aSets/Main.php
Clinical notes used for clinical NLP challenges
• 2006 Deidentification and Smoking Challenge
• 2008 Obesity Challenge
• 2009 Medication Challenge
• 2010 Relations Challenge
• 2011 Co-reference Challenge
Computational Medicine
center
http://computationalmedicine.
org/challenge/previous
Classifying Clinical Free Text Using Natural
Language Processing
80. 80
Structured EHR
Dataset Link Description
Texas Hospital Inpatient
Discharge
http://www.dshs.state.tx.us/thcic/ho
spitals/Inpatientpudf.shtm
Patient: hospital location, admission
type/source, claims, admit day, age, icd9
codes + surgical codes
Framingham Health Care
Data Set
http://www.framinghamheartstudy.o
rg/share/index.html
Genetic dataset for cardiovascular disease
Medicare Basic Stand Alone
Claim Public Use Files
http://resdac.advantagelabs.com/c
ms-data/files/bsa-puf
Inpatient, skilled nursing facility, outpatient,
home health agency, hospice, carrier, durable
medical equipment, prescription drug event,
and chronic conditions on an aggregate level
VHA Medical SAS Datasets
http://www.virec.research.va.gov/M
edSAS/Overview.htm
Patient care encounters primarily for
Veterans: inpatient/outpatient data from VHA
facilities
Nationwide Inpatient
Sample
http://www.hcup-
us.ahrq.gov/nisoverview.jsp
Discharge data from 1051 hospitals in 45
states with diagnosis, procedures, status,
demographics, cost, length of stay
CA Patient Discharge Data
http://www.oshpd.ca.gov/HID/Produ
cts/PatDischargeData/PublicDataS
et/index.html
Discharge data for licensed general acute
hospital in CA with demographic, diagnostic
and treatment information, disposition, total
charges
MIMIC II Clinical Database
http://mimic.physionet.org/database
.html
ICU data including demographics, diagnosis,
clinical measurements, lab results,
interventions, notes
Thanks to Prof. Joydeep Ghosh from UT Austin for providing this information
81. 81
Software
MetaMap maps biomedical text to UMLS metathesaurus
– Developed by NLM for parsing medical article not clinical notes
– http://metamap.nlm.nih.gov/
cTAKES: clinical Text Analysis and Knowledge Extraction System
– Using Unstructured Information Management Architecture (UIMA)
framework and OpenNLP toolkit
– http://ctakes.apache.org/
82. 82
Organization of this Tutorial
Introduction
Motivating Examples
Sources and Techniques for Big Data in Healthcare
– Structured EHR Data
– Unstructured Clinical Notes
– Medical Imaging Data
– Genetic Data
– Other Data (Epidemiology & Behavioral)
Final Thoughts and Conclusion
84. 84
Image Data is Big !!!
By 2015, the average hospital
will have two-thirds of a
petabyte (665 terabytes) of
patient data, 80% of which will
be unstructured image data
like CT scans and X-rays.
Medical Imaging archives are
increasing by 20%-40%
PACS (Picture Archival &
Communication Systems)
system is used for storage and
retrieval of the images.
Image Source: http://medcitynews.com/2013/03/the-body-in-bytes-medical-images-as-a-source-of-healthcare-big-data-infographic/
85. 85
Popular Imaging Modalities in Healthcare Domain
Computed
Tomography (CT)
The main challenge with the image data is that it is not only huge, but
is also high-dimensional and complex.
Extraction of the important and relevant features is a daunting task.
Many research works applied image features to extract the most
relevant images for a given query.
Positron Emission
Tomography (PET)
Magnetic Resonance
Imaging (MRI)
Image Source: Wikipedia
86. 86
Medical Image Retrieval System
Biomedical
Image
Database
Testing Phase
Feature
Extraction
Algorithms for
learning or
similarity
computations
Final Trained
Models
Query Image
Retrieval
System
Query Results Performance
Evaluation
(Precision-Recall)
Recall
Precision
Training Phase
87. 87
Content-based Image Retrieval
Two components
– Image features/descriptors - bridging the gap between the
visual content and its numerical representation.
– These representations are designed to encode color and texture
properties of the image, the spatial layout of objects, and various
geometric shape characteristics of perceptually coherent structures.
– Assessment of similarities between image features based on
mathematical analyses, which compare descriptors across different
images.
– Vector affinity measures such as Euclidean distance, Mahalanobis
distance, KL divergence, Earth Mover’s distance are amongst the
widely used ones.
88. 88
Medical Image Features
Photo-metric features exploit
color and texture cues and they
are derived directly from raw
pixel intensities.
Geometric features: cues such as
edges, contours, joints, polylines,
and polygonal regions.
Akgül, Ceyhun Burak, et al. "Content-based image retrieval in radiology: current status and future directions." Journal of Digital Imaging
24.2 (2011): 208-222.
Müller, Henning, et al. "A review of content-based image retrieval systems in medical applications-clinical benefits and future directions."
International journal of medical informatics 73.1 (2004): 1-24.
• A suitable shape representation
should be extracted from the
pixel intensity information by
region-of interest detection,
segmentation, and grouping. Due
to these difficulties, geometric
features are not widely used.
89. 89
Image CLEF Data
ImageCLEF aims to provide an evaluation forum for the cross–
language annotation and retrieval of images (launched in 2003)
Statistics of this database :
– With more than 300,000 (in .JPEG format), the total size of the database > 300 GB
– contains PET, CT, MRI, and Ultrasound images
Three Tasks
– Modality classification
– Image–based retrieval
– Case–based retrieval
Medical Image Database available at http://www.imageclef.org/2013/medical
91. 91
Image based and Case-based Querying
Image-based retrieval :
This is the classic medical retrieval task.
Similar to Query by Image Example.
Given the query image, find the most similar images.
Case-based retrieval:
This is a more complex task; is closer to the clinical workflow.
A case description, with patient demographics, limited symptoms and test
results including imaging studies, is provided (but not the final diagnosis).
The goal is to retrieve cases including images that might best suit the
provided case description.
92. 92
Challenges with Image Data
Extracting informative features.
Selection of relevant features.
Sparse methods* and dimensionality reducing techniques
Integration of Image data with other data available
Early Fusion
Vector-based Integration
Intermediate Fusion
Multiple Kernel Learning
Late Fusion
Ensembling results from individual modalities
*Jieping Ye’s SDM 2010 Tutorial on Sparse methods http://www.public.asu.edu/~jye02/Tutorial/Sparse-SDM10.pdf
93. 93
Publicly Available Medical Image Repositories
Image
database
Name
Moda
lities
No. Of
patients
No. Of
Images
Size Of
Data
Notes/Applications Download Link
Cancer
Imaging
Archive
Database
CT
DX
CR
1010 244,527 241 GB Lesion Detection and
classification, Accelerated
Diagnostic Image Decision,
Quantitative image
assessment of drug response
https://public.cancerimagingarchive.net/
ncia/dataBasketDisplay.jsf
Digital
Mammog
raphy
database
DX 2620 9,428 211 GB Research in Development of
Computer Algorithm to aid in
screening
http://marathon.csee.usf.edu/Mammogr
aphy/Database.html
Public
Lung
Image
Database
CT 119 28,227 28 GB Identifying Lung Cancer by
Screening Images
https://eddie.via.cornell.edu/crpf.html
Image
CLEF
Database
PET
CT
MRI
US
unknown 306,549 316 GB Modality Classification , Visual
Image Annotation , Scientific
Multimedia Data Management
http://www.imageclef.org/2013/medical
MS
Lesion
Segment
ation
MRI 41 145 36 GB Develop and Compare 3D MS
Lesion Segmentation
Techniques
http://www.ia.unc.edu/MSseg/download
.php
ADNI
Database
MRI
PET
2851 67,871 16GB Define the progression of
Alzheimer’s disease
http://adni.loni.ucla.edu/data-
samples/acscess-data/
95. 95
Genetic Data
The human genome is made up of DNA which consists of four different
chemical building blocks (called bases and abbreviated A, T, C, and G).
It contains 3 billion pairs of bases and the particular order of As, Ts, Cs,
and Gs is extremely important.
Size of a single human genome is about 3GB.
Thanks to the Human Genome Project (1990-2003)
– The goal was to determine the complete sequence of the 3 billion
DNA subunits (bases).
– The total cost was around $3 billion.
96. 96
Genetic Data
The whole genome sequencing data is currently being annotated and not
many analytics have been applied so far since the data is relatively new.
Several publicly available genome repositories.
http://aws.amazon.com/1000genomes/
It costs around $5000 to get a complete genome. It is still in the research
phase. Heavily used in the cancer biology.
In this tutorial, we will focus on Genome-Wide Association Studies (GWAS).
– It is more relevant to healthcare practice. Some clinical trials have
already started using GWAS.
– Most of the computing literature (in terms of analytics) is available for the
GWAS. It is still in rudimentary stage for whole genome sequences.
97. 97
Genome-Wide Association Studies (GWAS)
Genome-wide association studies (GWAS) are
used to identify common genetic factors that
influence health and disease.
These studies normally compare the DNA of
two groups of participants: people with the
disease (cases) and similar people without
(controls). (One million Loci)
Single nucleotide polymorphisms (SNPs) are
DNA sequence variations that occur when a
single nucleotide (A,T,C,or G) in the genome
sequence differs between individuals.
SNPs occur every 100 to 300 bases along the
3-billion-base human genome.
98. 98
Epistasis Modeling
For simple Mendelian diseases, single SNPs can explain phenotype
very well.
The complex relationship between genotype and phenotype is
inadequately described by marginal effects of individual SNPs.
Increasing empirical evidence suggests that interactions among loci
contribute broadly to complex traits.
The difficulty in the problem of detecting SNP pair interactions is the
heavy computational burden.
– To detect pairwise interactions from 500,000 SNPs genotyped in
thousands of samples, a total of 1.25 X 10 statistical tests are
needed.
99. 99
Epistasis Detection Methods
Exhaustive
– Enumerates all K-locus
interactions among SNPs.
– Efficient implementations
mostly aiming at reducing
computations by eliminating
unnecessary calculations.
Non-Exhaustive
– Stochastic: randomized
search. Performance lowers
when the # SNPs increase.
– Heuristic: greedy methods
that do not guarantee optimal
solution.
Shang, Junliang, et al. "Performance analysis of novel methods for detecting epistasis." BMC
bioinformatics 12.1 (2011): 475.
100. 100
Sparse Methods for SNP data analysis
Successful identification of SNPs strongly predictive of disease promises a
better understanding of the biological mechanisms underlying the disease.
Sparse linear methods have been used to fit the genotype data and obtain
a selected set of SNPs.
Minimizing the squared loss function (L) of N individuals and p variables
(SNPs) is used for linear regression and is defined as
where xi ∈ ℝp are inputs for the ith sample, y ∈ ℝN is the N vector of outputs, β0 ∈
ℝ is the intercept, β ∈ ℝp is a p-vector of model weights, and λ is user penalty.
Efficient implementations that scale to genome-wide data are available.
SparSNP package http://bioinformatics.research.nicta.com.au/software/sparsnp/
Wu, Tong Tong, et al. "Genome-wide association analysis by lasso penalized logistic regression."
Bioinformatics 25.6 (2009): 714-721.
N
i
p
j
j
T
ii xyL
1 1
2
00 )(
2
1
),(
101. 101
Public Resources for Genetic (SNP) Data
The Wellcome Trust Case Control Consortium (WTCCC) is a group
of 50 research groups across the UK which was established in 2005.
Available at http://www.wtccc.org.uk/
Seven different diseases: bipolar disorder (1868 individuals),
coronary heart disease (1926 individuals), Crohn's disease (1748
individuals), hypertension (1952 individuals), rheumatoid arthritis
(1860 individuals), type I diabetes (1963 individuals) or type II
diabetes (1924 individuals).
Around 3,000 healthy controls common for these disorders. The
individuals were genotyped using Affymetrix chip and obtained
approximately 500K SNPs.
The database of Genotypes and Phenotypes (dbGaP) maintained by
National Center of Biotechnology Information (NCBT) at NIH.
Available at http://www.ncbi.nlm.nih.gov/gap
103. 103
Epidemiology Data
The Surveillance Epidemiology and End Results Program (SEER) at NIH.
Publishes cancer incidence and survival data from population-based cancer
registries covering approximately 28% of the population of the US.
Collected over the past 40 years (starting from January 1973 until now).
Contains a total of 7.7M cases and >350,000 cases are added each year.
Collect data on patient demographics, tumor site, tumor morphology and
stage at diagnosis, first course of treatment, and follow-up for vital status.
Usage:
Widely used for understanding disparities related to race, age, and gender.
Can be used to overlay information with other sources of data (such as
water/air pollution, climate, socio-economic) to identify any correlations.
Can not be used for predictive analysis, but mostly used for studying trends.
SEER database is available at http://seer.cancer.gov/
104. 104
Social Media can Sense Public Health !!
Chunara, Rumi, Jason R. Andrews, and John S.
Brownstein. "Social and news media enable estimation of
epidemiological patterns early in the 2010 Haitian cholera
outbreak." American Journal of Tropical Medicine and
Hygiene 86.1 (2012): 39.
During infectious disease outbreaks, data collected through health institutions
and official reporting structures may not be available for weeks, hindering early
epidemiologic assessment. Social media can get it in near real-time.
Dugas, Andrea Freyer, et al. "Google Flu Trends:
correlation with emergency department influenza rates
and crowding metrics." Clinical infectious diseases 54.4
(2012): 463-469.
Google Flu Trends correlated with Influenza outbreak
Twitter messaging correlated with
cholera outbreak
105. 105
Social Networks for Patients
PatientsLikeMe1 is a patient network is an online data
sharing platform started in 2006; now has more than
200,000 patients and is tracking 1,500 diseases.
1 http://www.patientslikeme.com/
OBJECTIVE: “Given my status, what is the best outcome I can hope to
achieve, and how do I get there?”
People connect with others who have the same disease or condition,
track and share their own experiences, see what treatments have helped
other patients like them, gain insights and identify any patterns.
Patient provides the data on their conditions, treatment history, side
effects, hospitalizations, symptoms, disease-specific functional scores,
weight, mood, quality of life and more on an ongoing basis.
Gaining access to the patients for future clinical trials.
106. 106
Home Monitoring and Sensing Technologies
Advancements in sensing technology are critical for developing
effective and efficient home-monitoring systems
• Sensing devices can provide several types of data in real-time.
• Activity Recognition using Cell Phone Accelerometers
Kwapisz, Jennifer R., Gary M. Weiss, and Samuel A. Moore. "Activity recognition using cell phone accelerometers." ACM
SIGKDD Explorations Newsletter 12.2 (2011): 74-82.
Rashidi, Parisa, et al. "Discovering activities to recognize and track in a smart environment." Knowledge and Data
Engineering, IEEE Transactions on 23.4 (2011): 527-539.
107. 107
Public Health and Behavior Data Repositories
Dataset Link Description
Behavioral Risk Factor
Surveillance System (BRFSS)
http://www.cdc.gov/brfss/technical_
infodata/index.htm
Healthcare survey data: smoking,
alcohol, lifestyle (diet, exercise),
major diseases (diabetes, cancer),
mental illness
Ohio Hospital Inpatient/Outpatient
Data
http://publicapps.odh.ohio.gov/pwh
/PWHMain.aspx?q=021813114232
Hospital: number of discharges,
transfers, length of stay,
admissions, transfers, number of
patients with specific procedure
codes
US Mortality Data
http://www.cdc.gov/nchs/data_acce
ss/cmf.htm
Mortality information on county-
level
Human Mortality Database http://www.mortality.org/
Birth, death, population size by
country
Utah Public Health Database http://ibis.health.utah.gov/query
Summary statistics for mortality,
charges, discharges, length of stay
on a county-level basis
Dartmouth Atlas of Health Care
http://www.dartmouthatlas.org/tools
/downloads.aspx
Post discharge events, chronically
ill care, surgical discharge rate
Thanks to Prof. Joydeep Ghosh from UT Austin for providing this information.
109. 109
Final Thoughts
Big data could save the health care industry up to $450 billion, but other things
are important too.
Right living: Patients should take more active steps to improve their health.
Right care: Developing a coordinated approach to care in which all
caregivers have access to the same information.
Right provider: Any professionals who treat patients must have strong
performance records and be capable of achieving the best outcomes.
Right value: Improving value while simultaneously improving care quality.
Right innovation: Identifying new approaches to health-care delivery.
“Stakeholders will only benefit from big data if they take a more holistic,
patient-centered approach to value, one that focuses equally on health-care
spending and treatment outcomes,”
McKinsey report available at: http://www.mckinsey.com/insights/health_systems/the_big-data_revolution_in_us_health_care
110. 110
Conclusion
Big data analytics is a promising right direction which is in its infancy for
the healthcare domain.
Healthcare is a data-rich domain. As more and more data is being
collected, there will be increasing demand for big data analytics.
Unraveling the “Big Data” related complexities can provide many insights
about making the right decisions at the right time for the patients.
Efficiently utilizing the colossal healthcare data repositories can yield some
immediate returns in terms of patient outcomes and lowering care costs.
Data with more complexities keep evolving in healthcare thus leading to
more opportunities for big data analytics.
111. 111
Acknowledgements
• Funding Sources
– National Science foundation
– National Institutes of Health
– Susan G. Komen for the Cure
– Delphinus Medical Technologies
– IBM Research
112. 112
Thank You
Questions and Comments
Feel free to email questions or suggestions to
jimeng@cs.cmu.edu
reddy@cs.wayne.edu