The document proposes a platform that matches patients from online health communities to relevant medical research projects, by developing rich semantic profiles of both patients and projects. It analyzes patient conversations to extract medical conditions, medications, and demographics to create patient profiles. It also analyzes research project descriptions to create profiles. These profiles are then matched using semantic similarity algorithms to find relevant patients for projects. The platform was prototyped and shown to accurately match patients to projects with similar medical conditions.
The document provides an overview of Lean Six Sigma (LSS) and its key principles and tools. It discusses how Lean focuses on eliminating waste and ensuring smooth workflow, while Six Sigma aims to reduce variation and improve quality. The combination of Lean and Six Sigma in LSS provides a balanced approach that can drive process improvements in any organization. Case studies like Toyota demonstrate how LSS principles like standard work and data-driven decision making can significantly enhance production quality and efficiency. Key takeaways emphasize measuring results from LSS projects and implementing them as a team through defined roles and documented progress.
Homer wants to drive from Athens, GA to NYC and have his favorite burger along the way. He downloads an app to find restaurants but realizes it is too complicated. The document then discusses creating a simpler app to help Homer find McDonald's restaurants on his route. It outlines a 4-step process: 1) finding relevant services, 2) integrating the services, 3) ensuring they work together, and 4) identifying issues when used. The rest of the document describes semantic techniques like RDF that could be used to build such an app.
The document discusses semantic web services and proposes approaches to help describe, discover, compose and mediate between services in a semantic way. It presents technologies like OWL-S, WSMO that model service semantics and proposes semantic templates to describe service aspects like inputs, outputs and properties in a declarative way. It also discusses facilitating service discovery, composition and mediation through semantic annotations and using techniques like faceted search and semantic association querying. The document argues that taking a semantic approach helps address challenges in service interoperability, discovery and mediation.
The document discusses dynamic and agile service-oriented architectures using semantic web services. It describes how semantic web services can enable description, discovery, and data mediation of web services to support more dynamic integration. Semantic annotations of web services using ontologies is proposed to achieve agility through increased reuse, easier integration, and the ability to span domains.
The document discusses the importance of attitude and how much of it is visible to others. It states that just like an iceberg where only 10% is visible above water, only a small part of a person's attitudes, knowledge, and skills are visible to others. The rest, including their values, motives, and beliefs, lie below the surface unknown to others. It emphasizes that attitude is everything and determines a person's success more than their aptitude. It provides several quotes about the power of positive thinking and having a positive attitude.
The document summarizes a development plan for the Eastern Urban Center at Millenia, which will include up to 3,000 multifamily homes and 300,000 square feet of retail across 80 city blocks. The development is designed as a walkable, mixed-use district centered around sustainability with public parks, trails, and transit connections. It aims to achieve LEED for Neighborhood Development certification by integrating energy efficient buildings, transportation, and urban design.
The document introduces the 797 mining truck, which has a 360 ton nominal payload capacity. It is larger than the 793 model, with dimensions of 30 feet wide, 24 feet high, and 48 feet long. The 797 has improvements like a cast frame, updated powertrain components like a new torque converter and transmission, a larger operator station, and purpose-built components for longer life, with the goal of improving haulage cost per ton.
The document provides an overview of Lean Six Sigma (LSS) and its key principles and tools. It discusses how Lean focuses on eliminating waste and ensuring smooth workflow, while Six Sigma aims to reduce variation and improve quality. The combination of Lean and Six Sigma in LSS provides a balanced approach that can drive process improvements in any organization. Case studies like Toyota demonstrate how LSS principles like standard work and data-driven decision making can significantly enhance production quality and efficiency. Key takeaways emphasize measuring results from LSS projects and implementing them as a team through defined roles and documented progress.
Homer wants to drive from Athens, GA to NYC and have his favorite burger along the way. He downloads an app to find restaurants but realizes it is too complicated. The document then discusses creating a simpler app to help Homer find McDonald's restaurants on his route. It outlines a 4-step process: 1) finding relevant services, 2) integrating the services, 3) ensuring they work together, and 4) identifying issues when used. The rest of the document describes semantic techniques like RDF that could be used to build such an app.
The document discusses semantic web services and proposes approaches to help describe, discover, compose and mediate between services in a semantic way. It presents technologies like OWL-S, WSMO that model service semantics and proposes semantic templates to describe service aspects like inputs, outputs and properties in a declarative way. It also discusses facilitating service discovery, composition and mediation through semantic annotations and using techniques like faceted search and semantic association querying. The document argues that taking a semantic approach helps address challenges in service interoperability, discovery and mediation.
The document discusses dynamic and agile service-oriented architectures using semantic web services. It describes how semantic web services can enable description, discovery, and data mediation of web services to support more dynamic integration. Semantic annotations of web services using ontologies is proposed to achieve agility through increased reuse, easier integration, and the ability to span domains.
The document discusses the importance of attitude and how much of it is visible to others. It states that just like an iceberg where only 10% is visible above water, only a small part of a person's attitudes, knowledge, and skills are visible to others. The rest, including their values, motives, and beliefs, lie below the surface unknown to others. It emphasizes that attitude is everything and determines a person's success more than their aptitude. It provides several quotes about the power of positive thinking and having a positive attitude.
The document summarizes a development plan for the Eastern Urban Center at Millenia, which will include up to 3,000 multifamily homes and 300,000 square feet of retail across 80 city blocks. The development is designed as a walkable, mixed-use district centered around sustainability with public parks, trails, and transit connections. It aims to achieve LEED for Neighborhood Development certification by integrating energy efficient buildings, transportation, and urban design.
The document introduces the 797 mining truck, which has a 360 ton nominal payload capacity. It is larger than the 793 model, with dimensions of 30 feet wide, 24 feet high, and 48 feet long. The 797 has improvements like a cast frame, updated powertrain components like a new torque converter and transmission, a larger operator station, and purpose-built components for longer life, with the goal of improving haulage cost per ton.
Leverage machine learning and new technologies to enhance rwe generation and ...Athula Herath
My personal activities on automating evidence synthesis and real world data derived evidence for automated treatment guidelines compilation for precision medicine.
This document discusses the potential for artificial intelligence and machine learning in medicine. It notes that while 80% of healthcare data remains unstructured, machine learning could help analyze this data by mapping and validating data fields for modeling. However, significant preprocessing is required due to limitations in available data sets and variables. The document also discusses challenges including different classifications for patients, diseases, and representations in records. It provides an example of a study using clinical notes to predict acute kidney injury. Overall, the document outlines both the promise and challenges of applying artificial intelligence and machine learning to healthcare data.
This document summarizes a presentation on using data mining to analyze characteristics of high-cost diabetics in the Arkansas Medicaid population. It provides an overview of the Arkansas Foundation for Medical Care (AFMC), describes how a data mining project was initiated to examine costs for diabetics, and outlines the study design which used decision trees to analyze Medicaid claims data from 2004. The results identified that diabetics receiving home health services had costs over 3 times other diabetics, and the top diagnoses for these high-cost patients included diabetes, congestive heart failure, chronic renal failure, and hypertension.
The document discusses standards and coding systems used in biomedical and health informatics. It provides background on the speaker and their qualifications in the fields of medicine and health informatics. It then discusses why healthcare information standards are needed, providing examples of different types of standards including unique identifiers, standard data sets, vocabularies and terminologies, and exchange standards for messages and documents.
AI and Big Data in Psychiatry: An Introduction and OverviewCarlo Carandang
Dr. Carlo Carandang, a psychiatrist and data scientist, talks about how Big Data can be implemented into clinical psychiatric practice to improve patient care and reduce costs. Dr. Carandang introduces Big Data topics, Big Data systems, machine learning algorithms, and AI psychiatry applications. Dr. Carandang presented this talk at the 2019 Presidential Symposium in Washington, DC, sponsored by the Washington Psychiatric Society.
An overview of the i2b2 clinical research platform, and the implications of connecting Indivo to i2b2 as a source of patient-reported outcomes. Presented at the 2012 Indivo X Users' Conference.
By Shawn Murphy MD, Ph.D., Partners Healthcare.
Stephen Friend Dana Farber Cancer Institute 2011-10-24Sage Base
The document discusses building disease models using data intensive science and open medical information systems, with the goal of better understanding disease biology before testing drugs. It describes the Sage Bionetworks non-profit organization, which aims to create a commons for shared disease maps and models through several pilot projects including clinical trial data sharing and identifying cancer patients who do not respond to approved drug regimens.
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29Sage Base
The document proposes a new approach called Arch2POCM for drug development that moves from disease targets to clinical validation. It discusses issues with the current drug discovery process, noting $200 billion is spent annually but only a handful of new medicines are approved each year while productivity is declining. Arch2POCM would require a more data-driven and collaborative approach involving scientists, clinicians, and citizens to better link knowledge and accelerate eliminating human disease. It presents the mission of Sage Bionetworks to create a commons for evolving integrative networks to map diseases and enable discovery.
E-Symptom Analysis System to Improve Medical Diagnosis and Treatment Recommen...journal ijrtem
: A wealth of data in public health care systems has been collected and meanwhile there are plenty
of new technological improvements which have considerable influence on current data pool. Nevertheless,
important obstacles are challenging to utilize existing clinical data. Enhanced technological improvements lead
patients to search their symptoms and corresponding diagnosis on online resources. In this study, it is aimed to
develop a machine learning model to suit in different availability of users. Most of the current systems allow
people to choose related symptom in web interfaces or Q&A forums. In addition to these applications it is aimed
to implement a new technique which extracts the text-based symptoms and its related parameters such as, severity,
duration, location, cause, accompanied by any other indicators. This study is applicable for patient`s everyday
language statements besides medical expression of symptoms for corresponding symptoms. Extracted terms are
used as an input of the model and analyzed for matching diagnosis where an accuracy of 72.5% has been
accomplished.
E-Symptom Analysis System to Improve Medical Diagnosis and Treatment Recommen...IJRTEMJOURNAL
A wealth of data in public health care systems has been collected and meanwhile there are plenty
of new technological improvements which have considerable influence on current data pool. Nevertheless,
important obstacles are challenging to utilize existing clinical data. Enhanced technological improvements lead
patients to search their symptoms and corresponding diagnosis on online resources. In this study, it is aimed to
develop a machine learning model to suit in different availability of users. Most of the current systems allow
people to choose related symptom in web interfaces or Q&A forums. In addition to these applications it is aimed
to implement a new technique which extracts the text-based symptoms and its related parameters such as, severity,
duration, location, cause, accompanied by any other indicators. This study is applicable for patient`s everyday
language statements besides medical expression of symptoms for corresponding symptoms. Extracted terms are
used as an input of the model and analyzed for matching diagnosis where an accuracy of 72.5% has been
accomplished.
Our classification technique uses a deep CNN to classify skin lesions. An image is warped through the CNN architecture into a probability distribution over clinical skin disease classes. The CNN was pretrained on a large generic image dataset and fine-tuned on a dataset of over 129,000 skin lesions spanning 2,032 diseases. Data integration from multiple sources is key to future digital medicine, but challenges include data quality, availability, and privacy. Techniques like distributed learning models and homomorphic encryption can help address privacy concerns while enabling large-scale data sharing and analysis.
Application of Data Analytics to Improve Patient Care: A Systematic ReviewIRJET Journal
This document summarizes a systematic review of research on applying data analytics to improve patient care. The review found that data analytics has significantly impacted the healthcare sector by improving patient care. Data analytics involves using scientific and mathematical methods to derive meaning from data to gain better insights. It can reduce costs, enable faster decision making, and minimize risks in healthcare. The review identified theories like the Magical Thinking Theory and Lightweight Theory that provide a framework for understanding the relationship between data analytics and patient care. The findings suggest data analytics plays an important role in improving patient health and experiences.
The document discusses the need for standardization and interoperability in electronic patient record (EPR) development. It notes that healthcare currently falls short in several quality metrics like safety, effectiveness, efficiency, and being patient-centered. The document outlines meaningful EHR functionalities that can help enable high quality healthcare like structured documentation, clinical decision support, and healthcare information exchange. It also discusses the challenges of language and terminology standards for EPRs to allow effective information sharing and translation across systems.
Visual Analytics for Healthcare - Panel at AMIA 2012 in ChicagoAdam Perer
AMIA 2012 Panel on Visual Analytics for Healthcare
Organizer:
Adam Perer, PhD
Research Scientist
IBM T.J. Watson Research Center, Hawthorne, NY
Panelists:
Ben Shneiderman, PhD
Professor, Computer Science
University of Maryland, College Park, MD
Yuval Shahar, PhD
Professor, Head of the Medical Informatics Research Center
Ben Gurion University, Beer Sheva, Israel
Jeffrey Heer, PhD
Assistant Professor, Computer Science
Stanford University, Stanford, CA
David Gotz, PhD
Research Scientist
IBM T.J. Watson Research Center, Hawthorne, NY
Abstract
With the proliferation of medical information technology, users at all levels of the healthcare system have access to more data than ever before6. This data can be of tremendous value but is often difficult to access and interpret. For example clinicians are often faced with the challenging task of analyzing large amounts of unstructured, multi-modal, and longitudinal data to effectively diagnose and monitor the progression of a patient’s disease4,5. Similarly, patients are confronted with the difficult task of understanding the trends and correlations within data related to their own health. At the institutional level, healthcare organizations are faced with the desire to use data to improve overall operational efficiency and performance, while continuing to maintain the quality of patient care and safety.
Recent advances in visualization and visual analytics have the potential to help each of the user groups listed above do more with the often overwhelming amount of data available to them 1,3,7,8. However, to be successful, visualization designers and clinicians must work together closely to ensure that the right technologies are used to help address the meaningful problems. Unfortunately, despite the continuous use of scientific visualization and visual analytics in medical applications, the lack of communication between engineers and physicians has meant that only basic visualization and analytics techniques are currently employed in clinical practice2,9.
The goal of this panel is to present state-of-the-art visualization applications for healthcare and engage the leading physicians and clinical researchers at AMIA to discuss the areas in healthcare where additional visualization techniques are most needed.
A location comparison of three health care centers in Sfax-cityIJERD Editor
The problem of health facilities location is explored under a mathematical optimization approach. Several models are developed for the location of a generalized health facility system in a manner that the selected criteria are optimized. From the literature we use in our paper the criteria efficiency and availability of the service. The optimal locations satisfying two objectives, one that minimizes health care centers-patient distance and another that captures as many patients as possible within a pre-specified time or distance. The results indicate that the existing locations provide near-optimal geographic access to health care center.
Big data analytics in health care by data mining and classification techniquesssuserc491ef2
This document summarizes a research article that proposes using big data analytics and data mining techniques like association rule mining and classification to analyze medical data from diabetes patients. The researchers apply an apriori algorithm within a MapReduce framework to discover associations between diseases and symptoms using a diabetes dataset from the UCI machine learning repository containing 50 attributes. The results of their proposed method are evaluated based on metrics like precision, accuracy, recall, and F-score.
Big data analytics in health care by data mining and classification techniquesssuserc491ef2
This document discusses using big data analytics and data mining techniques like hierarchical decision tree networks, association rule mining, and multiclass outlier classification to analyze medical data from diabetes patients. The proposed approach first uses MapReduce to process a large diabetes dataset from the UCI machine learning repository containing 50 attributes. It then applies a hierarchical decision tree network that uses decision trees and a hierarchical attention network. Next, it implements an association rule mining algorithm called Apriori to discover relationships between diseases and symptoms. Finally, it performs multiclass outlier classification based on the predictions from association rule mining. The goal is to accurately classify diabetes patient data and predict insulin needs based on attributes like medication and past patient records.
a part of "The Path Forward for Academic Medical Centers: Innovation", Economics and Better Health, an Economic Studies and Engelberg Center for Health Care Reform event at the Brookings Institutuion
1) Medicine is increasingly becoming a data-intensive field due to the digitization of health records, research data, and patient self-tracking data.
2) The volume and diversity of biomedical data, known as "Big Data", provides opportunities to gain insights and improve patient outcomes but also poses challenges around data integration and analysis due to issues like heterogeneity and noise.
3) Techniques like data mining, machine learning, and knowledge discovery in databases are used to extract meaningful information and discover patterns in large and complex biomedical data to support areas like predictive analytics and personalized medicine.
Leverage machine learning and new technologies to enhance rwe generation and ...Athula Herath
My personal activities on automating evidence synthesis and real world data derived evidence for automated treatment guidelines compilation for precision medicine.
This document discusses the potential for artificial intelligence and machine learning in medicine. It notes that while 80% of healthcare data remains unstructured, machine learning could help analyze this data by mapping and validating data fields for modeling. However, significant preprocessing is required due to limitations in available data sets and variables. The document also discusses challenges including different classifications for patients, diseases, and representations in records. It provides an example of a study using clinical notes to predict acute kidney injury. Overall, the document outlines both the promise and challenges of applying artificial intelligence and machine learning to healthcare data.
This document summarizes a presentation on using data mining to analyze characteristics of high-cost diabetics in the Arkansas Medicaid population. It provides an overview of the Arkansas Foundation for Medical Care (AFMC), describes how a data mining project was initiated to examine costs for diabetics, and outlines the study design which used decision trees to analyze Medicaid claims data from 2004. The results identified that diabetics receiving home health services had costs over 3 times other diabetics, and the top diagnoses for these high-cost patients included diabetes, congestive heart failure, chronic renal failure, and hypertension.
The document discusses standards and coding systems used in biomedical and health informatics. It provides background on the speaker and their qualifications in the fields of medicine and health informatics. It then discusses why healthcare information standards are needed, providing examples of different types of standards including unique identifiers, standard data sets, vocabularies and terminologies, and exchange standards for messages and documents.
AI and Big Data in Psychiatry: An Introduction and OverviewCarlo Carandang
Dr. Carlo Carandang, a psychiatrist and data scientist, talks about how Big Data can be implemented into clinical psychiatric practice to improve patient care and reduce costs. Dr. Carandang introduces Big Data topics, Big Data systems, machine learning algorithms, and AI psychiatry applications. Dr. Carandang presented this talk at the 2019 Presidential Symposium in Washington, DC, sponsored by the Washington Psychiatric Society.
An overview of the i2b2 clinical research platform, and the implications of connecting Indivo to i2b2 as a source of patient-reported outcomes. Presented at the 2012 Indivo X Users' Conference.
By Shawn Murphy MD, Ph.D., Partners Healthcare.
Stephen Friend Dana Farber Cancer Institute 2011-10-24Sage Base
The document discusses building disease models using data intensive science and open medical information systems, with the goal of better understanding disease biology before testing drugs. It describes the Sage Bionetworks non-profit organization, which aims to create a commons for shared disease maps and models through several pilot projects including clinical trial data sharing and identifying cancer patients who do not respond to approved drug regimens.
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29Sage Base
The document proposes a new approach called Arch2POCM for drug development that moves from disease targets to clinical validation. It discusses issues with the current drug discovery process, noting $200 billion is spent annually but only a handful of new medicines are approved each year while productivity is declining. Arch2POCM would require a more data-driven and collaborative approach involving scientists, clinicians, and citizens to better link knowledge and accelerate eliminating human disease. It presents the mission of Sage Bionetworks to create a commons for evolving integrative networks to map diseases and enable discovery.
E-Symptom Analysis System to Improve Medical Diagnosis and Treatment Recommen...journal ijrtem
: A wealth of data in public health care systems has been collected and meanwhile there are plenty
of new technological improvements which have considerable influence on current data pool. Nevertheless,
important obstacles are challenging to utilize existing clinical data. Enhanced technological improvements lead
patients to search their symptoms and corresponding diagnosis on online resources. In this study, it is aimed to
develop a machine learning model to suit in different availability of users. Most of the current systems allow
people to choose related symptom in web interfaces or Q&A forums. In addition to these applications it is aimed
to implement a new technique which extracts the text-based symptoms and its related parameters such as, severity,
duration, location, cause, accompanied by any other indicators. This study is applicable for patient`s everyday
language statements besides medical expression of symptoms for corresponding symptoms. Extracted terms are
used as an input of the model and analyzed for matching diagnosis where an accuracy of 72.5% has been
accomplished.
E-Symptom Analysis System to Improve Medical Diagnosis and Treatment Recommen...IJRTEMJOURNAL
A wealth of data in public health care systems has been collected and meanwhile there are plenty
of new technological improvements which have considerable influence on current data pool. Nevertheless,
important obstacles are challenging to utilize existing clinical data. Enhanced technological improvements lead
patients to search their symptoms and corresponding diagnosis on online resources. In this study, it is aimed to
develop a machine learning model to suit in different availability of users. Most of the current systems allow
people to choose related symptom in web interfaces or Q&A forums. In addition to these applications it is aimed
to implement a new technique which extracts the text-based symptoms and its related parameters such as, severity,
duration, location, cause, accompanied by any other indicators. This study is applicable for patient`s everyday
language statements besides medical expression of symptoms for corresponding symptoms. Extracted terms are
used as an input of the model and analyzed for matching diagnosis where an accuracy of 72.5% has been
accomplished.
Our classification technique uses a deep CNN to classify skin lesions. An image is warped through the CNN architecture into a probability distribution over clinical skin disease classes. The CNN was pretrained on a large generic image dataset and fine-tuned on a dataset of over 129,000 skin lesions spanning 2,032 diseases. Data integration from multiple sources is key to future digital medicine, but challenges include data quality, availability, and privacy. Techniques like distributed learning models and homomorphic encryption can help address privacy concerns while enabling large-scale data sharing and analysis.
Application of Data Analytics to Improve Patient Care: A Systematic ReviewIRJET Journal
This document summarizes a systematic review of research on applying data analytics to improve patient care. The review found that data analytics has significantly impacted the healthcare sector by improving patient care. Data analytics involves using scientific and mathematical methods to derive meaning from data to gain better insights. It can reduce costs, enable faster decision making, and minimize risks in healthcare. The review identified theories like the Magical Thinking Theory and Lightweight Theory that provide a framework for understanding the relationship between data analytics and patient care. The findings suggest data analytics plays an important role in improving patient health and experiences.
The document discusses the need for standardization and interoperability in electronic patient record (EPR) development. It notes that healthcare currently falls short in several quality metrics like safety, effectiveness, efficiency, and being patient-centered. The document outlines meaningful EHR functionalities that can help enable high quality healthcare like structured documentation, clinical decision support, and healthcare information exchange. It also discusses the challenges of language and terminology standards for EPRs to allow effective information sharing and translation across systems.
Visual Analytics for Healthcare - Panel at AMIA 2012 in ChicagoAdam Perer
AMIA 2012 Panel on Visual Analytics for Healthcare
Organizer:
Adam Perer, PhD
Research Scientist
IBM T.J. Watson Research Center, Hawthorne, NY
Panelists:
Ben Shneiderman, PhD
Professor, Computer Science
University of Maryland, College Park, MD
Yuval Shahar, PhD
Professor, Head of the Medical Informatics Research Center
Ben Gurion University, Beer Sheva, Israel
Jeffrey Heer, PhD
Assistant Professor, Computer Science
Stanford University, Stanford, CA
David Gotz, PhD
Research Scientist
IBM T.J. Watson Research Center, Hawthorne, NY
Abstract
With the proliferation of medical information technology, users at all levels of the healthcare system have access to more data than ever before6. This data can be of tremendous value but is often difficult to access and interpret. For example clinicians are often faced with the challenging task of analyzing large amounts of unstructured, multi-modal, and longitudinal data to effectively diagnose and monitor the progression of a patient’s disease4,5. Similarly, patients are confronted with the difficult task of understanding the trends and correlations within data related to their own health. At the institutional level, healthcare organizations are faced with the desire to use data to improve overall operational efficiency and performance, while continuing to maintain the quality of patient care and safety.
Recent advances in visualization and visual analytics have the potential to help each of the user groups listed above do more with the often overwhelming amount of data available to them 1,3,7,8. However, to be successful, visualization designers and clinicians must work together closely to ensure that the right technologies are used to help address the meaningful problems. Unfortunately, despite the continuous use of scientific visualization and visual analytics in medical applications, the lack of communication between engineers and physicians has meant that only basic visualization and analytics techniques are currently employed in clinical practice2,9.
The goal of this panel is to present state-of-the-art visualization applications for healthcare and engage the leading physicians and clinical researchers at AMIA to discuss the areas in healthcare where additional visualization techniques are most needed.
A location comparison of three health care centers in Sfax-cityIJERD Editor
The problem of health facilities location is explored under a mathematical optimization approach. Several models are developed for the location of a generalized health facility system in a manner that the selected criteria are optimized. From the literature we use in our paper the criteria efficiency and availability of the service. The optimal locations satisfying two objectives, one that minimizes health care centers-patient distance and another that captures as many patients as possible within a pre-specified time or distance. The results indicate that the existing locations provide near-optimal geographic access to health care center.
Big data analytics in health care by data mining and classification techniquesssuserc491ef2
This document summarizes a research article that proposes using big data analytics and data mining techniques like association rule mining and classification to analyze medical data from diabetes patients. The researchers apply an apriori algorithm within a MapReduce framework to discover associations between diseases and symptoms using a diabetes dataset from the UCI machine learning repository containing 50 attributes. The results of their proposed method are evaluated based on metrics like precision, accuracy, recall, and F-score.
Big data analytics in health care by data mining and classification techniquesssuserc491ef2
This document discusses using big data analytics and data mining techniques like hierarchical decision tree networks, association rule mining, and multiclass outlier classification to analyze medical data from diabetes patients. The proposed approach first uses MapReduce to process a large diabetes dataset from the UCI machine learning repository containing 50 attributes. It then applies a hierarchical decision tree network that uses decision trees and a hierarchical attention network. Next, it implements an association rule mining algorithm called Apriori to discover relationships between diseases and symptoms. Finally, it performs multiclass outlier classification based on the predictions from association rule mining. The goal is to accurately classify diabetes patient data and predict insulin needs based on attributes like medication and past patient records.
a part of "The Path Forward for Academic Medical Centers: Innovation", Economics and Better Health, an Economic Studies and Engelberg Center for Health Care Reform event at the Brookings Institutuion
1) Medicine is increasingly becoming a data-intensive field due to the digitization of health records, research data, and patient self-tracking data.
2) The volume and diversity of biomedical data, known as "Big Data", provides opportunities to gain insights and improve patient outcomes but also poses challenges around data integration and analysis due to issues like heterogeneity and noise.
3) Techniques like data mining, machine learning, and knowledge discovery in databases are used to extract meaningful information and discover patterns in large and complex biomedical data to support areas like predictive analytics and personalized medicine.
1. Platform for Patient Centric Collaborative
Research
Dadong Wan, Sophia Cao, Karthik Gomadam,
Accenture Technology Labs, San Jose, CA.
{ dadong.wan, sophia.cao, karthik.gomadam }@accenture.com
1 Abstract
The Affordable Care Act is perhaps the most significant “face-lift” in the U.S.
healthcare system since the introduction of Medicare and Medicaid. Key focus
areas of ACA include evidence based care and pay for performance. Patient
engagement is at the heart of both of these focus areas. However, finding relevant
patients to engage with medical providers is an important challenge. In this
paper, we describe our solution to alleviate this problem that leverages patient
data avaialble in online health communities and seeks to match the patients in
these communities for relevant projects. Our solution can be applied to data
from any patient community and patients can engage with researchers from
within the communities they are already a part of. We believe that this approach
will help researchers find highly relevant patients and will enable patient centric,
dynamic and responsive research.
2 Introduction
The Affordable Care Act is perhaps the most significant “face-lift” in the U.S.
healthcare system since the introduction of Medicare and Medicaid. Key focus
areas of ACA include evidence based care and pay for performance. Patient en-
gagement is at the heart of both of these focus areas. For example, researchers
who want to study the effectiveness of levetiracetam, lamotrigine, or oxcar-
bazepine on pediatric epilepsy patients should engage with the patients and
their caregivers. Measuring this would allow them to validate their care plan
process for helping patients manage their conditions as well as that of their treat-
ment plans. Having these validations will help providers analyze and optimize
their performance in the pay for performance age. However, finding relevant
patients to engage with medical providers is a non-trivial problem. In the above
example, providers will need to recruit patients who are children, have epilepsy,
and are prescribed levetiracetam, lamotrigine, or oxcarbazepine.
In this paper, we propose a solution to address this problem using the data
from online health communities. Our experience in the past when we developed
1
2. applications to match patients and clinical trial investigators had proved to us
that patients will not flock to recruitment platforms and any meaningful solution
should /emphfish where the fishes are. We realized that patient communities
such as PatientsLikeMe and Medhelp have millions of patients who are sharing
information about their medical conditions, medications, and their experience
in managing their conditions. We developed a solution that takes advantage of
this patient data, allowing researchers to find patients from these communities.
We apply semantic and text mining algorithms to analyze patient conversations
in these communities to build rich patient profies that captures their medical
conditions, medications, and demographic information. We build similar profiles
for research projects (listed at PCORI.org). We then match and rank the project
and the patient profiles to find the most relevant patients.
One challenge in matching projects with patients based on patient conver-
sations is the difference in the ways in which different participants (researchers,
patients, caregivers) describe the same thing. For example, a researcher will use
diabetes mellitus while a patient might say type 2. Using semantic Web tech-
nologies (UMLS ontologies, OpenCalais entity extractor, semantic type match-
ing) allows us to overcome this problem.
We have prototyped our approach (available at: http://bit.ly/pccr_acn)
that demonstrates the effectiveness of our approach in finding patients. Due
to privacy concerns, we were not able to integrate with existing online commu-
nities. We have developed a sample online community, MeMed (available at:
http://bit.ly/me_med_), and created posts similar to those found in existing
communities. Our prototype allows users to add PCORI projects and finding
matching patients in MeMed.
3 Overview of the PCCR Platform
In this section we briefly describe the PCCR platform. We begin by describing
the main models in the system. These are illustated in figure 1.
1. Investigators: captures the information about the investigators who are
seeking participants for their projects. We model the institution and the
areas of interest for an investigator. The areas of interest of an investigator
are automatically created by analyzing their projects.
2. Projects: Each investigator can have multiple projects. Each project has
a title, description, goals, project type that captures the nature of the
project, the medical conditions and medications of interest described in
the project, and the expected outcomes. Our matching algorithm matches
participants across these different dimensions and calculates a match score.
The patients are ranked based on this match score.
3. Patients: We extract patient profiles based on their conversations / partic-
ipation in existing online health communities. We identify and use their
demographic, socio-economic, and medical information in creating their
profile.
2
3. Inves-gators( Pa-ents(
Name( Name(
Organiza-on( Age(
Areas(of(interest( Gender(
Project(History( Loca-on(
Economic(status(
Race(
Areas(of(interest(
Medical(Condi-ons(/(stage(
Ac-vity(
Project(1( Project(2( Project(k(
Project(defini-on(
Medical(
Statement( Goals( Type( Condi-ons( Demographics( Outcome(
Preven-ve( Medical( Age( Trial(
Diagnos-c( condi-on( Gender( Tests(
PR( PR( Therapeu-c( Condi-on(stage( Economic( Studies(
Pallia-ve( Medica-on( Region( Surveys(
UC( UC( Health(Delivery( Race(
UC( UC(
PR( PR(
*PR(–(Pa-ent(response,(*UCN(User(comment(
Figure 1: PCCR Matching Platform - Data Definitions
Project Title & Description Patient Communities
Big Data & Multidimensional Big Data & Multidimensional
Semantic Analysis Semantic Analysis
Rich Project Profile Rich Patient Profile
Multidimensional Semantic
Match Engine
Matched Participants Across
Online Patient Communities
Figure 2: PCCR Matching Platform - Data Flow
The researcher and patients profiles are used by our matching engine to
identify relevant patients for a project. At the heart of the PCCR platform is
our matching engine. Figure 2 illustrates the data flow of our matching engine.
The two main components of the matching engine are the researcher profile
generator and the patient profile generator.
The researcher profile generator takes as input the textual description of a
research project. For the purposes of this challenge, we use the descriptions of
funded PCORI projects. This profile is passed through a semantic analyzer.
The semantic analyzer is built using concepts in RXNorm and SNOMED and
3
4. Figure 3: Example output of semantic analysis
identifies medical terminologies and concepts in the description, along with their
semantic types. In addition to the semantic analyzer, the description is also sent
to OpenCalais Web API for entity identification. A final list of entites and types
is created by combining the output of the semantic analyzer and OpenCalais.
The demographic analyzer module extracts demographic information (such as
age group of target population, gender, and location information). We use
textual cues to identify expected outcomes. Figure 3 illustrates the entities
identified from the description of a PCORI project on Epilepsy.
The patient profile generator uses the semantic analyzer and the demo-
graphic analyzer. However, given the volume of patient data, we needed to
adopt a more scalable approach as semantic analysis can be expensive. We use
a Map-Reduce based solution, where we have a series of map and reduce jobs.
The first map job takes user profiles as input and uses the semantic analyzer
to identify entities and types. In parallel, we have another map job that uses
extracts entities and types using OpenCalais. The respective reduce jobs com-
bine all the identified entities for a patient. We merge these lists to create a
semantic signature of the patient consisting of a collection of entities and their
types. Similarly, the demographic and socio-economic information is identified.
Combining all of the above information yields a rich patient profile. We store
the profile as a structured object in Mongo.
The matching algorithm takes as input a rich project profile. For each of
the facets in medical condition, medication, and demographics, the match-
ing algorithm first finds the relevant patients using set containment opera-
tors. We also use Mongo’s geo querying to filter users by location, if the
project description mentions such as a restriction. Further, we apply a seman-
tic similarity (based on Ted Pedersons UMLS Similarity project available at
http://umls-similarity.sourceforge.net/), to compute the semantic sim-
ilarity of a patient profile to that of a project. All of these are then combined
to create a match score that is used in selected and ranking patients.
4 Related Work
The techniques we have used in this paper are built upon prior research in the
areas of semantic Web, hierarchical object matching, and entity extraction. In
the context of patient matching for healthcare, the TrialX system [4]is very
relevant to work. We also use our prior work in the area of faceted matching
and searching of unstructured documents [3] for factet extraction.We model our
4
5. similarity measurement technique based on the their approach. We also applied
the principles of hierarchical object matching discussed by Ganesan et. al in [2]
and Doan et. al in [1]. We also use OpenCalais Web service [5] to semantically
enrich patient conversations and project descriptions and to extract relevant
entities.
5 Conclusions
In this paper, we describe our solution to the PCORI Healthcare 2.0 chal-
lenge. Our solution leverages existing patient data available in online health
communities and creates a rich semantic profile of the patients. We have also
developed techniques for creating multi-dimensional project profiles from their
textual descriptions. We have developed a semantic matching algorithm that
finds matching patients for research projects. The PCCR platform we have de-
veloped works for any patient community. Due to privacy concerns, we have
not used any online community data in our development or demonstration. In-
stead, we use data from a patient community that we prototyped and seeded
with posts. We evaluated our system and found that our approach has over
90% accuracy in finding patients who have same or similar medical conditions.
The match rate when using demographics goes down to about 80%. We are
currently improving our demographic profiling and extraction technique. Our
approach builds on current ways patients share and interact on the Web today
and we believe that it can help researchers find very relevant patients leading
to more meaningful and productive engagements and outcomes.
References
[1] Anhai Doan, Pedro Domingos, and Alon Halevy. Learning to match the
schemas of data sources: A multistrategy approach. Machine Learning,
50(3):279–301, 2003.
[2] Prasanna Ganesan, Hector Garcia-Molina, and Jennifer Widom. Exploiting
hierarchical domain structure to compute similarity. ACM Transactions on
Information Systems (TOIS), 21(1):64–93, 2003.
[3] Karthik Gomadam, Ajith Ranabahu, Meenakshi Nagarajan, Amit P Sheth,
and Kunal Verma. A faceted classification based approach to search and rank
web apis. In Web Services, 2008. ICWS’08. IEEE International Conference
on, pages 177–184. IEEE, 2008.
[4] Chintan Patel, Sharib Khan, and Karthik Gomadam. Trialx: Using semantic
technologies to match patients to relevant clinical trials based on their per-
sonal health records. Proc. of the International Semantic Web Conference
(ISWC), 2009.
[5] T Reuters. Opencalais, 2009.
5