INTRODUCTION
The pharma industry is one of the few industries where
the level of risk increases as a company gets closer to a
product launch.
When sponsors design clinical trial protocols, they look for
data that can be used to calculate a trial’s feasibility.
Data mining is defined as the process of selecting,
exploring, and modelling large amounts of data to uncover
previously unknown patterns.
A clinical data warehouse (CDW) is regarded as the best
approach to achieve this goal.
2
DATA MINING
Clinical data-mining a process that involves the
conceptualization, extraction, analysis, and
interpretation of available clinical data for practice
knowledge-building, clinical decision-making, and
practitioner reflection.
3
APPROACHES TO DATA MINING
Data Collection
 Clinical Data of any patient is stored in two Different formats: Medical
Transcript File (contains 25 to 30% of information) , EMR (contains 75-80%
of information).
Pre-Processing
 The input document needs to be in Clinical Document Architecture (CDA)
Data Parsing
 Pre-Processed Data is parsed into a single structured format (Snomed codes,
Rxnorm codes)
Application of Knowledge
 Using this knowledge we can create a new Database, and querying the
database can be useful in medical research and in improvement of patient
healthcare.
4
5
ONLINE CLINICAL DATA MINING
OCDM (Online Clinical Data Mining) is a tool for statistical analysis of clinical
data.
Example: OCDM provides a simply way to analyze tumor data to get an individual
evaluation of tumor response.
Statistical evaluations include:
 Modern interactive graphics
 Subgroup analysis
 Kaplan-meier analysis
 Key figures and crosstabs
 Hazard curves
 Cox regression
6
ADVANTAGES:
 No local installation required (server-based)
 Operation is via a modern web browser
 Customizable access rights
 Guaranteed data security
 Export of graphics to png, pdf and tiff format
 Table and graphic export to MS Excel / Word
 Modular structure of the software
 Customer-specific extensions possible
7
METHODS IN DATA MINING
 Tracking patterns: It is one of the simple techniques used to understand
and find out patterns in the concerned dataset.
 Classification: Classification is a comparatively difficult data mining
technique that requires collection of various aspects together into obvious
categories, which can be further used to draw conclusions.
 Association: Association is related to following patterns but is more specific
to dependently related variables. In this type of technique, specific events
or attributes which are extremely connected with another event are looked
into.
 Outlier detection: In this type of technique, we don’t only look into
patterns but we also look into the outliers in the data.
8
 Clustering: Clustering is like classification͘ However, it involves
grouping portions of data organised collectively on the basis of their
similarities.
 Regression: It is a form of arranging and demonstrating the
possibility of certain variables in the presence of other variables.
 Prediction: This is one of the most treasured data mining
techniques. It plans for the type of data which will be seen in future,
just by recognising the patterns of the past, which are enough to
chart a precise forecast.
9
APPLICATIONS OF DATA MINING
Creating Data Warehouse
 Compiling data such as clinical and demographic information within a single system
helps to make a better decision, taking right steps toward the problems in less span of
time and easy access of data.
Creating Electronic Patient Files
 Complete details of the patient is compressed in a single time-indexed manner which
is of great importance in the evaluation of data and in the provision of services.
Data Compliance Solution
 Helps to minimize problems and errors thus obtaining a more organised less error
results.
Detection of Early Warning Signs for Chronic Diseases
 Can analyze and compare the social, economic, demographic, geographical, etc.
variables for each chronic disease and also the variables that have an effect on the
occurrence of the disease.
10
APPLICATIONS OF DATA MINING cont…
Error and Abuse Detection for Laboratory Tests
 Certain abuses like health payments, in particular, the payments of social security
institutions and insurance companies can be found out as well as eradicated.
Clinical Decision Support Systems
 Systems that can work on electronic patient records according to the needs of the
physicians in a way that will respond to the request without going into the
theoretical confusion can be designed.
Improving the Quality by Patient-Focused Health Service
 Key components analysis, factor analysis, logistic regression and decision tree
algorithms can be applied by combining the variables obtained with a
questionnaire of patient and management opinions.
 All variables affecting quality can be handled together, and the quality variables
for each focus group can be individually determined
11
APPLICATIONS OF DATA MINING cont…
Abuses Detection and Billing Corruptions
 Developments in information technologies have led to the adoption of classical
human-based inspection methods by automation-based surveillance and control
systems.
 Data Mining in healthcare will bring some new features as using the descriptive
statistics, outlier detection is determined subjectively on an observation basis
12
DATA WAREHOUSING
A clinical data warehouse (CDW) is an important solution
that is used to achieve clinical stakeholders’ goals by
merging heterogeneous data sources in a central repository
and using this repository to find answers related to the
strategic clinical domain, thereby supporting clinical
decisions.
13
PROPERTIES OF A DATA WAREHOUSE
 Subject-oriented means that only the relevant data are collected
and stored to present useful information related to the subject.
 Integrated property describes the stored data style and format
where all data types, naming conventions, encoding, data domains
and measurements should be unified in standard form.
 Nonvolatile property ensures that the data stored in DW should not
change after any operational process execution
 Time-variant means that the data in the DW should be historical
and present.
14
ARCHITECTURE OF CDW
DATA SOURCES
ETL ( Extraction, Transformation and Loading)
DATA STORAGE
DATA ANALYSIS TECHNOLOGIES
15
16
ADVANTAGES
 Helps determine the relationship between clinical data attributes,
discover disease behavior, evaluate treatment procedures and increase
patients’ outcomes
 Provides users with various information related to management and
research fields
 Enhances the quality of care provided for patients
 Uses the DW platform to enhance data quality and quantity, and improve
query performance and business intelligence
 Uses a knowledge-based platform to make the right decisions on critical
issues;
 Reduces the time spent on data collection and enhances data quality
 Provides a platform for timely analysis and online decision-making
systems for administration, research, clinical and management systems
17
CHALLENGES
Data availability
 Availability of data across different sources depends on completeness and design.
 The old operational systems may work with various policies and obligations on data
entry and types, which may affect the overall data accessibility.
Data format
 The format of the clinical data ranges from text and images to videos and signals.
 The clinical data are also in numeric, qualitative, quantitative to image, ultrasound,
sequential time, signal, protein and microarray forms.
Data collection methods
 The two types of data collection methods are manual and automated.
 Long-term clinical data related to specific diseases, such as continuous diagnosis, need
a different approach compared with short-term medical data.
18
CHALLENGES contd…
Data integration tools
 One of the most important challenges in CDW is implementing data
integration tools.
ETL issues
 Handling old clinical data and transforming them into specific forms
to be loaded into CDW tables require tools, scenarios and plans to
merge with new data.
 This was practically found to be difficult.
Legacy systems
 Considerable time and effort have to be spent on collecting data
from legacy clinical systems
19
REFERENCES
 https://bulawayo24.com/index-id-technology-sc-science-byo-9956 article-
what+is+clinical+data+mining?.html
 http://online-clinical-data-mining.com/
 https://www.forbes.com/sites/medidata/2015/10/15/data-mining-for-better-trial-
design/
 https://www.healthcatalyst.com/whitepaper/3-approaches-healthcare-data-
warehousing
 https://www.researchgate.net/publication/_Data_Mining_Usage_and_Applications
_in_Health_Services/
 https://www.researchgate.net/publication/330030304_CLINICAL_DATA_WAREH
OUSE_A_REVIEW
20
THANK YOU
21

Data mining and data warehousing

  • 2.
    INTRODUCTION The pharma industryis one of the few industries where the level of risk increases as a company gets closer to a product launch. When sponsors design clinical trial protocols, they look for data that can be used to calculate a trial’s feasibility. Data mining is defined as the process of selecting, exploring, and modelling large amounts of data to uncover previously unknown patterns. A clinical data warehouse (CDW) is regarded as the best approach to achieve this goal. 2
  • 3.
    DATA MINING Clinical data-mininga process that involves the conceptualization, extraction, analysis, and interpretation of available clinical data for practice knowledge-building, clinical decision-making, and practitioner reflection. 3
  • 4.
    APPROACHES TO DATAMINING Data Collection  Clinical Data of any patient is stored in two Different formats: Medical Transcript File (contains 25 to 30% of information) , EMR (contains 75-80% of information). Pre-Processing  The input document needs to be in Clinical Document Architecture (CDA) Data Parsing  Pre-Processed Data is parsed into a single structured format (Snomed codes, Rxnorm codes) Application of Knowledge  Using this knowledge we can create a new Database, and querying the database can be useful in medical research and in improvement of patient healthcare. 4
  • 5.
  • 6.
    ONLINE CLINICAL DATAMINING OCDM (Online Clinical Data Mining) is a tool for statistical analysis of clinical data. Example: OCDM provides a simply way to analyze tumor data to get an individual evaluation of tumor response. Statistical evaluations include:  Modern interactive graphics  Subgroup analysis  Kaplan-meier analysis  Key figures and crosstabs  Hazard curves  Cox regression 6
  • 7.
    ADVANTAGES:  No localinstallation required (server-based)  Operation is via a modern web browser  Customizable access rights  Guaranteed data security  Export of graphics to png, pdf and tiff format  Table and graphic export to MS Excel / Word  Modular structure of the software  Customer-specific extensions possible 7
  • 8.
    METHODS IN DATAMINING  Tracking patterns: It is one of the simple techniques used to understand and find out patterns in the concerned dataset.  Classification: Classification is a comparatively difficult data mining technique that requires collection of various aspects together into obvious categories, which can be further used to draw conclusions.  Association: Association is related to following patterns but is more specific to dependently related variables. In this type of technique, specific events or attributes which are extremely connected with another event are looked into.  Outlier detection: In this type of technique, we don’t only look into patterns but we also look into the outliers in the data. 8
  • 9.
     Clustering: Clusteringis like classification͘ However, it involves grouping portions of data organised collectively on the basis of their similarities.  Regression: It is a form of arranging and demonstrating the possibility of certain variables in the presence of other variables.  Prediction: This is one of the most treasured data mining techniques. It plans for the type of data which will be seen in future, just by recognising the patterns of the past, which are enough to chart a precise forecast. 9
  • 10.
    APPLICATIONS OF DATAMINING Creating Data Warehouse  Compiling data such as clinical and demographic information within a single system helps to make a better decision, taking right steps toward the problems in less span of time and easy access of data. Creating Electronic Patient Files  Complete details of the patient is compressed in a single time-indexed manner which is of great importance in the evaluation of data and in the provision of services. Data Compliance Solution  Helps to minimize problems and errors thus obtaining a more organised less error results. Detection of Early Warning Signs for Chronic Diseases  Can analyze and compare the social, economic, demographic, geographical, etc. variables for each chronic disease and also the variables that have an effect on the occurrence of the disease. 10
  • 11.
    APPLICATIONS OF DATAMINING cont… Error and Abuse Detection for Laboratory Tests  Certain abuses like health payments, in particular, the payments of social security institutions and insurance companies can be found out as well as eradicated. Clinical Decision Support Systems  Systems that can work on electronic patient records according to the needs of the physicians in a way that will respond to the request without going into the theoretical confusion can be designed. Improving the Quality by Patient-Focused Health Service  Key components analysis, factor analysis, logistic regression and decision tree algorithms can be applied by combining the variables obtained with a questionnaire of patient and management opinions.  All variables affecting quality can be handled together, and the quality variables for each focus group can be individually determined 11
  • 12.
    APPLICATIONS OF DATAMINING cont… Abuses Detection and Billing Corruptions  Developments in information technologies have led to the adoption of classical human-based inspection methods by automation-based surveillance and control systems.  Data Mining in healthcare will bring some new features as using the descriptive statistics, outlier detection is determined subjectively on an observation basis 12
  • 13.
    DATA WAREHOUSING A clinicaldata warehouse (CDW) is an important solution that is used to achieve clinical stakeholders’ goals by merging heterogeneous data sources in a central repository and using this repository to find answers related to the strategic clinical domain, thereby supporting clinical decisions. 13
  • 14.
    PROPERTIES OF ADATA WAREHOUSE  Subject-oriented means that only the relevant data are collected and stored to present useful information related to the subject.  Integrated property describes the stored data style and format where all data types, naming conventions, encoding, data domains and measurements should be unified in standard form.  Nonvolatile property ensures that the data stored in DW should not change after any operational process execution  Time-variant means that the data in the DW should be historical and present. 14
  • 15.
    ARCHITECTURE OF CDW DATASOURCES ETL ( Extraction, Transformation and Loading) DATA STORAGE DATA ANALYSIS TECHNOLOGIES 15
  • 16.
  • 17.
    ADVANTAGES  Helps determinethe relationship between clinical data attributes, discover disease behavior, evaluate treatment procedures and increase patients’ outcomes  Provides users with various information related to management and research fields  Enhances the quality of care provided for patients  Uses the DW platform to enhance data quality and quantity, and improve query performance and business intelligence  Uses a knowledge-based platform to make the right decisions on critical issues;  Reduces the time spent on data collection and enhances data quality  Provides a platform for timely analysis and online decision-making systems for administration, research, clinical and management systems 17
  • 18.
    CHALLENGES Data availability  Availabilityof data across different sources depends on completeness and design.  The old operational systems may work with various policies and obligations on data entry and types, which may affect the overall data accessibility. Data format  The format of the clinical data ranges from text and images to videos and signals.  The clinical data are also in numeric, qualitative, quantitative to image, ultrasound, sequential time, signal, protein and microarray forms. Data collection methods  The two types of data collection methods are manual and automated.  Long-term clinical data related to specific diseases, such as continuous diagnosis, need a different approach compared with short-term medical data. 18
  • 19.
    CHALLENGES contd… Data integrationtools  One of the most important challenges in CDW is implementing data integration tools. ETL issues  Handling old clinical data and transforming them into specific forms to be loaded into CDW tables require tools, scenarios and plans to merge with new data.  This was practically found to be difficult. Legacy systems  Considerable time and effort have to be spent on collecting data from legacy clinical systems 19
  • 20.
    REFERENCES  https://bulawayo24.com/index-id-technology-sc-science-byo-9956 article- what+is+clinical+data+mining?.html http://online-clinical-data-mining.com/  https://www.forbes.com/sites/medidata/2015/10/15/data-mining-for-better-trial- design/  https://www.healthcatalyst.com/whitepaper/3-approaches-healthcare-data- warehousing  https://www.researchgate.net/publication/_Data_Mining_Usage_and_Applications _in_Health_Services/  https://www.researchgate.net/publication/330030304_CLINICAL_DATA_WAREH OUSE_A_REVIEW 20
  • 21.