SlideShare a Scribd company logo
Data Analytics
Topics to be covered
• Introducing Big Data
• Big Data in healthcare
• Database structure and management
• Database structure
• How to manage your data
• Statistical analysis in population health management
• Introduction to statistics
• Statistical analysis in healthcare
Introducing Big Data
• Information that can’t be processed or analyzed using traditional
processes or tools
• There are four dimensions to Big Data: Volume, Velocity, Variety,
Veracity
• Challenges with Big Data: Capturing, Storing, Searching, Sharing &
Analyzing
Introducing Big Data
• Volume
• The amount of data being collected is unprecedented
• The volume of data available is on the rise, while the percent that can be
analyzed is on the decline. This is known as the data blind zone.
• Velocity
• The rate at which the data is being generated needs to be handled
• How quickly is the data arriving and stored?
• How quickly can you process the data?
Introducing Big Data
• Variety
• With an increase in quantity, comes an increase in quality
• Issues with storing complex data
• Analyzing all different types of data
• Veracity
• The accuracy of data becomes more important as we use more of it
• Garbage in, garbage out
Introducing Big Data
• Big Data challenges:
• Capturing
• Data is initially pulled from all sorts of different places
• Storing
• Data is kept in different locations (virtual or otherwise)
• Security concerns
• Searching
• Having a database capable of handling searches
• Optimizing a database for searches
Introducing Big Data
• Big Data challenges:
• Sharing
• There are valid security concerns
• The data variety poses a problem when sharing
• Analyzing
• Extracting the data isn’t easy
• Data variety poses a significant problem
• Sheer volume of data makes it difficult to focus
Big Data in Healthcare
• Incentives for big data use are rising
• Movement to evidence-based care
• Increase in available technologies for data collection, analysis and
communication
• The ultimate goal is improving patient health while reducing costs
Big Data in Healthcare
• Volume
• Healthcare data is more plentiful than ever
• Velocity
• Data flows real time and is processed real time
• Variety
• Billing information and clinical information
• Veracity
• Data accuracy is vital to an organization
Big Data in Healthcare
• Challenges
• Mixing healthcare with IT
• The availability of data has exploded
• How do you handle the influx of data?
• Finding the relevant data to mine
Database Structure and
Management
Database Structure
• A structured set of data held in a computer, especially one that is
accessible in various ways (or not so accessible in some cases).
• Data are organized in database tables, which consists of rows and
columns.
• Each row is called a record, object or entity. Each column is called a
field or attribute.
• Each column should contain the same data type, but each row can
have different data types
Database Structure
Database Structure
• Two types of keys, primary and foreign
• Primary keys makes a row of data unique, it can be made up of
multiple columns
• Foreign keys are columns or group of columns in a relational database
table that provide a link between data in two tables
Database Structure
Database Structure
• Database relationships can be of three different types:
• One-to-one
• One-to-many
• Many-to-many
Database Structure
• One-to-One Relationships
• A key will appear only once in a related table.
• Example: A patient can only be assigned one primary care provider
Database Structure
• One-to-Many Relationships
• Keys from one table will appear multiple times in a related table
• Example: One provider can be assigned multiple patients in paneling
Database Structure
• Many-to-Many relationships
• The key value of one table can appear many times in a related table, but the
opposite also holds true!
• Example: A patient can see multiple different providers and a provider can see
multiple different patients
How to Manage Your Data
• The importance of managing your database
• Your database is composed of data and is built by the software companies.
You can effectively manage what goes INTO your database.
• It plays an important role in improving the performance of an organization’s
health care systems.
• Collecting, analyzing, interpreting, and acting on data for specific
performance measures allows health care professionals to identify where
systems are falling short, to make corrective adjustments, and to track
outcomes.
How to Manage Your Data
• Developing an EMR data roadmap
• First determine what you need to collect
• Next, identify where the data is able to be entered
• Find out who is entering it
• Develop a roadmap of your data using a spreadsheet
• Rows would correspond to the data being collected
• Columns would contain the where and who
How to Manage Your Data
• Data roadmap example:
Measure Name Data Item Field Name Employee
Colorectal Cancer Colonoscopy Result healthmaintenance.table MD
Colorectal Cancer Colonoscopy Date diagnostichistory.table MA
Colorectal Cancer Colonoscopy Document referralorder.table RN
Colorectal Cancer FIT Outside lab result outsidelabs.table MA
Colorectal Cancer FIT Quest lab resul emrlabs.table MD
Hypertension Systolic BP vitalssys.table MA
Hypertension Dyastolic BP vitalsdys.table MA
How to Manage Your Data
• Data Health Checks
• They are periodic reviews of your EMR data's integrity
• Establish timelines for the data health checks, yearly is recommended.
• Get your data health check team together, members from different departments are
recommended
• Document your data health checks, and don’t delete roadmap columns. Simply add
another tab in your spreadsheet.
How to Manage Your Data
• Creating data workflows
• Use the data roadmap to streamline workflows
• Duplicate data entry
• Redundant data workflows
• Too many places to document
• Too many variations in your data types
• Standardize the process
• Involve the end-users in the process
• Use a diverse team, the same team that does the Data Health Checks works
well
Statistical Analysis in PHM
• Statistical analysis involved using the scientific method to answer
questions and make decisions
• It involves designing the studies, collecting good data, describing the
data with numbers and graphs, analyzing the data, and then making
conclusions.
Introduction to Statistics
• Statistics are everywhere, from healthcare to marketing.
• Usually statistics deals with two different sets of data:
• Population:
• The set of individual persons or objects in which an investigator is primarily
interested during his or her research problem
• Sample:
• That part of the population from which information is collected
Introduction to Statistics
• There are two major types of statistics
• Descriptive: methods for organizing and summarizing information
• Inferential: methods for drawing and measuring the reliability of conclusions
about a population
• Descriptive statistics involves graphs, charts, tables, etc.
• Inferential statistics is predictive and includes methods like point
estimation, interval estimation and hypothesis testing
Introduction to Statistics
• Descriptive Statistics Examples:
PatientID
Tobacco
Cessation
5465 Yes
5466 No
5467 Yes
5468 Yes
5469 No
5470 Yes
5471 Yes
5472 Yes
5473 Yes
Introduction to Statistics
• Independent and Dependent Variables
• Independent variables are manipulated by an experimenter
• Example: A provider wants to know which medication is best for depression,
he has four antidepressants to choose from. Which medication they give out,
is the independent variable.
• Dependent variables are the results of the experiment
• Example: After a period of time, the provider interviews the patients to see
what their PHQ score is, the PHQ score is the dependent variable.
Introduction to Statistics
• Distribution
• Distribution has to do with the frequency of the data
• Example: You purchase a bag of Skittles. Skittles come in different colors, how
many of each type of color is found in the bag?
• This is known as a frequency table, which describes the Skittles color
frequencies
Color Count
Green 15
Blue 8
Yellow 10
Purple 6
Red 12
Introduction to Statistics
• Continuous Variables
• Sometimes data is always changing, and you never have a black and white
data set like in our Skittles example
• When your data is varied, you can do a grouped frequency distribution and
look at your data in histogram form
• Example:
• We’re much better off looking at the data in grouped frequency rather than
looking at each HgbA1c result
HgbA1c Values Count
<7 253
>7<8 700
>8<9 740
>9 141
Introduction to Statistics
• Probability Distributions: Discrete vs Continuous
• Depends on whether they define probabilities associated with discrete
variables or continuous variables.
• Discrete vs. Continuous Variables
• If a variable can take on any value between two specified values, it is called
a continuous variable; otherwise, it is called a discrete variable.
• Example:
• To be eligible for a particular program, your income must be between x and y amount.
This is an example of a continuous variable, because no one in the program would have
an income outside the parameters of x and y.
• The weight distribution of a patient population is an example of a discrete variable
Introduction to Statistics
• Probability Densities
• These are needed to observe not just one data set, but many of them at the
same time. This is called continuous distribution.
• Normal (Bell) distribution, a type of continuous distribution, explains many
natural phenomena
Introduction to Statistics
• Distribution shapes
• If you fold the figure in the previous slide in half you would get equal halves.
However, not all distributions are symmetrical.
• A distribution with a longer “tail” to the positive direction is said to have a
“positive skew”, it can also be known as “skewed to the right”:
Introduction to Statistics
• Although less common, some distributions have a “negative skew”.
Introduction to Statistics
• All the distributions so far have had one distinct high point or peak.
When distributions have two peaks in the data, this is called a
bimodal distribution:
Introduction to Statistics
• Some statistic definitions
• Mean – add up all the numbers and divide by the number of numbers
• Medium – middle value in the list of numbers, the numbers have to be listed
in numerical order
• Mode – the value that occurs most often
• Range – the difference between the largest and smallest values
Introduction to Statistics
• Properties of the Normal (Bell) Distribution Curve
• Suppose that the total area under the curve is defined to be 1. You can
multiply that number by 100 and say there is a 100% chance that any value
you can name will be somewhere in the distribution.(Remember: The
distribution extends to infinity in both directions.)
• Similarly, because half the area of the curve is below the mean and half is
above it, you can say that there is a 50 percent chance that a randomly
chosen value will be above the mean and the same chance that it will be
below it.
Introduction to Statistics
• A normal curve also has an equal mean, medium and mode.
• When looking at data points, the Mean is known as “sigma”. The
sigma is also the standard deviation of a population.
Introduction to Statistics
• In a normal distribution, 68% of the data are between one standard
deviation below the mean and one standard deviation above the
mean. 95% are within two standard deviations of the mean and
99.7% are within three standard deviations of the mean.
Statistical Models
Introduction to Statistics
• Descriptive Statistic Models
• Graphing data from frequency tables in:
• Pie charts
• Bar Charts
HgbA1c Values Count
<7 253
>7<8 700
>8<9 740
>9 141
Introduction to Statistics
• Descriptive Statistic Models
• Graphing data from linear date tables
• Line Graphs: line graphs are meant to show data over time
Introduction to Statistics
• Histograms
• It’s a graphical method for displaying the shape of a distribution, really useful
when looking at large amounts of data.
• Example: We analyzed 10 patients, and we recorded their most recent LDL
values. The values ranged from 57 to 221. We would first create a frequency
table that breaks the values into intervals or parameters.
Introduction to Statistics
• Histogram Data set
LDL Intervals LDL Values
70 65
100 138
130 102
160 221
190 155
99
144
113
166
159
Introduction to Statistics
• Things to note about frequency tables:
• Intervals or parameters are also known as bins
• The bin values in the column is the highest value possible in the bin set
• To set up your bins, use the Rice rule. Set the number of intervals to twice the
cube root of the number of observations.
• In the case of 1000 observations, the Rice rule yields 20 intervals. In our previous
example, we got the data for 10 patients. So the cube root of 10 would be 2, twice that
would be 4. We settled on 5 to have more uniform bins. The rule is more of a guideline
and you can experiment with the bin numbers to get different distribution curves.
Introduction to Statistics
• Creating a Histogram using excel:
• First, make sure the Analysis ToolPak is enabled.
• Go to File, Options:
Introduction to Statistics
• Creating a Histogram using excel:
• Then, select Add-ins
• At the bottom of the view, select Excel Add-ins, then select Go…
Introduction to Statistics
• Creating a Histogram using excel:
• Afterwards, select the Analysis ToolPak and click OK
• The Data Analysis button now appears under the Data tab in the Excel home
menu
Introduction to Statistics
• Creating a Histogram using excel:
• Select your data set, then click on the Data Analysis button. A list pops up.
Select Histogram from the list
• It will ask you to select the Input Range and the Bin Range. The input range
are the actual values, the bin range are the set intervals
• If you have included the column labels, click on the labels box.
• Then select where you would like your histogram to go (the default is fine),
then click on chart output at the bottom
Introduction to Statistics
• If you followed the instructions, your should get a spreadsheet that
looks like this (reduce the gap width to zero to get the columns all
bunched up):
Introduction to Statistics
• Histogram applications in healthcare
• Large data sets
• Pareto charts to correctly identify vulnerable populations
• The 80/20 rule can help identify the areas to focus on
• Best when data ranges can vary, as averages are not a good measuring tool
• Examples: Cycle time, lab values, etc. Really any population with discrete variables
Introduction to Statistics
• Regression Analysis
• Linear Regression: At the center of regression is the relationship between two
variables called the dependent and independent variables
• You want to compare two data sets to see what a change in the independent
variable causes in the dependent variable
• Example:
• You notice that the Behavioral Health department is swamped with referrals from
primary care during the winter months. You wonder if there’s some correlation between
the average PHQ-9 scores of the patients, the months of the year an d the amount of
referrals BH is getting.
Introduction to Statistics
• You extract some data from your system, and obtain the following
data set
Date
Average PHQ-9
Score
Average Referrals
to BH
January 19 60
February 18 57
March 14 48
April 10 35
May 10 22
June 8 20
July 8 15
August 7 15
September 8 14
October 12 15
November 15 35
December 20 53
Introduction to Statistics
• Let’s regress.
• Choosing Data Analysis again from the Data Menu Item, choose
regression from the menu
• Put in the dependent variable in the y axis and the independent
variable in the x axis
• Click on Line Fit Plots to get a nice graph heat map that shows how
tight the relationship between the PHQ score and number of referrals
really is
Introduction to Statistics
• You should get the following (there’s more data but it get’s
complicated):
Regression Statistics
Multiple R 0.90733138
R Square 0.823250233
Adjusted R Square 0.805575256
Standard Error 7.928960791
Observations 12
Introduction to Statistics
• Our model tells us the following important information:
• Multiple R. This is the correlation coefficient. It tells you how strong the linear
relationship is. For example, a value of 1 means a perfect positive relationship and a
value of zero means no relationship at all. It is the square root of r squared (see #2)
• R squared. This is r2, the Coefficient of Determination. It tells you how many points
fall on the regression line. for example, 80% means that 80% of the variation of y-
values around the mean are explained by the x-values. In other words, 80% of the
values fit the model
• Adjusted R square. The adjusted R-square adjusts for the number of terms in a
model. You’ll want to use this instead of #2 if you have more than one x variable
Introduction to Statistics
• How is this useful?
• First of all, you have now proven your theory. In the summer months, when
the average PHQ scores are lower there are less referrals in the winter
months when the average PHQ scores are higher
• Use this information to request extra staffing, longer hours, etc. It’s not
conjecture anymore, you have hard data that proves it
• Maybe you can use this information to mount a depression campaign during
the winter months in your clinic, the uses are endless for the data
Statistical Analysis in Healthcare
Statistical Analysis In Healthcare
• Currently, there is an abundance of data. There is a real need for
people who can analyze and interpret clinical, operational and
financial data in healthcare
• Statistical analysis is looking at regression cost models, to see if
particular diagnoses or services increase or decrease costs
• Combining operational and clinical data will yield maximum
knowledge to create better clinical workflows and increase patient
satisfaction
Statistical Analysis In Healthcare
• Currently, not many healthcare centers or hospitals use analytics
software on a daily basis
• Statistical analysis of a patient population can help determine where
to focus efforts for maximum impact
• Using social determinants of health as data points, you can also
determine if there are correlations between them and patient
outcomes

More Related Content

What's hot

Decision Tree Models for Medical Diagnosis
Decision Tree Models for Medical DiagnosisDecision Tree Models for Medical Diagnosis
Decision Tree Models for Medical Diagnosis
ijtsrd
 
Demonstrating Research Impact: Measuring Return on Investment with an Impact ...
Demonstrating Research Impact: Measuring Return on Investment with an Impact ...Demonstrating Research Impact: Measuring Return on Investment with an Impact ...
Demonstrating Research Impact: Measuring Return on Investment with an Impact ...
CesToronto
 
Standards in health informatics - Problem, clinical models and terminologies
Standards in health informatics - Problem, clinical models and terminologiesStandards in health informatics - Problem, clinical models and terminologies
Standards in health informatics - Problem, clinical models and terminologies
Silje Ljosland Bakke
 
searching for evidence
searching for evidencesearching for evidence
searching for evidence
IAU Dent
 
Clinical data munging
Clinical data mungingClinical data munging
Clinical data munging
Ken Mwai
 
Data peer review workshop
Data peer review workshopData peer review workshop
Data peer review workshop
Varsha Khodiyar
 
Informatics and the merging of research and quality measures with bedside care
Informatics and the merging of research and quality measures with bedside careInformatics and the merging of research and quality measures with bedside care
Informatics and the merging of research and quality measures with bedside care
Mike Hogarth, MD, FACMI, FACP
 
RDAP 16 Poster: Responding to Data Management and Sharing Requirements in the...
RDAP 16 Poster: Responding to Data Management and Sharing Requirements in the...RDAP 16 Poster: Responding to Data Management and Sharing Requirements in the...
RDAP 16 Poster: Responding to Data Management and Sharing Requirements in the...
ASIS&T
 
The Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer DatasetThe Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer Dataset
CongChen35
 
Using machine learning to improve the user experience in online health care c...
Using machine learning to improve the user experience in online health care c...Using machine learning to improve the user experience in online health care c...
Using machine learning to improve the user experience in online health care c...
Anja Pilz
 
THE TECHNOLOGY OF USING A DATA WAREHOUSE TO SUPPORT DECISION-MAKING IN HEALTH...
THE TECHNOLOGY OF USING A DATA WAREHOUSE TO SUPPORT DECISION-MAKING IN HEALTH...THE TECHNOLOGY OF USING A DATA WAREHOUSE TO SUPPORT DECISION-MAKING IN HEALTH...
THE TECHNOLOGY OF USING A DATA WAREHOUSE TO SUPPORT DECISION-MAKING IN HEALTH...
ijdms
 
National archetype governance in Norway
National archetype governance in NorwayNational archetype governance in Norway
National archetype governance in Norway
Silje Ljosland Bakke
 
Mobilizing informational resources for rare diseases
Mobilizing informational resources for rare diseasesMobilizing informational resources for rare diseases
Mobilizing informational resources for rare diseases
Maria Shkrob
 
Assessing Research Impact: Bibliometrics, Citations and the H-Index
Assessing Research Impact: Bibliometrics, Citations and the H-IndexAssessing Research Impact: Bibliometrics, Citations and the H-Index
Assessing Research Impact: Bibliometrics, Citations and the H-Index
Fintan Bracken
 
Terminology and information models
Terminology and information modelsTerminology and information models
Terminology and information models
Silje Ljosland Bakke
 
Novel Research Data Delivery System Using REDCap 20131211
Novel Research Data Delivery System Using REDCap 20131211Novel Research Data Delivery System Using REDCap 20131211
Novel Research Data Delivery System Using REDCap 20131211Travis H Nagler, MS, CPHIMS
 
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
ASIS&T
 
Embi cri review-2012-final
Embi cri review-2012-finalEmbi cri review-2012-final
Embi cri review-2012-final
Peter Embi
 

What's hot (18)

Decision Tree Models for Medical Diagnosis
Decision Tree Models for Medical DiagnosisDecision Tree Models for Medical Diagnosis
Decision Tree Models for Medical Diagnosis
 
Demonstrating Research Impact: Measuring Return on Investment with an Impact ...
Demonstrating Research Impact: Measuring Return on Investment with an Impact ...Demonstrating Research Impact: Measuring Return on Investment with an Impact ...
Demonstrating Research Impact: Measuring Return on Investment with an Impact ...
 
Standards in health informatics - Problem, clinical models and terminologies
Standards in health informatics - Problem, clinical models and terminologiesStandards in health informatics - Problem, clinical models and terminologies
Standards in health informatics - Problem, clinical models and terminologies
 
searching for evidence
searching for evidencesearching for evidence
searching for evidence
 
Clinical data munging
Clinical data mungingClinical data munging
Clinical data munging
 
Data peer review workshop
Data peer review workshopData peer review workshop
Data peer review workshop
 
Informatics and the merging of research and quality measures with bedside care
Informatics and the merging of research and quality measures with bedside careInformatics and the merging of research and quality measures with bedside care
Informatics and the merging of research and quality measures with bedside care
 
RDAP 16 Poster: Responding to Data Management and Sharing Requirements in the...
RDAP 16 Poster: Responding to Data Management and Sharing Requirements in the...RDAP 16 Poster: Responding to Data Management and Sharing Requirements in the...
RDAP 16 Poster: Responding to Data Management and Sharing Requirements in the...
 
The Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer DatasetThe Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer Dataset
 
Using machine learning to improve the user experience in online health care c...
Using machine learning to improve the user experience in online health care c...Using machine learning to improve the user experience in online health care c...
Using machine learning to improve the user experience in online health care c...
 
THE TECHNOLOGY OF USING A DATA WAREHOUSE TO SUPPORT DECISION-MAKING IN HEALTH...
THE TECHNOLOGY OF USING A DATA WAREHOUSE TO SUPPORT DECISION-MAKING IN HEALTH...THE TECHNOLOGY OF USING A DATA WAREHOUSE TO SUPPORT DECISION-MAKING IN HEALTH...
THE TECHNOLOGY OF USING A DATA WAREHOUSE TO SUPPORT DECISION-MAKING IN HEALTH...
 
National archetype governance in Norway
National archetype governance in NorwayNational archetype governance in Norway
National archetype governance in Norway
 
Mobilizing informational resources for rare diseases
Mobilizing informational resources for rare diseasesMobilizing informational resources for rare diseases
Mobilizing informational resources for rare diseases
 
Assessing Research Impact: Bibliometrics, Citations and the H-Index
Assessing Research Impact: Bibliometrics, Citations and the H-IndexAssessing Research Impact: Bibliometrics, Citations and the H-Index
Assessing Research Impact: Bibliometrics, Citations and the H-Index
 
Terminology and information models
Terminology and information modelsTerminology and information models
Terminology and information models
 
Novel Research Data Delivery System Using REDCap 20131211
Novel Research Data Delivery System Using REDCap 20131211Novel Research Data Delivery System Using REDCap 20131211
Novel Research Data Delivery System Using REDCap 20131211
 
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
 
Embi cri review-2012-final
Embi cri review-2012-finalEmbi cri review-2012-final
Embi cri review-2012-final
 

Similar to Data analytics in Healthcare

Knowledge discovery in medicine
Knowledge discovery in medicineKnowledge discovery in medicine
Knowledge discovery in medicine
Avinash Hanwate
 
Big data in healthcare
Big data in healthcareBig data in healthcare
Big data in healthcare
DeZyre
 
Routine data use in evaluation: practical guidance
Routine data use in evaluation: practical guidanceRoutine data use in evaluation: practical guidance
Routine data use in evaluation: practical guidance
MEASURE Evaluation
 
Principles of data collection.pptx
Principles of data collection.pptxPrinciples of data collection.pptx
Principles of data collection.pptx
Dr. Chirag Sonkusare
 
Prof Mendel Singer Big Data Meets Public Health and Medicine 2018 12-22
Prof Mendel Singer Big Data Meets Public Health and Medicine 2018 12-22Prof Mendel Singer Big Data Meets Public Health and Medicine 2018 12-22
Prof Mendel Singer Big Data Meets Public Health and Medicine 2018 12-22
mjbinstitute
 
2014 engaging communities in education and research - SNOCAP introduction - w...
2014 engaging communities in education and research - SNOCAP introduction - w...2014 engaging communities in education and research - SNOCAP introduction - w...
2014 engaging communities in education and research - SNOCAP introduction - w...
Donald Nease
 
Data mining (DM) in the pharmaceutical industry
Data mining (DM) in the pharmaceutical industryData mining (DM) in the pharmaceutical industry
Data mining (DM) in the pharmaceutical industry
lurdhu agnes
 
Information Products to Drive Decision Making
Information Products to Drive Decision  MakingInformation Products to Drive Decision  Making
Information Products to Drive Decision Making
MEASURE Evaluation
 
Statistics — Your Friend, Not Your Foe
Statistics — Your Friend, Not Your Foe Statistics — Your Friend, Not Your Foe
Statistics — Your Friend, Not Your Foe
Integrity Management Services, Inc.
 
WEEK-1-IS-20022023-094301am.pdf
WEEK-1-IS-20022023-094301am.pdfWEEK-1-IS-20022023-094301am.pdf
WEEK-1-IS-20022023-094301am.pdf
MdDahri
 
Generating Quality Data through Collaborative Research with an ACO
Generating Quality Data through Collaborative Research with an ACOGenerating Quality Data through Collaborative Research with an ACO
Generating Quality Data through Collaborative Research with an ACO
Todd Berner MD
 
COMMUNITY NEED ASSESSMENT.pptx
COMMUNITY NEED ASSESSMENT.pptxCOMMUNITY NEED ASSESSMENT.pptx
COMMUNITY NEED ASSESSMENT.pptx
GhaffarAhmed9
 
SAFTINet Overview for EDRC
SAFTINet Overview for EDRCSAFTINet Overview for EDRC
SAFTINet Overview for EDRC
Marion Sills
 
classIX_DS_Teacher_Presentation.pptx
classIX_DS_Teacher_Presentation.pptxclassIX_DS_Teacher_Presentation.pptx
classIX_DS_Teacher_Presentation.pptx
XICSStudents
 
Advanced Biostatistics presentation pptx
Advanced Biostatistics presentation  pptxAdvanced Biostatistics presentation  pptx
Advanced Biostatistics presentation pptx
Abebe334138
 
Quality improvement dev days-2017
Quality improvement dev days-2017Quality improvement dev days-2017
Quality improvement dev days-2017
DevDays
 
AMDIS CHIME Fall Symposium
AMDIS CHIME Fall SymposiumAMDIS CHIME Fall Symposium
AMDIS CHIME Fall Symposium
Dale Sanders
 
Big Data Mining Methods in Medical Applications [Autosaved].pptx
Big Data Mining Methods in Medical Applications [Autosaved].pptxBig Data Mining Methods in Medical Applications [Autosaved].pptx
Big Data Mining Methods in Medical Applications [Autosaved].pptx
HemaSenthil5
 
Lec 1 Theories, Models and Frameworks (Nursing Informatics).pdf
Lec 1 Theories, Models and Frameworks (Nursing Informatics).pdfLec 1 Theories, Models and Frameworks (Nursing Informatics).pdf
Lec 1 Theories, Models and Frameworks (Nursing Informatics).pdf
SarahJaneMagante
 
statistics introduction.ppt
statistics introduction.pptstatistics introduction.ppt
statistics introduction.ppt
CHANDAN PADHAN
 

Similar to Data analytics in Healthcare (20)

Knowledge discovery in medicine
Knowledge discovery in medicineKnowledge discovery in medicine
Knowledge discovery in medicine
 
Big data in healthcare
Big data in healthcareBig data in healthcare
Big data in healthcare
 
Routine data use in evaluation: practical guidance
Routine data use in evaluation: practical guidanceRoutine data use in evaluation: practical guidance
Routine data use in evaluation: practical guidance
 
Principles of data collection.pptx
Principles of data collection.pptxPrinciples of data collection.pptx
Principles of data collection.pptx
 
Prof Mendel Singer Big Data Meets Public Health and Medicine 2018 12-22
Prof Mendel Singer Big Data Meets Public Health and Medicine 2018 12-22Prof Mendel Singer Big Data Meets Public Health and Medicine 2018 12-22
Prof Mendel Singer Big Data Meets Public Health and Medicine 2018 12-22
 
2014 engaging communities in education and research - SNOCAP introduction - w...
2014 engaging communities in education and research - SNOCAP introduction - w...2014 engaging communities in education and research - SNOCAP introduction - w...
2014 engaging communities in education and research - SNOCAP introduction - w...
 
Data mining (DM) in the pharmaceutical industry
Data mining (DM) in the pharmaceutical industryData mining (DM) in the pharmaceutical industry
Data mining (DM) in the pharmaceutical industry
 
Information Products to Drive Decision Making
Information Products to Drive Decision  MakingInformation Products to Drive Decision  Making
Information Products to Drive Decision Making
 
Statistics — Your Friend, Not Your Foe
Statistics — Your Friend, Not Your Foe Statistics — Your Friend, Not Your Foe
Statistics — Your Friend, Not Your Foe
 
WEEK-1-IS-20022023-094301am.pdf
WEEK-1-IS-20022023-094301am.pdfWEEK-1-IS-20022023-094301am.pdf
WEEK-1-IS-20022023-094301am.pdf
 
Generating Quality Data through Collaborative Research with an ACO
Generating Quality Data through Collaborative Research with an ACOGenerating Quality Data through Collaborative Research with an ACO
Generating Quality Data through Collaborative Research with an ACO
 
COMMUNITY NEED ASSESSMENT.pptx
COMMUNITY NEED ASSESSMENT.pptxCOMMUNITY NEED ASSESSMENT.pptx
COMMUNITY NEED ASSESSMENT.pptx
 
SAFTINet Overview for EDRC
SAFTINet Overview for EDRCSAFTINet Overview for EDRC
SAFTINet Overview for EDRC
 
classIX_DS_Teacher_Presentation.pptx
classIX_DS_Teacher_Presentation.pptxclassIX_DS_Teacher_Presentation.pptx
classIX_DS_Teacher_Presentation.pptx
 
Advanced Biostatistics presentation pptx
Advanced Biostatistics presentation  pptxAdvanced Biostatistics presentation  pptx
Advanced Biostatistics presentation pptx
 
Quality improvement dev days-2017
Quality improvement dev days-2017Quality improvement dev days-2017
Quality improvement dev days-2017
 
AMDIS CHIME Fall Symposium
AMDIS CHIME Fall SymposiumAMDIS CHIME Fall Symposium
AMDIS CHIME Fall Symposium
 
Big Data Mining Methods in Medical Applications [Autosaved].pptx
Big Data Mining Methods in Medical Applications [Autosaved].pptxBig Data Mining Methods in Medical Applications [Autosaved].pptx
Big Data Mining Methods in Medical Applications [Autosaved].pptx
 
Lec 1 Theories, Models and Frameworks (Nursing Informatics).pdf
Lec 1 Theories, Models and Frameworks (Nursing Informatics).pdfLec 1 Theories, Models and Frameworks (Nursing Informatics).pdf
Lec 1 Theories, Models and Frameworks (Nursing Informatics).pdf
 
statistics introduction.ppt
statistics introduction.pptstatistics introduction.ppt
statistics introduction.ppt
 

More from Jorge A. Gaspar

PCMH 2014 recognition changes
PCMH 2014 recognition changesPCMH 2014 recognition changes
PCMH 2014 recognition changes
Jorge A. Gaspar
 
CSUSM-Team Precision-The Economist Business Case Competition 2015
CSUSM-Team Precision-The Economist Business Case Competition 2015CSUSM-Team Precision-The Economist Business Case Competition 2015
CSUSM-Team Precision-The Economist Business Case Competition 2015
Jorge A. Gaspar
 
Lean in healthcare (ugm)
Lean in healthcare (ugm)Lean in healthcare (ugm)
Lean in healthcare (ugm)
Jorge A. Gaspar
 
VCC HEDIS TEMPLATE
VCC HEDIS TEMPLATEVCC HEDIS TEMPLATE
VCC HEDIS TEMPLATE
Jorge A. Gaspar
 
Lean in healthcare %28final%29
Lean in healthcare %28final%29Lean in healthcare %28final%29
Lean in healthcare %28final%29
Jorge A. Gaspar
 

More from Jorge A. Gaspar (6)

PCMH 2014 recognition changes
PCMH 2014 recognition changesPCMH 2014 recognition changes
PCMH 2014 recognition changes
 
CSUSM-Team Precision-The Economist Business Case Competition 2015
CSUSM-Team Precision-The Economist Business Case Competition 2015CSUSM-Team Precision-The Economist Business Case Competition 2015
CSUSM-Team Precision-The Economist Business Case Competition 2015
 
Lean in healthcare (ugm)
Lean in healthcare (ugm)Lean in healthcare (ugm)
Lean in healthcare (ugm)
 
VCC HEDIS TEMPLATE
VCC HEDIS TEMPLATEVCC HEDIS TEMPLATE
VCC HEDIS TEMPLATE
 
Lean in healthcare %28final%29
Lean in healthcare %28final%29Lean in healthcare %28final%29
Lean in healthcare %28final%29
 
Abstract IPS Boston
Abstract IPS BostonAbstract IPS Boston
Abstract IPS Boston
 

Recently uploaded

Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 

Data analytics in Healthcare

  • 2. Topics to be covered • Introducing Big Data • Big Data in healthcare • Database structure and management • Database structure • How to manage your data • Statistical analysis in population health management • Introduction to statistics • Statistical analysis in healthcare
  • 3. Introducing Big Data • Information that can’t be processed or analyzed using traditional processes or tools • There are four dimensions to Big Data: Volume, Velocity, Variety, Veracity • Challenges with Big Data: Capturing, Storing, Searching, Sharing & Analyzing
  • 4. Introducing Big Data • Volume • The amount of data being collected is unprecedented • The volume of data available is on the rise, while the percent that can be analyzed is on the decline. This is known as the data blind zone. • Velocity • The rate at which the data is being generated needs to be handled • How quickly is the data arriving and stored? • How quickly can you process the data?
  • 5. Introducing Big Data • Variety • With an increase in quantity, comes an increase in quality • Issues with storing complex data • Analyzing all different types of data • Veracity • The accuracy of data becomes more important as we use more of it • Garbage in, garbage out
  • 6. Introducing Big Data • Big Data challenges: • Capturing • Data is initially pulled from all sorts of different places • Storing • Data is kept in different locations (virtual or otherwise) • Security concerns • Searching • Having a database capable of handling searches • Optimizing a database for searches
  • 7. Introducing Big Data • Big Data challenges: • Sharing • There are valid security concerns • The data variety poses a problem when sharing • Analyzing • Extracting the data isn’t easy • Data variety poses a significant problem • Sheer volume of data makes it difficult to focus
  • 8. Big Data in Healthcare • Incentives for big data use are rising • Movement to evidence-based care • Increase in available technologies for data collection, analysis and communication • The ultimate goal is improving patient health while reducing costs
  • 9. Big Data in Healthcare • Volume • Healthcare data is more plentiful than ever • Velocity • Data flows real time and is processed real time • Variety • Billing information and clinical information • Veracity • Data accuracy is vital to an organization
  • 10. Big Data in Healthcare • Challenges • Mixing healthcare with IT • The availability of data has exploded • How do you handle the influx of data? • Finding the relevant data to mine
  • 12. Database Structure • A structured set of data held in a computer, especially one that is accessible in various ways (or not so accessible in some cases). • Data are organized in database tables, which consists of rows and columns. • Each row is called a record, object or entity. Each column is called a field or attribute. • Each column should contain the same data type, but each row can have different data types
  • 14. Database Structure • Two types of keys, primary and foreign • Primary keys makes a row of data unique, it can be made up of multiple columns • Foreign keys are columns or group of columns in a relational database table that provide a link between data in two tables
  • 16. Database Structure • Database relationships can be of three different types: • One-to-one • One-to-many • Many-to-many
  • 17. Database Structure • One-to-One Relationships • A key will appear only once in a related table. • Example: A patient can only be assigned one primary care provider
  • 18. Database Structure • One-to-Many Relationships • Keys from one table will appear multiple times in a related table • Example: One provider can be assigned multiple patients in paneling
  • 19. Database Structure • Many-to-Many relationships • The key value of one table can appear many times in a related table, but the opposite also holds true! • Example: A patient can see multiple different providers and a provider can see multiple different patients
  • 20. How to Manage Your Data • The importance of managing your database • Your database is composed of data and is built by the software companies. You can effectively manage what goes INTO your database. • It plays an important role in improving the performance of an organization’s health care systems. • Collecting, analyzing, interpreting, and acting on data for specific performance measures allows health care professionals to identify where systems are falling short, to make corrective adjustments, and to track outcomes.
  • 21. How to Manage Your Data • Developing an EMR data roadmap • First determine what you need to collect • Next, identify where the data is able to be entered • Find out who is entering it • Develop a roadmap of your data using a spreadsheet • Rows would correspond to the data being collected • Columns would contain the where and who
  • 22. How to Manage Your Data • Data roadmap example: Measure Name Data Item Field Name Employee Colorectal Cancer Colonoscopy Result healthmaintenance.table MD Colorectal Cancer Colonoscopy Date diagnostichistory.table MA Colorectal Cancer Colonoscopy Document referralorder.table RN Colorectal Cancer FIT Outside lab result outsidelabs.table MA Colorectal Cancer FIT Quest lab resul emrlabs.table MD Hypertension Systolic BP vitalssys.table MA Hypertension Dyastolic BP vitalsdys.table MA
  • 23. How to Manage Your Data • Data Health Checks • They are periodic reviews of your EMR data's integrity • Establish timelines for the data health checks, yearly is recommended. • Get your data health check team together, members from different departments are recommended • Document your data health checks, and don’t delete roadmap columns. Simply add another tab in your spreadsheet.
  • 24. How to Manage Your Data • Creating data workflows • Use the data roadmap to streamline workflows • Duplicate data entry • Redundant data workflows • Too many places to document • Too many variations in your data types • Standardize the process • Involve the end-users in the process • Use a diverse team, the same team that does the Data Health Checks works well
  • 25. Statistical Analysis in PHM • Statistical analysis involved using the scientific method to answer questions and make decisions • It involves designing the studies, collecting good data, describing the data with numbers and graphs, analyzing the data, and then making conclusions.
  • 26. Introduction to Statistics • Statistics are everywhere, from healthcare to marketing. • Usually statistics deals with two different sets of data: • Population: • The set of individual persons or objects in which an investigator is primarily interested during his or her research problem • Sample: • That part of the population from which information is collected
  • 27. Introduction to Statistics • There are two major types of statistics • Descriptive: methods for organizing and summarizing information • Inferential: methods for drawing and measuring the reliability of conclusions about a population • Descriptive statistics involves graphs, charts, tables, etc. • Inferential statistics is predictive and includes methods like point estimation, interval estimation and hypothesis testing
  • 28. Introduction to Statistics • Descriptive Statistics Examples: PatientID Tobacco Cessation 5465 Yes 5466 No 5467 Yes 5468 Yes 5469 No 5470 Yes 5471 Yes 5472 Yes 5473 Yes
  • 29. Introduction to Statistics • Independent and Dependent Variables • Independent variables are manipulated by an experimenter • Example: A provider wants to know which medication is best for depression, he has four antidepressants to choose from. Which medication they give out, is the independent variable. • Dependent variables are the results of the experiment • Example: After a period of time, the provider interviews the patients to see what their PHQ score is, the PHQ score is the dependent variable.
  • 30. Introduction to Statistics • Distribution • Distribution has to do with the frequency of the data • Example: You purchase a bag of Skittles. Skittles come in different colors, how many of each type of color is found in the bag? • This is known as a frequency table, which describes the Skittles color frequencies Color Count Green 15 Blue 8 Yellow 10 Purple 6 Red 12
  • 31. Introduction to Statistics • Continuous Variables • Sometimes data is always changing, and you never have a black and white data set like in our Skittles example • When your data is varied, you can do a grouped frequency distribution and look at your data in histogram form • Example: • We’re much better off looking at the data in grouped frequency rather than looking at each HgbA1c result HgbA1c Values Count <7 253 >7<8 700 >8<9 740 >9 141
  • 32. Introduction to Statistics • Probability Distributions: Discrete vs Continuous • Depends on whether they define probabilities associated with discrete variables or continuous variables. • Discrete vs. Continuous Variables • If a variable can take on any value between two specified values, it is called a continuous variable; otherwise, it is called a discrete variable. • Example: • To be eligible for a particular program, your income must be between x and y amount. This is an example of a continuous variable, because no one in the program would have an income outside the parameters of x and y. • The weight distribution of a patient population is an example of a discrete variable
  • 33. Introduction to Statistics • Probability Densities • These are needed to observe not just one data set, but many of them at the same time. This is called continuous distribution. • Normal (Bell) distribution, a type of continuous distribution, explains many natural phenomena
  • 34. Introduction to Statistics • Distribution shapes • If you fold the figure in the previous slide in half you would get equal halves. However, not all distributions are symmetrical. • A distribution with a longer “tail” to the positive direction is said to have a “positive skew”, it can also be known as “skewed to the right”:
  • 35. Introduction to Statistics • Although less common, some distributions have a “negative skew”.
  • 36. Introduction to Statistics • All the distributions so far have had one distinct high point or peak. When distributions have two peaks in the data, this is called a bimodal distribution:
  • 37. Introduction to Statistics • Some statistic definitions • Mean – add up all the numbers and divide by the number of numbers • Medium – middle value in the list of numbers, the numbers have to be listed in numerical order • Mode – the value that occurs most often • Range – the difference between the largest and smallest values
  • 38. Introduction to Statistics • Properties of the Normal (Bell) Distribution Curve • Suppose that the total area under the curve is defined to be 1. You can multiply that number by 100 and say there is a 100% chance that any value you can name will be somewhere in the distribution.(Remember: The distribution extends to infinity in both directions.) • Similarly, because half the area of the curve is below the mean and half is above it, you can say that there is a 50 percent chance that a randomly chosen value will be above the mean and the same chance that it will be below it.
  • 39. Introduction to Statistics • A normal curve also has an equal mean, medium and mode. • When looking at data points, the Mean is known as “sigma”. The sigma is also the standard deviation of a population.
  • 40. Introduction to Statistics • In a normal distribution, 68% of the data are between one standard deviation below the mean and one standard deviation above the mean. 95% are within two standard deviations of the mean and 99.7% are within three standard deviations of the mean.
  • 42. Introduction to Statistics • Descriptive Statistic Models • Graphing data from frequency tables in: • Pie charts • Bar Charts HgbA1c Values Count <7 253 >7<8 700 >8<9 740 >9 141
  • 43. Introduction to Statistics • Descriptive Statistic Models • Graphing data from linear date tables • Line Graphs: line graphs are meant to show data over time
  • 44. Introduction to Statistics • Histograms • It’s a graphical method for displaying the shape of a distribution, really useful when looking at large amounts of data. • Example: We analyzed 10 patients, and we recorded their most recent LDL values. The values ranged from 57 to 221. We would first create a frequency table that breaks the values into intervals or parameters.
  • 45. Introduction to Statistics • Histogram Data set LDL Intervals LDL Values 70 65 100 138 130 102 160 221 190 155 99 144 113 166 159
  • 46. Introduction to Statistics • Things to note about frequency tables: • Intervals or parameters are also known as bins • The bin values in the column is the highest value possible in the bin set • To set up your bins, use the Rice rule. Set the number of intervals to twice the cube root of the number of observations. • In the case of 1000 observations, the Rice rule yields 20 intervals. In our previous example, we got the data for 10 patients. So the cube root of 10 would be 2, twice that would be 4. We settled on 5 to have more uniform bins. The rule is more of a guideline and you can experiment with the bin numbers to get different distribution curves.
  • 47. Introduction to Statistics • Creating a Histogram using excel: • First, make sure the Analysis ToolPak is enabled. • Go to File, Options:
  • 48. Introduction to Statistics • Creating a Histogram using excel: • Then, select Add-ins • At the bottom of the view, select Excel Add-ins, then select Go…
  • 49. Introduction to Statistics • Creating a Histogram using excel: • Afterwards, select the Analysis ToolPak and click OK • The Data Analysis button now appears under the Data tab in the Excel home menu
  • 50. Introduction to Statistics • Creating a Histogram using excel: • Select your data set, then click on the Data Analysis button. A list pops up. Select Histogram from the list • It will ask you to select the Input Range and the Bin Range. The input range are the actual values, the bin range are the set intervals • If you have included the column labels, click on the labels box. • Then select where you would like your histogram to go (the default is fine), then click on chart output at the bottom
  • 51. Introduction to Statistics • If you followed the instructions, your should get a spreadsheet that looks like this (reduce the gap width to zero to get the columns all bunched up):
  • 52. Introduction to Statistics • Histogram applications in healthcare • Large data sets • Pareto charts to correctly identify vulnerable populations • The 80/20 rule can help identify the areas to focus on • Best when data ranges can vary, as averages are not a good measuring tool • Examples: Cycle time, lab values, etc. Really any population with discrete variables
  • 53. Introduction to Statistics • Regression Analysis • Linear Regression: At the center of regression is the relationship between two variables called the dependent and independent variables • You want to compare two data sets to see what a change in the independent variable causes in the dependent variable • Example: • You notice that the Behavioral Health department is swamped with referrals from primary care during the winter months. You wonder if there’s some correlation between the average PHQ-9 scores of the patients, the months of the year an d the amount of referrals BH is getting.
  • 54. Introduction to Statistics • You extract some data from your system, and obtain the following data set Date Average PHQ-9 Score Average Referrals to BH January 19 60 February 18 57 March 14 48 April 10 35 May 10 22 June 8 20 July 8 15 August 7 15 September 8 14 October 12 15 November 15 35 December 20 53
  • 55. Introduction to Statistics • Let’s regress. • Choosing Data Analysis again from the Data Menu Item, choose regression from the menu • Put in the dependent variable in the y axis and the independent variable in the x axis • Click on Line Fit Plots to get a nice graph heat map that shows how tight the relationship between the PHQ score and number of referrals really is
  • 56. Introduction to Statistics • You should get the following (there’s more data but it get’s complicated): Regression Statistics Multiple R 0.90733138 R Square 0.823250233 Adjusted R Square 0.805575256 Standard Error 7.928960791 Observations 12
  • 57. Introduction to Statistics • Our model tells us the following important information: • Multiple R. This is the correlation coefficient. It tells you how strong the linear relationship is. For example, a value of 1 means a perfect positive relationship and a value of zero means no relationship at all. It is the square root of r squared (see #2) • R squared. This is r2, the Coefficient of Determination. It tells you how many points fall on the regression line. for example, 80% means that 80% of the variation of y- values around the mean are explained by the x-values. In other words, 80% of the values fit the model • Adjusted R square. The adjusted R-square adjusts for the number of terms in a model. You’ll want to use this instead of #2 if you have more than one x variable
  • 58. Introduction to Statistics • How is this useful? • First of all, you have now proven your theory. In the summer months, when the average PHQ scores are lower there are less referrals in the winter months when the average PHQ scores are higher • Use this information to request extra staffing, longer hours, etc. It’s not conjecture anymore, you have hard data that proves it • Maybe you can use this information to mount a depression campaign during the winter months in your clinic, the uses are endless for the data
  • 60. Statistical Analysis In Healthcare • Currently, there is an abundance of data. There is a real need for people who can analyze and interpret clinical, operational and financial data in healthcare • Statistical analysis is looking at regression cost models, to see if particular diagnoses or services increase or decrease costs • Combining operational and clinical data will yield maximum knowledge to create better clinical workflows and increase patient satisfaction
  • 61. Statistical Analysis In Healthcare • Currently, not many healthcare centers or hospitals use analytics software on a daily basis • Statistical analysis of a patient population can help determine where to focus efforts for maximum impact • Using social determinants of health as data points, you can also determine if there are correlations between them and patient outcomes