SlideShare a Scribd company logo
DATA ANALYTICS FOR SOLVING BUSINESS
PROBLEMS:
SHIFTING FOCUS FROM THE TECHNOLOGY
DEPLOYMENT TO THE ANALYTICS METHODOLOGY
Alexander Kolker, PhD
March 7, 2017
Alexander Kolker. All rights reserved 1
Alexander Kolker. All rights reserved 2
"Making better business decisions using data“
a co-hosted event with Accelerate Madison and Big
Data Madison Meetup
Date: February 13, 2017
Time: 5:30 PM - 7:30 PM CST
Key point:
Focusing on business outcomes rather than on data and
technology per se is getting momentum …
Some professional highlights…
• 4 business consulting projects: US Bank, Boston Consulting Group,
Children’s Hospital of Wisconsin, Ohio Hospital Association
• 12 years at GE (General Electric) Healthcare: Data Scientist
• 3 years at Froedtert Hospital: Process Simulation Leader
• 5 years at Children’s Hospital of Wisconsin: Simulation and Data
Analytics
• UW-Milwaukee Lubar School of Business-Adjunct Faculty: A graduate
course Healthcare Delivery Systems-Data Analytics
• Lead Editor and Author of 2 books, 8 book chapters, 10 reviewed papers,
18 Conference presentation in the area of operations management,
process modeling and simulation, business analyticsAlexander Kolker. All rights reserved
BIG DATA AND ANALYTICS
BACKGROUND
Alexander Kolker. All rights reserved 4
A bold statement to start with:
Big data without actionable analytics and business
decision-making is a ‘sleeping giant’
Big Data is a 2-part deal
1. Technology for storing and managing large
amounts of data of various nature- the current
trend
2. Methodology for helping business decision-
making using modeling and data.
This is called Analytics, it is getting momentum…
Alexander Kolker. All rights reserved 5
This presentation focus
Key points:
• Analytics must help in developing:
 New products
 Operational efficiency
 Business Decision support
Alexander Kolker. All rights reserved 6
$$$
7
WHAT WILL BE COVERED NEXT…
1. The concept of simulation analytics for studying systemic
complex business problems
Use case 1:
Analysis of the optimal staffing of a team of medical providers
using simulation methodology (with a live demonstration)
2. Analytics methodology for identifying a few contributing
variables to the organization’s financial outcome:
Use case 2:
Principal components decomposition of the large
observational dataset and regression with principal
components
3. Appendix: Food for thought… from Pierre Laplace, 1795
Alexander Kolker. All rights reserved
Alexander Kolker. All rights reserved 8
SIMULATE!
• In general, simulation is a process of studying complex
systems using their mathematical representation called a
model or a digital twin, e.g.
• Flight simulator-the aircraft response to the cockpit input
controls
• Nuclear plant operators simulators-reactor output
response to the various operator inputs
• Surgical and physiology procedures simulators on
mannequins
•Our focus here is simulation of business operations
Alexander Kolker. All rights reserved 9
Key Point:
The most powerful and versatile simulation methodology for
analyzing manufacturing, finance, healthcare, military and
other business operations is Discrete Event Simulation
Taken from a LinkedIn post on Data Science Central
Alexander Kolker. All rights reserved 10
Discrete Event Simulation (DES) Methodology.
What is it?
•A discrete event simulation (DES) model mimics a
system’s dynamic behavior as the system transitions
from state to state
(compare to Data Science approach: map an output to the input through
a black box model or algorithm)
Alexander Kolker. All rights reserved 11
The validated model is used for predicting various
scenarios of the future system’s responses to the
random inputs in a virtual reality
Key points:
•The simulation model is not a ‘black box’. It is a scalable
digital twin of the reality
•The model reflects what’s actually happening in the system
• This capability gives a sense of the expected system’s
output before incurring the cost and risk of the
business solution implementation
(compare to Data Science validation and cross-validation of a ‘black box’ model for
predicting the future outcomes…)
Alexander Kolker. All rights reserved 12
Use case 1
Analysis of the performance and the
optimal staffing
in an Endoscopy Unit using
Discrete Event Simulation
Presented at the:
5-th International Conference on Healthcare Systems, October, 2008;
and
IEEE Workshop on HealthCare Modeling and Simulation, February 18-20, 2010,
Venice, Italy
Problem Description
• The inevitable variability of the admission, recovery and
procedure time due to unforeseen medical complications
and delays result in some unit performance issues:
a long patient wait time to schedule procedures
not meeting daily patient demand for procedures
underutilization of the available capacity and staff
overtime
dissatisfaction of patients and medical staff
There has also been a lower than anticipated revenue
stream
The objectives of this work were:
(i) to analyze the main factors that contribute to the
inefficient patient flow and process bottlenecks,
and
(ii) to propose a more efficient patient scheduling and
staffing allocation aimed at increasing the number of
served patients, reducing procedure delays, and staff
overtime
Business Problem - Project Goal
The Endoscopy Unit High Level Process
Patients arrive at the
admission area
Patients are seen by the
admission nurse
Patients are attended by the
procedure nurse
Assigned doctors perform
procedures
Patients move to the
recovery area where they
are attended by the
recovery nurse
Admitting Area
Recovery area
High Level Model Outline
• Admission, procedure and the patient recovery
duration are random variables
• These variables are represented as the best fit
statistical distributions built into the simulation model
• Each patient is assigned his/her attributes:
 scheduled arrival time
 procedure type
 assigned doctor’s name
Baseline Simulation Model Layout
What happens in the Exam Rooms?
if Proc_Type=col AND Doc_name=Bajaj AND Wk_Day=Fri Then
{
jointlyget (RN_WF and Tech_TF and D_Bajaj) OR (2 RN_WF and D_Bajaj)
Time (T(30,40,40) min)
Free all
}
else
if Proc_Type=egd AND Doc_name=Bajaj AND Wk_Day=Fri Then
{
jointlyget (RN_WF and Tech_TF and D_Bajaj) OR (2 RN_WF and D_Bajaj)
Time (T(10,20,20) min)
Free all
}
else
if Proc_Type=ERCP AND Doc_name=Dua AND Wk_Day=Fri Then
{
jointlyget (RN_WF and Tech_TF and D_Dua) OR (2 RN_WF and D_Dua)
Time (T(70,80,80) min)
Free all
}
Key Point:
Capturing multiple resources with different time distributions for different
procedures requires some coding…
Typical Input Data Format
Annual patient volume is ~10,000 patients
Alexander Kolker. All rights reserved 21
Key
Source
Destination
Nam
e
Action
Logic
W
eek
W
eekday
Tim
e
Quan
10 patient Late_Pt_arrival_adjustment_
Proc_Type=col
Doc_name=bajaj
Wk_day=Mon
1 Mon 7:00 AM 1
10 patient Late_Pt_arrival_adjustment_
Proc_Type=egd
Doc_name=massey
Wk_day=Mon
1 Mon 7:00 AM 1
10 patient Late_Pt_arrival_adjustment_
Proc_Type=col
Doc_name=johnson
Wk_day=Mon
1 Mon 7:00 AM 1
10 patient Late_Pt_arrival_adjustment_
Proc_Type=egd
Doc_name=massey
Wk_day=Mon
1 Mon 7:20 AM 1
10 patient Late_Pt_arrival_adjustment_
Proc_Type=col
Doc_name=bajaj
Wk_day=Mon
1 Mon 7:40 AM 1
10 patient Late_Pt_arrival_adjustment_
Proc_Type=col
Doc_name=johnson
Wk_day=Mon
1 Mon 7:40 AM 1
10 patient Late_Pt_arrival_adjustment_
Proc_Type=col
Doc_name=massey
Wk_day=Mon
1 Mon 7:40 AM 1
10 patient Late_Pt_arrival_adjustment_
Proc_Type=egd
Doc_name=bajaj
Wk_day=Mon
1 Mon 8:20 AM 1
Alexander Kolker. All rights reserved 22
Typical information (data) usually required to populate a
DES model:
• Arrival pattern and quantities: periodic, random,
scheduled, daily pattern, etc.
• The time that the entities spend in the activities, i.e.
service time.
This is usually not a fixed time but a statistical distribution.
• Capacity of each activity, i.e. the max number of entities
that can be processed concurrently in the activity
• Routing types that connect structural elements: %,
conditional, alternate, create, renege, etc.
•Resource assignments: quantity and scheduled shifts
Live simulation demonstration is included
here: call simulation ProcessModel
patient arrivals, shifts for nurses,
technicians and doctors, stat-fit distribution
Some Simulation Scenarios
Scenario 1- The Original Model –Baseline-used for model
validation and testing
Scenario 2 - One additional doctor scheduled part time for
11 hours per week
Scenario 3 - Change in the patient arrival schedule with
10% reduction in inter-arrival time with one additional
doctor
Scenario 4 - Cross-training of the admission and recovery
nurses
Scenario 5 - Adding a part-time nurse
Scenario 6 - Adding a part-time scope-cleaning tech
Scenario 7 – ladder nurse shifts, change breaks and lunch
time
Scenario 8 – combined Scenarios 2, 3 and 4, and all together
Simulation outcome example:
Scenario 1 vs. Scenario 2+Scenario 4 (additional part-time
doctor for 11 hours/week + cross-trained nurses):
39
34
29
35
23
44
40
30
40
23
0
5
10
15
20
25
30
35
40
45
50
Monday Tuesday Wednesday Thursday Friday
Days of the Week
NumberofPatients
Scenario I Scenario II
Weekly Total
Scenario I 160
Scenario II 177
The number of patients
increase: 17
Overtime, hours
Scenario I 28.2
Scenario II 20.9
Reduced doctors’
overtime: 7.3 hrs
Financial Cost-Benefit Estimate
Typical average colonoscopy patient charge is about $2,500
(Colonoscopy is a major GI procedure)
Nurse overtime rate is 1.5 times of the regular pay (about $30/hr)
Typical GI doctor’s annual pay is about $360,000, i.e. ~$360 / hr
Weekly revenue from additional 17 patients is 17 *$2,500 = $42,500
Reduced overtime cost for nurses and doctors is
7.3 hrs*($30*1.5+$360)= $2956
Cost of additional doctor (working 11 hrs): $360*11= $3960
Additional revenue that the additional doctor brings in is about
$42,500 + $2956 - $3960 = $41,496 per week
27
Concluding Key Points:
So how can you tell if simulation is right for you?
• This is methodology of choice for analyzing the dynamic behavior
of the complex systems/processes with random components
• There is a big decision to make with high potential for failure or
reward
• Provides a framework for experimenting with the system
and testing various business scenarios
• Reveals unintended consequences of business solutions
• Commitment to use the findings and recommendations, even if
they are not what you want to hearAlexander Kolker. All rights reserved
Use case 2
Analytics methodology for identifying a few
contributing variables to the organization’s financial
outcome:
Principal components decomposition of the large
observational dataset and regression with Principal
components
Reference:
A. Kolker. Management Engineering for Effective Healthcare Delivery: Principles and
Applications, IGI-Global, 2011, Chapter 1.
A. Kolker. Healthcare Management Engineering. What Does this Fancy Term Really
Mean? Chapter 5. Springer-Briefs in Healthcare Management & Economics, NY, 2012
Alexander Kolker. All rights reserved 28
• The large local hospital plans a major market share
expansion to improve its long-term financial viability
Alexander Kolker. All rights reserved 29
Business Problem - Project Goal
• The management wants to know what population
demographic factors and population disease prevalence
specific to the local area zip codes are the most important
contributors to financial contribution margin (CM $)?
Note: Contribution margin is defined as the difference between all
payments collected from patients and the patient variable costs.
Plan of the problem attack
Alexander Kolker. All rights reserved 30
• Step 1
Demographics data matrix (total 38 variables) to be analyzed
for the top 10 ZIPs using Principal Component decomposition.
• Step 2
Regression analysis to be performed that relates $ CM and
principal components of the original data matrix.
• Step 3
By analyzing eigenvectors for only statistically significant principal
components, conclusions to be made which demographic variables
are the biggest contributors for the top 10 ZIPs
Alexander Kolker. All rights reserved 31
Description of Data
A set of population demographic data was collected for
local area zip codes and the corresponding median
contribution margin for each zip code (CM $).
The following groups of demographic variables and
disease prevalence data were collected for each zip
code as percentage of the total zip code population:
Alexander Kolker. All rights reserved 32
• 4 Age categories:
 18-34
 35-54
 55-64
 65+
• 4 Educational categories:
 BS/BA degree and higher,
 Associate/Professional degree,
 high school diploma,
 no high school diploma
Alexander Kolker. All rights reserved 33
• 4 Income categories:
 less than $50K
 $50 - $75K
 $75K - 100K
 $100K +
• 5 occupational categories:
 Healthcare, Labor,
 Professional/Administrative,
 Public Service,
 Service industry
• Gender: male, female
• 5 Race categories: African American, Native American, Asian,
White, Other
Alexander Kolker. All rights reserved 34
• 14 disease categories:
 BMT
 Medical Oncology
 Surgical Oncology
 Cardiology
 Cardiothoracic surgical
 Vascular surgical, Digestive
 Medicine/Primary care
 Musculoskeletal
 Neurology
 Transplant
 Trauma, Unassigned
 Women Health
• There are total 38 data variables included in the data base.
Alexander Kolker. All rights reserved 35
Issues with direct use of data for regression:
• In large observational data sets with the dozens variables
some of them are inevitably correlated
• Correlation means that some information is redundant
• This redundant information in the data makes it difficult
to attribute the contributions of each variable to the
output
This issue is called Multicollinearity!!
Alexander Kolker. All rights reserved 36
Illustration of some pairwise correlation:
Correlation coefficient of the variables
'No high school’ and ‘Annual income less $50K’: 0.93
vs.
Correlation coefficient of the variables
‘Professional Degree’ and ‘Annual income less $50K’: - 0.87
Alexander Kolker. All rights reserved 37
Illustration of the regression disaster with all original
data (38 variables)
CM $ =4130333+41195*18-24 years–39029*25-34 years+
11836*35-44years+2894*45-54 years+5507*55-59 years+
209919*60-64 years-142258*65-74 years+53373*75 years+ -
2665632*AD–2662185*BD-2620383*PhD- 2649374*HS - 2648440
Less HS - 2687756 MD - 2717506 ProD- 2665190 Some Coll -
2692213 Some HS - 2398380 Less $15K- 2386133 $15K to $25K
- 2493006 $25K to $35K - 2413833 $35K to $50K- 2398657
$50K to $75K - 2455023 $75K to $100K - 2434483 $100K to
$150K- 2404935*$150K to $250K - 2414342 $250K to $500K -
2393024 $500K+ 947225 Health Care + 954055 Labor + 966787
Professional/Administrative+ 954355 Public Service +
960649* Service Industry+………..
Regression diagnostics:
R-Sq = 67.1% R-Sq(adj) = 8.6%
Huge variances inflation factors VIF:
Alexander Kolker. All rights reserved 38
Predictor Coef SE Coef T P VIF
Constant 4130333 4378828 0.94 0.358
18--24 years 41195 32885 1.25 0.226 13.820
25--34 years -39029 24759 -1.58 0.132 23.274
35--44 years 11836 30294 0.39 0.701 9.458
45--54 years 2894 44603 0.06 0.949 25.180
55--59 years 5507 162937 0.03 0.973 89.682
60--64 years 209919 157301 1.33 0.199 65.101
65--74 years -142258 66336 -2.14 0.046 43.529
75 years+ 53373 36529 1.46 0.161 26.059
AD -2665632 3334182 -0.80 0.434 90827.662
BD -2662185 3342475 -0.80 0.436 2400778.419
PhD -2620383 3375609 -0.78 0.448 20953.952
HS -2649374 3333923 -0.79 0.437 1711185.583
Less HS -2648440 3329576 -0.80 0.437 575442.669
MD -2687756 3321036 -0.81 0.429 389134.963
ProD -2717506 3320805 -0.82 0.424 161574.141
Some Coll -2665190 3325834 -0.80 0.433 256129.161
Some HS -2692213 3334397 -0.81 0.430 1402053.683
Less $15K -2398380 2972893 -0.81 0.430 1398310.925
$15K to $25K -2386133 2983525 -0.80 0.434 429011.942
$25K to $35K -2493006 2994782 -0.83 0.416 281665.965
$35K to $50K -2413833 2973178 -0.81 0.427 253783.866
$50K to $75K -2398657 2980453 -0.80 0.431 371553.358
$75K to $100K -2455023 2994758 -0.82 0.423 541397.221
$100K to $150K -2434483 2980581 -0.82 0.425 953779.541
$150K to $250K -2404935 2982679 -0.81 0.431 330537.600
$250K to $500K -2414342 2994755 -0.81 0.431 71152.055
$500K+ -2393024 2989787 -0.80 0.434 36401.343
Health Care 947225 1810961 0.52 0.607 32674.125
Labor 954055 1801535 0.53 0.603 727911.597
Professional/Administrative 966787 1801311 0.54 0.598 501480.184
Public Service 954355 1807843 0.53 0.604 42387.891
Service Industry 960649 1803238 0.53 0.601 19069.682
VIF=1/(1-corr^2)
Corr is the
multiple
correlation of the
variable with the
remaining
independent
variables
Alexander Kolker. All rights reserved 39
• Paired correlation analysis for all 38 variables (703
pairs!!) is impractical.
• Knowing paired linear correlation coefficient does not
help in reducing redundant information and extracting
meaningful information for separate contributing
factors.
• Regression analysis with dozens of the original
variables from observational data sets usually
fails.
Key Points:
Alexander Kolker. All rights reserved 40
• It allows removing the redundant
variables that carry little or no information
while retaining only a few mutually
uncorrelated principal variables.
Why Principal components
decomposition?
The main idea of PCD
Alexander Kolker. All rights reserved 41
The purpose of PCD is determining r new variables
PCr that can best approximate variation in the p
original X variables as linear combinations
The principle of information
conservation
Alexander Kolker. All rights reserved 42
• The total amount of information in the original data
set is not changed because of its PC decomposition
• Rather, it is rearranged in the form of a few linear
combinations of the original variables as main
information holders (PCs)
• This significantly reduces the number of
independent variables but retain the same amount
of information that is contained in the original data
matrix
What’s the eigen value?
Alexander Kolker. All rights reserved 43
• The eigen value λj is a measure of how much
information is retained by the corresponding PC.
• A large value of λj (compared to 1) means that
there is a substantial amount of information retained
by the corresponding PC
• A small value means that there is little amount of
information retained by the corresponding PC
Remainder:
If the product of the data matrix A and the vector p can be presented as
A * p = λj * p
then λj are eigen values and the vector p is eigen vector of the matrix A.
Eigen value analysis of the demographic
data correlation matrix
Alexander Kolker. All rights reserved 44
Eigen
value
16.44 11.19 4.63 2.73 1.15 0.853 0.63 0.307 0.067
Propo
rtion
0.433 0.295 0.122 0.072 0.03 0.022 0.017 0.008 0.002
Cumu
lative
0.433 0.727 0.849 0.921 0.951 0.974 0.990 0.998 1.000
Key Point:
Only 9 principal components (9 linear combinations of the
original variables) are required to account for all 38 original
variables.
Alexander Kolker. All rights reserved 45
Why Regression with Principal components?
• Because PCs are mutually uncorrelated, the
variation of dependent variable (CM $) is accounted
for by each PC independently of other PC
• Contribution of each PC is directly defined by the
coefficients of the regression equation
Key Point:
Regression with totally uncorrelated PC is one of the
most powerful methodologies for identifying significant
contributing variables (factors).
The Best Subset Regression
Alexander Kolker. All rights reserved 46
• Best subsets regression identifies the best-fitting
regression models that can be constructed with as
few predictor variables as possible
• All possible subsets of the predictors are examined,
beginning with all models containing one predictor,
and then all models containing two predictors, and so
on.
• The two best models for each number of predictors
are displayed
Best subsets regression with PCs
Alexander Kolker. All rights reserved 47
Varia
bles
R-sq
(adj)
Mallow
Cp
PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9
3 87.0 128 X X X
3 64.9 349 X X X
4 90.1 83.0 X X X X
4 88.1 99.6 X X X X
5 92.2 54.4 X X X X X
5 91.3 60.3 X X X X X
6 94.5 31.7 X X X X X X
6 93.4 37.2 X X X X X X
7 97.4 14.5 X X X X X X X
7 94.0 26.2 X X X X X X X
8 99.4 9.0 X X X X X X X X
Final regression equation with PC
Alexander Kolker. All rights reserved 48
CM $ = 12.8 + 0.201*PC2 - 0.387*PC3 + 1.95*PC8
(compare to the original regression…)
Key Points:
• This equation accounts for R-sq(adj) = 99.4% of the response
function (CM $) variability.
• It contains only statistically significant terms (at 5% confidence
level)
Conclusion from the regression equation
Alexander Kolker. All rights reserved 49
• Eigen vector coefficients for PC2, PC3 and PC8
combined with PC coefficients represent the
contribution of each variable into the $CM output
Note:
In general, for not-normalized variables the relative contribution of the Xi is:
called the elasticity coefficient Ei= (dY/Y)/(∂Xi/Xi) = ai*Xi/Y
Alexander Kolker. All rights reserved 50
Variable PC2 PC3 PC8
Age 18-34 0.26 0.037 -0.034
Age 35-54 -0.084 0.331 0.037
Age 55-64 -0.229 -0.173 0.236
Age 65+ -0.058 -0.185 0.015
BS/BA+ degree -0.269 -0.137 0.049
Assoc/Prof degree -0.237 0.081 -0.18
High school 0.097 0.332 0.101
No high school 0.286 -0.084 -0.078
Income < $50K 0.275 -0.105 0.025
Income $50K-$75K -0.059 -0.013 0.256
Income $75-$100K -0.27 0.125 -0.183
Income $100K+ -0.259 0.097 -0.012
Occupation: Health -0.21 -0.176 -0.206
Labor 0.265 0.116 -0.133
Professional/Adm -0.275 -0.059 -0.104
Public Service 0.029 -0.328 0.463
Service Industry -0.125 0.264 0.542
% male 0.059 0.210 0.017
% female -0.059 -0.210 -0.017
Race: African American 0.235 -0.123 0.007
Asian 0.157 0.142 -0.337
Native American -0.033 -0.339 -0.253
Other 0.263 -0.114 0.158
White -0.252 0.128 -0.087
Disease: Cancer-BMT 0.012 0.108 0.002
Med Oncology 0.012 0.107 0.01
Surgical Oncology 0.011 0.108 0.012
Cardiology 0.014 0.103 0.012
Cardiothoracic Surgery 0.014 0.103 0.011
Vascular surgery 0.018 0.104 -0.001
Digestive disease 0.014 0.103 0.005
Medicine/Primary Care 0.015 0.103 0.01
Musculoskeletal 0.014 0.105 0.012
Neurology 0.014 0.104 0.013
Transplant 0.016 0.106 0.008
Trauma 0.015 0.104 0.006
Unassigned 0.014 0.103 0.000
Women Health 0.015 0.103 -0.002
Eigen vector coefficients
for PC2, PC3 and PC8
Conclusion from the regression with PC
Alexander Kolker. All rights reserved 51
The primary contributing variables (factors) to CM $ are:
 Age 55-64
 Annual income $50 K - $75 K
 Occupations: Public Service and Service Industry
 Race- Other
 Relative contributions of diseases are:
neurology, cardiology and musculoskeletal
Concluding Remarks and Reflections
Alexander Kolker. All rights reserved 52
• As analytics professionals we are rewarded for help in solving
business problems
• Building analytics that influences business decision-making
requires attention to the non-technical side of the project
(organization’s internal politics and power-sharing)
• Analytics has no practical value for the organization if it does
not affect business decision-making, regardless of how much
a new trendy technology is used
So, how much of your work is about understanding and
addressing real business problems vs. the technology
deployment, coding and finding insights in the data?
Alexander Kolker. All rights reserved
53
.
Appendix
“We may regard the present state of the universe as the effect of its past
and the cause of its future (Predictive analytics?!)
An intellect which at a certain moment would know all forces that set
nature in motion, and all positions of all items of which nature is
composed, if this intellect were also vast enough to submit these data to
analysis, it would embrace in a single formula (algorithm?) the
movements of the greatest bodies of the universe and those of the
tiniest atom.
For such an intellect nothing would be uncertain and the future
(predictive analytics?) just like the past would be present before its
eyes.”
- Pierre Simon Laplace, A Philosophical Essay on Probabilities, 1795
Food for Thought:
Can the contemporary Big Data Technology function as that ‘intellect’
capable of analyzing all data and getting a single formula for the future?

More Related Content

What's hot

Three Big Data Case Studies
Three Big Data Case StudiesThree Big Data Case Studies
Three Big Data Case Studies
Atidan Technologies Pvt Ltd (India)
 
kinds of analytics
kinds of analyticskinds of analytics
kinds of analytics
Benila Paul
 
Data science
Data scienceData science
Data science
GitanshuSharma1
 
Business Analytics and Decision Making
Business Analytics and Decision MakingBusiness Analytics and Decision Making
Business Analytics and Decision Making
Excel Strategies LLC
 
Introduction To Predictive Analytics Part I
Introduction To Predictive Analytics   Part IIntroduction To Predictive Analytics   Part I
Introduction To Predictive Analytics Part I
jayroy
 
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Simplilearn
 
Data visualization
Data visualizationData visualization
Data visualization
Jan Willem Tulp
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
Utkarsh Sharma
 
1. F A Using S P S S1 (Saq.Sav) Q Ti A
1.  F A Using  S P S S1 (Saq.Sav)   Q Ti A1.  F A Using  S P S S1 (Saq.Sav)   Q Ti A
1. F A Using S P S S1 (Saq.Sav) Q Ti A
Zoha Qureshi
 
Introduction To Analytics
Introduction To AnalyticsIntroduction To Analytics
Introduction To Analytics
Alex Meadows
 
SPSS- Output and Interpretation
SPSS- Output and InterpretationSPSS- Output and Interpretation
SPSS- Output and Interpretation
Harshvardhan Pal
 
Business View of IT Applications.pdf
Business View of IT Applications.pdfBusiness View of IT Applications.pdf
Business View of IT Applications.pdf
EverlastingSong
 
Case study neelkanth drugs pvt
Case study neelkanth  drugs pvtCase study neelkanth  drugs pvt
Case study neelkanth drugs pvt
Mrudula Swamy
 
Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data Science Training | Data Science Tutorial | Data Science Certification | ...Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data Science Training | Data Science Tutorial | Data Science Certification | ...
Edureka!
 
Data Analytics
Data AnalyticsData Analytics
Data Analytics
Srinimf-Slides
 
Loan default prediction with machine language
Loan  default  prediction with  machine  language Loan  default  prediction with  machine  language
Loan default prediction with machine language
Aayush Kumar
 
Customer churn prediction in banking
Customer churn prediction in bankingCustomer churn prediction in banking
Customer churn prediction in banking
BU - PG Master Computing Conference
 
Data quality and data profiling
Data quality and data profilingData quality and data profiling
Data quality and data profiling
Shailja Khurana
 
Data Visualization Tools
Data Visualization ToolsData Visualization Tools
Essential Excel for Business Analysts and Consultants
Essential Excel for Business Analysts and ConsultantsEssential Excel for Business Analysts and Consultants
Essential Excel for Business Analysts and Consultants
Asen Gyczew
 

What's hot (20)

Three Big Data Case Studies
Three Big Data Case StudiesThree Big Data Case Studies
Three Big Data Case Studies
 
kinds of analytics
kinds of analyticskinds of analytics
kinds of analytics
 
Data science
Data scienceData science
Data science
 
Business Analytics and Decision Making
Business Analytics and Decision MakingBusiness Analytics and Decision Making
Business Analytics and Decision Making
 
Introduction To Predictive Analytics Part I
Introduction To Predictive Analytics   Part IIntroduction To Predictive Analytics   Part I
Introduction To Predictive Analytics Part I
 
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
 
Data visualization
Data visualizationData visualization
Data visualization
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
1. F A Using S P S S1 (Saq.Sav) Q Ti A
1.  F A Using  S P S S1 (Saq.Sav)   Q Ti A1.  F A Using  S P S S1 (Saq.Sav)   Q Ti A
1. F A Using S P S S1 (Saq.Sav) Q Ti A
 
Introduction To Analytics
Introduction To AnalyticsIntroduction To Analytics
Introduction To Analytics
 
SPSS- Output and Interpretation
SPSS- Output and InterpretationSPSS- Output and Interpretation
SPSS- Output and Interpretation
 
Business View of IT Applications.pdf
Business View of IT Applications.pdfBusiness View of IT Applications.pdf
Business View of IT Applications.pdf
 
Case study neelkanth drugs pvt
Case study neelkanth  drugs pvtCase study neelkanth  drugs pvt
Case study neelkanth drugs pvt
 
Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data Science Training | Data Science Tutorial | Data Science Certification | ...Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data Science Training | Data Science Tutorial | Data Science Certification | ...
 
Data Analytics
Data AnalyticsData Analytics
Data Analytics
 
Loan default prediction with machine language
Loan  default  prediction with  machine  language Loan  default  prediction with  machine  language
Loan default prediction with machine language
 
Customer churn prediction in banking
Customer churn prediction in bankingCustomer churn prediction in banking
Customer churn prediction in banking
 
Data quality and data profiling
Data quality and data profilingData quality and data profiling
Data quality and data profiling
 
Data Visualization Tools
Data Visualization ToolsData Visualization Tools
Data Visualization Tools
 
Essential Excel for Business Analysts and Consultants
Essential Excel for Business Analysts and ConsultantsEssential Excel for Business Analysts and Consultants
Essential Excel for Business Analysts and Consultants
 

Viewers also liked

Data Envelopment Analysis
Data Envelopment AnalysisData Envelopment Analysis
Data Envelopment Analysis
Alexander Kolker
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
smj
 
Big Data, Big Disappointment
Big Data, Big DisappointmentBig Data, Big Disappointment
Big Data, Big Disappointment
Jesus Ramos
 
Business Analytics from Bodhtree
Business Analytics from BodhtreeBusiness Analytics from Bodhtree
Business Analytics from Bodhtree
arunvanlvanoor
 
ED conference presentation 2007
ED conference presentation 2007ED conference presentation 2007
ED conference presentation 2007
Alexander Kolker
 
hcm4-a-kolker
hcm4-a-kolkerhcm4-a-kolker
hcm4-a-kolker
Alexander Kolker
 
Business Intelligence
Business Intelligence Business Intelligence
Business Intelligence
arunvanlvanoor
 
Exploratory Data Analysis for Energy Efficiency
Exploratory Data Analysis for Energy EfficiencyExploratory Data Analysis for Energy Efficiency
Exploratory Data Analysis for Energy Efficiency
Nitin Agarwal
 
Discover the value in IBM Business Analytics
Discover the value in IBM Business AnalyticsDiscover the value in IBM Business Analytics
Discover the value in IBM Business Analytics
Daryl Pereira
 
Value Drivers for Your Data – Big, Fast, or Smart
Value Drivers for Your Data – Big, Fast, or SmartValue Drivers for Your Data – Big, Fast, or Smart
Value Drivers for Your Data – Big, Fast, or Smart
IDEAS - Int'l Data Engineering and Science Association
 
Understanding Business Data Analytics
Understanding Business Data AnalyticsUnderstanding Business Data Analytics
Understanding Business Data Analytics
Alejandro Jaramillo
 
Churn Analysis
Churn AnalysisChurn Analysis
Churn Analysis
David Cho
 
Statistical investigation
Statistical investigationStatistical investigation
Statistical investigation
Sambhujyoti Das
 
[MPKD1] Introduction to business analytics and simulation
[MPKD1] Introduction to business analytics and simulation[MPKD1] Introduction to business analytics and simulation
[MPKD1] Introduction to business analytics and simulation
Nguyen Ngoc Binh Phuong
 
Big Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBig Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the Marketspace
Bala Iyer
 
Introduction to Business Analytics Part 1
Introduction to Business Analytics Part 1Introduction to Business Analytics Part 1
Introduction to Business Analytics Part 1
Beamsync
 

Viewers also liked (16)

Data Envelopment Analysis
Data Envelopment AnalysisData Envelopment Analysis
Data Envelopment Analysis
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Big Data, Big Disappointment
Big Data, Big DisappointmentBig Data, Big Disappointment
Big Data, Big Disappointment
 
Business Analytics from Bodhtree
Business Analytics from BodhtreeBusiness Analytics from Bodhtree
Business Analytics from Bodhtree
 
ED conference presentation 2007
ED conference presentation 2007ED conference presentation 2007
ED conference presentation 2007
 
hcm4-a-kolker
hcm4-a-kolkerhcm4-a-kolker
hcm4-a-kolker
 
Business Intelligence
Business Intelligence Business Intelligence
Business Intelligence
 
Exploratory Data Analysis for Energy Efficiency
Exploratory Data Analysis for Energy EfficiencyExploratory Data Analysis for Energy Efficiency
Exploratory Data Analysis for Energy Efficiency
 
Discover the value in IBM Business Analytics
Discover the value in IBM Business AnalyticsDiscover the value in IBM Business Analytics
Discover the value in IBM Business Analytics
 
Value Drivers for Your Data – Big, Fast, or Smart
Value Drivers for Your Data – Big, Fast, or SmartValue Drivers for Your Data – Big, Fast, or Smart
Value Drivers for Your Data – Big, Fast, or Smart
 
Understanding Business Data Analytics
Understanding Business Data AnalyticsUnderstanding Business Data Analytics
Understanding Business Data Analytics
 
Churn Analysis
Churn AnalysisChurn Analysis
Churn Analysis
 
Statistical investigation
Statistical investigationStatistical investigation
Statistical investigation
 
[MPKD1] Introduction to business analytics and simulation
[MPKD1] Introduction to business analytics and simulation[MPKD1] Introduction to business analytics and simulation
[MPKD1] Introduction to business analytics and simulation
 
Big Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBig Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the Marketspace
 
Introduction to Business Analytics Part 1
Introduction to Business Analytics Part 1Introduction to Business Analytics Part 1
Introduction to Business Analytics Part 1
 

Similar to Data Analytics for Real-World Business Problems

Intro DES-Capacity
Intro DES-CapacityIntro DES-Capacity
Intro DES-Capacity
Alexander Kolker
 
Ensuring the feasibility of a $31 million OR expansion project: Capacity plan...
Ensuring the feasibility of a $31 million OR expansion project: Capacity plan...Ensuring the feasibility of a $31 million OR expansion project: Capacity plan...
Ensuring the feasibility of a $31 million OR expansion project: Capacity plan...
SIMUL8 Corporation
 
2 Brian Hudson Lean 6 Sigma
2 Brian Hudson Lean 6 Sigma2 Brian Hudson Lean 6 Sigma
2 Brian Hudson Lean 6 Sigma
gueste681ce
 
Healthcare Delivery Reimagined: Patient Flow and Care Coordination Analytics
Healthcare Delivery Reimagined: Patient Flow and Care Coordination AnalyticsHealthcare Delivery Reimagined: Patient Flow and Care Coordination Analytics
Healthcare Delivery Reimagined: Patient Flow and Care Coordination Analytics
Adrish Sannyasi
 
Flow queue analysis co4.pptx business process management
Flow  queue analysis co4.pptx business process managementFlow  queue analysis co4.pptx business process management
Flow queue analysis co4.pptx business process management
21120061
 
Hospital management system
Hospital management systemHospital management system
Hospital management system
Adil Riaz Siddiqi
 
Team_W
Team_WTeam_W
Case Study: Increasing Operating Room Utilization
Case Study: Increasing Operating Room UtilizationCase Study: Increasing Operating Room Utilization
Case Study: Increasing Operating Room Utilization
U.S. News Healthcare of Tomorrow
 
hospital management system.docx
hospital management system.docxhospital management system.docx
hospital management system.docx
Nikhil Patil
 
Hospital management system
Hospital management systemHospital management system
Hospital management system
Adil Riaz Siddiqi
 
Modern Management Techniques.pptx
Modern Management Techniques.pptxModern Management Techniques.pptx
Modern Management Techniques.pptx
Immanuel Joshua
 
Hosptal management system
Hosptal management systemHosptal management system
Hosptal management system
Kartik Chaudhari
 
QMRAS Project Presentation
QMRAS Project PresentationQMRAS Project Presentation
QMRAS Project Presentation
Gary Spencer
 
Sharing a New Ideal: How Tomorrow’s Understaffed, Multi-Site Lab Organization...
Sharing a New Ideal: How Tomorrow’s Understaffed, Multi-Site Lab Organization...Sharing a New Ideal: How Tomorrow’s Understaffed, Multi-Site Lab Organization...
Sharing a New Ideal: How Tomorrow’s Understaffed, Multi-Site Lab Organization...
mhartman1309
 
090528 Miller Process Forensics Talk @ Asq
090528 Miller Process Forensics Talk @ Asq090528 Miller Process Forensics Talk @ Asq
090528 Miller Process Forensics Talk @ Asq
rwmill9716
 
Demand flow summary
Demand flow summaryDemand flow summary
Demand flow summary
rcerceo
 
Industrial Engineering
Industrial EngineeringIndustrial Engineering
Industrial Engineering
vijay kumar
 
Introduction of Industrial Engineering
Introduction of Industrial EngineeringIntroduction of Industrial Engineering
Introduction of Industrial Engineering
vijay kumar
 
From an operational perspective, yield management is most effective under whi...
From an operational perspective, yield management is most effective under whi...From an operational perspective, yield management is most effective under whi...
From an operational perspective, yield management is most effective under whi...
johann11371
 
A Study On Hybrid System
A Study On Hybrid SystemA Study On Hybrid System
A Study On Hybrid System
Carmen Sanborn
 

Similar to Data Analytics for Real-World Business Problems (20)

Intro DES-Capacity
Intro DES-CapacityIntro DES-Capacity
Intro DES-Capacity
 
Ensuring the feasibility of a $31 million OR expansion project: Capacity plan...
Ensuring the feasibility of a $31 million OR expansion project: Capacity plan...Ensuring the feasibility of a $31 million OR expansion project: Capacity plan...
Ensuring the feasibility of a $31 million OR expansion project: Capacity plan...
 
2 Brian Hudson Lean 6 Sigma
2 Brian Hudson Lean 6 Sigma2 Brian Hudson Lean 6 Sigma
2 Brian Hudson Lean 6 Sigma
 
Healthcare Delivery Reimagined: Patient Flow and Care Coordination Analytics
Healthcare Delivery Reimagined: Patient Flow and Care Coordination AnalyticsHealthcare Delivery Reimagined: Patient Flow and Care Coordination Analytics
Healthcare Delivery Reimagined: Patient Flow and Care Coordination Analytics
 
Flow queue analysis co4.pptx business process management
Flow  queue analysis co4.pptx business process managementFlow  queue analysis co4.pptx business process management
Flow queue analysis co4.pptx business process management
 
Hospital management system
Hospital management systemHospital management system
Hospital management system
 
Team_W
Team_WTeam_W
Team_W
 
Case Study: Increasing Operating Room Utilization
Case Study: Increasing Operating Room UtilizationCase Study: Increasing Operating Room Utilization
Case Study: Increasing Operating Room Utilization
 
hospital management system.docx
hospital management system.docxhospital management system.docx
hospital management system.docx
 
Hospital management system
Hospital management systemHospital management system
Hospital management system
 
Modern Management Techniques.pptx
Modern Management Techniques.pptxModern Management Techniques.pptx
Modern Management Techniques.pptx
 
Hosptal management system
Hosptal management systemHosptal management system
Hosptal management system
 
QMRAS Project Presentation
QMRAS Project PresentationQMRAS Project Presentation
QMRAS Project Presentation
 
Sharing a New Ideal: How Tomorrow’s Understaffed, Multi-Site Lab Organization...
Sharing a New Ideal: How Tomorrow’s Understaffed, Multi-Site Lab Organization...Sharing a New Ideal: How Tomorrow’s Understaffed, Multi-Site Lab Organization...
Sharing a New Ideal: How Tomorrow’s Understaffed, Multi-Site Lab Organization...
 
090528 Miller Process Forensics Talk @ Asq
090528 Miller Process Forensics Talk @ Asq090528 Miller Process Forensics Talk @ Asq
090528 Miller Process Forensics Talk @ Asq
 
Demand flow summary
Demand flow summaryDemand flow summary
Demand flow summary
 
Industrial Engineering
Industrial EngineeringIndustrial Engineering
Industrial Engineering
 
Introduction of Industrial Engineering
Introduction of Industrial EngineeringIntroduction of Industrial Engineering
Introduction of Industrial Engineering
 
From an operational perspective, yield management is most effective under whi...
From an operational perspective, yield management is most effective under whi...From an operational perspective, yield management is most effective under whi...
From an operational perspective, yield management is most effective under whi...
 
A Study On Hybrid System
A Study On Hybrid SystemA Study On Hybrid System
A Study On Hybrid System
 

More from Alexander Kolker

Session 1
Session 1Session 1
Session 1
Alexander Kolker
 
Syllabus-Kolker-Bus 755
Syllabus-Kolker-Bus 755Syllabus-Kolker-Bus 755
Syllabus-Kolker-Bus 755
Alexander Kolker
 
Optimized Staffing with variable demand
Optimized Staffing with variable demandOptimized Staffing with variable demand
Optimized Staffing with variable demand
Alexander Kolker
 
Data Science-Data Analytics
Data Science-Data AnalyticsData Science-Data Analytics
Data Science-Data Analytics
Alexander Kolker
 
Primary care clinics-managing physician patient panels
Primary care clinics-managing physician patient panelsPrimary care clinics-managing physician patient panels
Primary care clinics-managing physician patient panels
Alexander Kolker
 
Staffing with variable demand in healthcare settings
Staffing with variable demand in healthcare settingsStaffing with variable demand in healthcare settings
Staffing with variable demand in healthcare settings
Alexander Kolker
 
Staffing Decision-Making Using Simulation Modeling
Staffing Decision-Making Using Simulation ModelingStaffing Decision-Making Using Simulation Modeling
Staffing Decision-Making Using Simulation Modeling
Alexander Kolker
 
SHS_ ASQ 2010 Paper
SHS_ ASQ 2010 PaperSHS_ ASQ 2010 Paper
SHS_ ASQ 2010 Paper
Alexander Kolker
 
Effect Of Interdependency On Hospital Wide Patient Flow
Effect Of Interdependency On Hospital Wide Patient FlowEffect Of Interdependency On Hospital Wide Patient Flow
Effect Of Interdependency On Hospital Wide Patient Flow
Alexander Kolker
 
SHS_ASQ 2010 Conference Poster
SHS_ASQ 2010 Conference PosterSHS_ASQ 2010 Conference Poster
SHS_ASQ 2010 Conference Poster
Alexander Kolker
 
SHS_ASQ 2010 Conference: Poster The Use of Simulation for Surgical Expansion ...
SHS_ASQ 2010 Conference: Poster The Use of Simulation for Surgical Expansion ...SHS_ASQ 2010 Conference: Poster The Use of Simulation for Surgical Expansion ...
SHS_ASQ 2010 Conference: Poster The Use of Simulation for Surgical Expansion ...
Alexander Kolker
 
SHS ASQ 2010 Conference Presentation: Hospital System Patient Flow
SHS ASQ 2010 Conference Presentation: Hospital System Patient FlowSHS ASQ 2010 Conference Presentation: Hospital System Patient Flow
SHS ASQ 2010 Conference Presentation: Hospital System Patient Flow
Alexander Kolker
 
Advanced Process Simulation Methodology To Plan Facility Renovation
Advanced Process Simulation Methodology To Plan Facility RenovationAdvanced Process Simulation Methodology To Plan Facility Renovation
Advanced Process Simulation Methodology To Plan Facility Renovation
Alexander Kolker
 
WCQI 2010 Presentation
WCQI 2010 PresentationWCQI 2010 Presentation
WCQI 2010 Presentation
Alexander Kolker
 

More from Alexander Kolker (14)

Session 1
Session 1Session 1
Session 1
 
Syllabus-Kolker-Bus 755
Syllabus-Kolker-Bus 755Syllabus-Kolker-Bus 755
Syllabus-Kolker-Bus 755
 
Optimized Staffing with variable demand
Optimized Staffing with variable demandOptimized Staffing with variable demand
Optimized Staffing with variable demand
 
Data Science-Data Analytics
Data Science-Data AnalyticsData Science-Data Analytics
Data Science-Data Analytics
 
Primary care clinics-managing physician patient panels
Primary care clinics-managing physician patient panelsPrimary care clinics-managing physician patient panels
Primary care clinics-managing physician patient panels
 
Staffing with variable demand in healthcare settings
Staffing with variable demand in healthcare settingsStaffing with variable demand in healthcare settings
Staffing with variable demand in healthcare settings
 
Staffing Decision-Making Using Simulation Modeling
Staffing Decision-Making Using Simulation ModelingStaffing Decision-Making Using Simulation Modeling
Staffing Decision-Making Using Simulation Modeling
 
SHS_ ASQ 2010 Paper
SHS_ ASQ 2010 PaperSHS_ ASQ 2010 Paper
SHS_ ASQ 2010 Paper
 
Effect Of Interdependency On Hospital Wide Patient Flow
Effect Of Interdependency On Hospital Wide Patient FlowEffect Of Interdependency On Hospital Wide Patient Flow
Effect Of Interdependency On Hospital Wide Patient Flow
 
SHS_ASQ 2010 Conference Poster
SHS_ASQ 2010 Conference PosterSHS_ASQ 2010 Conference Poster
SHS_ASQ 2010 Conference Poster
 
SHS_ASQ 2010 Conference: Poster The Use of Simulation for Surgical Expansion ...
SHS_ASQ 2010 Conference: Poster The Use of Simulation for Surgical Expansion ...SHS_ASQ 2010 Conference: Poster The Use of Simulation for Surgical Expansion ...
SHS_ASQ 2010 Conference: Poster The Use of Simulation for Surgical Expansion ...
 
SHS ASQ 2010 Conference Presentation: Hospital System Patient Flow
SHS ASQ 2010 Conference Presentation: Hospital System Patient FlowSHS ASQ 2010 Conference Presentation: Hospital System Patient Flow
SHS ASQ 2010 Conference Presentation: Hospital System Patient Flow
 
Advanced Process Simulation Methodology To Plan Facility Renovation
Advanced Process Simulation Methodology To Plan Facility RenovationAdvanced Process Simulation Methodology To Plan Facility Renovation
Advanced Process Simulation Methodology To Plan Facility Renovation
 
WCQI 2010 Presentation
WCQI 2010 PresentationWCQI 2010 Presentation
WCQI 2010 Presentation
 

Recently uploaded

Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 

Recently uploaded (20)

Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 

Data Analytics for Real-World Business Problems

  • 1. DATA ANALYTICS FOR SOLVING BUSINESS PROBLEMS: SHIFTING FOCUS FROM THE TECHNOLOGY DEPLOYMENT TO THE ANALYTICS METHODOLOGY Alexander Kolker, PhD March 7, 2017 Alexander Kolker. All rights reserved 1
  • 2. Alexander Kolker. All rights reserved 2 "Making better business decisions using data“ a co-hosted event with Accelerate Madison and Big Data Madison Meetup Date: February 13, 2017 Time: 5:30 PM - 7:30 PM CST Key point: Focusing on business outcomes rather than on data and technology per se is getting momentum …
  • 3. Some professional highlights… • 4 business consulting projects: US Bank, Boston Consulting Group, Children’s Hospital of Wisconsin, Ohio Hospital Association • 12 years at GE (General Electric) Healthcare: Data Scientist • 3 years at Froedtert Hospital: Process Simulation Leader • 5 years at Children’s Hospital of Wisconsin: Simulation and Data Analytics • UW-Milwaukee Lubar School of Business-Adjunct Faculty: A graduate course Healthcare Delivery Systems-Data Analytics • Lead Editor and Author of 2 books, 8 book chapters, 10 reviewed papers, 18 Conference presentation in the area of operations management, process modeling and simulation, business analyticsAlexander Kolker. All rights reserved
  • 4. BIG DATA AND ANALYTICS BACKGROUND Alexander Kolker. All rights reserved 4
  • 5. A bold statement to start with: Big data without actionable analytics and business decision-making is a ‘sleeping giant’ Big Data is a 2-part deal 1. Technology for storing and managing large amounts of data of various nature- the current trend 2. Methodology for helping business decision- making using modeling and data. This is called Analytics, it is getting momentum… Alexander Kolker. All rights reserved 5 This presentation focus
  • 6. Key points: • Analytics must help in developing:  New products  Operational efficiency  Business Decision support Alexander Kolker. All rights reserved 6 $$$
  • 7. 7 WHAT WILL BE COVERED NEXT… 1. The concept of simulation analytics for studying systemic complex business problems Use case 1: Analysis of the optimal staffing of a team of medical providers using simulation methodology (with a live demonstration) 2. Analytics methodology for identifying a few contributing variables to the organization’s financial outcome: Use case 2: Principal components decomposition of the large observational dataset and regression with principal components 3. Appendix: Food for thought… from Pierre Laplace, 1795 Alexander Kolker. All rights reserved
  • 8. Alexander Kolker. All rights reserved 8 SIMULATE! • In general, simulation is a process of studying complex systems using their mathematical representation called a model or a digital twin, e.g. • Flight simulator-the aircraft response to the cockpit input controls • Nuclear plant operators simulators-reactor output response to the various operator inputs • Surgical and physiology procedures simulators on mannequins •Our focus here is simulation of business operations
  • 9. Alexander Kolker. All rights reserved 9 Key Point: The most powerful and versatile simulation methodology for analyzing manufacturing, finance, healthcare, military and other business operations is Discrete Event Simulation Taken from a LinkedIn post on Data Science Central
  • 10. Alexander Kolker. All rights reserved 10 Discrete Event Simulation (DES) Methodology. What is it? •A discrete event simulation (DES) model mimics a system’s dynamic behavior as the system transitions from state to state (compare to Data Science approach: map an output to the input through a black box model or algorithm)
  • 11. Alexander Kolker. All rights reserved 11 The validated model is used for predicting various scenarios of the future system’s responses to the random inputs in a virtual reality Key points: •The simulation model is not a ‘black box’. It is a scalable digital twin of the reality •The model reflects what’s actually happening in the system • This capability gives a sense of the expected system’s output before incurring the cost and risk of the business solution implementation (compare to Data Science validation and cross-validation of a ‘black box’ model for predicting the future outcomes…)
  • 12. Alexander Kolker. All rights reserved 12 Use case 1 Analysis of the performance and the optimal staffing in an Endoscopy Unit using Discrete Event Simulation Presented at the: 5-th International Conference on Healthcare Systems, October, 2008; and IEEE Workshop on HealthCare Modeling and Simulation, February 18-20, 2010, Venice, Italy
  • 13. Problem Description • The inevitable variability of the admission, recovery and procedure time due to unforeseen medical complications and delays result in some unit performance issues: a long patient wait time to schedule procedures not meeting daily patient demand for procedures underutilization of the available capacity and staff overtime dissatisfaction of patients and medical staff There has also been a lower than anticipated revenue stream
  • 14. The objectives of this work were: (i) to analyze the main factors that contribute to the inefficient patient flow and process bottlenecks, and (ii) to propose a more efficient patient scheduling and staffing allocation aimed at increasing the number of served patients, reducing procedure delays, and staff overtime Business Problem - Project Goal
  • 15. The Endoscopy Unit High Level Process Patients arrive at the admission area Patients are seen by the admission nurse Patients are attended by the procedure nurse Assigned doctors perform procedures Patients move to the recovery area where they are attended by the recovery nurse
  • 18. High Level Model Outline • Admission, procedure and the patient recovery duration are random variables • These variables are represented as the best fit statistical distributions built into the simulation model • Each patient is assigned his/her attributes:  scheduled arrival time  procedure type  assigned doctor’s name
  • 20. What happens in the Exam Rooms? if Proc_Type=col AND Doc_name=Bajaj AND Wk_Day=Fri Then { jointlyget (RN_WF and Tech_TF and D_Bajaj) OR (2 RN_WF and D_Bajaj) Time (T(30,40,40) min) Free all } else if Proc_Type=egd AND Doc_name=Bajaj AND Wk_Day=Fri Then { jointlyget (RN_WF and Tech_TF and D_Bajaj) OR (2 RN_WF and D_Bajaj) Time (T(10,20,20) min) Free all } else if Proc_Type=ERCP AND Doc_name=Dua AND Wk_Day=Fri Then { jointlyget (RN_WF and Tech_TF and D_Dua) OR (2 RN_WF and D_Dua) Time (T(70,80,80) min) Free all } Key Point: Capturing multiple resources with different time distributions for different procedures requires some coding…
  • 21. Typical Input Data Format Annual patient volume is ~10,000 patients Alexander Kolker. All rights reserved 21 Key Source Destination Nam e Action Logic W eek W eekday Tim e Quan 10 patient Late_Pt_arrival_adjustment_ Proc_Type=col Doc_name=bajaj Wk_day=Mon 1 Mon 7:00 AM 1 10 patient Late_Pt_arrival_adjustment_ Proc_Type=egd Doc_name=massey Wk_day=Mon 1 Mon 7:00 AM 1 10 patient Late_Pt_arrival_adjustment_ Proc_Type=col Doc_name=johnson Wk_day=Mon 1 Mon 7:00 AM 1 10 patient Late_Pt_arrival_adjustment_ Proc_Type=egd Doc_name=massey Wk_day=Mon 1 Mon 7:20 AM 1 10 patient Late_Pt_arrival_adjustment_ Proc_Type=col Doc_name=bajaj Wk_day=Mon 1 Mon 7:40 AM 1 10 patient Late_Pt_arrival_adjustment_ Proc_Type=col Doc_name=johnson Wk_day=Mon 1 Mon 7:40 AM 1 10 patient Late_Pt_arrival_adjustment_ Proc_Type=col Doc_name=massey Wk_day=Mon 1 Mon 7:40 AM 1 10 patient Late_Pt_arrival_adjustment_ Proc_Type=egd Doc_name=bajaj Wk_day=Mon 1 Mon 8:20 AM 1
  • 22. Alexander Kolker. All rights reserved 22 Typical information (data) usually required to populate a DES model: • Arrival pattern and quantities: periodic, random, scheduled, daily pattern, etc. • The time that the entities spend in the activities, i.e. service time. This is usually not a fixed time but a statistical distribution. • Capacity of each activity, i.e. the max number of entities that can be processed concurrently in the activity • Routing types that connect structural elements: %, conditional, alternate, create, renege, etc. •Resource assignments: quantity and scheduled shifts
  • 23. Live simulation demonstration is included here: call simulation ProcessModel patient arrivals, shifts for nurses, technicians and doctors, stat-fit distribution
  • 24. Some Simulation Scenarios Scenario 1- The Original Model –Baseline-used for model validation and testing Scenario 2 - One additional doctor scheduled part time for 11 hours per week Scenario 3 - Change in the patient arrival schedule with 10% reduction in inter-arrival time with one additional doctor Scenario 4 - Cross-training of the admission and recovery nurses Scenario 5 - Adding a part-time nurse Scenario 6 - Adding a part-time scope-cleaning tech Scenario 7 – ladder nurse shifts, change breaks and lunch time Scenario 8 – combined Scenarios 2, 3 and 4, and all together
  • 25. Simulation outcome example: Scenario 1 vs. Scenario 2+Scenario 4 (additional part-time doctor for 11 hours/week + cross-trained nurses): 39 34 29 35 23 44 40 30 40 23 0 5 10 15 20 25 30 35 40 45 50 Monday Tuesday Wednesday Thursday Friday Days of the Week NumberofPatients Scenario I Scenario II Weekly Total Scenario I 160 Scenario II 177 The number of patients increase: 17 Overtime, hours Scenario I 28.2 Scenario II 20.9 Reduced doctors’ overtime: 7.3 hrs
  • 26. Financial Cost-Benefit Estimate Typical average colonoscopy patient charge is about $2,500 (Colonoscopy is a major GI procedure) Nurse overtime rate is 1.5 times of the regular pay (about $30/hr) Typical GI doctor’s annual pay is about $360,000, i.e. ~$360 / hr Weekly revenue from additional 17 patients is 17 *$2,500 = $42,500 Reduced overtime cost for nurses and doctors is 7.3 hrs*($30*1.5+$360)= $2956 Cost of additional doctor (working 11 hrs): $360*11= $3960 Additional revenue that the additional doctor brings in is about $42,500 + $2956 - $3960 = $41,496 per week
  • 27. 27 Concluding Key Points: So how can you tell if simulation is right for you? • This is methodology of choice for analyzing the dynamic behavior of the complex systems/processes with random components • There is a big decision to make with high potential for failure or reward • Provides a framework for experimenting with the system and testing various business scenarios • Reveals unintended consequences of business solutions • Commitment to use the findings and recommendations, even if they are not what you want to hearAlexander Kolker. All rights reserved
  • 28. Use case 2 Analytics methodology for identifying a few contributing variables to the organization’s financial outcome: Principal components decomposition of the large observational dataset and regression with Principal components Reference: A. Kolker. Management Engineering for Effective Healthcare Delivery: Principles and Applications, IGI-Global, 2011, Chapter 1. A. Kolker. Healthcare Management Engineering. What Does this Fancy Term Really Mean? Chapter 5. Springer-Briefs in Healthcare Management & Economics, NY, 2012 Alexander Kolker. All rights reserved 28
  • 29. • The large local hospital plans a major market share expansion to improve its long-term financial viability Alexander Kolker. All rights reserved 29 Business Problem - Project Goal • The management wants to know what population demographic factors and population disease prevalence specific to the local area zip codes are the most important contributors to financial contribution margin (CM $)? Note: Contribution margin is defined as the difference between all payments collected from patients and the patient variable costs.
  • 30. Plan of the problem attack Alexander Kolker. All rights reserved 30 • Step 1 Demographics data matrix (total 38 variables) to be analyzed for the top 10 ZIPs using Principal Component decomposition. • Step 2 Regression analysis to be performed that relates $ CM and principal components of the original data matrix. • Step 3 By analyzing eigenvectors for only statistically significant principal components, conclusions to be made which demographic variables are the biggest contributors for the top 10 ZIPs
  • 31. Alexander Kolker. All rights reserved 31 Description of Data A set of population demographic data was collected for local area zip codes and the corresponding median contribution margin for each zip code (CM $). The following groups of demographic variables and disease prevalence data were collected for each zip code as percentage of the total zip code population:
  • 32. Alexander Kolker. All rights reserved 32 • 4 Age categories:  18-34  35-54  55-64  65+ • 4 Educational categories:  BS/BA degree and higher,  Associate/Professional degree,  high school diploma,  no high school diploma
  • 33. Alexander Kolker. All rights reserved 33 • 4 Income categories:  less than $50K  $50 - $75K  $75K - 100K  $100K + • 5 occupational categories:  Healthcare, Labor,  Professional/Administrative,  Public Service,  Service industry • Gender: male, female • 5 Race categories: African American, Native American, Asian, White, Other
  • 34. Alexander Kolker. All rights reserved 34 • 14 disease categories:  BMT  Medical Oncology  Surgical Oncology  Cardiology  Cardiothoracic surgical  Vascular surgical, Digestive  Medicine/Primary care  Musculoskeletal  Neurology  Transplant  Trauma, Unassigned  Women Health • There are total 38 data variables included in the data base.
  • 35. Alexander Kolker. All rights reserved 35 Issues with direct use of data for regression: • In large observational data sets with the dozens variables some of them are inevitably correlated • Correlation means that some information is redundant • This redundant information in the data makes it difficult to attribute the contributions of each variable to the output This issue is called Multicollinearity!!
  • 36. Alexander Kolker. All rights reserved 36 Illustration of some pairwise correlation: Correlation coefficient of the variables 'No high school’ and ‘Annual income less $50K’: 0.93 vs. Correlation coefficient of the variables ‘Professional Degree’ and ‘Annual income less $50K’: - 0.87
  • 37. Alexander Kolker. All rights reserved 37 Illustration of the regression disaster with all original data (38 variables) CM $ =4130333+41195*18-24 years–39029*25-34 years+ 11836*35-44years+2894*45-54 years+5507*55-59 years+ 209919*60-64 years-142258*65-74 years+53373*75 years+ - 2665632*AD–2662185*BD-2620383*PhD- 2649374*HS - 2648440 Less HS - 2687756 MD - 2717506 ProD- 2665190 Some Coll - 2692213 Some HS - 2398380 Less $15K- 2386133 $15K to $25K - 2493006 $25K to $35K - 2413833 $35K to $50K- 2398657 $50K to $75K - 2455023 $75K to $100K - 2434483 $100K to $150K- 2404935*$150K to $250K - 2414342 $250K to $500K - 2393024 $500K+ 947225 Health Care + 954055 Labor + 966787 Professional/Administrative+ 954355 Public Service + 960649* Service Industry+……….. Regression diagnostics: R-Sq = 67.1% R-Sq(adj) = 8.6% Huge variances inflation factors VIF:
  • 38. Alexander Kolker. All rights reserved 38 Predictor Coef SE Coef T P VIF Constant 4130333 4378828 0.94 0.358 18--24 years 41195 32885 1.25 0.226 13.820 25--34 years -39029 24759 -1.58 0.132 23.274 35--44 years 11836 30294 0.39 0.701 9.458 45--54 years 2894 44603 0.06 0.949 25.180 55--59 years 5507 162937 0.03 0.973 89.682 60--64 years 209919 157301 1.33 0.199 65.101 65--74 years -142258 66336 -2.14 0.046 43.529 75 years+ 53373 36529 1.46 0.161 26.059 AD -2665632 3334182 -0.80 0.434 90827.662 BD -2662185 3342475 -0.80 0.436 2400778.419 PhD -2620383 3375609 -0.78 0.448 20953.952 HS -2649374 3333923 -0.79 0.437 1711185.583 Less HS -2648440 3329576 -0.80 0.437 575442.669 MD -2687756 3321036 -0.81 0.429 389134.963 ProD -2717506 3320805 -0.82 0.424 161574.141 Some Coll -2665190 3325834 -0.80 0.433 256129.161 Some HS -2692213 3334397 -0.81 0.430 1402053.683 Less $15K -2398380 2972893 -0.81 0.430 1398310.925 $15K to $25K -2386133 2983525 -0.80 0.434 429011.942 $25K to $35K -2493006 2994782 -0.83 0.416 281665.965 $35K to $50K -2413833 2973178 -0.81 0.427 253783.866 $50K to $75K -2398657 2980453 -0.80 0.431 371553.358 $75K to $100K -2455023 2994758 -0.82 0.423 541397.221 $100K to $150K -2434483 2980581 -0.82 0.425 953779.541 $150K to $250K -2404935 2982679 -0.81 0.431 330537.600 $250K to $500K -2414342 2994755 -0.81 0.431 71152.055 $500K+ -2393024 2989787 -0.80 0.434 36401.343 Health Care 947225 1810961 0.52 0.607 32674.125 Labor 954055 1801535 0.53 0.603 727911.597 Professional/Administrative 966787 1801311 0.54 0.598 501480.184 Public Service 954355 1807843 0.53 0.604 42387.891 Service Industry 960649 1803238 0.53 0.601 19069.682 VIF=1/(1-corr^2) Corr is the multiple correlation of the variable with the remaining independent variables
  • 39. Alexander Kolker. All rights reserved 39 • Paired correlation analysis for all 38 variables (703 pairs!!) is impractical. • Knowing paired linear correlation coefficient does not help in reducing redundant information and extracting meaningful information for separate contributing factors. • Regression analysis with dozens of the original variables from observational data sets usually fails. Key Points:
  • 40. Alexander Kolker. All rights reserved 40 • It allows removing the redundant variables that carry little or no information while retaining only a few mutually uncorrelated principal variables. Why Principal components decomposition?
  • 41. The main idea of PCD Alexander Kolker. All rights reserved 41 The purpose of PCD is determining r new variables PCr that can best approximate variation in the p original X variables as linear combinations
  • 42. The principle of information conservation Alexander Kolker. All rights reserved 42 • The total amount of information in the original data set is not changed because of its PC decomposition • Rather, it is rearranged in the form of a few linear combinations of the original variables as main information holders (PCs) • This significantly reduces the number of independent variables but retain the same amount of information that is contained in the original data matrix
  • 43. What’s the eigen value? Alexander Kolker. All rights reserved 43 • The eigen value λj is a measure of how much information is retained by the corresponding PC. • A large value of λj (compared to 1) means that there is a substantial amount of information retained by the corresponding PC • A small value means that there is little amount of information retained by the corresponding PC Remainder: If the product of the data matrix A and the vector p can be presented as A * p = λj * p then λj are eigen values and the vector p is eigen vector of the matrix A.
  • 44. Eigen value analysis of the demographic data correlation matrix Alexander Kolker. All rights reserved 44 Eigen value 16.44 11.19 4.63 2.73 1.15 0.853 0.63 0.307 0.067 Propo rtion 0.433 0.295 0.122 0.072 0.03 0.022 0.017 0.008 0.002 Cumu lative 0.433 0.727 0.849 0.921 0.951 0.974 0.990 0.998 1.000 Key Point: Only 9 principal components (9 linear combinations of the original variables) are required to account for all 38 original variables.
  • 45. Alexander Kolker. All rights reserved 45 Why Regression with Principal components? • Because PCs are mutually uncorrelated, the variation of dependent variable (CM $) is accounted for by each PC independently of other PC • Contribution of each PC is directly defined by the coefficients of the regression equation Key Point: Regression with totally uncorrelated PC is one of the most powerful methodologies for identifying significant contributing variables (factors).
  • 46. The Best Subset Regression Alexander Kolker. All rights reserved 46 • Best subsets regression identifies the best-fitting regression models that can be constructed with as few predictor variables as possible • All possible subsets of the predictors are examined, beginning with all models containing one predictor, and then all models containing two predictors, and so on. • The two best models for each number of predictors are displayed
  • 47. Best subsets regression with PCs Alexander Kolker. All rights reserved 47 Varia bles R-sq (adj) Mallow Cp PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 3 87.0 128 X X X 3 64.9 349 X X X 4 90.1 83.0 X X X X 4 88.1 99.6 X X X X 5 92.2 54.4 X X X X X 5 91.3 60.3 X X X X X 6 94.5 31.7 X X X X X X 6 93.4 37.2 X X X X X X 7 97.4 14.5 X X X X X X X 7 94.0 26.2 X X X X X X X 8 99.4 9.0 X X X X X X X X
  • 48. Final regression equation with PC Alexander Kolker. All rights reserved 48 CM $ = 12.8 + 0.201*PC2 - 0.387*PC3 + 1.95*PC8 (compare to the original regression…) Key Points: • This equation accounts for R-sq(adj) = 99.4% of the response function (CM $) variability. • It contains only statistically significant terms (at 5% confidence level)
  • 49. Conclusion from the regression equation Alexander Kolker. All rights reserved 49 • Eigen vector coefficients for PC2, PC3 and PC8 combined with PC coefficients represent the contribution of each variable into the $CM output Note: In general, for not-normalized variables the relative contribution of the Xi is: called the elasticity coefficient Ei= (dY/Y)/(∂Xi/Xi) = ai*Xi/Y
  • 50. Alexander Kolker. All rights reserved 50 Variable PC2 PC3 PC8 Age 18-34 0.26 0.037 -0.034 Age 35-54 -0.084 0.331 0.037 Age 55-64 -0.229 -0.173 0.236 Age 65+ -0.058 -0.185 0.015 BS/BA+ degree -0.269 -0.137 0.049 Assoc/Prof degree -0.237 0.081 -0.18 High school 0.097 0.332 0.101 No high school 0.286 -0.084 -0.078 Income < $50K 0.275 -0.105 0.025 Income $50K-$75K -0.059 -0.013 0.256 Income $75-$100K -0.27 0.125 -0.183 Income $100K+ -0.259 0.097 -0.012 Occupation: Health -0.21 -0.176 -0.206 Labor 0.265 0.116 -0.133 Professional/Adm -0.275 -0.059 -0.104 Public Service 0.029 -0.328 0.463 Service Industry -0.125 0.264 0.542 % male 0.059 0.210 0.017 % female -0.059 -0.210 -0.017 Race: African American 0.235 -0.123 0.007 Asian 0.157 0.142 -0.337 Native American -0.033 -0.339 -0.253 Other 0.263 -0.114 0.158 White -0.252 0.128 -0.087 Disease: Cancer-BMT 0.012 0.108 0.002 Med Oncology 0.012 0.107 0.01 Surgical Oncology 0.011 0.108 0.012 Cardiology 0.014 0.103 0.012 Cardiothoracic Surgery 0.014 0.103 0.011 Vascular surgery 0.018 0.104 -0.001 Digestive disease 0.014 0.103 0.005 Medicine/Primary Care 0.015 0.103 0.01 Musculoskeletal 0.014 0.105 0.012 Neurology 0.014 0.104 0.013 Transplant 0.016 0.106 0.008 Trauma 0.015 0.104 0.006 Unassigned 0.014 0.103 0.000 Women Health 0.015 0.103 -0.002 Eigen vector coefficients for PC2, PC3 and PC8
  • 51. Conclusion from the regression with PC Alexander Kolker. All rights reserved 51 The primary contributing variables (factors) to CM $ are:  Age 55-64  Annual income $50 K - $75 K  Occupations: Public Service and Service Industry  Race- Other  Relative contributions of diseases are: neurology, cardiology and musculoskeletal
  • 52. Concluding Remarks and Reflections Alexander Kolker. All rights reserved 52 • As analytics professionals we are rewarded for help in solving business problems • Building analytics that influences business decision-making requires attention to the non-technical side of the project (organization’s internal politics and power-sharing) • Analytics has no practical value for the organization if it does not affect business decision-making, regardless of how much a new trendy technology is used So, how much of your work is about understanding and addressing real business problems vs. the technology deployment, coding and finding insights in the data?
  • 53. Alexander Kolker. All rights reserved 53 . Appendix “We may regard the present state of the universe as the effect of its past and the cause of its future (Predictive analytics?!) An intellect which at a certain moment would know all forces that set nature in motion, and all positions of all items of which nature is composed, if this intellect were also vast enough to submit these data to analysis, it would embrace in a single formula (algorithm?) the movements of the greatest bodies of the universe and those of the tiniest atom. For such an intellect nothing would be uncertain and the future (predictive analytics?) just like the past would be present before its eyes.” - Pierre Simon Laplace, A Philosophical Essay on Probabilities, 1795 Food for Thought: Can the contemporary Big Data Technology function as that ‘intellect’ capable of analyzing all data and getting a single formula for the future?