SlideShare a Scribd company logo
1 of 33
What to Optimize?
The Heart of Every Analytics Problem	
Predictive Analytics World
May, 2017
John F. Elder, Ph.D.
elder@elderresearch.com
@johnelder4
Charlottesville, VA
Washington, DC
Baltimore, MD
Raleigh, NC
434-973-7673
www.elderresearch.com
Outline
•  Squared error is convenient for the computer"
but not for the client
•  Lift (cumulative response) charts are great,"
but never optimize AUC (area under the curve)
•  You may need to design a custom metric
•  That may require a global search algorithm
•  Brainstorm about the Project goal
•  And what project to tackle in the first place
2
3	
4 Series: (X,Y1) (X,Y2) (X,Y3) (X4,Y4)
rxy	=	0.85	
yLS	=	3	+	0.5x	
MSE	=	1.25	
R2	=	0.67	
X	 Y1	 Y2	 Y3	 X4	 Y4	
10	 8.04	 9.14	 7.46	 8	 6.58	
8	 6.95	 8.14	 6.77	 8	 5.76	
13	 7.58	 8.74	 12.74	 8	 7.71	
9	 8.81	 8.77	 7.11	 8	 8.84	
11	 8.33	 9.26	 7.81	 8	 8.47	
14	 9.96	 8.10	 8.84	 8	 7.04	
6	 7.24	 6.13	 6.08	 8	 5.25	
4	 4.26	 3.10	 5.39	 19	 12.50	
12	 10.84	 9.13	 8.15	 8	 5.56	
7	 4.82	 7.26	 6.42	 8	 7.91	
5	 5.68	 4.74	 5.73	 8	 6.89
Anscomb’s Quartet (1973, American Statistician)
Y1	
X	
2						4						6						8					10				12				14				16				18				20	
14	
	
12	
	
10	
	
8	
	
6	
	
4	
	
2	
Y3	
14	
	
12	
	
10	
	
8	
	
6	
	
4	
	
2	
2						4						6						8					10				12				14				16				18				20	
Y2	
X	
14	
	
12	
	
10	
	
8	
	
6	
	
4	
	
2	
2						4						6						8					10				12				14				16				18				20	
Y4	
14	
	
12	
	
10	
	
8	
	
6	
	
4	
	
2	
2						4						6						8					10				12				14				16				18				20
Datasaurus	Dozen	(David	Smith	5/2/17)
Carl Friedrich Gauss
1)		If	your	model	is	linear	and	your	error	is	squared	then	
there	is	a	closed-form	soluOon	(regression)	
	
	
	
1) Otherwise,	you	are	groping	in	the	dark	(global	search)
2)		If	your	model	is	linear	and	your	error	is	absolute	then	
there	is	an	iteraOve	soluOon	(linear	programming)
3)		Otherwise,	you	need	
to	perform	global	search	
(which	has	no	
guarantees)
Simulated	Annealing	search	path
Nelder-Mead	(Amoeba)	Search	Path
Global	Rd	OpOmizaOon	when	Probes	are	
Expensive	(GROPE)	
•  Class	of	problems	where	goal	is	to	get	to	the	answer	
with	fewest	probes	(funcOon	evaluaOons)	
•  Best	algorithms	are		
–  SDO	(SequenOal	Design	for	OpOmizaOon)	by	Cox	&	John	
(1992,	1997)	
–  GROPE-Canopy	by	Elder	(1992,	1993)
Stock	Market	PredicOon	Thought	Experiment	
•  Say	your	model	predicted	a	10%	price	rise,	from	
$10	to	$11	over	the	next	quarter.	
•  But	the	price	later	actually	rises	to	$14.	
•  How	do	you	feel	about	it?	
•  How	does	the	model	(under	squared	error)	“feel”	
about	it?			
•  14-11=3;	3*3=9.		Had	it	instead	lost	10%	to	$9,	
the	error	of	2	would’ve	led	to	a	squared	error	of	
less	than	half	as	much	(4).			
•  So	the	model	would	have	been	“twice	as	happy”	
if	you’d	lost	10%	instead	of	won	40%.	
•  Something	is	wrong	with	that	metric!
16	16
Trading System Example
Gas Production Saved
19	19	
Using Lift Charts
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
1a.		Set	
invesOgaOon	limit	
1b.	Note	
expected	
response	
2a.		Or,	Set	
desired	
response	
2b.		And	note		
work	requirements	
Prospects Ordered by Response Probability
[293-295]
Bound by Random and Perfect Models
A	random	model	(no	
predicOve	power)	would	
be	a	diagonal	line.	
	
A	perfect	model	(right	
predicOon	every	Ome)	
shoots	up	as	fast	as	
possible	to	100%.	The	
slope	depends	on	event	
frequency.
Never	Use	AUC	(Area	Under	the	Curve)	
•  The	area	between	the	lin	curve	and	the	random	
line	(or	the	baseline)	is	onen	maximized.	
•  This	is	never	the	best	thing	to	do	
•  Instead,	figure	out	how	deep	into	the	list	you	
want	to,	or	can,	go.	
•  You	are	either	constrained	by	resources	(#cases	
you	can	invesOgate,	for	instance),	or	there	is	a	
problem-dependent	cost	tradeoff	between	false	
alarms	and	false	dismissals	(false	posiOves	and	
negaOves)
Truth Table (confusion matrix) "
with 25% Threshold
Actual	
OK	 BAD	
Predicted	
OK	 1,352	 136	
BAD	 237	 260
Truth table depends on threshold
Same model,
different cutoff
threshold "
results in different
truth table
(confusion matrix)
 Actual	
OK	 BAD	
Predicted	
OK	 1540	 246	
BAD	 49	 150	
Actual	
OK	 BAD	
Predicted	
OK	 846	 47	
BAD	 743	 349
0	
10	
20	
30	
40	
50	
60	
70	
80	
90	
100	
0	 10	 20	 30	 40	 50	 60	 70	 80	 90	 100	
CumulaOve	%	Captured	Response	
PercenOle	
	
HMEQ	"Bads"	Regression	Model	
Baseline	 Model	 Best	
	Gain	
Cost	 Predicted	Return	 Predicted	Profit
“Multiple Myeloma I have been diagnosed with
Multiple Myeloma (cancer of the bone marrow) and
am currently undergoing treatment to prepare me for
an autologous stem cell transplant. There has been a
brain tumor associated with this, for which I have
had....”
26
Social Security Administration
Disability Approval Prediction
Text	informaOon	in	“AllegaOon	Field”	proved	most	valuable
•  Draw from Bayesian statistics and smooth the raw count with an
empirical prior
–  Use baseline probability of the most probable classification
•  For SSA, roughly 33% of applications approved
–  Counts for each word are initialized with the baseline probability
•  Similar to Shrinkage, James-Stein Estimator, Ridge Regression, etc.
•  Hypothetical Example: Multiple Myeloma
–  Appears 5 times, 4 times was approved = 80% predicted “yes”
–  Prior (given all data) is 33%. If we use an “initial mass of 3 (2 “no” +
1 “yes”) then the total “yes” is 5/8 = 62.5%
•  With no data, results in prior
•  With lots of data, measurement provides probability
•  In between, compromises between measured and prior %
27
Using a Prior: “non-zero initialization”
•  Common aggregations don’t match medical
domain requirements
– SUM: many symptoms increases probability of
predicting approval
– MAX: ignores multiple serious symptoms
– AVG: minor symptoms water down major
symptoms
28
Combining Weights
Business Understanding:"
Desired properties for joining evidence
•  Applicants with multiple severe diseases should be more
likely to be approved
•  A large number of mild ailments should not add up to a
high score that gets an applicant approved
•  Mild ailments should not detract from severe ones
•  Rare diseases should be included, but not with the same
confidence as those with more evidence
•  Calculation of disease severity must be self-adapting to
accommodate rapid changes in the medical field
We designed a joint probability function meeting these constraints
29
If (no data), then use prior
Else If (max(probability) < 0.5) then use that max.
Else:
i.  Ignore concepts with probability < 0.5
ii.  Combine the remaining ones with a log-likelihood
formula and use the resulting joint probability.
30
Our approach to combine evidence (SSA)
31	31	
Higher Level Optimization Issue:"
What is the Goal of the Project?
Aim at the right target
Example: Fraud Detection for international phone calls 
Daryl Pregibon and colleagues at Bell (Shannon) Labs: 
The normal approach would have been to attempt to
classify fraud/nonfraud for general calls
Instead they characterized normal behavior for each
account (phone), then flagged outliers.
Model had features like top 5 countries called, durations
of calls, times of day, days of week, “faxicity” of call, etc. 
All features slowly adapted if changes occurred.

-> A brilliant success.
32	32	
Even Higher-Level Optimization Issue:"
What Project Should you Choose?
ROI
Cost
(Disruption,TechnicalEffort)
Cost	factors	include:	
•  Time	required	
•  DisrupOon	effect	
•  Data	availability	
•  Data	quality	
	
Phantom	inventory
Summary
•  Squared error gives undue power to outliers and is
symmetric, but is very hard to escape.
•  You can always do better than to optimize AUC (but it’s
correlated with success, so don’t throw away its results).
•  Think about what you’re asking the computer to search
for: to solve the hardest problems, you’ll need to design
a custom metric.
•  Get at least a random global search capability ready.
•  Work closely with the client and creative folk to
brainstorm project goals and priorities.
•  If your work isn’t implemented, you failed.
33

More Related Content

What's hot

Identify Root Causes – 5 Whys
Identify Root Causes – 5 WhysIdentify Root Causes – 5 Whys
Identify Root Causes – 5 WhysMatt Hansen
 
Hypothesis Testing: Relationships (Overview)
Hypothesis Testing: Relationships (Overview)Hypothesis Testing: Relationships (Overview)
Hypothesis Testing: Relationships (Overview)Matt Hansen
 
Kepner Tregoe Method PowerPoint Presentation Slides
Kepner Tregoe Method PowerPoint Presentation Slides Kepner Tregoe Method PowerPoint Presentation Slides
Kepner Tregoe Method PowerPoint Presentation Slides SlideTeam
 
Hypothesis Testing: Overview
Hypothesis Testing: OverviewHypothesis Testing: Overview
Hypothesis Testing: OverviewMatt Hansen
 
Business model innovation by experimentation
Business model innovation by experimentationBusiness model innovation by experimentation
Business model innovation by experimentationEnergized Work
 
Analysis of Behavior & Cognition (ABC) Model with Matt Hansen at StatStuff
Analysis of Behavior & Cognition (ABC) Model with Matt Hansen at StatStuffAnalysis of Behavior & Cognition (ABC) Model with Matt Hansen at StatStuff
Analysis of Behavior & Cognition (ABC) Model with Matt Hansen at StatStuffMatt Hansen
 
Gap Analysis Methods And Models PowerPoint Presentation Slides
Gap Analysis Methods And Models PowerPoint Presentation Slides Gap Analysis Methods And Models PowerPoint Presentation Slides
Gap Analysis Methods And Models PowerPoint Presentation Slides SlideTeam
 
Risk Analysis with Matt Hansen at StatStuff
Risk Analysis with Matt Hansen at StatStuffRisk Analysis with Matt Hansen at StatStuff
Risk Analysis with Matt Hansen at StatStuffMatt Hansen
 
Building a Problem Statement with Matt Hansen at StatStuff
Building a Problem Statement with Matt Hansen at StatStuffBuilding a Problem Statement with Matt Hansen at StatStuff
Building a Problem Statement with Matt Hansen at StatStuffMatt Hansen
 
Introduction to Root Cause Analysis
Introduction to Root Cause AnalysisIntroduction to Root Cause Analysis
Introduction to Root Cause AnalysisCarmel Khan
 
The Necessity of the Measure Phase with Matt Hansen at StatStuff
The Necessity of the Measure Phase with Matt Hansen at StatStuffThe Necessity of the Measure Phase with Matt Hansen at StatStuff
The Necessity of the Measure Phase with Matt Hansen at StatStuffMatt Hansen
 
Mark Graban SHS 2014: Two Data Points Are Not a Trend: Using SPC to Manage Be...
Mark Graban SHS 2014: Two Data Points Are Not a Trend: Using SPC to Manage Be...Mark Graban SHS 2014: Two Data Points Are Not a Trend: Using SPC to Manage Be...
Mark Graban SHS 2014: Two Data Points Are Not a Trend: Using SPC to Manage Be...Mark Graban
 
Risk Management Process in OH&S
Risk Management Process in OH&SRisk Management Process in OH&S
Risk Management Process in OH&SAhmed-Refat Refat
 
Testing for Success: How to Infuse Consistent Testing Into Your Fundraising P...
Testing for Success: How to Infuse Consistent Testing Into Your Fundraising P...Testing for Success: How to Infuse Consistent Testing Into Your Fundraising P...
Testing for Success: How to Infuse Consistent Testing Into Your Fundraising P...PMX Agency
 

What's hot (20)

Identify Root Causes – 5 Whys
Identify Root Causes – 5 WhysIdentify Root Causes – 5 Whys
Identify Root Causes – 5 Whys
 
1120 track1 grossman
1120 track1 grossman1120 track1 grossman
1120 track1 grossman
 
Hypothesis Testing: Relationships (Overview)
Hypothesis Testing: Relationships (Overview)Hypothesis Testing: Relationships (Overview)
Hypothesis Testing: Relationships (Overview)
 
1120 track1 taylor
1120 track1 taylor1120 track1 taylor
1120 track1 taylor
 
Kepner Tregoe Method PowerPoint Presentation Slides
Kepner Tregoe Method PowerPoint Presentation Slides Kepner Tregoe Method PowerPoint Presentation Slides
Kepner Tregoe Method PowerPoint Presentation Slides
 
Hypothesis Testing: Overview
Hypothesis Testing: OverviewHypothesis Testing: Overview
Hypothesis Testing: Overview
 
Business model innovation by experimentation
Business model innovation by experimentationBusiness model innovation by experimentation
Business model innovation by experimentation
 
Analysis of Behavior & Cognition (ABC) Model with Matt Hansen at StatStuff
Analysis of Behavior & Cognition (ABC) Model with Matt Hansen at StatStuffAnalysis of Behavior & Cognition (ABC) Model with Matt Hansen at StatStuff
Analysis of Behavior & Cognition (ABC) Model with Matt Hansen at StatStuff
 
Root Cause Analysis
Root Cause AnalysisRoot Cause Analysis
Root Cause Analysis
 
Gap Analysis Methods And Models PowerPoint Presentation Slides
Gap Analysis Methods And Models PowerPoint Presentation Slides Gap Analysis Methods And Models PowerPoint Presentation Slides
Gap Analysis Methods And Models PowerPoint Presentation Slides
 
Root cause analysis
Root cause analysisRoot cause analysis
Root cause analysis
 
Risk Analysis with Matt Hansen at StatStuff
Risk Analysis with Matt Hansen at StatStuffRisk Analysis with Matt Hansen at StatStuff
Risk Analysis with Matt Hansen at StatStuff
 
EESS Day 1 - Justin Ludcke
EESS Day 1 - Justin LudckeEESS Day 1 - Justin Ludcke
EESS Day 1 - Justin Ludcke
 
Building a Problem Statement with Matt Hansen at StatStuff
Building a Problem Statement with Matt Hansen at StatStuffBuilding a Problem Statement with Matt Hansen at StatStuff
Building a Problem Statement with Matt Hansen at StatStuff
 
Introduction to Root Cause Analysis
Introduction to Root Cause AnalysisIntroduction to Root Cause Analysis
Introduction to Root Cause Analysis
 
The Necessity of the Measure Phase with Matt Hansen at StatStuff
The Necessity of the Measure Phase with Matt Hansen at StatStuffThe Necessity of the Measure Phase with Matt Hansen at StatStuff
The Necessity of the Measure Phase with Matt Hansen at StatStuff
 
Root cause analysis
Root cause analysisRoot cause analysis
Root cause analysis
 
Mark Graban SHS 2014: Two Data Points Are Not a Trend: Using SPC to Manage Be...
Mark Graban SHS 2014: Two Data Points Are Not a Trend: Using SPC to Manage Be...Mark Graban SHS 2014: Two Data Points Are Not a Trend: Using SPC to Manage Be...
Mark Graban SHS 2014: Two Data Points Are Not a Trend: Using SPC to Manage Be...
 
Risk Management Process in OH&S
Risk Management Process in OH&SRisk Management Process in OH&S
Risk Management Process in OH&S
 
Testing for Success: How to Infuse Consistent Testing Into Your Fundraising P...
Testing for Success: How to Infuse Consistent Testing Into Your Fundraising P...Testing for Success: How to Infuse Consistent Testing Into Your Fundraising P...
Testing for Success: How to Infuse Consistent Testing Into Your Fundraising P...
 

Similar to 920 plenary elder

840 plenary elder_using his laptop
840 plenary elder_using his laptop840 plenary elder_using his laptop
840 plenary elder_using his laptopRising Media, Inc.
 
Medical Segmentation Decathalon
Medical Segmentation DecathalonMedical Segmentation Decathalon
Medical Segmentation Decathalonimgcommcall
 
MH Prediction Modeling and Validation -clean
MH Prediction Modeling and Validation -cleanMH Prediction Modeling and Validation -clean
MH Prediction Modeling and Validation -cleanMin-hyung Kim
 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsAkin Osman Kazakci
 
MLPA for health care presentation smc
MLPA for health care presentation   smcMLPA for health care presentation   smc
MLPA for health care presentation smcShaun Comfort
 
Progress in AI and its application to Asset Management.pptx
Progress in AI and its application to Asset Management.pptxProgress in AI and its application to Asset Management.pptx
Progress in AI and its application to Asset Management.pptxDerryn Knife
 
Meta-Analysis -- Introduction.pptx
Meta-Analysis -- Introduction.pptxMeta-Analysis -- Introduction.pptx
Meta-Analysis -- Introduction.pptxACSRM
 
D6 transforming oncology development with adaptive studies - 2011-04
D6   transforming oncology development with adaptive studies - 2011-04D6   transforming oncology development with adaptive studies - 2011-04
D6 transforming oncology development with adaptive studies - 2011-04therealreverendbayes
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningAI Summary
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreTuri, Inc.
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Greg Makowski
 
Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...Evangelos Kritsotakis
 
Automated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUAutomated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUCS, NcState
 
Mixed Effects Models - Random Intercepts
Mixed Effects Models - Random InterceptsMixed Effects Models - Random Intercepts
Mixed Effects Models - Random InterceptsScott Fraundorf
 
Echelon Asia Summit 2017 Startup Academy Workshop
Echelon Asia Summit 2017 Startup Academy WorkshopEchelon Asia Summit 2017 Startup Academy Workshop
Echelon Asia Summit 2017 Startup Academy WorkshopGarrett Teoh Hor Keong
 
Improving predictions: Lasso, Ridge and Stein's paradox
Improving predictions: Lasso, Ridge and Stein's paradoxImproving predictions: Lasso, Ridge and Stein's paradox
Improving predictions: Lasso, Ridge and Stein's paradoxMaarten van Smeden
 

Similar to 920 plenary elder (20)

840 plenary elder_using his laptop
840 plenary elder_using his laptop840 plenary elder_using his laptop
840 plenary elder_using his laptop
 
Medical Segmentation Decathalon
Medical Segmentation DecathalonMedical Segmentation Decathalon
Medical Segmentation Decathalon
 
MH Prediction Modeling and Validation -clean
MH Prediction Modeling and Validation -cleanMH Prediction Modeling and Validation -clean
MH Prediction Modeling and Validation -clean
 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analytics
 
MLPA for health care presentation smc
MLPA for health care presentation   smcMLPA for health care presentation   smc
MLPA for health care presentation smc
 
Progress in AI and its application to Asset Management.pptx
Progress in AI and its application to Asset Management.pptxProgress in AI and its application to Asset Management.pptx
Progress in AI and its application to Asset Management.pptx
 
Meta-Analysis -- Introduction.pptx
Meta-Analysis -- Introduction.pptxMeta-Analysis -- Introduction.pptx
Meta-Analysis -- Introduction.pptx
 
D6 transforming oncology development with adaptive studies - 2011-04
D6   transforming oncology development with adaptive studies - 2011-04D6   transforming oncology development with adaptive studies - 2011-04
D6 transforming oncology development with adaptive studies - 2011-04
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignore
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
 
Final_Presentation.pptx
Final_Presentation.pptxFinal_Presentation.pptx
Final_Presentation.pptx
 
Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...
 
Parkinson disease classification recorded v2.0
Parkinson disease classification recorded   v2.0Parkinson disease classification recorded   v2.0
Parkinson disease classification recorded v2.0
 
Parkinson disease classification v2.0
Parkinson disease classification v2.0Parkinson disease classification v2.0
Parkinson disease classification v2.0
 
Automated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUAutomated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSU
 
Mixed Effects Models - Random Intercepts
Mixed Effects Models - Random InterceptsMixed Effects Models - Random Intercepts
Mixed Effects Models - Random Intercepts
 
Echelon Asia Summit 2017 Startup Academy Workshop
Echelon Asia Summit 2017 Startup Academy WorkshopEchelon Asia Summit 2017 Startup Academy Workshop
Echelon Asia Summit 2017 Startup Academy Workshop
 
Improving predictions: Lasso, Ridge and Stein's paradox
Improving predictions: Lasso, Ridge and Stein's paradoxImproving predictions: Lasso, Ridge and Stein's paradox
Improving predictions: Lasso, Ridge and Stein's paradox
 
Analyzing Performance Test Data
Analyzing Performance Test DataAnalyzing Performance Test Data
Analyzing Performance Test Data
 

More from Rising Media, Inc.

1415 track 1 wu_using his laptop
1415 track 1 wu_using his laptop1415 track 1 wu_using his laptop
1415 track 1 wu_using his laptopRising Media, Inc.
 
1620 keynote olson_using our laptop
1620 keynote olson_using our laptop1620 keynote olson_using our laptop
1620 keynote olson_using our laptopRising Media, Inc.
 
1530 track 2 stuart_using our laptop
1530 track 2 stuart_using our laptop1530 track 2 stuart_using our laptop
1530 track 2 stuart_using our laptopRising Media, Inc.
 
1530 track 1 fader_using our laptop
1530 track 1 fader_using our laptop1530 track 1 fader_using our laptop
1530 track 1 fader_using our laptopRising Media, Inc.
 
1215 daa lunch owusu_using our laptop
1215 daa lunch owusu_using our laptop1215 daa lunch owusu_using our laptop
1215 daa lunch owusu_using our laptopRising Media, Inc.
 
1215 daa lunch a bos intro slides_using our laptop
1215 daa lunch a bos intro slides_using our laptop1215 daa lunch a bos intro slides_using our laptop
1215 daa lunch a bos intro slides_using our laptopRising Media, Inc.
 
855 sponsor movassate_using our laptop
855 sponsor movassate_using our laptop855 sponsor movassate_using our laptop
855 sponsor movassate_using our laptopRising Media, Inc.
 
1325 keynote yale_pdf shareable
1325 keynote yale_pdf shareable1325 keynote yale_pdf shareable
1325 keynote yale_pdf shareableRising Media, Inc.
 
905 keynote peele_using our laptop
905 keynote peele_using our laptop905 keynote peele_using our laptop
905 keynote peele_using our laptopRising Media, Inc.
 

More from Rising Media, Inc. (20)

1415 track 1 wu_using his laptop
1415 track 1 wu_using his laptop1415 track 1 wu_using his laptop
1415 track 1 wu_using his laptop
 
Matt gershoff
Matt gershoffMatt gershoff
Matt gershoff
 
Keynote adam greco
Keynote adam grecoKeynote adam greco
Keynote adam greco
 
1620 keynote olson_using our laptop
1620 keynote olson_using our laptop1620 keynote olson_using our laptop
1620 keynote olson_using our laptop
 
1530 track 2 stuart_using our laptop
1530 track 2 stuart_using our laptop1530 track 2 stuart_using our laptop
1530 track 2 stuart_using our laptop
 
1530 track 1 fader_using our laptop
1530 track 1 fader_using our laptop1530 track 1 fader_using our laptop
1530 track 1 fader_using our laptop
 
1415 track 2 richardson
1415 track 2 richardson1415 track 2 richardson
1415 track 2 richardson
 
1215 daa lunch owusu_using our laptop
1215 daa lunch owusu_using our laptop1215 daa lunch owusu_using our laptop
1215 daa lunch owusu_using our laptop
 
1215 daa lunch a bos intro slides_using our laptop
1215 daa lunch a bos intro slides_using our laptop1215 daa lunch a bos intro slides_using our laptop
1215 daa lunch a bos intro slides_using our laptop
 
915 e metrics_claudia perlich
915 e metrics_claudia perlich915 e metrics_claudia perlich
915 e metrics_claudia perlich
 
855 sponsor movassate_using our laptop
855 sponsor movassate_using our laptop855 sponsor movassate_using our laptop
855 sponsor movassate_using our laptop
 
1615 plack using our laptop
1615 plack using our laptop1615 plack using our laptop
1615 plack using our laptop
 
1530 rimmele do not share
1530 rimmele do not share1530 rimmele do not share
1530 rimmele do not share
 
1325 keynote yale_pdf shareable
1325 keynote yale_pdf shareable1325 keynote yale_pdf shareable
1325 keynote yale_pdf shareable
 
1115 fiztgerald schuchardt
1115 fiztgerald schuchardt1115 fiztgerald schuchardt
1115 fiztgerald schuchardt
 
1000 kondic do not share
1000 kondic do not share1000 kondic do not share
1000 kondic do not share
 
905 keynote peele_using our laptop
905 keynote peele_using our laptop905 keynote peele_using our laptop
905 keynote peele_using our laptop
 
Stephen morse sharable
Stephen morse sharableStephen morse sharable
Stephen morse sharable
 
Elder shareable
Elder shareableElder shareable
Elder shareable
 
1115 ramirez using our laptop
1115 ramirez using our laptop1115 ramirez using our laptop
1115 ramirez using our laptop
 

Recently uploaded

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computationsit20ad004
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 

Recently uploaded (20)

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computation
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 

920 plenary elder

  • 1. What to Optimize? The Heart of Every Analytics Problem Predictive Analytics World May, 2017 John F. Elder, Ph.D. elder@elderresearch.com @johnelder4 Charlottesville, VA Washington, DC Baltimore, MD Raleigh, NC 434-973-7673 www.elderresearch.com
  • 2. Outline •  Squared error is convenient for the computer" but not for the client •  Lift (cumulative response) charts are great," but never optimize AUC (area under the curve) •  You may need to design a custom metric •  That may require a global search algorithm •  Brainstorm about the Project goal •  And what project to tackle in the first place 2
  • 3. 3 4 Series: (X,Y1) (X,Y2) (X,Y3) (X4,Y4) rxy = 0.85 yLS = 3 + 0.5x MSE = 1.25 R2 = 0.67 X Y1 Y2 Y3 X4 Y4 10 8.04 9.14 7.46 8 6.58 8 6.95 8.14 6.77 8 5.76 13 7.58 8.74 12.74 8 7.71 9 8.81 8.77 7.11 8 8.84 11 8.33 9.26 7.81 8 8.47 14 9.96 8.10 8.84 8 7.04 6 7.24 6.13 6.08 8 5.25 4 4.26 3.10 5.39 19 12.50 12 10.84 9.13 8.15 8 5.56 7 4.82 7.26 6.42 8 7.91 5 5.68 4.74 5.73 8 6.89
  • 4. Anscomb’s Quartet (1973, American Statistician) Y1 X 2 4 6 8 10 12 14 16 18 20 14 12 10 8 6 4 2 Y3 14 12 10 8 6 4 2 2 4 6 8 10 12 14 16 18 20 Y2 X 14 12 10 8 6 4 2 2 4 6 8 10 12 14 16 18 20 Y4 14 12 10 8 6 4 2 2 4 6 8 10 12 14 16 18 20
  • 10.
  • 11.
  • 15. Stock Market PredicOon Thought Experiment •  Say your model predicted a 10% price rise, from $10 to $11 over the next quarter. •  But the price later actually rises to $14. •  How do you feel about it? •  How does the model (under squared error) “feel” about it? •  14-11=3; 3*3=9. Had it instead lost 10% to $9, the error of 2 would’ve led to a squared error of less than half as much (4). •  So the model would have been “twice as happy” if you’d lost 10% instead of won 40%. •  Something is wrong with that metric!
  • 17.
  • 19. 19 19 Using Lift Charts 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 1a. Set invesOgaOon limit 1b. Note expected response 2a. Or, Set desired response 2b. And note work requirements Prospects Ordered by Response Probability [293-295]
  • 20. Bound by Random and Perfect Models A random model (no predicOve power) would be a diagonal line. A perfect model (right predicOon every Ome) shoots up as fast as possible to 100%. The slope depends on event frequency.
  • 21. Never Use AUC (Area Under the Curve) •  The area between the lin curve and the random line (or the baseline) is onen maximized. •  This is never the best thing to do •  Instead, figure out how deep into the list you want to, or can, go. •  You are either constrained by resources (#cases you can invesOgate, for instance), or there is a problem-dependent cost tradeoff between false alarms and false dismissals (false posiOves and negaOves)
  • 22.
  • 23. Truth Table (confusion matrix) " with 25% Threshold Actual OK BAD Predicted OK 1,352 136 BAD 237 260
  • 24. Truth table depends on threshold Same model, different cutoff threshold " results in different truth table (confusion matrix) Actual OK BAD Predicted OK 1540 246 BAD 49 150 Actual OK BAD Predicted OK 846 47 BAD 743 349
  • 25. 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 CumulaOve % Captured Response PercenOle HMEQ "Bads" Regression Model Baseline Model Best Gain Cost Predicted Return Predicted Profit
  • 26. “Multiple Myeloma I have been diagnosed with Multiple Myeloma (cancer of the bone marrow) and am currently undergoing treatment to prepare me for an autologous stem cell transplant. There has been a brain tumor associated with this, for which I have had....” 26 Social Security Administration Disability Approval Prediction Text informaOon in “AllegaOon Field” proved most valuable
  • 27. •  Draw from Bayesian statistics and smooth the raw count with an empirical prior –  Use baseline probability of the most probable classification •  For SSA, roughly 33% of applications approved –  Counts for each word are initialized with the baseline probability •  Similar to Shrinkage, James-Stein Estimator, Ridge Regression, etc. •  Hypothetical Example: Multiple Myeloma –  Appears 5 times, 4 times was approved = 80% predicted “yes” –  Prior (given all data) is 33%. If we use an “initial mass of 3 (2 “no” + 1 “yes”) then the total “yes” is 5/8 = 62.5% •  With no data, results in prior •  With lots of data, measurement provides probability •  In between, compromises between measured and prior % 27 Using a Prior: “non-zero initialization”
  • 28. •  Common aggregations don’t match medical domain requirements – SUM: many symptoms increases probability of predicting approval – MAX: ignores multiple serious symptoms – AVG: minor symptoms water down major symptoms 28 Combining Weights
  • 29. Business Understanding:" Desired properties for joining evidence •  Applicants with multiple severe diseases should be more likely to be approved •  A large number of mild ailments should not add up to a high score that gets an applicant approved •  Mild ailments should not detract from severe ones •  Rare diseases should be included, but not with the same confidence as those with more evidence •  Calculation of disease severity must be self-adapting to accommodate rapid changes in the medical field We designed a joint probability function meeting these constraints 29
  • 30. If (no data), then use prior Else If (max(probability) < 0.5) then use that max. Else: i.  Ignore concepts with probability < 0.5 ii.  Combine the remaining ones with a log-likelihood formula and use the resulting joint probability. 30 Our approach to combine evidence (SSA)
  • 31. 31 31 Higher Level Optimization Issue:" What is the Goal of the Project? Aim at the right target Example: Fraud Detection for international phone calls Daryl Pregibon and colleagues at Bell (Shannon) Labs: The normal approach would have been to attempt to classify fraud/nonfraud for general calls Instead they characterized normal behavior for each account (phone), then flagged outliers. Model had features like top 5 countries called, durations of calls, times of day, days of week, “faxicity” of call, etc. All features slowly adapted if changes occurred. -> A brilliant success.
  • 32. 32 32 Even Higher-Level Optimization Issue:" What Project Should you Choose? ROI Cost (Disruption,TechnicalEffort) Cost factors include: •  Time required •  DisrupOon effect •  Data availability •  Data quality Phantom inventory
  • 33. Summary •  Squared error gives undue power to outliers and is symmetric, but is very hard to escape. •  You can always do better than to optimize AUC (but it’s correlated with success, so don’t throw away its results). •  Think about what you’re asking the computer to search for: to solve the hardest problems, you’ll need to design a custom metric. •  Get at least a random global search capability ready. •  Work closely with the client and creative folk to brainstorm project goals and priorities. •  If your work isn’t implemented, you failed. 33