High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
920 plenary elder
1. What to Optimize?
The Heart of Every Analytics Problem
Predictive Analytics World
May, 2017
John F. Elder, Ph.D.
elder@elderresearch.com
@johnelder4
Charlottesville, VA
Washington, DC
Baltimore, MD
Raleigh, NC
434-973-7673
www.elderresearch.com
2. Outline
• Squared error is convenient for the computer"
but not for the client
• Lift (cumulative response) charts are great,"
but never optimize AUC (area under the curve)
• You may need to design a custom metric
• That may require a global search algorithm
• Brainstorm about the Project goal
• And what project to tackle in the first place
2
20. Bound by Random and Perfect Models
A random model (no
predicOve power) would
be a diagonal line.
A perfect model (right
predicOon every Ome)
shoots up as fast as
possible to 100%. The
slope depends on event
frequency.
23. Truth Table (confusion matrix) "
with 25% Threshold
Actual
OK BAD
Predicted
OK 1,352 136
BAD 237 260
24. Truth table depends on threshold
Same model,
different cutoff
threshold "
results in different
truth table
(confusion matrix)
Actual
OK BAD
Predicted
OK 1540 246
BAD 49 150
Actual
OK BAD
Predicted
OK 846 47
BAD 743 349
26. “Multiple Myeloma I have been diagnosed with
Multiple Myeloma (cancer of the bone marrow) and
am currently undergoing treatment to prepare me for
an autologous stem cell transplant. There has been a
brain tumor associated with this, for which I have
had....”
26
Social Security Administration
Disability Approval Prediction
Text informaOon in “AllegaOon Field” proved most valuable
27. • Draw from Bayesian statistics and smooth the raw count with an
empirical prior
– Use baseline probability of the most probable classification
• For SSA, roughly 33% of applications approved
– Counts for each word are initialized with the baseline probability
• Similar to Shrinkage, James-Stein Estimator, Ridge Regression, etc.
• Hypothetical Example: Multiple Myeloma
– Appears 5 times, 4 times was approved = 80% predicted “yes”
– Prior (given all data) is 33%. If we use an “initial mass of 3 (2 “no” +
1 “yes”) then the total “yes” is 5/8 = 62.5%
• With no data, results in prior
• With lots of data, measurement provides probability
• In between, compromises between measured and prior %
27
Using a Prior: “non-zero initialization”
28. • Common aggregations don’t match medical
domain requirements
– SUM: many symptoms increases probability of
predicting approval
– MAX: ignores multiple serious symptoms
– AVG: minor symptoms water down major
symptoms
28
Combining Weights
29. Business Understanding:"
Desired properties for joining evidence
• Applicants with multiple severe diseases should be more
likely to be approved
• A large number of mild ailments should not add up to a
high score that gets an applicant approved
• Mild ailments should not detract from severe ones
• Rare diseases should be included, but not with the same
confidence as those with more evidence
• Calculation of disease severity must be self-adapting to
accommodate rapid changes in the medical field
We designed a joint probability function meeting these constraints
29
30. If (no data), then use prior
Else If (max(probability) < 0.5) then use that max.
Else:
i. Ignore concepts with probability < 0.5
ii. Combine the remaining ones with a log-likelihood
formula and use the resulting joint probability.
30
Our approach to combine evidence (SSA)
31. 31 31
Higher Level Optimization Issue:"
What is the Goal of the Project?
Aim at the right target
Example: Fraud Detection for international phone calls
Daryl Pregibon and colleagues at Bell (Shannon) Labs:
The normal approach would have been to attempt to
classify fraud/nonfraud for general calls
Instead they characterized normal behavior for each
account (phone), then flagged outliers.
Model had features like top 5 countries called, durations
of calls, times of day, days of week, “faxicity” of call, etc.
All features slowly adapted if changes occurred.
-> A brilliant success.
32. 32 32
Even Higher-Level Optimization Issue:"
What Project Should you Choose?
ROI
Cost
(Disruption,TechnicalEffort)
Cost factors include:
• Time required
• DisrupOon effect
• Data availability
• Data quality
Phantom inventory
33. Summary
• Squared error gives undue power to outliers and is
symmetric, but is very hard to escape.
• You can always do better than to optimize AUC (but it’s
correlated with success, so don’t throw away its results).
• Think about what you’re asking the computer to search
for: to solve the hardest problems, you’ll need to design
a custom metric.
• Get at least a random global search capability ready.
• Work closely with the client and creative folk to
brainstorm project goals and priorities.
• If your work isn’t implemented, you failed.
33