General overview of analytical models. Presentation covers, the following topics:
• What is analytical model?
• What are business requirements for the model?
• How to fulfill those requirements ?
2. Presentation Scope
➢ What’s model
➢ What are the requirements for the model
➢ How to fulfill those requirements
3. A. Model to increase company
profit
Inputs Outputs
• Customer application data
• Customer behavior
• Operational activity
• External 3rd party service
• Reject potentially risky clients
• Propose bonus to improve
customer experience
• Recommend product / service
• Conduct retention campaign
4. B. Model has specific
requirements
Requirements
1. Model relevancy
1. Maximum impact
1. Consistent inference
1. Implementable solution
1. Minimum time to marketCut-off Point
KPI vs Model Score
Score
ProfitLoss
5. C. How to fulfill those
requirements
1. Model relevancy
1. Maximum impact
1. Consistent inference
1. Implementable solution
1. Minimum time to market
6. 1. Score correlates KPI
KPI
calculation
Target
definition
Analysis
sample
KPI depends on score
Factor to make score relevant
7. 1.1 Know how to calculate KPI
Assume, you don’t
know KPI
1. Difficult to
explain
2. Harm
business
3. Miss
predictors
Project failure
You
are
fired
8. 1.2 Good definition of target
Good examples:
• First payment
default
• 3 missing
payments
• Non paying off
debt in 30 days
Differentiates profitable and non-profitable customers
Binary variable
Fast to track
Easy to understand
9. 1.3 Get right sample for analysisTargetrate
Time period
Too old
Too
young
Right sample
10. C. How to fulfill those
requirements
1. Model relevancy
1. Maximum impact
1. Consistent inference
1. Implementable solution
1. Minimum time to market
11. The more money
we earn
The Higher
predictive
power
2. Generate as much money as
possible
More observations
More features
Automated workflow
Analyst intervention
$
12. 2.1. The more data, the more creative you can get
Sample size Decision making
If A Then Approve, else
Reject
13. 2.2. Where to get features
Features
Historical performance
Synthetics Generation External sources
Application Data
14. 2.3. There is no gut feeling, just grind
• Retrieve as
much data as
possible
• Generate
features
• Conduct
feature
selection
• Use
algorithms
to build
model and
sieve
them
• Choose
final
version
only from
few ones
• Deliver
results
DATA Model
15. 2.4. No need for analyst, if ultimate algo
exists
Reason why algo fails
1. Erroneous data
1. Missing values
1. Outliers
1. Wrong assumptions
1. High correlation
1. Too big volume to handle
16. C. How to fulfill those
requirements
1. Model relevancy
1. Maximum impact
1. Consistent inference
1. Implementable solution
1. Minimum time to market
17. 3. To ensure consistency, monitor model
Test vs Train Elbow method
Score stability report Feature impact on score
testtrain
18. 3.1. Divide dataset into train & test by time
Time based
✓ Sensitive to
operational
changes
✓ Enable us to point
on spurious
variable
✓ Easy to interpret
Random
× Non sensitive to
operational
changes
× Does not show
anything in
common situation
× Hard to interpret to
higher
management
19. 3.2. Apply “elbow” method for cross
validation
Over-
Fitting
Under-
Fitting
Symptoms
• Higher error at
testing dataset
• High
difference
between
performance in
train and test• High error
both at train &
test samples
• Lower
difference in
performance
Prescription
• Use simpler
models
• Get more data
• Conduct
feature
prescreening
• Use more
complex
models
• Get more
features
Under
fitting
Over fitting
Model complexity
Modelerror
train
test
20. 3.3. Measure score stability
Procedure
1. Score the whole dataset both
train and test samples
2. Divide observations into 5-20
score bins
3. Group observation by bin and
time period
4. Calculate Kullback-Leibler
divergence between periods
5. Check whether stability index
within appropriate limits
21. 3.4. Locate troublesome variable
Procedure
1. Calculate
AVG(coefficient * var value)
2. Group observations by time
period
3. Subtract value from most recent
period
4. Check what variable causes the
drift
5. Exclude that variable from the
model, if score is no stable
22. C. How to fulfill those
requirements
1. Model relevancy
1. Maximum impact
1. Consistent inference
1. Implementable solution
1. Minimum time to market
23. 4. Until implemented model worth nothing
MODEL
Test coverage
SpecificationData provider
24. C. How to fulfill those
requirements
1. Model relevancy
1. Maximum impact
1. Consistent inference
1. Implementable solution
1. Minimum time to market
25. 5. Minimize time-to-market
Retriev
e Data
Explor
e Data
Try
Models
Make
Report
s
Build
Solutio
n
Receiv
e Task
Proper
Tools
Structured
Thoughts
Deliver
Results
26. 5.1. Structure your code to free your
brain
Common
repository
Yet another
model
munge
reports
src
target
.Rprofile
cache
config
data
Cross model
routines
R
man
tests
27. Presentation Scope
✓ What’s model
✓ What are the requirements for the model
✓ How to fulfill those requirements