1
Machine Learning vs
Decision Optimization
concepts comparison
Alain Chabrier/Spain/IBM
achabrier@es.ibm.com
Oct. 2017
2
 Data science is an interdisciplinary field about scientific methods, processes, and systems to
extract knowledge or insights from data in various forms, either structured or unstructured.
 Data science community is made of people coming from different areas, and who do not always
understand each others. Everyone is using his own concepts and not always understands how
these map when applied to other techniques.
 In particular, Machine Learning experts do not always understand how Decision Optimization
concepts maps or differs from their own concepts.
Why these slides?
3
Machine learning is a field of computer science that
gives computers the ability to learn without being
explicitly programmed.
In practice, we want to support the training, validation,
debug and deployment of models which use uses
ML/statistics techniques to score some set of values and
get the most probable set of outcomes based on
training.
Decision optimization is the application of one or
more rigorous analytical techniques to a well-
defined model to generate the absolute best
decision from a multitude of possible alternatives
in a rigorous, repeatable, and provable fashion.
In practice, we want to support the development,
validation, debug and deployment of models which
use Mathematical and Constraint Programming
techniques to solve a given problem and get
proven optimal solution.
Machine Learning Decision Optimization
4
Using Models
5
Deployed usage
1 to N input values (features)
1 to M output value (target)
Trained model
1 to N input tables
Programmed model
1 to M output tables
ML Scoring DO Solving
Many times only 1
output, but can be
several
6
Example usage
25 M Student NY Single
82% 10% 8%
Age, genre, job, 1 to N input values (features)
82% chance to buy product1
10% product2, etc.
Trained model
List of activities to schedule,
available workers, etc.
Programmed model
Id workers start end
A John 10 30
B Jack 20 40
C Joe 15 60
ML Scoring
id duration Req
skills
Latest
start
predecessors
A 20 1,2 10
B 30 2 20 A
C 20 B
D 80 3 30 B,C
E 100 2,3 10 D
DO Solving
Activities schedule and
assignments to workers
7
An input (in general N values) and output data
schema
A reference to the training set and algorithm
which has been used.
Possible preview of some trained
characteristics (e.g. decision tree)
Model
ML Model DO Model
An input (in general 10-20 tables) and output
data schema.
Sometimes, part of data (“master data”) is the
same for all instances, and is deployed with
model.
Possible preview of a “program” which contains
the definition of variables, constraints and
objectives.
Program can be in Java, Python, … or natural
language. (*)
* In the past experts have been using matrix representations directly.
8
Scoring one instance takes a fraction of a
second.
Predictable scoring time.
Synchronous call.
In general, deployed model allow batch
scoring of set of instances in one call.
Model Integration
Scoring ML Model Solving DO Model
Solving an instance takes from seconds to
hours.
Quite unpredictable solving time (even for the
same model)
Asynchonous call.
Instance by instance solving.
.
9
Developing Models
10
Model Creation
ML Model Training
Many rows of input + known output values
Trained model
1 to N input tables
DO Model Development
Programmed model
+ business expertize on the problem (rules and objectives)
+ operation research skills (how to write rules)
11
Model Validation – Machine Learning
Some rows of input
Trained model
Compare
calculated
output with
known output
12
Model Validation – Decision Optimization
Programmed model
1. Expert analyze solution on dashboard
Programmed model
2. Test with
different scenarios
1 to N input tables
Scenario 1 Scenario 2
Programmed model 3. Test with
different model
formulations
13
As ML models are not “programmed”, it is
hard to define the notion of “bug”:
 Trained model has a very poor score on
some evaluation set
 Underfitting: not enough training ?
 Overfitting: too much training and the two
data sets do not correspond to the same
“logic” ?
 Application using deployed model (with
good evaluation score) is unsuccessful
 Trained logic is wrong?
Model bugs
ML Model bugs DO Model bugs
DO models are programmed and solutions
correspond to problem formulation:
 Some solutions are wrong with respect to
the business rules.
 modeling error: some constraint is missing
in the model.
 It may appear some solution is better than
the proven optimal one according to model.
 modeling error: some constraint in the
model is too strong or objective is wrong.
14
 Overfitting: training takes into account
“noise”/”errors” of training set
ML Model bugs – technical details
 Underfitting: Too few points in training
set.
Real function to be learnt
Result of learning
This point is “noise” in
training set, i.e. wrong
training item
Underfitted Correctly trained
15
Missing constraint is determined
when the solution for some scenario
appears to be invalid.
DO Model bugs – technical details
Wrong objective is determined
when some better solution is
found for some scenario.
objective
objective
objective
16
What to do?
 Experiment with different learning sets,
different methods, etc.
 Experiment with additional features*
 Capture bigger training data (rows) from
deployment.
 scenarios and dashboards!
* it is possible that selected target and features in training set are
uncorrelated!
Model Debug
ML Model Debug DO Model Debug
What to do?
 Experiment with additional constraints and
objectives.
 Experiment with different data.
 Feed with a candidate solution and analyze
infeasibility constraint set.
 scenarios and dashboards!
17
 Analyze one “model” with charts:
‒ better select features
‒ Input data preparation/transformation
Possible use of Dashboard and Scenarios for
ML model Validation and Debug (1/3)
18
 Compare several “models” outcomes
with charts:
‒ using different features,
‒ using different algorithms,
‒ using different training sets, as trained
model reproduce behavior of training set
Possible use of Dashboard and Scenarios for
ML model Validation and Debug (2/3) We want to automate some manual decision process with ML.
We have historical records of all inputs and decisions, for different
periods and different decision makers.
Should we train on overall set ? Should we train based on the
best performing decision makers ?
Ex: Outcome for model trained on dataset 2 performs better
overall than model trained on aggregated datasets.
19
 Compare features span between train set
and deploy set
‒ To detect missing training (e.g. we train on
population with age<30 and use deployed with age up
to 70)
Possible use of Dashboard and Scenarios for
ML model Validation and Debug (3/3)
20
Machine Learning + Decision Optimization
21
 Question is not which one is best, but when to use each one and when to use both.
 Decision Optimization, almost always use predictions and forecast as input.
 But it is also frequent that Machine Learning outcome is better used with Decision Optimization
to take optimal decisions.
When ML and DO works together.
22
A Company builds, distributes and sells
goods.
Decisions have to be taken on operations
for the coming months (which product to
build and where, what to stock and what
to sell, etc)
For that, predictions of sales for different
products and markets are required.
Ex1. Sales and Operation Planning
Demand forecast by period and product (obtained with ML)
Production plan by period and product (obtained with DO)
23
A bank proposes 3 different products: Savings, Mortgage or Pension.
Using historical data, it is simple to see how age or income impact the expected return from
customer when proposed products.
Now what to do with predictions on new customers if we have operational constraints like limited
budget?
See https://github.com/IBMDecisionOptimization/tutorials/blob/master/jupyter/MachineLearning_and_CPLEX.ipynb
Ex2: Marketing Campaign Optimization

Machine Learning vs Decision Optimization comparison

  • 1.
    1 Machine Learning vs DecisionOptimization concepts comparison Alain Chabrier/Spain/IBM achabrier@es.ibm.com Oct. 2017
  • 2.
    2  Data scienceis an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in various forms, either structured or unstructured.  Data science community is made of people coming from different areas, and who do not always understand each others. Everyone is using his own concepts and not always understands how these map when applied to other techniques.  In particular, Machine Learning experts do not always understand how Decision Optimization concepts maps or differs from their own concepts. Why these slides?
  • 3.
    3 Machine learning isa field of computer science that gives computers the ability to learn without being explicitly programmed. In practice, we want to support the training, validation, debug and deployment of models which use uses ML/statistics techniques to score some set of values and get the most probable set of outcomes based on training. Decision optimization is the application of one or more rigorous analytical techniques to a well- defined model to generate the absolute best decision from a multitude of possible alternatives in a rigorous, repeatable, and provable fashion. In practice, we want to support the development, validation, debug and deployment of models which use Mathematical and Constraint Programming techniques to solve a given problem and get proven optimal solution. Machine Learning Decision Optimization
  • 4.
  • 5.
    5 Deployed usage 1 toN input values (features) 1 to M output value (target) Trained model 1 to N input tables Programmed model 1 to M output tables ML Scoring DO Solving Many times only 1 output, but can be several
  • 6.
    6 Example usage 25 MStudent NY Single 82% 10% 8% Age, genre, job, 1 to N input values (features) 82% chance to buy product1 10% product2, etc. Trained model List of activities to schedule, available workers, etc. Programmed model Id workers start end A John 10 30 B Jack 20 40 C Joe 15 60 ML Scoring id duration Req skills Latest start predecessors A 20 1,2 10 B 30 2 20 A C 20 B D 80 3 30 B,C E 100 2,3 10 D DO Solving Activities schedule and assignments to workers
  • 7.
    7 An input (ingeneral N values) and output data schema A reference to the training set and algorithm which has been used. Possible preview of some trained characteristics (e.g. decision tree) Model ML Model DO Model An input (in general 10-20 tables) and output data schema. Sometimes, part of data (“master data”) is the same for all instances, and is deployed with model. Possible preview of a “program” which contains the definition of variables, constraints and objectives. Program can be in Java, Python, … or natural language. (*) * In the past experts have been using matrix representations directly.
  • 8.
    8 Scoring one instancetakes a fraction of a second. Predictable scoring time. Synchronous call. In general, deployed model allow batch scoring of set of instances in one call. Model Integration Scoring ML Model Solving DO Model Solving an instance takes from seconds to hours. Quite unpredictable solving time (even for the same model) Asynchonous call. Instance by instance solving. .
  • 9.
  • 10.
    10 Model Creation ML ModelTraining Many rows of input + known output values Trained model 1 to N input tables DO Model Development Programmed model + business expertize on the problem (rules and objectives) + operation research skills (how to write rules)
  • 11.
    11 Model Validation –Machine Learning Some rows of input Trained model Compare calculated output with known output
  • 12.
    12 Model Validation –Decision Optimization Programmed model 1. Expert analyze solution on dashboard Programmed model 2. Test with different scenarios 1 to N input tables Scenario 1 Scenario 2 Programmed model 3. Test with different model formulations
  • 13.
    13 As ML modelsare not “programmed”, it is hard to define the notion of “bug”:  Trained model has a very poor score on some evaluation set  Underfitting: not enough training ?  Overfitting: too much training and the two data sets do not correspond to the same “logic” ?  Application using deployed model (with good evaluation score) is unsuccessful  Trained logic is wrong? Model bugs ML Model bugs DO Model bugs DO models are programmed and solutions correspond to problem formulation:  Some solutions are wrong with respect to the business rules.  modeling error: some constraint is missing in the model.  It may appear some solution is better than the proven optimal one according to model.  modeling error: some constraint in the model is too strong or objective is wrong.
  • 14.
    14  Overfitting: trainingtakes into account “noise”/”errors” of training set ML Model bugs – technical details  Underfitting: Too few points in training set. Real function to be learnt Result of learning This point is “noise” in training set, i.e. wrong training item Underfitted Correctly trained
  • 15.
    15 Missing constraint isdetermined when the solution for some scenario appears to be invalid. DO Model bugs – technical details Wrong objective is determined when some better solution is found for some scenario. objective objective objective
  • 16.
    16 What to do? Experiment with different learning sets, different methods, etc.  Experiment with additional features*  Capture bigger training data (rows) from deployment.  scenarios and dashboards! * it is possible that selected target and features in training set are uncorrelated! Model Debug ML Model Debug DO Model Debug What to do?  Experiment with additional constraints and objectives.  Experiment with different data.  Feed with a candidate solution and analyze infeasibility constraint set.  scenarios and dashboards!
  • 17.
    17  Analyze one“model” with charts: ‒ better select features ‒ Input data preparation/transformation Possible use of Dashboard and Scenarios for ML model Validation and Debug (1/3)
  • 18.
    18  Compare several“models” outcomes with charts: ‒ using different features, ‒ using different algorithms, ‒ using different training sets, as trained model reproduce behavior of training set Possible use of Dashboard and Scenarios for ML model Validation and Debug (2/3) We want to automate some manual decision process with ML. We have historical records of all inputs and decisions, for different periods and different decision makers. Should we train on overall set ? Should we train based on the best performing decision makers ? Ex: Outcome for model trained on dataset 2 performs better overall than model trained on aggregated datasets.
  • 19.
    19  Compare featuresspan between train set and deploy set ‒ To detect missing training (e.g. we train on population with age<30 and use deployed with age up to 70) Possible use of Dashboard and Scenarios for ML model Validation and Debug (3/3)
  • 20.
    20 Machine Learning +Decision Optimization
  • 21.
    21  Question isnot which one is best, but when to use each one and when to use both.  Decision Optimization, almost always use predictions and forecast as input.  But it is also frequent that Machine Learning outcome is better used with Decision Optimization to take optimal decisions. When ML and DO works together.
  • 22.
    22 A Company builds,distributes and sells goods. Decisions have to be taken on operations for the coming months (which product to build and where, what to stock and what to sell, etc) For that, predictions of sales for different products and markets are required. Ex1. Sales and Operation Planning Demand forecast by period and product (obtained with ML) Production plan by period and product (obtained with DO)
  • 23.
    23 A bank proposes3 different products: Savings, Mortgage or Pension. Using historical data, it is simple to see how age or income impact the expected return from customer when proposed products. Now what to do with predictions on new customers if we have operational constraints like limited budget? See https://github.com/IBMDecisionOptimization/tutorials/blob/master/jupyter/MachineLearning_and_CPLEX.ipynb Ex2: Marketing Campaign Optimization