SlideShare a Scribd company logo
1
Machine Learning vs
Decision Optimization
concepts comparison
Alain Chabrier/Spain/IBM
achabrier@es.ibm.com
Oct. 2017
2
 Data science is an interdisciplinary field about scientific methods, processes, and systems to
extract knowledge or insights from data in various forms, either structured or unstructured.
 Data science community is made of people coming from different areas, and who do not always
understand each others. Everyone is using his own concepts and not always understands how
these map when applied to other techniques.
 In particular, Machine Learning experts do not always understand how Decision Optimization
concepts maps or differs from their own concepts.
Why these slides?
3
Machine learning is a field of computer science that
gives computers the ability to learn without being
explicitly programmed.
In practice, we want to support the training, validation,
debug and deployment of models which use uses
ML/statistics techniques to score some set of values and
get the most probable set of outcomes based on
training.
Decision optimization is the application of one or
more rigorous analytical techniques to a well-
defined model to generate the absolute best
decision from a multitude of possible alternatives
in a rigorous, repeatable, and provable fashion.
In practice, we want to support the development,
validation, debug and deployment of models which
use Mathematical and Constraint Programming
techniques to solve a given problem and get
proven optimal solution.
Machine Learning Decision Optimization
4
Using Models
5
Deployed usage
1 to N input values (features)
1 to M output value (target)
Trained model
1 to N input tables
Programmed model
1 to M output tables
ML Scoring DO Solving
Many times only 1
output, but can be
several
6
Example usage
25 M Student NY Single
82% 10% 8%
Age, genre, job, 1 to N input values (features)
82% chance to buy product1
10% product2, etc.
Trained model
List of activities to schedule,
available workers, etc.
Programmed model
Id workers start end
A John 10 30
B Jack 20 40
C Joe 15 60
ML Scoring
id duration Req
skills
Latest
start
predecessors
A 20 1,2 10
B 30 2 20 A
C 20 B
D 80 3 30 B,C
E 100 2,3 10 D
DO Solving
Activities schedule and
assignments to workers
7
An input (in general N values) and output data
schema
A reference to the training set and algorithm
which has been used.
Possible preview of some trained
characteristics (e.g. decision tree)
Model
ML Model DO Model
An input (in general 10-20 tables) and output
data schema.
Sometimes, part of data (“master data”) is the
same for all instances, and is deployed with
model.
Possible preview of a “program” which contains
the definition of variables, constraints and
objectives.
Program can be in Java, Python, … or natural
language. (*)
* In the past experts have been using matrix representations directly.
8
Scoring one instance takes a fraction of a
second.
Predictable scoring time.
Synchronous call.
In general, deployed model allow batch
scoring of set of instances in one call.
Model Integration
Scoring ML Model Solving DO Model
Solving an instance takes from seconds to
hours.
Quite unpredictable solving time (even for the
same model)
Asynchonous call.
Instance by instance solving.
.
9
Developing Models
10
Model Creation
ML Model Training
Many rows of input + known output values
Trained model
1 to N input tables
DO Model Development
Programmed model
+ business expertize on the problem (rules and objectives)
+ operation research skills (how to write rules)
11
Model Validation – Machine Learning
Some rows of input
Trained model
Compare
calculated
output with
known output
12
Model Validation – Decision Optimization
Programmed model
1. Expert analyze solution on dashboard
Programmed model
2. Test with
different scenarios
1 to N input tables
Scenario 1 Scenario 2
Programmed model 3. Test with
different model
formulations
13
As ML models are not “programmed”, it is
hard to define the notion of “bug”:
 Trained model has a very poor score on
some evaluation set
 Underfitting: not enough training ?
 Overfitting: too much training and the two
data sets do not correspond to the same
“logic” ?
 Application using deployed model (with
good evaluation score) is unsuccessful
 Trained logic is wrong?
Model bugs
ML Model bugs DO Model bugs
DO models are programmed and solutions
correspond to problem formulation:
 Some solutions are wrong with respect to
the business rules.
 modeling error: some constraint is missing
in the model.
 It may appear some solution is better than
the proven optimal one according to model.
 modeling error: some constraint in the
model is too strong or objective is wrong.
14
 Overfitting: training takes into account
“noise”/”errors” of training set
ML Model bugs – technical details
 Underfitting: Too few points in training
set.
Real function to be learnt
Result of learning
This point is “noise” in
training set, i.e. wrong
training item
Underfitted Correctly trained
15
Missing constraint is determined
when the solution for some scenario
appears to be invalid.
DO Model bugs – technical details
Wrong objective is determined
when some better solution is
found for some scenario.
objective
objective
objective
16
What to do?
 Experiment with different learning sets,
different methods, etc.
 Experiment with additional features*
 Capture bigger training data (rows) from
deployment.
 scenarios and dashboards!
* it is possible that selected target and features in training set are
uncorrelated!
Model Debug
ML Model Debug DO Model Debug
What to do?
 Experiment with additional constraints and
objectives.
 Experiment with different data.
 Feed with a candidate solution and analyze
infeasibility constraint set.
 scenarios and dashboards!
17
 Analyze one “model” with charts:
‒ better select features
‒ Input data preparation/transformation
Possible use of Dashboard and Scenarios for
ML model Validation and Debug (1/3)
18
 Compare several “models” outcomes
with charts:
‒ using different features,
‒ using different algorithms,
‒ using different training sets, as trained
model reproduce behavior of training set
Possible use of Dashboard and Scenarios for
ML model Validation and Debug (2/3) We want to automate some manual decision process with ML.
We have historical records of all inputs and decisions, for different
periods and different decision makers.
Should we train on overall set ? Should we train based on the
best performing decision makers ?
Ex: Outcome for model trained on dataset 2 performs better
overall than model trained on aggregated datasets.
19
 Compare features span between train set
and deploy set
‒ To detect missing training (e.g. we train on
population with age<30 and use deployed with age up
to 70)
Possible use of Dashboard and Scenarios for
ML model Validation and Debug (3/3)
20
Machine Learning + Decision Optimization
21
 Question is not which one is best, but when to use each one and when to use both.
 Decision Optimization, almost always use predictions and forecast as input.
 But it is also frequent that Machine Learning outcome is better used with Decision Optimization
to take optimal decisions.
When ML and DO works together.
22
A Company builds, distributes and sells
goods.
Decisions have to be taken on operations
for the coming months (which product to
build and where, what to stock and what
to sell, etc)
For that, predictions of sales for different
products and markets are required.
Ex1. Sales and Operation Planning
Demand forecast by period and product (obtained with ML)
Production plan by period and product (obtained with DO)
23
A bank proposes 3 different products: Savings, Mortgage or Pension.
Using historical data, it is simple to see how age or income impact the expected return from
customer when proposed products.
Now what to do with predictions on new customers if we have operational constraints like limited
budget?
See https://github.com/IBMDecisionOptimization/tutorials/blob/master/jupyter/MachineLearning_and_CPLEX.ipynb
Ex2: Marketing Campaign Optimization

More Related Content

Similar to Machine Learning vs Decision Optimization comparison

Analytics
AnalyticsAnalytics
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
Roger Barga
 
Chapter01.ppt
Chapter01.pptChapter01.ppt
Chapter01.ppt
butest
 
2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx
gdgsurrey
 
Operation research history and overview application limitation
Operation research history and overview application limitationOperation research history and overview application limitation
Operation research history and overview application limitation
Balaji P
 
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
ijsc
 
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Intel® Software
 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analytics
Akin Osman Kazakci
 
Simulacion luis garciaguzman-21012011
Simulacion luis garciaguzman-21012011Simulacion luis garciaguzman-21012011
Simulacion luis garciaguzman-21012011
lideresacademicos
 
Machine learning introduction to unit 1.ppt
Machine learning introduction to unit 1.pptMachine learning introduction to unit 1.ppt
Machine learning introduction to unit 1.ppt
ShivaShiva783981
 
Tech meetup Data Driven - Codemotion
Tech meetup Data Driven - Codemotion Tech meetup Data Driven - Codemotion
Tech meetup Data Driven - Codemotion
antimo musone
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentation
David Raj Kanthi
 
Practical Machine Learning
Practical Machine LearningPractical Machine Learning
Practical Machine Learning
Lynn Langit
 
2Jubail University CollegeDepartment of Business Adm.docx
2Jubail University CollegeDepartment of Business Adm.docx2Jubail University CollegeDepartment of Business Adm.docx
2Jubail University CollegeDepartment of Business Adm.docx
lorainedeserre
 
PyData SF 2016 --- Moving forward through the darkness
PyData SF 2016 --- Moving forward through the darknessPyData SF 2016 --- Moving forward through the darkness
PyData SF 2016 --- Moving forward through the darkness
Chia-Chi Chang
 
Understanding Mahout classification documentation
Understanding Mahout  classification documentationUnderstanding Mahout  classification documentation
Understanding Mahout classification documentation
Naveen Kumar
 
20121121101127simulation azmi
20121121101127simulation azmi20121121101127simulation azmi
20121121101127simulation azmi
Ahmad Nur Faiz
 
Simulation Powerpoint- Lecture Notes
Simulation Powerpoint- Lecture NotesSimulation Powerpoint- Lecture Notes
Simulation Powerpoint- Lecture Notes
Kesavartinii Bala Krisnain
 
Agile and Modeling / MDE : friends or foes? (Agile Tour Nantes 2010)
Agile and Modeling / MDE : friends or foes? (Agile Tour  Nantes 2010)Agile and Modeling / MDE : friends or foes? (Agile Tour  Nantes 2010)
Agile and Modeling / MDE : friends or foes? (Agile Tour Nantes 2010)
Jordi Cabot
 
Machine intelligence data science methodology 060420
Machine intelligence data science methodology 060420Machine intelligence data science methodology 060420
Machine intelligence data science methodology 060420
Jeremy Lehman
 

Similar to Machine Learning vs Decision Optimization comparison (20)

Analytics
AnalyticsAnalytics
Analytics
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
Chapter01.ppt
Chapter01.pptChapter01.ppt
Chapter01.ppt
 
2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx
 
Operation research history and overview application limitation
Operation research history and overview application limitationOperation research history and overview application limitation
Operation research history and overview application limitation
 
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
 
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analytics
 
Simulacion luis garciaguzman-21012011
Simulacion luis garciaguzman-21012011Simulacion luis garciaguzman-21012011
Simulacion luis garciaguzman-21012011
 
Machine learning introduction to unit 1.ppt
Machine learning introduction to unit 1.pptMachine learning introduction to unit 1.ppt
Machine learning introduction to unit 1.ppt
 
Tech meetup Data Driven - Codemotion
Tech meetup Data Driven - Codemotion Tech meetup Data Driven - Codemotion
Tech meetup Data Driven - Codemotion
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentation
 
Practical Machine Learning
Practical Machine LearningPractical Machine Learning
Practical Machine Learning
 
2Jubail University CollegeDepartment of Business Adm.docx
2Jubail University CollegeDepartment of Business Adm.docx2Jubail University CollegeDepartment of Business Adm.docx
2Jubail University CollegeDepartment of Business Adm.docx
 
PyData SF 2016 --- Moving forward through the darkness
PyData SF 2016 --- Moving forward through the darknessPyData SF 2016 --- Moving forward through the darkness
PyData SF 2016 --- Moving forward through the darkness
 
Understanding Mahout classification documentation
Understanding Mahout  classification documentationUnderstanding Mahout  classification documentation
Understanding Mahout classification documentation
 
20121121101127simulation azmi
20121121101127simulation azmi20121121101127simulation azmi
20121121101127simulation azmi
 
Simulation Powerpoint- Lecture Notes
Simulation Powerpoint- Lecture NotesSimulation Powerpoint- Lecture Notes
Simulation Powerpoint- Lecture Notes
 
Agile and Modeling / MDE : friends or foes? (Agile Tour Nantes 2010)
Agile and Modeling / MDE : friends or foes? (Agile Tour  Nantes 2010)Agile and Modeling / MDE : friends or foes? (Agile Tour  Nantes 2010)
Agile and Modeling / MDE : friends or foes? (Agile Tour Nantes 2010)
 
Machine intelligence data science methodology 060420
Machine intelligence data science methodology 060420Machine intelligence data science methodology 060420
Machine intelligence data science methodology 060420
 

Recently uploaded

Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 

Recently uploaded (20)

Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 

Machine Learning vs Decision Optimization comparison

  • 1. 1 Machine Learning vs Decision Optimization concepts comparison Alain Chabrier/Spain/IBM achabrier@es.ibm.com Oct. 2017
  • 2. 2  Data science is an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in various forms, either structured or unstructured.  Data science community is made of people coming from different areas, and who do not always understand each others. Everyone is using his own concepts and not always understands how these map when applied to other techniques.  In particular, Machine Learning experts do not always understand how Decision Optimization concepts maps or differs from their own concepts. Why these slides?
  • 3. 3 Machine learning is a field of computer science that gives computers the ability to learn without being explicitly programmed. In practice, we want to support the training, validation, debug and deployment of models which use uses ML/statistics techniques to score some set of values and get the most probable set of outcomes based on training. Decision optimization is the application of one or more rigorous analytical techniques to a well- defined model to generate the absolute best decision from a multitude of possible alternatives in a rigorous, repeatable, and provable fashion. In practice, we want to support the development, validation, debug and deployment of models which use Mathematical and Constraint Programming techniques to solve a given problem and get proven optimal solution. Machine Learning Decision Optimization
  • 5. 5 Deployed usage 1 to N input values (features) 1 to M output value (target) Trained model 1 to N input tables Programmed model 1 to M output tables ML Scoring DO Solving Many times only 1 output, but can be several
  • 6. 6 Example usage 25 M Student NY Single 82% 10% 8% Age, genre, job, 1 to N input values (features) 82% chance to buy product1 10% product2, etc. Trained model List of activities to schedule, available workers, etc. Programmed model Id workers start end A John 10 30 B Jack 20 40 C Joe 15 60 ML Scoring id duration Req skills Latest start predecessors A 20 1,2 10 B 30 2 20 A C 20 B D 80 3 30 B,C E 100 2,3 10 D DO Solving Activities schedule and assignments to workers
  • 7. 7 An input (in general N values) and output data schema A reference to the training set and algorithm which has been used. Possible preview of some trained characteristics (e.g. decision tree) Model ML Model DO Model An input (in general 10-20 tables) and output data schema. Sometimes, part of data (“master data”) is the same for all instances, and is deployed with model. Possible preview of a “program” which contains the definition of variables, constraints and objectives. Program can be in Java, Python, … or natural language. (*) * In the past experts have been using matrix representations directly.
  • 8. 8 Scoring one instance takes a fraction of a second. Predictable scoring time. Synchronous call. In general, deployed model allow batch scoring of set of instances in one call. Model Integration Scoring ML Model Solving DO Model Solving an instance takes from seconds to hours. Quite unpredictable solving time (even for the same model) Asynchonous call. Instance by instance solving. .
  • 10. 10 Model Creation ML Model Training Many rows of input + known output values Trained model 1 to N input tables DO Model Development Programmed model + business expertize on the problem (rules and objectives) + operation research skills (how to write rules)
  • 11. 11 Model Validation – Machine Learning Some rows of input Trained model Compare calculated output with known output
  • 12. 12 Model Validation – Decision Optimization Programmed model 1. Expert analyze solution on dashboard Programmed model 2. Test with different scenarios 1 to N input tables Scenario 1 Scenario 2 Programmed model 3. Test with different model formulations
  • 13. 13 As ML models are not “programmed”, it is hard to define the notion of “bug”:  Trained model has a very poor score on some evaluation set  Underfitting: not enough training ?  Overfitting: too much training and the two data sets do not correspond to the same “logic” ?  Application using deployed model (with good evaluation score) is unsuccessful  Trained logic is wrong? Model bugs ML Model bugs DO Model bugs DO models are programmed and solutions correspond to problem formulation:  Some solutions are wrong with respect to the business rules.  modeling error: some constraint is missing in the model.  It may appear some solution is better than the proven optimal one according to model.  modeling error: some constraint in the model is too strong or objective is wrong.
  • 14. 14  Overfitting: training takes into account “noise”/”errors” of training set ML Model bugs – technical details  Underfitting: Too few points in training set. Real function to be learnt Result of learning This point is “noise” in training set, i.e. wrong training item Underfitted Correctly trained
  • 15. 15 Missing constraint is determined when the solution for some scenario appears to be invalid. DO Model bugs – technical details Wrong objective is determined when some better solution is found for some scenario. objective objective objective
  • 16. 16 What to do?  Experiment with different learning sets, different methods, etc.  Experiment with additional features*  Capture bigger training data (rows) from deployment.  scenarios and dashboards! * it is possible that selected target and features in training set are uncorrelated! Model Debug ML Model Debug DO Model Debug What to do?  Experiment with additional constraints and objectives.  Experiment with different data.  Feed with a candidate solution and analyze infeasibility constraint set.  scenarios and dashboards!
  • 17. 17  Analyze one “model” with charts: ‒ better select features ‒ Input data preparation/transformation Possible use of Dashboard and Scenarios for ML model Validation and Debug (1/3)
  • 18. 18  Compare several “models” outcomes with charts: ‒ using different features, ‒ using different algorithms, ‒ using different training sets, as trained model reproduce behavior of training set Possible use of Dashboard and Scenarios for ML model Validation and Debug (2/3) We want to automate some manual decision process with ML. We have historical records of all inputs and decisions, for different periods and different decision makers. Should we train on overall set ? Should we train based on the best performing decision makers ? Ex: Outcome for model trained on dataset 2 performs better overall than model trained on aggregated datasets.
  • 19. 19  Compare features span between train set and deploy set ‒ To detect missing training (e.g. we train on population with age<30 and use deployed with age up to 70) Possible use of Dashboard and Scenarios for ML model Validation and Debug (3/3)
  • 20. 20 Machine Learning + Decision Optimization
  • 21. 21  Question is not which one is best, but when to use each one and when to use both.  Decision Optimization, almost always use predictions and forecast as input.  But it is also frequent that Machine Learning outcome is better used with Decision Optimization to take optimal decisions. When ML and DO works together.
  • 22. 22 A Company builds, distributes and sells goods. Decisions have to be taken on operations for the coming months (which product to build and where, what to stock and what to sell, etc) For that, predictions of sales for different products and markets are required. Ex1. Sales and Operation Planning Demand forecast by period and product (obtained with ML) Production plan by period and product (obtained with DO)
  • 23. 23 A bank proposes 3 different products: Savings, Mortgage or Pension. Using historical data, it is simple to see how age or income impact the expected return from customer when proposed products. Now what to do with predictions on new customers if we have operational constraints like limited budget? See https://github.com/IBMDecisionOptimization/tutorials/blob/master/jupyter/MachineLearning_and_CPLEX.ipynb Ex2: Marketing Campaign Optimization