SlideShare a Scribd company logo
1 of 55
Download to read offline
Optimization for Data Science:
Introduction to the course
Antonio Frangioni
Department of Computer Science
University of Pisa
https://www.di.unipi.it/~frangio
mailto:frangio@di.unipi.it
Optimization Methods for Data Science
Master in Data Science and Business Informatics
University of Pisa
A.Y. 2022/23
Outline
Logistic
Motivation I: Optimization Methods for Data Science
Motivation II: Data Science Techniques for Prescriptive Analytics
Mindset
Wrap up
Basic information 1
▶ Course Schedule
▶ Tue 11:00 – 13:00 (Fib M1)
▶ Fri 9:00 – 11:00 (Fib M1)
▶ Web page: https://elearning.di.unipi.it/course/view.php?id=301
▶ Team for lectures: https://teams.microsoft.com/l/team/19%
3aCSI78qkZgMUq5HknxX5LkyKP3pn8qGMs9CDyEZbd5U01%40thread.tacv2/
conversations?groupId=227a39ab-9e3b-419f-b7e9-de4fe8c59059&
tenantId=c7456b31-a220-47f5-be52-473828670aa1
▶ Grading in two possible ways:
▶ Written + oral exam (theory)
▶ Didactic project (groups of 2) + oral exam (discussion)
▶ Special projects available for C++ kamikaze C++
original image by Vectorportal.com
Outline
Logistic
Motivation I: Optimization Methods for Data Science
Motivation II: Data Science Techniques for Prescriptive Analytics
Mindset
Wrap up
Why this course: the view from the stratoshpere 2
▶ Huge amounts of data is generated and collected, but one has to
make sense of it in order to use it: that’s what data science is
▶ Take something big (data) and therefore unwieldy and produce
something small and nimble that can be used in its stead (“actionable”)
▶ That’s a (mathematical) model
▶ Word comes from “modulus”, diminutive from “modus” = “measure”:
“small measure”, “measure in the small” (small is good)
▶ Known uses in architecture: proving beforehand that the real building won’t
collapse (e.g., Filippo Brunelleschi for the Cupola of the Cathedral of Florence)
▶ Countless many physical models afterwards (planes, cars, . . . ), but
mathematics is cheaper than bricks / wood / iron . . .
▶ Yet, mathematical problems can be difficult, too, for various reasons
(and, of course, only truly viable after computers)
▶ And most of them remain (likely) difficult for quantum computers, too,
https://www.smbc-comics.com/comic/the-talk-3
That’s how (Data) Science works, a.k.a., “the decision process” 3
reality mathematics
fragment
situation
▶ The fundamental cycle of all science
That’s how (Data) Science works, a.k.a., “the decision process” 3
reality mathematics
fragment
situation
(instance of)
mathematical
model
modelling
data collection
▶ The fundamental cycle of all science
That’s how (Data) Science works, a.k.a., “the decision process” 3
reality mathematics
fragment
situation
(instance of)
mathematical
model
prediction
solution
modelling
data collection
formula
algorithm
▶ The fundamental cycle of all science
That’s how (Data) Science works, a.k.a., “the decision process” 3
reality mathematics
fragment
situation
(instance of)
mathematical
model
prediction
solution
law
decision
modelling
data collection
interpretation
formula
algorithm
▶ The fundamental cycle of all science
That’s how (Data) Science works, a.k.a., “the decision process” 3
reality mathematics
fragment
situation
(instance of)
mathematical
model
prediction
solution
law
decision
modelling
data collection
interpretation
formula
algorithm
verification
implementation
▶ The fundamental cycle of all science
That’s how (Data) Science works, a.k.a., “the decision process” 3
reality mathematics
fragment
situation
(instance of)
mathematical
model
prediction
solution
law
decision
modelling
data collection
interpretation
formula
algorithm
verification
implementation
program (sw)
computer (hw)
measure
forecast
implementation debug
execution
▶ The fundamental cycle of all science and its implementation
Choosing a mathematical model 4
▶ How a mathematical model should be:
Choosing a mathematical model 4
▶ How a mathematical model should be:
1. accurate (describes well the process at hand)
Choosing a mathematical model 4
▶ How a mathematical model should be:
1. accurate (describes well the process at hand)
2. computationally inexpensive (gives answers rapidly)
Choosing a mathematical model 4
▶ How a mathematical model should be:
1. accurate (describes well the process at hand)
2. computationally inexpensive (gives answers rapidly)
3. general (can be applied to many different processes)
Choosing a mathematical model 4
▶ How a mathematical model should be:
1. accurate (describes well the process at hand)
2. computationally inexpensive (gives answers rapidly)
3. general (can be applied to many different processes)
Typically impossible to have all three!
Choosing a mathematical model 4
▶ How a mathematical model should be:
1. accurate (describes well the process at hand)
2. computationally inexpensive (gives answers rapidly)
3. general (can be applied to many different processes)
Typically impossible to have all three!
▶ Choice of the model crucial, trade-offs between efficiency and effectiveness
▶ Two fundamentally different model building approaches:
1. analytic: model each component of the system separately + their interactions,
(≈)accurate but hard to construct (need system access + technical knowledge)
2. data-driven / synthetic: don’t expect the model to closely match the underlying
system, just to be simple and to (≈)accurately reproduce its observed behaviour
▶ All models are approximate (the map is not the world), but for different reasons
▶ Analytic models: flexible shape, (relatively) few “hand-chosen” parameters
▶ Synthetic models: rigid shape, (very) many automatically chosen parameters
▶ Fitting: find the parameters of the model that best represents the phenomenon,
clearly some sort of optimization problem (often a computational bottleneck)
Didactic example: sizing an Energy Community battery 5
▶ Energy Community: pool of prosumers with both generation and consumption
pooling resources to reduce (eliminate) reliance from the energy grid
▶ A serious application, don’t just take my word for it: https://ciress.it,
https://autens.unipi.it, https://unescochair.unipi.it
▶ Test problem: estimate daily energy production at 5-minutes resolution
▶ Too few measures of noisy process, large inherent random error, but
underlying physical process “constant and smooth”
▶ Need to estimate the “constant” part / average of the process
(yearly or multi-year planning, it’s the long-term average that counts)
▶ Analytic model possible but requires a lot of information on the system
(=⇒ has to be re-done if the system changes)
▶ Synthetic model just need to see the data, let’s try that
Estimate an Energy Community Production, the data 6
▶ Noisy measurement
Estimate an Energy Community Production, the data 6
▶ Noisy measurement but (hopefully) simple/smooth underlying physical process
Polynomial interpolation 7
▶ Available set of observations ( y0 , x0 ), . . . , ( ym , xm ), (m = 287)
▶ Possibly reasonable (??) assumption: the dependence is polynomial, i.e.,
y = pc ( x ) = c0 +
Pk
i=1 ci xk
for fixed k and k + 1 real parameters c = [ c0 , c+ = [ c1 , . . . , cn ] ]
▶ This would imply that yh = pc ( xh ) for all h, which is not really true for any c
▶ Find the c for which it is less untrue ≡ Linear Least Squares:
y =



y0
.
.
.
ym


 , X =



1 x0 x2
0 . . . xk
0
.
.
.
.
.
.
.
.
.
...
.
.
.
1 xm x2
m . . . xk
m


 , minc L( c ) = y − Xc
2
▶ Minimize loss function y − Xc
2
= empirical risk = how much the model
fails the predict the phenomenon on the available observations
▶ Why exactly that? Because it is simple to solve, in Matlab is just c = y / X
▶ Let’s see this in practice
But how about k? 8
▶ LSS choses “optimal” c for fixed k, but how to choose k?
▶ Jointly optimizing over c and k is hard, have to try all (∞-ly many) values
▶ Even worse: how do you measure how good a given k is?
▶ If you knew the ground truth you could check, let’s see what would happen
But how about k? 8
▶ LSS choses “optimal” c for fixed k, but how to choose k?
▶ Jointly optimizing over c and k is hard, have to try all (∞-ly many) values
▶ Even worse: how do you measure how good a given k is?
▶ If you knew the ground truth you could check, let’s see what would happen
▶ ∃ k ≡ optimal model complexity: less =⇒ model too coarse to map
the phenomenon, more =⇒ learning the noise, not the phenomenon
▶ Underfitting vs. overfitting, a.k.a. “the bias/variance dilemma”
▶ But we don’t know the ground truth, we need a proxy
▶ What we can measure: how good is the model at predicting data
it has not seen =⇒ k-fold cross-validation (not the same k)
▶ Let’s see what this gives:
But how about k? 8
▶ LSS choses “optimal” c for fixed k, but how to choose k?
▶ Jointly optimizing over c and k is hard, have to try all (∞-ly many) values
▶ Even worse: how do you measure how good a given k is?
▶ If you knew the ground truth you could check, let’s see what would happen
▶ ∃ k ≡ optimal model complexity: less =⇒ model too coarse to map
the phenomenon, more =⇒ learning the noise, not the phenomenon
▶ Underfitting vs. overfitting, a.k.a. “the bias/variance dilemma”
▶ But we don’t know the ground truth, we need a proxy
▶ What we can measure: how good is the model at predicting data
it has not seen =⇒ k-fold cross-validation (not the same k)
▶ Let’s see what this gives: roughly the same answer (by chance? or not?)
Wrap-up about synthetic models 9
▶ Available, hopefully large set of observations ( y0 , x0 ), . . . , ( ym , xm ):
xh typically ≫ a single real (a long vector, a tree, . . . ), often yh too
▶ Sometimes yh not even there (unsupervised learning)
▶ Want to learn from the data ≡ construct a model
▶ Choose the form among one of the not too many different options,
solve the fitting problem minc L( c ) to find the optimal parameters
▶ Choose the hyperparameters to find the best bias/variance compromise
(this often requires a grid search =⇒ costly)
▶ Very many other aspects to be considered: choice of the model (NN, SVR,
RBF, DT, Bayesian, clustering, ARMA/ARIMA, . . . ), regularization, feature
selection, sparsification/pruning, explainability, fairness, . . .
Wrap-up about synthetic models 9
▶ Available, hopefully large set of observations ( y0 , x0 ), . . . , ( ym , xm ):
xh typically ≫ a single real (a long vector, a tree, . . . ), often yh too
▶ Sometimes yh not even there (unsupervised learning)
▶ Want to learn from the data ≡ construct a model
▶ Choose the form among one of the not too many different options,
solve the fitting problem minc L( c ) to find the optimal parameters
▶ Choose the hyperparameters to find the best bias/variance compromise
(this often requires a grid search =⇒ costly)
▶ Very many other aspects to be considered: choice of the model (NN, SVR,
RBF, DT, Bayesian, clustering, ARMA/ARIMA, . . . ), regularization, feature
selection, sparsification/pruning, explainability, fairness, . . .
of which we don’t give a E here
Wrap-up about synthetic models 9
▶ Available, hopefully large set of observations ( y0 , x0 ), . . . , ( ym , xm ):
xh typically ≫ a single real (a long vector, a tree, . . . ), often yh too
▶ Sometimes yh not even there (unsupervised learning)
▶ Want to learn from the data ≡ construct a model
▶ Choose the form among one of the not too many different options,
solve the fitting problem minc L( c ) to find the optimal parameters
▶ Choose the hyperparameters to find the best bias/variance compromise
(this often requires a grid search =⇒ costly)
▶ Very many other aspects to be considered: choice of the model (NN, SVR,
RBF, DT, Bayesian, clustering, ARMA/ARIMA, . . . ), regularization, feature
selection, sparsification/pruning, explainability, fairness, . . .
of which we don’t give a E here
except insomuch as each of these things impact on optimization, which they do
▶ All that you will see in Machine Learning, Data Mining, Laboratory of Data
Science, Big Data Analytics, Text Analytics, Social Networks Analysis,
Distributed Data Analysis and Mining, Statistics for Data Science, . . .
What are we interested in here 10
▶ This course is about a different set of questions, such as:
1. how do you solve a fitting problem? how are optimal solutions characterised?
2. if I choose model XYZ, can the fitting problem be solved efficiently?
3. if I choose model XYZ, which algorithms exist to solve the fitting problem?
4. how the cost of solving the fitting problem changes when the characteristics
of x and y (number, size, sparsity, . . . ) change?
5. which algorithm is best to solve the fitting problem for model XYZ in case
x and y have characteristics ABC?
▶ The optimization backbone of data science
What are we interested in here 10
▶ This course is about a different set of questions, such as:
1. how do you solve a fitting problem? how are optimal solutions characterised?
2. if I choose model XYZ, can the fitting problem be solved efficiently?
3. if I choose model XYZ, which algorithms exist to solve the fitting problem?
4. how the cost of solving the fitting problem changes when the characteristics
of x and y (number, size, sparsity, . . . ) change?
5. which algorithm is best to solve the fitting problem for model XYZ in case
x and y have characteristics ABC?
▶ The optimization backbone of data science and much else beyond
▶ Typical of data science is to work with a small-ish set of models;
here you will understand (a part of) why they have been chosem
▶ Typical of optimization is to let you build your model (within reason)
▶ In fact, so far data science ≡ descriptive analytics ≡ how things are
▶ But optimization ≡ prescriptive analytics ≡ how things should be
(sometimes) is “the last step of data science”
▶ Let’s see an example
Outline
Logistic
Motivation I: Optimization Methods for Data Science
Motivation II: Data Science Techniques for Prescriptive Analytics
Mindset
Wrap up
Example, take II: optimal sizing of an Energy Community battery 11
▶ Why estimating the daily energy production at 5-minutes resolution?
To estimate how large a battery you should buy
▶ Battery ≡ selling the energy to the market when it’s more convenient
▶ Price changes hourly with significant swings
▶ But battery costly, have to precisely decide how it will be used
=⇒ have to precisely estimate how much energy will be produced
▶ Necessary but not sufficient input for optimal scheduling problem
considering technical constraints (max % discharge at every interval)
▶ Plus, battery only comes in discrete chunks
▶ All in all, optimal sizing requires optimal scheduling
Optimal sizing of an Energy Community battery, the data 12
▶ Real Italian e / MWh 08-08-2021
Optimal sizing of an Energy Community battery, the data 12
▶ Real Italian e / MWh 08-08-2021 . . . you don’t want to see 08-08-2022
Optimal sizing of an Energy Community: analytic model I 13
▶ Data for the analytic model:
1. T = { 0 , 1 , . . . , 287 } set of time periods, T− = T  { 287 }
2. for each t ∈ T, vt = energy production and pt = energy price
3. for each battery module, capacity (60 MHw), ammortised cost (80 e / MWh),
max % charge/discharge in one time period (10)
4. energy loss for charging/discharging the battery (1%)
▶ Variables of the analytic model:
1. bt = amount of energy in the battery at the beginning of period t
2. st / et : energy stored in / extracted from the battery within period t
3. mt : energy sold to the energy market within period t
4. z: number of battery modules bought
▶ Problem is periodic: bt must be equal at first and last period
Optimal sizing of an Energy Community: analytic model II 14
max
P
t∈T ptmt − 480z (1)
0 ≤ bt ≤ 60z t ∈ T (2)
bt + 0.99 ∗ st − et = bt+1 t ∈ T− (3)
b287 + 0.99 ∗ s287 − e287 = b0 (4)
mt = 0.99et − st + vt t ∈ T (5)
0 ≤ et ≤ 6z t ∈ T (6)
0 ≤ st ≤ 6z t ∈ T (7)
z ∈ N (8)
▶ May look complicated to the untrained eye, but in fact quite simple
▶ (2) battery capacity, (6)–(7) charging/discharging limit
▶ (3)–(4) battery energy flow, (5) defines mt (may be projected away)
▶ (8) integrality constraint: bad, but only one variable
▶ Easy to write and solve in practice, let’s see
Optimal sizing of an Energy Community: wrap up 15
▶ Synthetic / descriptive model is an input to the analytic / prescriptive model
▶ Better data leads to better decisions
▶ Don’t read too much in the example (artificial), but general concept holds
Optimal sizing of an Energy Community: wrap up 15
▶ Synthetic / descriptive model is an input to the analytic / prescriptive model
▶ Better data leads to better decisions
▶ Don’t read too much in the example (artificial), but general concept holds
▶ Need a prescriptive model be analytic? In principle no
▶ Could use reinforcement learning to learn the optimal decision rule
▶ Very good for Go, Atari games, protein folding
Optimal sizing of an Energy Community: wrap up 15
▶ Synthetic / descriptive model is an input to the analytic / prescriptive model
▶ Better data leads to better decisions
▶ Don’t read too much in the example (artificial), but general concept holds
▶ Need a prescriptive model be analytic? In principle no
▶ Could use reinforcement learning to learn the optimal decision rule
▶ Very good for Go, Atari games, protein folding . . . good and needed here?
▶ Data not there, so a simulator (“digital twin”) needed =⇒
no less information on the system than the analytical model needs (likely more)
▶ Analytical MILP model easy to write and solve with established tools,
guarantees optimal solutions, fast enough (in this case)
▶ Same tools can be used in zillions other cases
An aside (not really): synthetic vs. analytic, i.e., AI/ML vs. OR 16
▶ Are synthetic models better than analytic models?
An aside (not really): synthetic vs. analytic, i.e., AI/ML vs. OR 16
▶ Are synthetic models better than analytic models?
The question is ill-posed: better for what?
An aside (not really): synthetic vs. analytic, i.e., AI/ML vs. OR 16
▶ Are synthetic models better than analytic models?
The question is ill-posed: better for what?
▶ Analytic and synthetic models are just very very different,
the difference between reptilian brain and neocortex, intuition and proof
▶ Synthetic models ≡ AI = Artificial Intuition ≈ subconscious:
quickly takes hopefully good decisions, can fail, may not know why
(which is OK in many cases, in particular if you can’t wait longer)
▶ Analytic models ≡ mathemamical optimization ≈ logical brain:
slowly carves its way through an optimal decision, and can prove it
(which is OK if you really need to be sure and you can afford to wait)
An aside (not really): synthetic vs. analytic, i.e., AI/ML vs. OR 16
▶ Are synthetic models better than analytic models?
The question is ill-posed: better for what?
▶ Analytic and synthetic models are just very very different,
the difference between reptilian brain and neocortex, intuition and proof
▶ Synthetic models ≡ AI = Artificial Intuition ≈ subconscious:
quickly takes hopefully good decisions, can fail, may not know why
(which is OK in many cases, in particular if you can’t wait longer)
▶ Analytic models ≡ mathemamical optimization ≈ logical brain:
slowly carves its way through an optimal decision, and can prove it
(which is OK if you really need to be sure and you can afford to wait)
▶ Golden rule of all (computer) science: no tool is perfect for everything
1. would you write the Linux kernel in Javascript? a web page in assembler?
2. would you prove a theorem to decide how to run away from an hungry tiger?
3. would you only rely on intuition to design a nuclear reactor?
The case for pacific co-existence between analytic and synthetic 17
▶ They tried to use analytic models to develop general intelligence
https://en.wikipedia.org/wiki/Fifth_generation_computer
but it didn’t end up well: https://en.wikipedia.org/wiki/AI_winter
▶ Obvious in hindsight (which is 20/20): proving theorems is not how we work
The case for pacific co-existence between analytic and synthetic 17
▶ They tried to use analytic models to develop general intelligence
https://en.wikipedia.org/wiki/Fifth_generation_computer
but it didn’t end up well: https://en.wikipedia.org/wiki/AI_winter
▶ Obvious in hindsight (which is 20/20): proving theorems is not how we work
▶ Could it still work? In principle yes, but (likely) P ̸= NP: proving theorems
quickly enough will never be possible in general (QC will not help)
The case for pacific co-existence between analytic and synthetic 17
▶ They tried to use analytic models to develop general intelligence
https://en.wikipedia.org/wiki/Fifth_generation_computer
but it didn’t end up well: https://en.wikipedia.org/wiki/AI_winter
▶ Obvious in hindsight (which is 20/20): proving theorems is not how we work
▶ Could it still work? In principle yes, but (likely) P ̸= NP: proving theorems
quickly enough will never be possible in general (QC will not help)
▶ They are trying to replace all analytic models with synthetic ones,
with only partial success, and anyway always without a guarantee
The case for pacific co-existence between analytic and synthetic 17
▶ They tried to use analytic models to develop general intelligence
https://en.wikipedia.org/wiki/Fifth_generation_computer
but it didn’t end up well: https://en.wikipedia.org/wiki/AI_winter
▶ Obvious in hindsight (which is 20/20): proving theorems is not how we work
▶ Could it still work? In principle yes, but (likely) P ̸= NP: proving theorems
quickly enough will never be possible in general (QC will not help)
▶ They are trying to replace all analytic models with synthetic ones,
with only partial success, and anyway always without a guarantee
▶ Could it still work? In principle yes, but I’d wager not since (likely) P ̸= NP
=⇒ solving problems inherently requires exponentially many information,
if AI could do that it would be with little information =⇒ P = NP
(but then also analytic models would be “easy”)
▶ In other words: even if AI reproduces our intelligence, it will
not be good at proving theorems, since we are not (and nothing can be)
The case for pacific co-existence between analytic and synthetic 17
▶ They tried to use analytic models to develop general intelligence
https://en.wikipedia.org/wiki/Fifth_generation_computer
but it didn’t end up well: https://en.wikipedia.org/wiki/AI_winter
▶ Obvious in hindsight (which is 20/20): proving theorems is not how we work
▶ Could it still work? In principle yes, but (likely) P ̸= NP: proving theorems
quickly enough will never be possible in general (QC will not help)
▶ They are trying to replace all analytic models with synthetic ones,
with only partial success, and anyway always without a guarantee
▶ Could it still work? In principle yes, but I’d wager not since (likely) P ̸= NP
=⇒ solving problems inherently requires exponentially many information,
if AI could do that it would be with little information =⇒ P = NP
(but then also analytic models would be “easy”)
▶ In other words: even if AI reproduces our intelligence, it will
not be good at proving theorems, since we are not (and nothing can be)
▶ Wrap-up: do not expect AI to save you from theorems =⇒ this course
Outline
Logistic
Motivation I: Optimization Methods for Data Science
Motivation II: Data Science Techniques for Prescriptive Analytics
Mindset
Wrap up
Syllabus 18
▶ Linear algebra and calculus background (on a need-to-know basis)
▶ “Simple” unconstrained optimization, univariate and multivariate
▶ Univariate continuous unconstrained optimization
▶ Multivariate continuous unconstrained optimization (++/--gradient)
▶ Constrained continuous optimization theory
▶ Convexity and Duality (Lagrangian, linear, quadratic, conic, . . . )
▶ Some numerical methods for continuous constrained optimization
▶ Mixed-integer optimization models, sparse hints to software tools
▶ Sparse hints to Data Science applications throughout
Course material 19
▶ Slides prepared by the lecturer + recording of lectures
▶ Code (mostly, but not only, Matlab) + example data
▶ S. Boyd, L. Vandenberghe Convex optimization, Cambridge Un. Press, 2008
(http://web.stanford.edu/~boyd/cvxbook/)
▶ M.S. Bazaraa, H.D. Sherali, C.M. Shetty Nonlinear programming: theory and
algorithms, Wiley & Sons, 2006
▶ D.G. Luenberger, Y. Ye Linear and Nonlinear Programming, Springer
International Series in Operations Research & Management Science, 2008
▶ J. Nocedal, S. Wright Numerical Optimization, Springer Series in Operations
Research and Financial Engineering, 2006
▶ J. Lee A First Course in Linear Optimization Reex Press, 2013 (https:
//sites.google.com/site/jonleewebpage/home/publications/#book)
▶ F. Schoen Optimization Models, free e-book version, 2022 (https:
//webgol.dinfo.unifi.it/OptimizationModels/contents.html)
▶ Pointers to relevant web resources for specific themes
Where we want to go, how we get there 20
▶ Main aim of the course: get a general understanding of several different
classes of optimization problems and the corresponding solution algorithms
▶ Sought-after ability: quickly understand any new one you will see (or
invent yourself) and have the means to analyse it (is it better than others?)
▶ Optimization problems and algorithms are mathematical objects =⇒
reasoning about them is proving theorems (+ some hand-waving)
▶ But theorems often fail to paint the whole picture =⇒
implementation and practical tests are also crucial
▶ Aiming at a good balance between the theoretical and the practical side
▶ Nontrivial since it depends on your background, which is varied
▶ Self-contained course: little previous math/optimization background assumed
▶ Two different tracks for grading
Grading: two different tracks 21
Yes there are two paths you can go by, but in the long road
There is still time to change the road you’re on
Grading: two different tracks 21
▶ Grading on both theory and practice too heavy, have to choose one
▶ Best choice depends on your background/tastes, so choice is yours
▶ Two different grading tracks:
1. theoretical: individual, grading = written + oral exam
2. practical: 2-person groups, grading = didactic project + oral exam
▶ Free choice + changing track possible, but only one project per person per year
=⇒ once project done/renounced, theoretical track mandatory
▶ Special SMS++ projects for C++ kamikaze, few minor perks
▶ Theoretical track focuses on theory, but assumes some familiarity with
implementations/experiments seen during the course
▶ Practical track focuses on project (implementations/tests), but assumes
some familiarity with the theory seen during the course
▶ An experiment: you’ll vote with your feet and we’ll see how it goes
Outline
Logistic
Motivation I: Optimization Methods for Data Science
Motivation II: Data Science Techniques for Prescriptive Analytics
Mindset
Wrap up
Wrap up: what’s this course about 22
▶ Mathematical optimization and its two (main) different roles in Data Science:
1. as a tool for developing mathematical models and solving fitting problems
2. as prescriptive analytics, “the last step of the decision process”
▶ A strictly need-to-know review of the underlying mathematical concepts
(calculus, linear algebra, numerical analysis, . . . )
▶ Some hands-on experience with the pracicalities and pitfalls of implementations
▶ Focus on easy problems (linear, quadratic, conic, convex) or local optima,
since in Data Science size is huge (hard because large, not hard because hard),
and besides the global optimal solution is not (necessarily) what you want
▶ Understanding if/why a problem is easy, a bit about what to do if it it not
▶ Data Science the main source of models/problems, not our real focus:
Data Science is ̸= and ≫ than this, but you’ll see plenty of that elsewhere

More Related Content

Similar to 0-introduction.pdf

Useing PSO to optimize logit model with Tensorflow
Useing PSO to optimize logit model with TensorflowUseing PSO to optimize logit model with Tensorflow
Useing PSO to optimize logit model with TensorflowYi-Fan Liou
 
Machine Learning Approach.pptx
Machine Learning Approach.pptxMachine Learning Approach.pptx
Machine Learning Approach.pptxCYPatrickKwee
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Simplilearn
 
Optimization of power systems - old and new tools
Optimization of power systems - old and new toolsOptimization of power systems - old and new tools
Optimization of power systems - old and new toolsOlivier Teytaud
 
Tools for Discrete Time Control; Application to Power Systems
Tools for Discrete Time Control; Application to Power SystemsTools for Discrete Time Control; Application to Power Systems
Tools for Discrete Time Control; Application to Power SystemsOlivier Teytaud
 
G. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statisticsG. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statisticsIstituto nazionale di statistica
 
computer notes - Data Structures - 1
computer notes - Data Structures - 1computer notes - Data Structures - 1
computer notes - Data Structures - 1ecomputernotes
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
CS3114_09212011.ppt
CS3114_09212011.pptCS3114_09212011.ppt
CS3114_09212011.pptArumugam90
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithmsArunangsu Sahu
 
THESLING-PETER-6019098-EFR-THESIS
THESLING-PETER-6019098-EFR-THESISTHESLING-PETER-6019098-EFR-THESIS
THESLING-PETER-6019098-EFR-THESISPeter Thesling
 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...IJCSES Journal
 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...ijcseit
 
Machine Learning Guide maXbox Starter62
Machine Learning Guide maXbox Starter62Machine Learning Guide maXbox Starter62
Machine Learning Guide maXbox Starter62Max Kleiner
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesSơn Còm Nhom
 
Lecture_1_-_Course_Overview_(Inked).pdf
Lecture_1_-_Course_Overview_(Inked).pdfLecture_1_-_Course_Overview_(Inked).pdf
Lecture_1_-_Course_Overview_(Inked).pdfRTEFGDFGJU
 
Machine Learning basics
Machine Learning basicsMachine Learning basics
Machine Learning basicsNeeleEilers
 

Similar to 0-introduction.pdf (20)

Useing PSO to optimize logit model with Tensorflow
Useing PSO to optimize logit model with TensorflowUseing PSO to optimize logit model with Tensorflow
Useing PSO to optimize logit model with Tensorflow
 
ESE-Eyes-2023.pdf
ESE-Eyes-2023.pdfESE-Eyes-2023.pdf
ESE-Eyes-2023.pdf
 
Machine Learning Approach.pptx
Machine Learning Approach.pptxMachine Learning Approach.pptx
Machine Learning Approach.pptx
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
 
Optimization of power systems - old and new tools
Optimization of power systems - old and new toolsOptimization of power systems - old and new tools
Optimization of power systems - old and new tools
 
Tools for Discrete Time Control; Application to Power Systems
Tools for Discrete Time Control; Application to Power SystemsTools for Discrete Time Control; Application to Power Systems
Tools for Discrete Time Control; Application to Power Systems
 
G. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statisticsG. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statistics
 
computer notes - Data Structures - 1
computer notes - Data Structures - 1computer notes - Data Structures - 1
computer notes - Data Structures - 1
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
CS3114_09212011.ppt
CS3114_09212011.pptCS3114_09212011.ppt
CS3114_09212011.ppt
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
 
THESLING-PETER-6019098-EFR-THESIS
THESLING-PETER-6019098-EFR-THESISTHESLING-PETER-6019098-EFR-THESIS
THESLING-PETER-6019098-EFR-THESIS
 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
 
2019 Fall Series: Postdoc Seminars - Special Guest Lecture, There is a Kernel...
2019 Fall Series: Postdoc Seminars - Special Guest Lecture, There is a Kernel...2019 Fall Series: Postdoc Seminars - Special Guest Lecture, There is a Kernel...
2019 Fall Series: Postdoc Seminars - Special Guest Lecture, There is a Kernel...
 
Machine Learning Guide maXbox Starter62
Machine Learning Guide maXbox Starter62Machine Learning Guide maXbox Starter62
Machine Learning Guide maXbox Starter62
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and Techniques
 
Lecture_1_-_Course_Overview_(Inked).pdf
Lecture_1_-_Course_Overview_(Inked).pdfLecture_1_-_Course_Overview_(Inked).pdf
Lecture_1_-_Course_Overview_(Inked).pdf
 
Machine Learning basics
Machine Learning basicsMachine Learning basics
Machine Learning basics
 

Recently uploaded

Call Girl in Bur Dubai O5286O4116 Indian Call Girls in Bur Dubai By VIP Bur D...
Call Girl in Bur Dubai O5286O4116 Indian Call Girls in Bur Dubai By VIP Bur D...Call Girl in Bur Dubai O5286O4116 Indian Call Girls in Bur Dubai By VIP Bur D...
Call Girl in Bur Dubai O5286O4116 Indian Call Girls in Bur Dubai By VIP Bur D...dajasot375
 
Lucknow 💋 Virgin Call Girls Lucknow | Book 8923113531 Extreme Naughty Call Gi...
Lucknow 💋 Virgin Call Girls Lucknow | Book 8923113531 Extreme Naughty Call Gi...Lucknow 💋 Virgin Call Girls Lucknow | Book 8923113531 Extreme Naughty Call Gi...
Lucknow 💋 Virgin Call Girls Lucknow | Book 8923113531 Extreme Naughty Call Gi...anilsa9823
 
Bridge Fight Board by Daniel Johnson dtjohnsonart.com
Bridge Fight Board by Daniel Johnson dtjohnsonart.comBridge Fight Board by Daniel Johnson dtjohnsonart.com
Bridge Fight Board by Daniel Johnson dtjohnsonart.comthephillipta
 
Mandi House Call Girls : ☎ 8527673949, Low rate Call Girls
Mandi House Call Girls : ☎ 8527673949, Low rate Call GirlsMandi House Call Girls : ☎ 8527673949, Low rate Call Girls
Mandi House Call Girls : ☎ 8527673949, Low rate Call Girlsashishs7044
 
Roadrunner Lodge, Motel/Residence, Tucumcari NM
Roadrunner Lodge, Motel/Residence, Tucumcari NMRoadrunner Lodge, Motel/Residence, Tucumcari NM
Roadrunner Lodge, Motel/Residence, Tucumcari NMroute66connected
 
FULL ENJOY - 9953040155 Call Girls in Laxmi Nagar | Delhi
FULL ENJOY - 9953040155 Call Girls in Laxmi Nagar | DelhiFULL ENJOY - 9953040155 Call Girls in Laxmi Nagar | Delhi
FULL ENJOY - 9953040155 Call Girls in Laxmi Nagar | DelhiMalviyaNagarCallGirl
 
SHIVNA SAHITYIKI APRIL JUNE 2024 Magazine
SHIVNA SAHITYIKI APRIL JUNE 2024 MagazineSHIVNA SAHITYIKI APRIL JUNE 2024 Magazine
SHIVNA SAHITYIKI APRIL JUNE 2024 MagazineShivna Prakashan
 
Karachi Escorts | +923070433345 | Escort Service in Karachi
Karachi Escorts | +923070433345 | Escort Service in KarachiKarachi Escorts | +923070433345 | Escort Service in Karachi
Karachi Escorts | +923070433345 | Escort Service in KarachiAyesha Khan
 
FULL ENJOY - 9953040155 Call Girls in Moti Nagar | Delhi
FULL ENJOY - 9953040155 Call Girls in Moti Nagar | DelhiFULL ENJOY - 9953040155 Call Girls in Moti Nagar | Delhi
FULL ENJOY - 9953040155 Call Girls in Moti Nagar | DelhiMalviyaNagarCallGirl
 
Jagat Puri Call Girls : ☎ 8527673949, Low rate Call Girls
Jagat Puri Call Girls : ☎ 8527673949, Low rate Call GirlsJagat Puri Call Girls : ☎ 8527673949, Low rate Call Girls
Jagat Puri Call Girls : ☎ 8527673949, Low rate Call Girlsashishs7044
 
Retail Store Scavanger Hunt - Foundation College Park
Retail Store Scavanger Hunt - Foundation College ParkRetail Store Scavanger Hunt - Foundation College Park
Retail Store Scavanger Hunt - Foundation College Parkjosebenzaquen
 
Strip Zagor Extra 322 - Dva ortaka.pdf
Strip   Zagor Extra 322 - Dva ortaka.pdfStrip   Zagor Extra 322 - Dva ortaka.pdf
Strip Zagor Extra 322 - Dva ortaka.pdfStripovizijacom
 
Alex and Chloe by Daniel Johnson Storyboard
Alex and Chloe by Daniel Johnson StoryboardAlex and Chloe by Daniel Johnson Storyboard
Alex and Chloe by Daniel Johnson Storyboardthephillipta
 
9654467111 Call Girls In Noida Sector 62 Short 1500 Night 6000
9654467111 Call Girls In Noida Sector 62 Short 1500 Night 60009654467111 Call Girls In Noida Sector 62 Short 1500 Night 6000
9654467111 Call Girls In Noida Sector 62 Short 1500 Night 6000Sapana Sha
 
Kishangarh Call Girls : ☎ 8527673949, Low rate Call Girls
Kishangarh Call Girls : ☎ 8527673949, Low rate Call GirlsKishangarh Call Girls : ☎ 8527673949, Low rate Call Girls
Kishangarh Call Girls : ☎ 8527673949, Low rate Call Girlsashishs7044
 
Pragati Maidan Call Girls : ☎ 8527673949, Low rate Call Girls
Pragati Maidan Call Girls : ☎ 8527673949, Low rate Call GirlsPragati Maidan Call Girls : ☎ 8527673949, Low rate Call Girls
Pragati Maidan Call Girls : ☎ 8527673949, Low rate Call Girlsashishs7044
 
Patrakarpuram ) Cheap Call Girls In Lucknow (Adult Only) 🧈 8923113531 𓀓 Esco...
Patrakarpuram ) Cheap Call Girls In Lucknow  (Adult Only) 🧈 8923113531 𓀓 Esco...Patrakarpuram ) Cheap Call Girls In Lucknow  (Adult Only) 🧈 8923113531 𓀓 Esco...
Patrakarpuram ) Cheap Call Girls In Lucknow (Adult Only) 🧈 8923113531 𓀓 Esco...akbard9823
 
Olivia Cox. intertextual references.pptx
Olivia Cox. intertextual references.pptxOlivia Cox. intertextual references.pptx
Olivia Cox. intertextual references.pptxLauraFagan6
 
Turn Lock Take Key Storyboard Daniel Johnson
Turn Lock Take Key Storyboard Daniel JohnsonTurn Lock Take Key Storyboard Daniel Johnson
Turn Lock Take Key Storyboard Daniel Johnsonthephillipta
 

Recently uploaded (20)

Call Girl in Bur Dubai O5286O4116 Indian Call Girls in Bur Dubai By VIP Bur D...
Call Girl in Bur Dubai O5286O4116 Indian Call Girls in Bur Dubai By VIP Bur D...Call Girl in Bur Dubai O5286O4116 Indian Call Girls in Bur Dubai By VIP Bur D...
Call Girl in Bur Dubai O5286O4116 Indian Call Girls in Bur Dubai By VIP Bur D...
 
Lucknow 💋 Virgin Call Girls Lucknow | Book 8923113531 Extreme Naughty Call Gi...
Lucknow 💋 Virgin Call Girls Lucknow | Book 8923113531 Extreme Naughty Call Gi...Lucknow 💋 Virgin Call Girls Lucknow | Book 8923113531 Extreme Naughty Call Gi...
Lucknow 💋 Virgin Call Girls Lucknow | Book 8923113531 Extreme Naughty Call Gi...
 
Bridge Fight Board by Daniel Johnson dtjohnsonart.com
Bridge Fight Board by Daniel Johnson dtjohnsonart.comBridge Fight Board by Daniel Johnson dtjohnsonart.com
Bridge Fight Board by Daniel Johnson dtjohnsonart.com
 
Mandi House Call Girls : ☎ 8527673949, Low rate Call Girls
Mandi House Call Girls : ☎ 8527673949, Low rate Call GirlsMandi House Call Girls : ☎ 8527673949, Low rate Call Girls
Mandi House Call Girls : ☎ 8527673949, Low rate Call Girls
 
Bur Dubai Call Girls # 971504361175 # Call Girls In Bur Dubai || (UAE)
Bur Dubai Call Girls # 971504361175 # Call Girls In Bur Dubai || (UAE)Bur Dubai Call Girls # 971504361175 # Call Girls In Bur Dubai || (UAE)
Bur Dubai Call Girls # 971504361175 # Call Girls In Bur Dubai || (UAE)
 
Roadrunner Lodge, Motel/Residence, Tucumcari NM
Roadrunner Lodge, Motel/Residence, Tucumcari NMRoadrunner Lodge, Motel/Residence, Tucumcari NM
Roadrunner Lodge, Motel/Residence, Tucumcari NM
 
FULL ENJOY - 9953040155 Call Girls in Laxmi Nagar | Delhi
FULL ENJOY - 9953040155 Call Girls in Laxmi Nagar | DelhiFULL ENJOY - 9953040155 Call Girls in Laxmi Nagar | Delhi
FULL ENJOY - 9953040155 Call Girls in Laxmi Nagar | Delhi
 
SHIVNA SAHITYIKI APRIL JUNE 2024 Magazine
SHIVNA SAHITYIKI APRIL JUNE 2024 MagazineSHIVNA SAHITYIKI APRIL JUNE 2024 Magazine
SHIVNA SAHITYIKI APRIL JUNE 2024 Magazine
 
Karachi Escorts | +923070433345 | Escort Service in Karachi
Karachi Escorts | +923070433345 | Escort Service in KarachiKarachi Escorts | +923070433345 | Escort Service in Karachi
Karachi Escorts | +923070433345 | Escort Service in Karachi
 
FULL ENJOY - 9953040155 Call Girls in Moti Nagar | Delhi
FULL ENJOY - 9953040155 Call Girls in Moti Nagar | DelhiFULL ENJOY - 9953040155 Call Girls in Moti Nagar | Delhi
FULL ENJOY - 9953040155 Call Girls in Moti Nagar | Delhi
 
Jagat Puri Call Girls : ☎ 8527673949, Low rate Call Girls
Jagat Puri Call Girls : ☎ 8527673949, Low rate Call GirlsJagat Puri Call Girls : ☎ 8527673949, Low rate Call Girls
Jagat Puri Call Girls : ☎ 8527673949, Low rate Call Girls
 
Retail Store Scavanger Hunt - Foundation College Park
Retail Store Scavanger Hunt - Foundation College ParkRetail Store Scavanger Hunt - Foundation College Park
Retail Store Scavanger Hunt - Foundation College Park
 
Strip Zagor Extra 322 - Dva ortaka.pdf
Strip   Zagor Extra 322 - Dva ortaka.pdfStrip   Zagor Extra 322 - Dva ortaka.pdf
Strip Zagor Extra 322 - Dva ortaka.pdf
 
Alex and Chloe by Daniel Johnson Storyboard
Alex and Chloe by Daniel Johnson StoryboardAlex and Chloe by Daniel Johnson Storyboard
Alex and Chloe by Daniel Johnson Storyboard
 
9654467111 Call Girls In Noida Sector 62 Short 1500 Night 6000
9654467111 Call Girls In Noida Sector 62 Short 1500 Night 60009654467111 Call Girls In Noida Sector 62 Short 1500 Night 6000
9654467111 Call Girls In Noida Sector 62 Short 1500 Night 6000
 
Kishangarh Call Girls : ☎ 8527673949, Low rate Call Girls
Kishangarh Call Girls : ☎ 8527673949, Low rate Call GirlsKishangarh Call Girls : ☎ 8527673949, Low rate Call Girls
Kishangarh Call Girls : ☎ 8527673949, Low rate Call Girls
 
Pragati Maidan Call Girls : ☎ 8527673949, Low rate Call Girls
Pragati Maidan Call Girls : ☎ 8527673949, Low rate Call GirlsPragati Maidan Call Girls : ☎ 8527673949, Low rate Call Girls
Pragati Maidan Call Girls : ☎ 8527673949, Low rate Call Girls
 
Patrakarpuram ) Cheap Call Girls In Lucknow (Adult Only) 🧈 8923113531 𓀓 Esco...
Patrakarpuram ) Cheap Call Girls In Lucknow  (Adult Only) 🧈 8923113531 𓀓 Esco...Patrakarpuram ) Cheap Call Girls In Lucknow  (Adult Only) 🧈 8923113531 𓀓 Esco...
Patrakarpuram ) Cheap Call Girls In Lucknow (Adult Only) 🧈 8923113531 𓀓 Esco...
 
Olivia Cox. intertextual references.pptx
Olivia Cox. intertextual references.pptxOlivia Cox. intertextual references.pptx
Olivia Cox. intertextual references.pptx
 
Turn Lock Take Key Storyboard Daniel Johnson
Turn Lock Take Key Storyboard Daniel JohnsonTurn Lock Take Key Storyboard Daniel Johnson
Turn Lock Take Key Storyboard Daniel Johnson
 

0-introduction.pdf

  • 1. Optimization for Data Science: Introduction to the course Antonio Frangioni Department of Computer Science University of Pisa https://www.di.unipi.it/~frangio mailto:frangio@di.unipi.it Optimization Methods for Data Science Master in Data Science and Business Informatics University of Pisa A.Y. 2022/23
  • 2. Outline Logistic Motivation I: Optimization Methods for Data Science Motivation II: Data Science Techniques for Prescriptive Analytics Mindset Wrap up
  • 3. Basic information 1 ▶ Course Schedule ▶ Tue 11:00 – 13:00 (Fib M1) ▶ Fri 9:00 – 11:00 (Fib M1) ▶ Web page: https://elearning.di.unipi.it/course/view.php?id=301 ▶ Team for lectures: https://teams.microsoft.com/l/team/19% 3aCSI78qkZgMUq5HknxX5LkyKP3pn8qGMs9CDyEZbd5U01%40thread.tacv2/ conversations?groupId=227a39ab-9e3b-419f-b7e9-de4fe8c59059& tenantId=c7456b31-a220-47f5-be52-473828670aa1 ▶ Grading in two possible ways: ▶ Written + oral exam (theory) ▶ Didactic project (groups of 2) + oral exam (discussion) ▶ Special projects available for C++ kamikaze C++ original image by Vectorportal.com
  • 4. Outline Logistic Motivation I: Optimization Methods for Data Science Motivation II: Data Science Techniques for Prescriptive Analytics Mindset Wrap up
  • 5. Why this course: the view from the stratoshpere 2 ▶ Huge amounts of data is generated and collected, but one has to make sense of it in order to use it: that’s what data science is ▶ Take something big (data) and therefore unwieldy and produce something small and nimble that can be used in its stead (“actionable”) ▶ That’s a (mathematical) model ▶ Word comes from “modulus”, diminutive from “modus” = “measure”: “small measure”, “measure in the small” (small is good) ▶ Known uses in architecture: proving beforehand that the real building won’t collapse (e.g., Filippo Brunelleschi for the Cupola of the Cathedral of Florence) ▶ Countless many physical models afterwards (planes, cars, . . . ), but mathematics is cheaper than bricks / wood / iron . . . ▶ Yet, mathematical problems can be difficult, too, for various reasons (and, of course, only truly viable after computers) ▶ And most of them remain (likely) difficult for quantum computers, too, https://www.smbc-comics.com/comic/the-talk-3
  • 6. That’s how (Data) Science works, a.k.a., “the decision process” 3 reality mathematics fragment situation ▶ The fundamental cycle of all science
  • 7. That’s how (Data) Science works, a.k.a., “the decision process” 3 reality mathematics fragment situation (instance of) mathematical model modelling data collection ▶ The fundamental cycle of all science
  • 8. That’s how (Data) Science works, a.k.a., “the decision process” 3 reality mathematics fragment situation (instance of) mathematical model prediction solution modelling data collection formula algorithm ▶ The fundamental cycle of all science
  • 9. That’s how (Data) Science works, a.k.a., “the decision process” 3 reality mathematics fragment situation (instance of) mathematical model prediction solution law decision modelling data collection interpretation formula algorithm ▶ The fundamental cycle of all science
  • 10. That’s how (Data) Science works, a.k.a., “the decision process” 3 reality mathematics fragment situation (instance of) mathematical model prediction solution law decision modelling data collection interpretation formula algorithm verification implementation ▶ The fundamental cycle of all science
  • 11. That’s how (Data) Science works, a.k.a., “the decision process” 3 reality mathematics fragment situation (instance of) mathematical model prediction solution law decision modelling data collection interpretation formula algorithm verification implementation program (sw) computer (hw) measure forecast implementation debug execution ▶ The fundamental cycle of all science and its implementation
  • 12. Choosing a mathematical model 4 ▶ How a mathematical model should be:
  • 13. Choosing a mathematical model 4 ▶ How a mathematical model should be: 1. accurate (describes well the process at hand)
  • 14. Choosing a mathematical model 4 ▶ How a mathematical model should be: 1. accurate (describes well the process at hand) 2. computationally inexpensive (gives answers rapidly)
  • 15. Choosing a mathematical model 4 ▶ How a mathematical model should be: 1. accurate (describes well the process at hand) 2. computationally inexpensive (gives answers rapidly) 3. general (can be applied to many different processes)
  • 16. Choosing a mathematical model 4 ▶ How a mathematical model should be: 1. accurate (describes well the process at hand) 2. computationally inexpensive (gives answers rapidly) 3. general (can be applied to many different processes) Typically impossible to have all three!
  • 17. Choosing a mathematical model 4 ▶ How a mathematical model should be: 1. accurate (describes well the process at hand) 2. computationally inexpensive (gives answers rapidly) 3. general (can be applied to many different processes) Typically impossible to have all three! ▶ Choice of the model crucial, trade-offs between efficiency and effectiveness ▶ Two fundamentally different model building approaches: 1. analytic: model each component of the system separately + their interactions, (≈)accurate but hard to construct (need system access + technical knowledge) 2. data-driven / synthetic: don’t expect the model to closely match the underlying system, just to be simple and to (≈)accurately reproduce its observed behaviour ▶ All models are approximate (the map is not the world), but for different reasons ▶ Analytic models: flexible shape, (relatively) few “hand-chosen” parameters ▶ Synthetic models: rigid shape, (very) many automatically chosen parameters ▶ Fitting: find the parameters of the model that best represents the phenomenon, clearly some sort of optimization problem (often a computational bottleneck)
  • 18. Didactic example: sizing an Energy Community battery 5 ▶ Energy Community: pool of prosumers with both generation and consumption pooling resources to reduce (eliminate) reliance from the energy grid ▶ A serious application, don’t just take my word for it: https://ciress.it, https://autens.unipi.it, https://unescochair.unipi.it ▶ Test problem: estimate daily energy production at 5-minutes resolution ▶ Too few measures of noisy process, large inherent random error, but underlying physical process “constant and smooth” ▶ Need to estimate the “constant” part / average of the process (yearly or multi-year planning, it’s the long-term average that counts) ▶ Analytic model possible but requires a lot of information on the system (=⇒ has to be re-done if the system changes) ▶ Synthetic model just need to see the data, let’s try that
  • 19. Estimate an Energy Community Production, the data 6 ▶ Noisy measurement
  • 20. Estimate an Energy Community Production, the data 6 ▶ Noisy measurement but (hopefully) simple/smooth underlying physical process
  • 21. Polynomial interpolation 7 ▶ Available set of observations ( y0 , x0 ), . . . , ( ym , xm ), (m = 287) ▶ Possibly reasonable (??) assumption: the dependence is polynomial, i.e., y = pc ( x ) = c0 + Pk i=1 ci xk for fixed k and k + 1 real parameters c = [ c0 , c+ = [ c1 , . . . , cn ] ] ▶ This would imply that yh = pc ( xh ) for all h, which is not really true for any c ▶ Find the c for which it is less untrue ≡ Linear Least Squares: y =    y0 . . . ym    , X =    1 x0 x2 0 . . . xk 0 . . . . . . . . . ... . . . 1 xm x2 m . . . xk m    , minc L( c ) = y − Xc 2 ▶ Minimize loss function y − Xc 2 = empirical risk = how much the model fails the predict the phenomenon on the available observations ▶ Why exactly that? Because it is simple to solve, in Matlab is just c = y / X ▶ Let’s see this in practice
  • 22. But how about k? 8 ▶ LSS choses “optimal” c for fixed k, but how to choose k? ▶ Jointly optimizing over c and k is hard, have to try all (∞-ly many) values ▶ Even worse: how do you measure how good a given k is? ▶ If you knew the ground truth you could check, let’s see what would happen
  • 23. But how about k? 8 ▶ LSS choses “optimal” c for fixed k, but how to choose k? ▶ Jointly optimizing over c and k is hard, have to try all (∞-ly many) values ▶ Even worse: how do you measure how good a given k is? ▶ If you knew the ground truth you could check, let’s see what would happen ▶ ∃ k ≡ optimal model complexity: less =⇒ model too coarse to map the phenomenon, more =⇒ learning the noise, not the phenomenon ▶ Underfitting vs. overfitting, a.k.a. “the bias/variance dilemma” ▶ But we don’t know the ground truth, we need a proxy ▶ What we can measure: how good is the model at predicting data it has not seen =⇒ k-fold cross-validation (not the same k) ▶ Let’s see what this gives:
  • 24. But how about k? 8 ▶ LSS choses “optimal” c for fixed k, but how to choose k? ▶ Jointly optimizing over c and k is hard, have to try all (∞-ly many) values ▶ Even worse: how do you measure how good a given k is? ▶ If you knew the ground truth you could check, let’s see what would happen ▶ ∃ k ≡ optimal model complexity: less =⇒ model too coarse to map the phenomenon, more =⇒ learning the noise, not the phenomenon ▶ Underfitting vs. overfitting, a.k.a. “the bias/variance dilemma” ▶ But we don’t know the ground truth, we need a proxy ▶ What we can measure: how good is the model at predicting data it has not seen =⇒ k-fold cross-validation (not the same k) ▶ Let’s see what this gives: roughly the same answer (by chance? or not?)
  • 25. Wrap-up about synthetic models 9 ▶ Available, hopefully large set of observations ( y0 , x0 ), . . . , ( ym , xm ): xh typically ≫ a single real (a long vector, a tree, . . . ), often yh too ▶ Sometimes yh not even there (unsupervised learning) ▶ Want to learn from the data ≡ construct a model ▶ Choose the form among one of the not too many different options, solve the fitting problem minc L( c ) to find the optimal parameters ▶ Choose the hyperparameters to find the best bias/variance compromise (this often requires a grid search =⇒ costly) ▶ Very many other aspects to be considered: choice of the model (NN, SVR, RBF, DT, Bayesian, clustering, ARMA/ARIMA, . . . ), regularization, feature selection, sparsification/pruning, explainability, fairness, . . .
  • 26. Wrap-up about synthetic models 9 ▶ Available, hopefully large set of observations ( y0 , x0 ), . . . , ( ym , xm ): xh typically ≫ a single real (a long vector, a tree, . . . ), often yh too ▶ Sometimes yh not even there (unsupervised learning) ▶ Want to learn from the data ≡ construct a model ▶ Choose the form among one of the not too many different options, solve the fitting problem minc L( c ) to find the optimal parameters ▶ Choose the hyperparameters to find the best bias/variance compromise (this often requires a grid search =⇒ costly) ▶ Very many other aspects to be considered: choice of the model (NN, SVR, RBF, DT, Bayesian, clustering, ARMA/ARIMA, . . . ), regularization, feature selection, sparsification/pruning, explainability, fairness, . . . of which we don’t give a E here
  • 27. Wrap-up about synthetic models 9 ▶ Available, hopefully large set of observations ( y0 , x0 ), . . . , ( ym , xm ): xh typically ≫ a single real (a long vector, a tree, . . . ), often yh too ▶ Sometimes yh not even there (unsupervised learning) ▶ Want to learn from the data ≡ construct a model ▶ Choose the form among one of the not too many different options, solve the fitting problem minc L( c ) to find the optimal parameters ▶ Choose the hyperparameters to find the best bias/variance compromise (this often requires a grid search =⇒ costly) ▶ Very many other aspects to be considered: choice of the model (NN, SVR, RBF, DT, Bayesian, clustering, ARMA/ARIMA, . . . ), regularization, feature selection, sparsification/pruning, explainability, fairness, . . . of which we don’t give a E here except insomuch as each of these things impact on optimization, which they do ▶ All that you will see in Machine Learning, Data Mining, Laboratory of Data Science, Big Data Analytics, Text Analytics, Social Networks Analysis, Distributed Data Analysis and Mining, Statistics for Data Science, . . .
  • 28. What are we interested in here 10 ▶ This course is about a different set of questions, such as: 1. how do you solve a fitting problem? how are optimal solutions characterised? 2. if I choose model XYZ, can the fitting problem be solved efficiently? 3. if I choose model XYZ, which algorithms exist to solve the fitting problem? 4. how the cost of solving the fitting problem changes when the characteristics of x and y (number, size, sparsity, . . . ) change? 5. which algorithm is best to solve the fitting problem for model XYZ in case x and y have characteristics ABC? ▶ The optimization backbone of data science
  • 29. What are we interested in here 10 ▶ This course is about a different set of questions, such as: 1. how do you solve a fitting problem? how are optimal solutions characterised? 2. if I choose model XYZ, can the fitting problem be solved efficiently? 3. if I choose model XYZ, which algorithms exist to solve the fitting problem? 4. how the cost of solving the fitting problem changes when the characteristics of x and y (number, size, sparsity, . . . ) change? 5. which algorithm is best to solve the fitting problem for model XYZ in case x and y have characteristics ABC? ▶ The optimization backbone of data science and much else beyond ▶ Typical of data science is to work with a small-ish set of models; here you will understand (a part of) why they have been chosem ▶ Typical of optimization is to let you build your model (within reason) ▶ In fact, so far data science ≡ descriptive analytics ≡ how things are ▶ But optimization ≡ prescriptive analytics ≡ how things should be (sometimes) is “the last step of data science” ▶ Let’s see an example
  • 30. Outline Logistic Motivation I: Optimization Methods for Data Science Motivation II: Data Science Techniques for Prescriptive Analytics Mindset Wrap up
  • 31. Example, take II: optimal sizing of an Energy Community battery 11 ▶ Why estimating the daily energy production at 5-minutes resolution? To estimate how large a battery you should buy ▶ Battery ≡ selling the energy to the market when it’s more convenient ▶ Price changes hourly with significant swings ▶ But battery costly, have to precisely decide how it will be used =⇒ have to precisely estimate how much energy will be produced ▶ Necessary but not sufficient input for optimal scheduling problem considering technical constraints (max % discharge at every interval) ▶ Plus, battery only comes in discrete chunks ▶ All in all, optimal sizing requires optimal scheduling
  • 32. Optimal sizing of an Energy Community battery, the data 12 ▶ Real Italian e / MWh 08-08-2021
  • 33. Optimal sizing of an Energy Community battery, the data 12 ▶ Real Italian e / MWh 08-08-2021 . . . you don’t want to see 08-08-2022
  • 34. Optimal sizing of an Energy Community: analytic model I 13 ▶ Data for the analytic model: 1. T = { 0 , 1 , . . . , 287 } set of time periods, T− = T { 287 } 2. for each t ∈ T, vt = energy production and pt = energy price 3. for each battery module, capacity (60 MHw), ammortised cost (80 e / MWh), max % charge/discharge in one time period (10) 4. energy loss for charging/discharging the battery (1%) ▶ Variables of the analytic model: 1. bt = amount of energy in the battery at the beginning of period t 2. st / et : energy stored in / extracted from the battery within period t 3. mt : energy sold to the energy market within period t 4. z: number of battery modules bought ▶ Problem is periodic: bt must be equal at first and last period
  • 35. Optimal sizing of an Energy Community: analytic model II 14 max P t∈T ptmt − 480z (1) 0 ≤ bt ≤ 60z t ∈ T (2) bt + 0.99 ∗ st − et = bt+1 t ∈ T− (3) b287 + 0.99 ∗ s287 − e287 = b0 (4) mt = 0.99et − st + vt t ∈ T (5) 0 ≤ et ≤ 6z t ∈ T (6) 0 ≤ st ≤ 6z t ∈ T (7) z ∈ N (8) ▶ May look complicated to the untrained eye, but in fact quite simple ▶ (2) battery capacity, (6)–(7) charging/discharging limit ▶ (3)–(4) battery energy flow, (5) defines mt (may be projected away) ▶ (8) integrality constraint: bad, but only one variable ▶ Easy to write and solve in practice, let’s see
  • 36. Optimal sizing of an Energy Community: wrap up 15 ▶ Synthetic / descriptive model is an input to the analytic / prescriptive model ▶ Better data leads to better decisions ▶ Don’t read too much in the example (artificial), but general concept holds
  • 37. Optimal sizing of an Energy Community: wrap up 15 ▶ Synthetic / descriptive model is an input to the analytic / prescriptive model ▶ Better data leads to better decisions ▶ Don’t read too much in the example (artificial), but general concept holds ▶ Need a prescriptive model be analytic? In principle no ▶ Could use reinforcement learning to learn the optimal decision rule ▶ Very good for Go, Atari games, protein folding
  • 38. Optimal sizing of an Energy Community: wrap up 15 ▶ Synthetic / descriptive model is an input to the analytic / prescriptive model ▶ Better data leads to better decisions ▶ Don’t read too much in the example (artificial), but general concept holds ▶ Need a prescriptive model be analytic? In principle no ▶ Could use reinforcement learning to learn the optimal decision rule ▶ Very good for Go, Atari games, protein folding . . . good and needed here? ▶ Data not there, so a simulator (“digital twin”) needed =⇒ no less information on the system than the analytical model needs (likely more) ▶ Analytical MILP model easy to write and solve with established tools, guarantees optimal solutions, fast enough (in this case) ▶ Same tools can be used in zillions other cases
  • 39. An aside (not really): synthetic vs. analytic, i.e., AI/ML vs. OR 16 ▶ Are synthetic models better than analytic models?
  • 40. An aside (not really): synthetic vs. analytic, i.e., AI/ML vs. OR 16 ▶ Are synthetic models better than analytic models? The question is ill-posed: better for what?
  • 41. An aside (not really): synthetic vs. analytic, i.e., AI/ML vs. OR 16 ▶ Are synthetic models better than analytic models? The question is ill-posed: better for what? ▶ Analytic and synthetic models are just very very different, the difference between reptilian brain and neocortex, intuition and proof ▶ Synthetic models ≡ AI = Artificial Intuition ≈ subconscious: quickly takes hopefully good decisions, can fail, may not know why (which is OK in many cases, in particular if you can’t wait longer) ▶ Analytic models ≡ mathemamical optimization ≈ logical brain: slowly carves its way through an optimal decision, and can prove it (which is OK if you really need to be sure and you can afford to wait)
  • 42. An aside (not really): synthetic vs. analytic, i.e., AI/ML vs. OR 16 ▶ Are synthetic models better than analytic models? The question is ill-posed: better for what? ▶ Analytic and synthetic models are just very very different, the difference between reptilian brain and neocortex, intuition and proof ▶ Synthetic models ≡ AI = Artificial Intuition ≈ subconscious: quickly takes hopefully good decisions, can fail, may not know why (which is OK in many cases, in particular if you can’t wait longer) ▶ Analytic models ≡ mathemamical optimization ≈ logical brain: slowly carves its way through an optimal decision, and can prove it (which is OK if you really need to be sure and you can afford to wait) ▶ Golden rule of all (computer) science: no tool is perfect for everything 1. would you write the Linux kernel in Javascript? a web page in assembler? 2. would you prove a theorem to decide how to run away from an hungry tiger? 3. would you only rely on intuition to design a nuclear reactor?
  • 43. The case for pacific co-existence between analytic and synthetic 17 ▶ They tried to use analytic models to develop general intelligence https://en.wikipedia.org/wiki/Fifth_generation_computer but it didn’t end up well: https://en.wikipedia.org/wiki/AI_winter ▶ Obvious in hindsight (which is 20/20): proving theorems is not how we work
  • 44. The case for pacific co-existence between analytic and synthetic 17 ▶ They tried to use analytic models to develop general intelligence https://en.wikipedia.org/wiki/Fifth_generation_computer but it didn’t end up well: https://en.wikipedia.org/wiki/AI_winter ▶ Obvious in hindsight (which is 20/20): proving theorems is not how we work ▶ Could it still work? In principle yes, but (likely) P ̸= NP: proving theorems quickly enough will never be possible in general (QC will not help)
  • 45. The case for pacific co-existence between analytic and synthetic 17 ▶ They tried to use analytic models to develop general intelligence https://en.wikipedia.org/wiki/Fifth_generation_computer but it didn’t end up well: https://en.wikipedia.org/wiki/AI_winter ▶ Obvious in hindsight (which is 20/20): proving theorems is not how we work ▶ Could it still work? In principle yes, but (likely) P ̸= NP: proving theorems quickly enough will never be possible in general (QC will not help) ▶ They are trying to replace all analytic models with synthetic ones, with only partial success, and anyway always without a guarantee
  • 46. The case for pacific co-existence between analytic and synthetic 17 ▶ They tried to use analytic models to develop general intelligence https://en.wikipedia.org/wiki/Fifth_generation_computer but it didn’t end up well: https://en.wikipedia.org/wiki/AI_winter ▶ Obvious in hindsight (which is 20/20): proving theorems is not how we work ▶ Could it still work? In principle yes, but (likely) P ̸= NP: proving theorems quickly enough will never be possible in general (QC will not help) ▶ They are trying to replace all analytic models with synthetic ones, with only partial success, and anyway always without a guarantee ▶ Could it still work? In principle yes, but I’d wager not since (likely) P ̸= NP =⇒ solving problems inherently requires exponentially many information, if AI could do that it would be with little information =⇒ P = NP (but then also analytic models would be “easy”) ▶ In other words: even if AI reproduces our intelligence, it will not be good at proving theorems, since we are not (and nothing can be)
  • 47. The case for pacific co-existence between analytic and synthetic 17 ▶ They tried to use analytic models to develop general intelligence https://en.wikipedia.org/wiki/Fifth_generation_computer but it didn’t end up well: https://en.wikipedia.org/wiki/AI_winter ▶ Obvious in hindsight (which is 20/20): proving theorems is not how we work ▶ Could it still work? In principle yes, but (likely) P ̸= NP: proving theorems quickly enough will never be possible in general (QC will not help) ▶ They are trying to replace all analytic models with synthetic ones, with only partial success, and anyway always without a guarantee ▶ Could it still work? In principle yes, but I’d wager not since (likely) P ̸= NP =⇒ solving problems inherently requires exponentially many information, if AI could do that it would be with little information =⇒ P = NP (but then also analytic models would be “easy”) ▶ In other words: even if AI reproduces our intelligence, it will not be good at proving theorems, since we are not (and nothing can be) ▶ Wrap-up: do not expect AI to save you from theorems =⇒ this course
  • 48. Outline Logistic Motivation I: Optimization Methods for Data Science Motivation II: Data Science Techniques for Prescriptive Analytics Mindset Wrap up
  • 49. Syllabus 18 ▶ Linear algebra and calculus background (on a need-to-know basis) ▶ “Simple” unconstrained optimization, univariate and multivariate ▶ Univariate continuous unconstrained optimization ▶ Multivariate continuous unconstrained optimization (++/--gradient) ▶ Constrained continuous optimization theory ▶ Convexity and Duality (Lagrangian, linear, quadratic, conic, . . . ) ▶ Some numerical methods for continuous constrained optimization ▶ Mixed-integer optimization models, sparse hints to software tools ▶ Sparse hints to Data Science applications throughout
  • 50. Course material 19 ▶ Slides prepared by the lecturer + recording of lectures ▶ Code (mostly, but not only, Matlab) + example data ▶ S. Boyd, L. Vandenberghe Convex optimization, Cambridge Un. Press, 2008 (http://web.stanford.edu/~boyd/cvxbook/) ▶ M.S. Bazaraa, H.D. Sherali, C.M. Shetty Nonlinear programming: theory and algorithms, Wiley & Sons, 2006 ▶ D.G. Luenberger, Y. Ye Linear and Nonlinear Programming, Springer International Series in Operations Research & Management Science, 2008 ▶ J. Nocedal, S. Wright Numerical Optimization, Springer Series in Operations Research and Financial Engineering, 2006 ▶ J. Lee A First Course in Linear Optimization Reex Press, 2013 (https: //sites.google.com/site/jonleewebpage/home/publications/#book) ▶ F. Schoen Optimization Models, free e-book version, 2022 (https: //webgol.dinfo.unifi.it/OptimizationModels/contents.html) ▶ Pointers to relevant web resources for specific themes
  • 51. Where we want to go, how we get there 20 ▶ Main aim of the course: get a general understanding of several different classes of optimization problems and the corresponding solution algorithms ▶ Sought-after ability: quickly understand any new one you will see (or invent yourself) and have the means to analyse it (is it better than others?) ▶ Optimization problems and algorithms are mathematical objects =⇒ reasoning about them is proving theorems (+ some hand-waving) ▶ But theorems often fail to paint the whole picture =⇒ implementation and practical tests are also crucial ▶ Aiming at a good balance between the theoretical and the practical side ▶ Nontrivial since it depends on your background, which is varied ▶ Self-contained course: little previous math/optimization background assumed ▶ Two different tracks for grading
  • 52. Grading: two different tracks 21 Yes there are two paths you can go by, but in the long road There is still time to change the road you’re on
  • 53. Grading: two different tracks 21 ▶ Grading on both theory and practice too heavy, have to choose one ▶ Best choice depends on your background/tastes, so choice is yours ▶ Two different grading tracks: 1. theoretical: individual, grading = written + oral exam 2. practical: 2-person groups, grading = didactic project + oral exam ▶ Free choice + changing track possible, but only one project per person per year =⇒ once project done/renounced, theoretical track mandatory ▶ Special SMS++ projects for C++ kamikaze, few minor perks ▶ Theoretical track focuses on theory, but assumes some familiarity with implementations/experiments seen during the course ▶ Practical track focuses on project (implementations/tests), but assumes some familiarity with the theory seen during the course ▶ An experiment: you’ll vote with your feet and we’ll see how it goes
  • 54. Outline Logistic Motivation I: Optimization Methods for Data Science Motivation II: Data Science Techniques for Prescriptive Analytics Mindset Wrap up
  • 55. Wrap up: what’s this course about 22 ▶ Mathematical optimization and its two (main) different roles in Data Science: 1. as a tool for developing mathematical models and solving fitting problems 2. as prescriptive analytics, “the last step of the decision process” ▶ A strictly need-to-know review of the underlying mathematical concepts (calculus, linear algebra, numerical analysis, . . . ) ▶ Some hands-on experience with the pracicalities and pitfalls of implementations ▶ Focus on easy problems (linear, quadratic, conic, convex) or local optima, since in Data Science size is huge (hard because large, not hard because hard), and besides the global optimal solution is not (necessarily) what you want ▶ Understanding if/why a problem is easy, a bit about what to do if it it not ▶ Data Science the main source of models/problems, not our real focus: Data Science is ̸= and ≫ than this, but you’ll see plenty of that elsewhere