IT-Power Services GmbH
Beyond a Machine
Learning Model
PyALE: Interpreting ML models with ALE plots.
Dana Jomar
Machine Learning
• What is the expectedoutputof a
Machine Learningproject?
Image
by
rawpixel.com
on
Freepik
Machine Learning
• What is the expectedoutputof a
Machine Learningproject?
• Predicting future events
Image
by
rawpixel.com
on
Freepik
Machine Learning
• What is the expectedoutputof a
Machine Learningproject?
• Predicting future events
• Learnings about the data
• Insights about the business
• Learnings about current data collection
processes
• A deeper understanding of a process or
event
Image
by
rawpixel.com
on
Freepik
ALE Plots
Accumulated Local Effect Plots
• A machine learninginterpretationtool
• Describeshow a feature affects the predictionof a model
• Advantages:
• Can handle correlated features
• Faster than its competitors (i.e., PDPs)
• Relatively easy to interpret
Photo
by
isaacmsmith
on
Unsplash
ALE Plots
Accumulated Local Effect Plots
Source:
Interpretable Machine Learning
by Christoph Molnar
Photo
by
Lukas
from
Pexels
• For the given feature:
• Cut the value range intobins
• Within each bin for each data point
• Replacethe value of the feature
• once with the lower bound of bin, and
• once with the upper bound of bin
• Predict target for both virtualdata points
• Get the local effect for each bin:
• Compute the average difference
between both predictions
• Get the cumulative sum of the local effects
ALE Plots
Accumulated Local Effect Plots
7
• The effect of the area on real-estate
price predictions
Use Case
Not long ago…
9
Photos
by
Oleg
Magni
on
Pexels
10
Research Questions and Goals
• How effective are the measures?
• How long does it take for effects to unfold?
• Machine learningmodel:
• Not to make future predictions...
• ...but to understand, what
the model learns from the data
11
What should the model learn?
Photos
by
Eva
Elijas
from
Pexels
• One country:
• Set of measures
• Effectivenessof measures is not separable
(because they areimplemented at the same time)
• Different countries:
• Differentsetsof measures
• Somemeasures intersect and some don't
• The effectivenessis "separable",
individualmeasures can be examined
• The model can learn the individual effects
from the many different combinations of measures
Data Sources
12
Photoby SharonMcCutcheon on Unsplash
Publicly available data
• COVID-19cases
• Johns Hopkins University
• The cumulative number of daily cases
• Per country (214 countries at time of project)
• Politicalmeasures
(non-pharmaceuticalinterventions)
• CoronaNet Research Project
• 180+ scientists
• Standardised data (16 categories, plus subgroups)
• 198 countries at the time of the project
• Data used: 22 Jan to 17 Aug 2020
13
Photo
by
cottonbro
from
Pexels
Growth rate
Data preparation
• Growth rate within each country
• Relative increases of COVID-19 cases within one country
• Thus comparable between countries
• Exponential growth at the beginning of the pandamic
• Results in constant growth rate
(e.g.: 1.05 corresponds to 5% daily increase)
• The features:
• For each measure the number of days since enforcement
Results
14
Photo
by
Lukas
from
Pexels
• ALE Plots:Changesin the
predictedgrowth rate
• From 14 days before to 60 days
after implementation
• An average over all countries,
in which the measure was implemented
• Gray lines: Random fluctuations (bootstrap)
• Effect of measureunfolds around10 days
after it has been implemented
Results
15
Photo
by
Lukas
from
Pexels
Scientific Literature
16
Photo
by
Annie
Spratt
on
Unsplash
At the time of the project
• Many simulation studies
(based on assumptions)
• Few studies with empirical data
• Focus on the strength of the measures,
not on the onset of effects
• Advantages of our approach:
temporal patterns of effects
can be visualized
Python Package: PyALE
17
• 1D ALE plots for numerical features, including the option to plot a
confidence interval of the accumulated effect
Features
Photo
by
Lukas
from
Pexels
Python Package: PyALE
18
• 1D ALE plots for categorical features, including:
• A support for different encoding strategies
• Confidence interval of the accumulated effect
Features
Photo
by
Lukas
from
Pexels
Python Package: PyALE
19
• The distribution of the data points is shown as
a rug for numerical features and bars for
categorical features
• Customise the plot by passing a matplotlib
figure and axes to the function ale
Features
Photo
by
Lukas
from
Pexels
Python Package: PyALE
20
• 2D plots for numerical features
Features
Photo
by
Lukas
from
Pexels
IT-Power Services GmbH
Anton Afritschgasse 23 . 2512 Tribuswinkel
www.it-ps.at
Dana Jomar
Data Scientist
Dana.jomar@it-ps.at
Thank You!

[DSC DACH 23] Beyond a Machine Learning Model - Dana Jomar

  • 1.
    IT-Power Services GmbH Beyonda Machine Learning Model PyALE: Interpreting ML models with ALE plots. Dana Jomar
  • 2.
    Machine Learning • Whatis the expectedoutputof a Machine Learningproject? Image by rawpixel.com on Freepik
  • 3.
    Machine Learning • Whatis the expectedoutputof a Machine Learningproject? • Predicting future events Image by rawpixel.com on Freepik
  • 4.
    Machine Learning • Whatis the expectedoutputof a Machine Learningproject? • Predicting future events • Learnings about the data • Insights about the business • Learnings about current data collection processes • A deeper understanding of a process or event Image by rawpixel.com on Freepik
  • 5.
    ALE Plots Accumulated LocalEffect Plots • A machine learninginterpretationtool • Describeshow a feature affects the predictionof a model • Advantages: • Can handle correlated features • Faster than its competitors (i.e., PDPs) • Relatively easy to interpret Photo by isaacmsmith on Unsplash
  • 6.
    ALE Plots Accumulated LocalEffect Plots Source: Interpretable Machine Learning by Christoph Molnar Photo by Lukas from Pexels • For the given feature: • Cut the value range intobins • Within each bin for each data point • Replacethe value of the feature • once with the lower bound of bin, and • once with the upper bound of bin • Predict target for both virtualdata points • Get the local effect for each bin: • Compute the average difference between both predictions • Get the cumulative sum of the local effects
  • 7.
    ALE Plots Accumulated LocalEffect Plots 7 • The effect of the area on real-estate price predictions
  • 8.
  • 9.
  • 10.
    10 Research Questions andGoals • How effective are the measures? • How long does it take for effects to unfold? • Machine learningmodel: • Not to make future predictions... • ...but to understand, what the model learns from the data
  • 11.
    11 What should themodel learn? Photos by Eva Elijas from Pexels • One country: • Set of measures • Effectivenessof measures is not separable (because they areimplemented at the same time) • Different countries: • Differentsetsof measures • Somemeasures intersect and some don't • The effectivenessis "separable", individualmeasures can be examined • The model can learn the individual effects from the many different combinations of measures
  • 12.
    Data Sources 12 Photoby SharonMcCutcheonon Unsplash Publicly available data • COVID-19cases • Johns Hopkins University • The cumulative number of daily cases • Per country (214 countries at time of project) • Politicalmeasures (non-pharmaceuticalinterventions) • CoronaNet Research Project • 180+ scientists • Standardised data (16 categories, plus subgroups) • 198 countries at the time of the project • Data used: 22 Jan to 17 Aug 2020
  • 13.
    13 Photo by cottonbro from Pexels Growth rate Data preparation •Growth rate within each country • Relative increases of COVID-19 cases within one country • Thus comparable between countries • Exponential growth at the beginning of the pandamic • Results in constant growth rate (e.g.: 1.05 corresponds to 5% daily increase) • The features: • For each measure the number of days since enforcement
  • 14.
    Results 14 Photo by Lukas from Pexels • ALE Plots:Changesinthe predictedgrowth rate • From 14 days before to 60 days after implementation • An average over all countries, in which the measure was implemented • Gray lines: Random fluctuations (bootstrap) • Effect of measureunfolds around10 days after it has been implemented
  • 15.
  • 16.
    Scientific Literature 16 Photo by Annie Spratt on Unsplash At thetime of the project • Many simulation studies (based on assumptions) • Few studies with empirical data • Focus on the strength of the measures, not on the onset of effects • Advantages of our approach: temporal patterns of effects can be visualized
  • 17.
    Python Package: PyALE 17 •1D ALE plots for numerical features, including the option to plot a confidence interval of the accumulated effect Features Photo by Lukas from Pexels
  • 18.
    Python Package: PyALE 18 •1D ALE plots for categorical features, including: • A support for different encoding strategies • Confidence interval of the accumulated effect Features Photo by Lukas from Pexels
  • 19.
    Python Package: PyALE 19 •The distribution of the data points is shown as a rug for numerical features and bars for categorical features • Customise the plot by passing a matplotlib figure and axes to the function ale Features Photo by Lukas from Pexels
  • 20.
    Python Package: PyALE 20 •2D plots for numerical features Features Photo by Lukas from Pexels
  • 21.
    IT-Power Services GmbH AntonAfritschgasse 23 . 2512 Tribuswinkel www.it-ps.at Dana Jomar Data Scientist Dana.jomar@it-ps.at Thank You!