Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Computational
decision making
Dr. Boris Adryan
@BorisAdryan
What I aim to
provide
✓basic vocabulary for
✓fundamental concepts of
computational decision making
✓phenomenological intro...
What this
presentation isn’t
x hands-on tutorial
x thorough summary
x comprehensive guide
x technical deep dive
x statisti...
Is this artificial
intelligence?
word = input(‘Enter a word:’)
for key in British_dictionary.iteritems():
if key.startswit...
Is this machine
learning?
temperature = float(input('What is the temperature?'))
if temperature >= 1.0:
print('Wear shorts...
Definition
Rule-based decision making on
the basis of numeric thresholds
or string patterns etc is not
machine learning.
A...
But what if…
…the threshold is inferred at
run time?
“Write a software that says how
close to Euston you can move if
you c...
Linear regression
is probably the most simple
“machine learning” method.
Average property price
Northern Line, number of s...
Linear regression
can become arbitrarily
complicated.
The difference between curve fitting
in statistics and machine learn...
Classification tasks I. setosa
I. virginica
I. versicolor
All images from https://en.wikipedia.org/wiki/Iris_flower_data_s...
Classification tasks
Supervised learning requires
complete input matrices. Missing
or nonsensical values have to be
replac...
Classification tasks
sepal length
sepalwidth
I. setosa
I. virginica
In a first approximation,
classification (by regressio...
sepal length
sepalwidth
I. setosa
I. virginica
I. versicolor
Decision trees
sepal width
petalwidth
A decision tree can be ...
Random forests
A collection of decision trees, each
trained on a random subset of the
data, can minimise the risk of
over-...
Over-/Underfitting
Sloppy separation is called
underfitting, and greedy
separation overfitting.
Counteracting an overfit i...
Dimensionality
reduction
Dimensionality reduction aims to
reduce the complexity of a dataset
(in respect to number of feat...
Support vector
machine
The SVM aims to provide an ideal
separation plane by supporting it
with a training data vector.
A c...
Kernel trick
Input data can be projected into a
higher-dimensional space that
allows linear separation of
otherwise insepa...
features
weather forecast
airport location
# of gates
# of runways
# of
snowploughs
airline
aircraft
BLACK
BOX
training
fl...
training
classifier
performance
assessment
good enough?
success!
moredatafortraining
data
no
yes
sensitivity
“truepositive...
https://en.wikipedia.org/wiki/Precision_and_recall
There is a wide range of performance
metrics, comprising combinations o...
data acquisition model building test use in production
data recording
(production system)
evaluation
raw data clean-up fea...
Choosing a method
from: Olson et al., 2017, https://arxiv.org/abs/1708.05070
There is no ‘one-size-fits-all’
machine learn...
What about
neural networks?
feature 1 feature 2 feature 3
weight 1 weight 2 weight 3
input function
activation function
cl...
Deep neural networks
http://www.asimovinstitute.org/neural-network-zoo
While many artificial neural networks
show great pe...
In RL, the methods iteratively
learn to optimise an output from
an abstract representation of a
system
Reinforcement
learn...
Machine learning can help to
structure and explore an
unknown dataset.
These methods aid the
identification of classes whe...
Hierarchical clustering
Derives hierarchical dependencies of
individual rows and columns in the
dataset on the basis of si...
k-means clustering
Defines k different centroids to which
data points are assigned by proximity. If
the distance doesn’t g...
Conclusions
https://badryan.github.io/2015/10/20/is-it-all-machine-learning.html
•People on the Internet steal
infographic...
Upcoming SlideShare
Loading in …5
×

Computational decision making

1,048 views

Published on

A brief lesson on what constitutes computational decision making, from simple regression via various classification methods to deep learning. No maths, only basic concepts to teach the lingo of machine learning to a lay audience.

Published in: Technology
  • Ich kann eine Website empfehlen. Er hat mir wirklich geholfen. ⇒ www.WritersHilfe.com ⇐ Zufrieden und beeindruckt.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Computational decision making

  1. 1. Computational decision making Dr. Boris Adryan @BorisAdryan
  2. 2. What I aim to provide ✓basic vocabulary for ✓fundamental concepts of computational decision making ✓phenomenological introduction to machine learning methods ✓a rough idea when and how to use these methods
  3. 3. What this presentation isn’t x hands-on tutorial x thorough summary x comprehensive guide x technical deep dive x statistics course
  4. 4. Is this artificial intelligence? word = input(‘Enter a word:’) for key in British_dictionary.iteritems(): if key.startswith(word): print(‘This is a British word.’)
  5. 5. Is this machine learning? temperature = float(input('What is the temperature?')) if temperature >= 1.0: print('Wear shorts.') else: print('Wear long underwear.’)
  6. 6. Definition Rule-based decision making on the basis of numeric thresholds or string patterns etc is not machine learning. And most definitely it is not artificial intelligence.
  7. 7. But what if… …the threshold is inferred at run time? “Write a software that says how close to Euston you can move if you can afford to spend £650k.” 450 550 650 750 850 950 1050 0 3 6 9 12 Average property price Northern Line, number of stops from Euston table = input_table(‘cost at station’) print(where_x_yields_a_low_enough_y)
  8. 8. Linear regression is probably the most simple “machine learning” method. Average property price Northern Line, number of stops from Euston It is an example of supervised learning, because we teach the computer the relation between an input variable (“feature”) and an output variable (“label”). 450 550 650 750 850 950 1050 0 3 6 9 12 y = m . x + b
  9. 9. Linear regression can become arbitrarily complicated. The difference between curve fitting in statistics and machine learning is mostly semantics. f(number of stops to Euston, square footage, bedrooms, bathrooms, …) price many features Euston High Barnet small large price
  10. 10. Classification tasks I. setosa I. virginica I. versicolor All images from https://en.wikipedia.org/wiki/Iris_flower_data_set Rather than projecting the feature vector onto a continuous variable, many supervised learning methods identify “class labels”.
  11. 11. Classification tasks Supervised learning requires complete input matrices. Missing or nonsensical values have to be replaced or removed. Non-numerical features (think, e.g. “name of colour”, “smell”) have to be encoded. class label feature 1 feature 2 feature 3 feature 4 1 5.1 3.5 1.4 0.2 1 4.9 3.0 1.4 0.2 2 7.0 3.2 4.7 1.4 3 6.3 3.3 6.0 2.5
  12. 12. Classification tasks sepal length sepalwidth I. setosa I. virginica In a first approximation, classification (by regression) aims to find a function that best separates the different class labels. f(sepal width, sepal length) (1,2) “1” “2”
  13. 13. sepal length sepalwidth I. setosa I. virginica I. versicolor Decision trees sepal width petalwidth A decision tree can be understood as a series of linear separations of the data. ratio of sepal width : sepal length I. virginica ratio of petal width : sepal width I. setosa I. versicolor
  14. 14. Random forests A collection of decision trees, each trained on a random subset of the data, can minimise the risk of over-fitting. A single big decision tree trained on all data can effectively describe a single data point. sepal width petalwidth
  15. 15. Over-/Underfitting Sloppy separation is called underfitting, and greedy separation overfitting. Counteracting an overfit is called regularisation, and works by penalising too many features (L1) or too strong feature weights (L2). underfit (high bias) okay overfit (high variance)
  16. 16. Dimensionality reduction Dimensionality reduction aims to reduce the complexity of a dataset (in respect to number of features). The first principal components are dimensions that explain most of a dataset’s variance.
  17. 17. Support vector machine The SVM aims to provide an ideal separation plane by supporting it with a training data vector. A classification margin protects the SVM against overfitting. ? margin support vectors decision boundary wTx = 0 negative hyperplane wTx = -1 positive hyperplane wTx = 1
  18. 18. Kernel trick Input data can be projected into a higher-dimensional space that allows linear separation of otherwise inseparable data. There are different kernels, such as the radial basis function (Gaussian) or polynomial kernel. “2D” input space Φ “3D” feature space http://scikit-learn.org/0.18/auto_examples/svm/plot_iris.html
  19. 19. features weather forecast airport location # of gates # of runways # of snowploughs airline aircraft BLACK BOX training flights cancelled in the past classifier ranked list of relevant features weight of features thresholds for features performance metric prediction new data General approach An intuitive example from real life.
  20. 20. training classifier performance assessment good enough? success! moredatafortraining data no yes sensitivity “truepositives” 1-specificity “false positives” 0 0.2 0.4 0.6 0.8 1.0 1.0 0.8 0.6 0.4 0.2 worse than random guess Classifier performance Not all machine learning behaves ideal, and performance metrics are important for quality checks and parameter tuning.
  21. 21. https://en.wikipedia.org/wiki/Precision_and_recall There is a wide range of performance metrics, comprising combinations of true & false positives as well as true & false negatives. Metrics zoo positive class (P) negative class (N) predicted positive true positive (TP) false positive (FP) predicted negative false negative (FN) true negative (TN)
  22. 22. data acquisition model building test use in production data recording (production system) evaluation raw data clean-up feature engineering model learning model selection labour intense compute intensebrain intense development production ML pipeline
  23. 23. Choosing a method from: Olson et al., 2017, https://arxiv.org/abs/1708.05070 There is no ‘one-size-fits-all’ machine learning method. Most methods need to be carefully tuned to perform ideal. Often, there a ‘non-functional’ constraints on choosing a method. Runtime, interpretability, etc.
  24. 24. What about neural networks? feature 1 feature 2 feature 3 weight 1 weight 2 weight 3 input function activation function class output error for weight updates a simple perceptron Neural networks attempt to mimic the integrative properties of neurons. The perceptron is a single-layer network. inputs outputs
  25. 25. Deep neural networks http://www.asimovinstitute.org/neural-network-zoo While many artificial neural networks show great performance, the basis on which features exactly the classification works remains largely unknown.
  26. 26. In RL, the methods iteratively learn to optimise an output from an abstract representation of a system Reinforcement learning move (l/r) or shoot unknown systemmap (210 x 160 pixels, 8-bit RGB) actual score choose action on basis of map to optimise score Mnih et al., Nature (2015)
  27. 27. Machine learning can help to structure and explore an unknown dataset. These methods aid the identification of classes where their existence isn’t known yet. Unsupervised learning • Hierarchical clustering • K-means clustering • Expectation maximisation • Density-based clustering plus clever visualisation
  28. 28. Hierarchical clustering Derives hierarchical dependencies of individual rows and columns in the dataset on the basis of similarity (correlation) between their properties. Combined with a heatmap, gives a good first impression of a dataset.
  29. 29. k-means clustering Defines k different centroids to which data points are assigned by proximity. If the distance doesn’t get much smaller, k is the number of clusters in the set. Density-based clustering and expectation maximisation are conceptually related, the latter giving a probability for membership in any group. k = 2 k vs centroid distance
  30. 30. Conclusions https://badryan.github.io/2015/10/20/is-it-all-machine-learning.html •People on the Internet steal infographics. •ML methods have been around in the stats world for ages, but big data sets and compute power make them more widely known. •Understanding key principles behind ML should be part of the school curriculum.

×