SlideShare a Scribd company logo
1 of 45
Introduction to
Machine Learning
Sebastian E. Kwiatkowski
(sebastian@aisummary.com)
__
Artificial Intelligence
Machine Learning (ML) Everything else
HIGH-LEVEL OVERVIEW
Rule/expert-
based
Biology-inspired
Other ML models:
• Nearest
Neighbors
• Trees, forests
• Naive Bayes
• Support vector
machines
Neural networks:
• Feed-forward
• Convolutional
• Residual
• Recurrent
• Autoencoders
• Memory
Basic models:
• Linear regression
• Logistic regression
TO LIVE IS TO PREDICT
Biology
- Food: edible or
poisonous?
- Fight/flight/freeze
- Position within the
hierarchy
- Mating choice
- Financial markets
- Betting markets
- Economic forecasts
- Business plans
- Election results
- Sports betting
Forecasting
Example 3- Career/job choice
- Moving
- Medical
interventions
- How to compete
with machines?
Modern life
• Effective decision-making requires accurate predictions.
• Humans and other species have evolved adaptations to cope with uncertainty:
• Memory: data storage linked to the ability to generate predictions
• Mental time traveling: ability to project oneself into the future
• Automated processes associated with certain emotions
16,000 words
spoken per person
per day
100 trillion words
words spoken by
humanity per day
28 million papers
(1980-2012)
130 million books
indexed by Google
1 billion websites
on the World Wide Web
500 million videos
hosted on YouTube
BIG TEXT DATA
COMPETITION AS A DISCOVERY PROCESS
• Machine learning is organized around competitions.
• A dataset is split up into two parts:
• a training set and a test set
• Competitors submit models trained on the first set.
• Open source ensures perfect replicability.
• Models are then evaluated based on the second set.
• Competition winners tend to dominate the discourse …
• … until the next improvement is published.
• “State of the art”: best competition performances
DOMAIN-GENERAL LEARNING: COMPUTER VISION 1/2
• Domain-general learning strategies
• Example: convolutional neural networks
Rapid progress in image classification
• LeCun et al. (1998): a CNN trained on the MNIST dataset (60,000 small images of digits, 9 classes) achieves an error
rate of 1-2% when tested on 10,000 images
• Krizhevsky et al. (2012): a deep CNN trained on 1.2 million high-resolution images and 1,000 classes achieves a 17%
top-5 error rate when tested on 150,000 images
Skin cancer classification
• Esteva et al. (2016): Deep CNN, trained on 130,000 clinical images
• Human-level performance when tested against 21 dermatologists on two binary classification tasks
• karitinocytes carcinomas: most common skin cancer
• melanomas: deadliest skin cancer
DOMAIN GENERALITY: COMPUTER VISION 2/2
• Is there something special about skin cancer classification?
• Litjens et al. (2017) summarizes the use of deep learning for medical image analysis:
• at least 300 papers, most published in 2016
• CNNs have already been applied in the 90s:
• Lo et al. (1995): CNN trained to recognize lung nodules in x-rays
• In most cases, the only input to the learning algorithms is a set of pairs.
• Each pair consists of an image and a label:
• Malignant vs. benign
• Stage 1/2/3/4
DOMAIN GENERALITY: NATURAL LANGUAGE PROCESSING
Text classification
• Kim (2014): a shallow CNN outperforms state-of-the-art (SOTA) results in sentence classification tasks
• Conneau et al. (2017): very deep CNNs improve upon SOTA results in short text classification tasks
Sequence labeling
• Map a sequence of words to a sequence of tags:
• Strubell et al. (2017): A new CNN variant achieves almost SOTA results, but 10-20X faster
Machine translation
• Kalchbrenner et al. (2017): SOTA performance on an English-German translation benchmark
Tim Cook is Chief Executive Officer of Apple .
B-Name I-Name O B-Title I-Title I-Title O B-Org O
Elements of a machine
learning system
__
The basic workflow in a machine learning project:
WORKFLOW
IMPROVE
(when needed)
Error analysis,
more data,
“better” models
PROBLEM
FORMULATION
Can you describe the
problem in terms of
existing solutions?
DATA COLLECTION
How can you obtain
a large high-quality
dataset?
MODEL TRAINING
& SELECTION
What is a good
model? How do
you measure
success?
PROBLEM FORMULATION
TYPE OUTPUT COMMENT APPLICATIONS
Multi-class
classification
can be thought of as a special case of
sequence prediction
probability
distribution
topic classification, very good to very
bad, object classification, staging
Binary classification probability a special case of multi-class classification
yes/no, positive/negative,
present/absent, similar/dissimilar
Sequence prediction
sequence of
probability
distributions
sequence labeling, machine translation,
speech synthesis, image segmentation
at the core of intelligence
(artificial and biological)
Clustering
segmentation: customers, images
detection: communities, anomalies
cluster
membership
another special case:
number of clusters is a hyperparameter
robotics, driverless cars, conversational
agents, game playing
sequence predictions in an active
environment: actions effect observations
sequence of
actions
Reinforcement
learning
DATA COLLECTION
• A data set D is a collection of n data points di.
• Each data point di = (xi, ti) in D is a pair consisting of features x and a target t.
• Easy problems require, at least, hundreds or thousands of data points.
• Harder problems require millions of data points.
• Occasionally, data will be provided or is available in existing databases.
• Usually, a data collection strategy has to be devised and implemented.
Weakly supervised Example 3
• Humans manually label
the inputs with
appropriate targets.
• “This is a cat. That’s a
dog. This is another cat.”
• Minimalhumanintervention
• Downloadallimagesw/
thehashtags#cat,#dog
• Syntheticdata
• Nohumansupervision
• Learnpotentially
relevantpatternsfrom
giganticdatasets
Supervision Unsupervised
WHAT IS A MODEL, ANYWAY?
• Using parameters θ, a model f generates a prediction y from an input x:
• f(x, θ) = y
• Parameters allow the model to “weight the evidence”.
• Example: a simple binary classification problem
• Does a given article from a news archive focus on politics? Yes or no?
• Consider the parameters (weights) for the following words:
• Which of these parameters will be positive and negative?
election soccerthe
Loss function /
logistic loss
__
Logistic loss
LOSS FUNCTION: LOGISTIC LOSS
The target can be either 1 (“did occur”) or 0 (“did not occur”).
If the target equals 1: -log(prediction)
If the target equals 0: -log(1-prediction)
loss (prediction, target) = target  log (prediction)-
[
(1 - target)  log (1 - prediction)+ ]
Loss
- Some predictions are better than others.
- The deviation of the prediction p from the target t is referred to as loss.
- Synonyms: cost, error, empirical risk
- The loss is calculated through a loss function.
- Logistic loss is one of the most important loss functions.
LOGISTIC LOSS: EXAMPLES
Good prediction Mediocre predictionBad predictionBad prediction
loss prediction, target = −[target  log(prediction) + 1 − target  log(1 − prediction)]
Loss: -log(0.9) ≈ 0.046
This is a good prediction.
Consequently, the loss is
small.
Target: 1
Prediction: 90%
Target: 1
Prediction: 10%
Loss: -log(0.1) ≈ 0.699
This prediction is
inaccurate and the loss,
therefore, is high.
Target: 0
Prediction: 40%
Loss: -log (1-0.4) ≈ 0.222
This loss is a function of
the counter-probability of
60%.
WHY LOGISTIC LOSS?
• The likelihood function returns the probability of the data for a given parameter.
• In practice, it is convenient to use the log likelihood:
log L parameters data = log
i=1
n
P data pointi parameter) =
i=1
n
logP(data pointi|parameter)
L parameters data = P data parameter =
i=1
n
P data pointi parameter
• Coin flip example: L(ph=0.5|HT) = P(HT|ph=0.5) = 0.25
Likelihood
Log likelihood
• Using the log likelihood helps prevent underflow problems.
MAXIMUM LIKELIHOOD PRINCIPLE
The maximum likelihood principle tells us to select the parameters θ∗ that maximize the probability of the data:
A maximization problem w.r.t. to f(x) is equivalent to a minimization problem w.r.t. to f(-x):
For a random variable with two outcomes, the logistic loss is the negative log likelihood.
Thus, minimizing the logistic loss is equivalent to the maximum likelihood approach.
θ∗= arg maxθ
i=1
n
log P(data pointi|θ)
θ∗= arg minθ[−
i=1
n
log P(data pointi|θ) ]
Brief digression:
Ockham’s Scotus’s Razor
__
NUMBER COMPLETION TASK
3, 9, 27, 81, ?
What is the next number in this sequence?
Simple solution
f(x)=3x
f(1) = 3, f(2) = 9, f(3)= 27, f(4) = 81
f(5) = 243
f(x)= -15 + 32x – 18x2 + 4x3
f(1) = 3, f(2) = 9, f(3) = 27, f(4) = 81
But: f(5) = 195
More complex solution
SCOTUS’S RAZOR
• Problem: There is an infinite number of solutions to any
sequence prediction problem.
• Most, if not all, machine learning problems are
sequence prediction problems.
• One solution: Ockham’s Razor
• Prefer the simplest theory consistent with the data
• First clear formulation by 13th century theologian
Duns Scotus
• Today: Don’t use a fancy machine learning model
when a simple model works just fine.
WHY OCKHAM’S RAZOR?
• It works.
• Successful applications in in ML, science, business, design and other fields
• Fast & cheap
• Simpler models tend to be faster models and consume fewer resources.
• The Schmidhuber/Hutter argument:
• The Great Programmer implements all possible universes with program lengths from 1 to N.
• Program B is a functional copy of program A if both lead to the same result but with different code.
• Simpler programs have more functional copies than longer programs.
• Simple program: print(“Hello world!”)
• Functional copy: const message = “Hello world!”; print(message)
Neural networks
__
• A neuron is the basic processing unit:
• Accepts input, processes input, sends
output
• Neurons are connected to other
neurons.
• Connections are weighted.
• A layer is a group of neurons.
• Every neural network has an input layer
and an output layer.
• Hidden layer:
• Any layer between input and output
• Shallow: ~ 1-5 hidden layers
• Deep: dozens or hundreds of layers
BUILDING BLOCKS: NEURONS, WEIGHTS AND LAYERS
Output:
f(w1x1+...+w3x3)
Input layer Output layer
Input 1: x1
Input 2: x2
...
Input n: xn
w1
w2
wn
A FEED-FORWARD NETWORK WITH TWO HIDDEN LAYERS
• Baseline model for binary
classification tasks:
• f(x) = s(w1x1+...+wnxn+b)
• Weigh the evidence
• Add a bias
• The sigmoid function s “squashes” the
input to a number b/w 0 and 1.
• Can be formulated as a neural net:
• The input x1, ..., xn corresponds to
neurons in the first layer.
• The bias corresponds to an additional
neuron with a connection weight of 1.
• The output neuron applies the sigmoid
function.
LOGISTIC REGRESSION
Input layer Output layer
Bias: b
Input 1: x1
Input 2: x2
Input n: xn
Output:
s(w1x1+...+wnxn+b)
...
w1
wn
w2
1
THE SIGMOID FUNCTION
The sigmoid function s(x) is one of the most
frequently used functions in machine learning:
𝑠 𝑥 =
1
1 + 𝑒−𝑥
Desirable properties:
• “squashes” any input into the range between 0 and 1
• The derivative is easy to compute:
𝑑𝑠
𝑑𝑥
= 𝑠 𝑥 (1 − 𝑠 𝑥 )
A GLIMPSE AT BACKPROPAGATION
• Model parameters are initialized randomly.
• The term “training” refers to the (iterative) optimization of parameters.
• Almost all neural nets are trained with the backpropagation algorithm.
Backpropagation algorithm (n repetitions)
Go through each instance:
1. Forward propagation: Compute the prediction and the loss.
2. Backward propagation: For each parameter, compute the derivative w.r.t. the loss.
• Positive derivative: small increase in parameter => increase in loss
• Negative derivative: small increase in parameter => decrease in loss
3. Update: Use the derivative to apply an update rule.
• Simple rule: old value = new value – learning rate  derivative
A SIMPLE EXAMPLE
Source: hackernoon.com
INCREASED LEARNING RATE
Source: hackernoon.com
A MORE REALISTIC EXAMPLE
Source: Analytics Vidhya
A small zoo of neural
networks
__
STANDARD FEED-FORWARD NETWORK
HIGHWAY NETWORK
RECURRENT NEURAL NETWORK
BIDIRECTIONAL RECURRENT NEURAL NETWORK
MINIMAL GATED UNIT
Convolutional
neural networks
__
CONVOLUTIONAL LAYER
• CNNs use a repeated sequence of layers:
• A convolutional layer, followed by a pooling layer
• A convolutional layer consists of filters:
• A window moves through the output of the
previous layer
• Similar to how we read: from left to right, and
then downwards
• The purpose of a filter is to detect the
presence of a particular feature:
• Basic geometric shapes
• Lines, circles, edges
• Characteristic colors
• Blue sky, green grass
96 low-level features learned by a convolution layer
Source: CS231n Convolutional Neural Networks for Visual Recognition
MAX-POOLING LAYER
• A max-pooling layer performs a
reduction operation:
• A window moves through the subregions of
the previous output.
• For each subregion, the maximum value is
extracted.
• A max-pooling layer with a stride of 2
reduces a 4x4 matrix to a 2x2 matrix.
• Intuition: It doesn’t really matter where
exactly a feature is located.
• Less entries => faster computation
DEEP CONVOLUTIONAL NEURAL NETWORKS
• Deep neural nets are characterized by repeated blocks of layers.
• Ex.: a series of convolution/max-pooling operations
• Some ML fields (though not all) are dominated by deep nets.
• Theory lags behind applications.
• Intuition: hierarchical models for hierarchical data
• Simple example: traffic sign recognitions
• Lines and circles form digits.
• Digits form numbers.
• A speed limit sign is composed of a red circle,
a white circle and a number.
Summary
__
SUMMARY
• The growth of machine learning is fueled by:
1. the importance of predictions
2. the low cost of data acquisition and processing
3. the domain generality of learning algorithms.
• The essential task in machine learning is to
1. formulate the problem
2. collect a large and relevant the dataset
3. train, test and improve appropriate models.
• Neural networks form a class of powerful models trained by backpropagation:
• Building blocks: neurons, connections, layer
• Convolutional neural networks:
• high predictive accuracy and computational efficiency
Thank you!
__

More Related Content

What's hot

week9_Machine_Learning.ppt
week9_Machine_Learning.pptweek9_Machine_Learning.ppt
week9_Machine_Learning.pptbutest
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IMachine Learning Valencia
 
2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methods2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methodsKrish_ver2
 
Decision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningDecision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningAbhishek Vijayvargia
 
Machine learning
Machine learningMachine learning
Machine learningRohit Kumar
 
Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos butest
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsSalah Amean
 
Download It
Download ItDownload It
Download Itbutest
 
Introduction to Some Tree based Learning Method
Introduction to Some Tree based Learning MethodIntroduction to Some Tree based Learning Method
Introduction to Some Tree based Learning MethodHonglin Yu
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Simplilearn
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Simplilearn
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision TreesSara Hooker
 
Covering (Rules-based) Algorithm
Covering (Rules-based) AlgorithmCovering (Rules-based) Algorithm
Covering (Rules-based) AlgorithmZHAO Sam
 
Machine Learning for NLP
Machine Learning for NLPMachine Learning for NLP
Machine Learning for NLPbutest
 
Automatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face RecognitionAutomatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face Recognitionvatsal199567
 
08 classbasic
08 classbasic08 classbasic
08 classbasicengrasi
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models ananth
 

What's hot (20)

week9_Machine_Learning.ppt
week9_Machine_Learning.pptweek9_Machine_Learning.ppt
week9_Machine_Learning.ppt
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms I
 
2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methods2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methods
 
Decision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningDecision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learning
 
[系列活動] 機器學習速遊
[系列活動] 機器學習速遊[系列活動] 機器學習速遊
[系列活動] 機器學習速遊
 
Machine learning
Machine learningMachine learning
Machine learning
 
Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
 
Download It
Download ItDownload It
Download It
 
Introduction to Some Tree based Learning Method
Introduction to Some Tree based Learning MethodIntroduction to Some Tree based Learning Method
Introduction to Some Tree based Learning Method
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision Trees
 
Covering (Rules-based) Algorithm
Covering (Rules-based) AlgorithmCovering (Rules-based) Algorithm
Covering (Rules-based) Algorithm
 
Machine Learning for NLP
Machine Learning for NLPMachine Learning for NLP
Machine Learning for NLP
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Automatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face RecognitionAutomatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face Recognition
 
08 classbasic
08 classbasic08 classbasic
08 classbasic
 
08 classbasic
08 classbasic08 classbasic
08 classbasic
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
 

Similar to Introduction to Machine Learning

06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptx06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptxSaharA84
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
Machine Learning ebook.pdf
Machine Learning ebook.pdfMachine Learning ebook.pdf
Machine Learning ebook.pdfHODIT12
 
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 11_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1MostafaHazemMostafaa
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017MLconf
 
know Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdfknow Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdfhemangppatel
 
Computational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding RegionsComputational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding Regionsbutest
 
Introduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesIntroduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesXavier Rafael Palou
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programmingSoumya Mukherjee
 
An Introduction to boosting
An Introduction to boostingAn Introduction to boosting
An Introduction to boostingbutest
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning AnalyticsXavier Ochoa
 
Brief Tour of Machine Learning
Brief Tour of Machine LearningBrief Tour of Machine Learning
Brief Tour of Machine Learningbutest
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning IntroductionDong Guo
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersAlbert Y. C. Chen
 
Machine learning cyphort_malware_most_wanted
Machine learning cyphort_malware_most_wantedMachine learning cyphort_malware_most_wanted
Machine learning cyphort_malware_most_wantedCyphort
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningAkshay Kanchan
 
Machine Learning Foundations
Machine Learning FoundationsMachine Learning Foundations
Machine Learning FoundationsAlbert Y. C. Chen
 
Surface features with nonparametric machine learning
Surface features with nonparametric machine learningSurface features with nonparametric machine learning
Surface features with nonparametric machine learningSylvain Ferrandiz
 

Similar to Introduction to Machine Learning (20)

06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptx06-01 Machine Learning and Linear Regression.pptx
06-01 Machine Learning and Linear Regression.pptx
 
Ml ppt at
Ml ppt atMl ppt at
Ml ppt at
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
Machine Learning ebook.pdf
Machine Learning ebook.pdfMachine Learning ebook.pdf
Machine Learning ebook.pdf
 
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 11_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
 
know Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdfknow Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdf
 
Computational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding RegionsComputational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding Regions
 
Introduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesIntroduction to conventional machine learning techniques
Introduction to conventional machine learning techniques
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programming
 
An Introduction to boosting
An Introduction to boostingAn Introduction to boosting
An Introduction to boosting
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
 
Brief Tour of Machine Learning
Brief Tour of Machine LearningBrief Tour of Machine Learning
Brief Tour of Machine Learning
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
 
Machine learning cyphort_malware_most_wanted
Machine learning cyphort_malware_most_wantedMachine learning cyphort_malware_most_wanted
Machine learning cyphort_malware_most_wanted
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Machine Learning Foundations
Machine Learning FoundationsMachine Learning Foundations
Machine Learning Foundations
 
Surface features with nonparametric machine learning
Surface features with nonparametric machine learningSurface features with nonparametric machine learning
Surface features with nonparametric machine learning
 

Recently uploaded

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Recently uploaded (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Introduction to Machine Learning

  • 1. Introduction to Machine Learning Sebastian E. Kwiatkowski (sebastian@aisummary.com) __
  • 2. Artificial Intelligence Machine Learning (ML) Everything else HIGH-LEVEL OVERVIEW Rule/expert- based Biology-inspired Other ML models: • Nearest Neighbors • Trees, forests • Naive Bayes • Support vector machines Neural networks: • Feed-forward • Convolutional • Residual • Recurrent • Autoencoders • Memory Basic models: • Linear regression • Logistic regression
  • 3. TO LIVE IS TO PREDICT Biology - Food: edible or poisonous? - Fight/flight/freeze - Position within the hierarchy - Mating choice - Financial markets - Betting markets - Economic forecasts - Business plans - Election results - Sports betting Forecasting Example 3- Career/job choice - Moving - Medical interventions - How to compete with machines? Modern life • Effective decision-making requires accurate predictions. • Humans and other species have evolved adaptations to cope with uncertainty: • Memory: data storage linked to the ability to generate predictions • Mental time traveling: ability to project oneself into the future • Automated processes associated with certain emotions
  • 4. 16,000 words spoken per person per day 100 trillion words words spoken by humanity per day 28 million papers (1980-2012) 130 million books indexed by Google 1 billion websites on the World Wide Web 500 million videos hosted on YouTube BIG TEXT DATA
  • 5. COMPETITION AS A DISCOVERY PROCESS • Machine learning is organized around competitions. • A dataset is split up into two parts: • a training set and a test set • Competitors submit models trained on the first set. • Open source ensures perfect replicability. • Models are then evaluated based on the second set. • Competition winners tend to dominate the discourse … • … until the next improvement is published. • “State of the art”: best competition performances
  • 6. DOMAIN-GENERAL LEARNING: COMPUTER VISION 1/2 • Domain-general learning strategies • Example: convolutional neural networks Rapid progress in image classification • LeCun et al. (1998): a CNN trained on the MNIST dataset (60,000 small images of digits, 9 classes) achieves an error rate of 1-2% when tested on 10,000 images • Krizhevsky et al. (2012): a deep CNN trained on 1.2 million high-resolution images and 1,000 classes achieves a 17% top-5 error rate when tested on 150,000 images Skin cancer classification • Esteva et al. (2016): Deep CNN, trained on 130,000 clinical images • Human-level performance when tested against 21 dermatologists on two binary classification tasks • karitinocytes carcinomas: most common skin cancer • melanomas: deadliest skin cancer
  • 7. DOMAIN GENERALITY: COMPUTER VISION 2/2 • Is there something special about skin cancer classification? • Litjens et al. (2017) summarizes the use of deep learning for medical image analysis: • at least 300 papers, most published in 2016 • CNNs have already been applied in the 90s: • Lo et al. (1995): CNN trained to recognize lung nodules in x-rays • In most cases, the only input to the learning algorithms is a set of pairs. • Each pair consists of an image and a label: • Malignant vs. benign • Stage 1/2/3/4
  • 8. DOMAIN GENERALITY: NATURAL LANGUAGE PROCESSING Text classification • Kim (2014): a shallow CNN outperforms state-of-the-art (SOTA) results in sentence classification tasks • Conneau et al. (2017): very deep CNNs improve upon SOTA results in short text classification tasks Sequence labeling • Map a sequence of words to a sequence of tags: • Strubell et al. (2017): A new CNN variant achieves almost SOTA results, but 10-20X faster Machine translation • Kalchbrenner et al. (2017): SOTA performance on an English-German translation benchmark Tim Cook is Chief Executive Officer of Apple . B-Name I-Name O B-Title I-Title I-Title O B-Org O
  • 9. Elements of a machine learning system __
  • 10. The basic workflow in a machine learning project: WORKFLOW IMPROVE (when needed) Error analysis, more data, “better” models PROBLEM FORMULATION Can you describe the problem in terms of existing solutions? DATA COLLECTION How can you obtain a large high-quality dataset? MODEL TRAINING & SELECTION What is a good model? How do you measure success?
  • 11. PROBLEM FORMULATION TYPE OUTPUT COMMENT APPLICATIONS Multi-class classification can be thought of as a special case of sequence prediction probability distribution topic classification, very good to very bad, object classification, staging Binary classification probability a special case of multi-class classification yes/no, positive/negative, present/absent, similar/dissimilar Sequence prediction sequence of probability distributions sequence labeling, machine translation, speech synthesis, image segmentation at the core of intelligence (artificial and biological) Clustering segmentation: customers, images detection: communities, anomalies cluster membership another special case: number of clusters is a hyperparameter robotics, driverless cars, conversational agents, game playing sequence predictions in an active environment: actions effect observations sequence of actions Reinforcement learning
  • 12. DATA COLLECTION • A data set D is a collection of n data points di. • Each data point di = (xi, ti) in D is a pair consisting of features x and a target t. • Easy problems require, at least, hundreds or thousands of data points. • Harder problems require millions of data points. • Occasionally, data will be provided or is available in existing databases. • Usually, a data collection strategy has to be devised and implemented. Weakly supervised Example 3 • Humans manually label the inputs with appropriate targets. • “This is a cat. That’s a dog. This is another cat.” • Minimalhumanintervention • Downloadallimagesw/ thehashtags#cat,#dog • Syntheticdata • Nohumansupervision • Learnpotentially relevantpatternsfrom giganticdatasets Supervision Unsupervised
  • 13. WHAT IS A MODEL, ANYWAY? • Using parameters θ, a model f generates a prediction y from an input x: • f(x, θ) = y • Parameters allow the model to “weight the evidence”. • Example: a simple binary classification problem • Does a given article from a news archive focus on politics? Yes or no? • Consider the parameters (weights) for the following words: • Which of these parameters will be positive and negative? election soccerthe
  • 15. Logistic loss LOSS FUNCTION: LOGISTIC LOSS The target can be either 1 (“did occur”) or 0 (“did not occur”). If the target equals 1: -log(prediction) If the target equals 0: -log(1-prediction) loss (prediction, target) = target  log (prediction)- [ (1 - target)  log (1 - prediction)+ ] Loss - Some predictions are better than others. - The deviation of the prediction p from the target t is referred to as loss. - Synonyms: cost, error, empirical risk - The loss is calculated through a loss function. - Logistic loss is one of the most important loss functions.
  • 16. LOGISTIC LOSS: EXAMPLES Good prediction Mediocre predictionBad predictionBad prediction loss prediction, target = −[target  log(prediction) + 1 − target  log(1 − prediction)] Loss: -log(0.9) ≈ 0.046 This is a good prediction. Consequently, the loss is small. Target: 1 Prediction: 90% Target: 1 Prediction: 10% Loss: -log(0.1) ≈ 0.699 This prediction is inaccurate and the loss, therefore, is high. Target: 0 Prediction: 40% Loss: -log (1-0.4) ≈ 0.222 This loss is a function of the counter-probability of 60%.
  • 17. WHY LOGISTIC LOSS? • The likelihood function returns the probability of the data for a given parameter. • In practice, it is convenient to use the log likelihood: log L parameters data = log i=1 n P data pointi parameter) = i=1 n logP(data pointi|parameter) L parameters data = P data parameter = i=1 n P data pointi parameter • Coin flip example: L(ph=0.5|HT) = P(HT|ph=0.5) = 0.25 Likelihood Log likelihood • Using the log likelihood helps prevent underflow problems.
  • 18. MAXIMUM LIKELIHOOD PRINCIPLE The maximum likelihood principle tells us to select the parameters θ∗ that maximize the probability of the data: A maximization problem w.r.t. to f(x) is equivalent to a minimization problem w.r.t. to f(-x): For a random variable with two outcomes, the logistic loss is the negative log likelihood. Thus, minimizing the logistic loss is equivalent to the maximum likelihood approach. θ∗= arg maxθ i=1 n log P(data pointi|θ) θ∗= arg minθ[− i=1 n log P(data pointi|θ) ]
  • 20. NUMBER COMPLETION TASK 3, 9, 27, 81, ? What is the next number in this sequence? Simple solution f(x)=3x f(1) = 3, f(2) = 9, f(3)= 27, f(4) = 81 f(5) = 243 f(x)= -15 + 32x – 18x2 + 4x3 f(1) = 3, f(2) = 9, f(3) = 27, f(4) = 81 But: f(5) = 195 More complex solution
  • 21. SCOTUS’S RAZOR • Problem: There is an infinite number of solutions to any sequence prediction problem. • Most, if not all, machine learning problems are sequence prediction problems. • One solution: Ockham’s Razor • Prefer the simplest theory consistent with the data • First clear formulation by 13th century theologian Duns Scotus • Today: Don’t use a fancy machine learning model when a simple model works just fine.
  • 22. WHY OCKHAM’S RAZOR? • It works. • Successful applications in in ML, science, business, design and other fields • Fast & cheap • Simpler models tend to be faster models and consume fewer resources. • The Schmidhuber/Hutter argument: • The Great Programmer implements all possible universes with program lengths from 1 to N. • Program B is a functional copy of program A if both lead to the same result but with different code. • Simpler programs have more functional copies than longer programs. • Simple program: print(“Hello world!”) • Functional copy: const message = “Hello world!”; print(message)
  • 24. • A neuron is the basic processing unit: • Accepts input, processes input, sends output • Neurons are connected to other neurons. • Connections are weighted. • A layer is a group of neurons. • Every neural network has an input layer and an output layer. • Hidden layer: • Any layer between input and output • Shallow: ~ 1-5 hidden layers • Deep: dozens or hundreds of layers BUILDING BLOCKS: NEURONS, WEIGHTS AND LAYERS Output: f(w1x1+...+w3x3) Input layer Output layer Input 1: x1 Input 2: x2 ... Input n: xn w1 w2 wn
  • 25. A FEED-FORWARD NETWORK WITH TWO HIDDEN LAYERS
  • 26. • Baseline model for binary classification tasks: • f(x) = s(w1x1+...+wnxn+b) • Weigh the evidence • Add a bias • The sigmoid function s “squashes” the input to a number b/w 0 and 1. • Can be formulated as a neural net: • The input x1, ..., xn corresponds to neurons in the first layer. • The bias corresponds to an additional neuron with a connection weight of 1. • The output neuron applies the sigmoid function. LOGISTIC REGRESSION Input layer Output layer Bias: b Input 1: x1 Input 2: x2 Input n: xn Output: s(w1x1+...+wnxn+b) ... w1 wn w2 1
  • 27. THE SIGMOID FUNCTION The sigmoid function s(x) is one of the most frequently used functions in machine learning: 𝑠 𝑥 = 1 1 + 𝑒−𝑥 Desirable properties: • “squashes” any input into the range between 0 and 1 • The derivative is easy to compute: 𝑑𝑠 𝑑𝑥 = 𝑠 𝑥 (1 − 𝑠 𝑥 )
  • 28. A GLIMPSE AT BACKPROPAGATION • Model parameters are initialized randomly. • The term “training” refers to the (iterative) optimization of parameters. • Almost all neural nets are trained with the backpropagation algorithm. Backpropagation algorithm (n repetitions) Go through each instance: 1. Forward propagation: Compute the prediction and the loss. 2. Backward propagation: For each parameter, compute the derivative w.r.t. the loss. • Positive derivative: small increase in parameter => increase in loss • Negative derivative: small increase in parameter => decrease in loss 3. Update: Use the derivative to apply an update rule. • Simple rule: old value = new value – learning rate  derivative
  • 29. A SIMPLE EXAMPLE Source: hackernoon.com
  • 31. A MORE REALISTIC EXAMPLE Source: Analytics Vidhya
  • 32. A small zoo of neural networks __
  • 39. CONVOLUTIONAL LAYER • CNNs use a repeated sequence of layers: • A convolutional layer, followed by a pooling layer • A convolutional layer consists of filters: • A window moves through the output of the previous layer • Similar to how we read: from left to right, and then downwards • The purpose of a filter is to detect the presence of a particular feature: • Basic geometric shapes • Lines, circles, edges • Characteristic colors • Blue sky, green grass
  • 40. 96 low-level features learned by a convolution layer Source: CS231n Convolutional Neural Networks for Visual Recognition
  • 41. MAX-POOLING LAYER • A max-pooling layer performs a reduction operation: • A window moves through the subregions of the previous output. • For each subregion, the maximum value is extracted. • A max-pooling layer with a stride of 2 reduces a 4x4 matrix to a 2x2 matrix. • Intuition: It doesn’t really matter where exactly a feature is located. • Less entries => faster computation
  • 42. DEEP CONVOLUTIONAL NEURAL NETWORKS • Deep neural nets are characterized by repeated blocks of layers. • Ex.: a series of convolution/max-pooling operations • Some ML fields (though not all) are dominated by deep nets. • Theory lags behind applications. • Intuition: hierarchical models for hierarchical data • Simple example: traffic sign recognitions • Lines and circles form digits. • Digits form numbers. • A speed limit sign is composed of a red circle, a white circle and a number.
  • 44. SUMMARY • The growth of machine learning is fueled by: 1. the importance of predictions 2. the low cost of data acquisition and processing 3. the domain generality of learning algorithms. • The essential task in machine learning is to 1. formulate the problem 2. collect a large and relevant the dataset 3. train, test and improve appropriate models. • Neural networks form a class of powerful models trained by backpropagation: • Building blocks: neurons, connections, layer • Convolutional neural networks: • high predictive accuracy and computational efficiency