01_introduction_Machine learning course.pdf

Introduction to Machine Learning

Course Assessment
Course total grade 150
Grading:
- Attendance 10
- Lab + Assignments 20
- Midterm Exam 30
- Final Exam 90

What is Machine Learning?
“Learning is any process by which a system improves
performance from experience.”
- Herbert Simon
Definition by Tom Mitchell (1998):
Machine Learning is the study of algorithms that
• improve their performance P
• at some task T
• with experience E.
A well-defined learning task is given by <P, T, E>.
3

Traditional Programming
Machine Learning
Computer
Data
Program
Output
Computer
Data
Output
Program
4

7
Some examples of tasks that are best solved
by using a learning algorithm
• Recognizing patterns:
– Facial identities or facial expressions
– Handwritten or spoken words
– Medical images
• Generating patterns:
– Generating images or motion sequences
• Recognizing anomalies:
– Unusual credit card transactions
– Unusual patterns of sensor readings in a nuclear power plant
• Prediction:
– Future stock prices or currency exchange rates

State of the Art Applications of
Machine Learning
11

Example (Spam Filter)
Traditional way of Programming
1. First you would look at what spam typically looks like. You might notice that
some words or phrases (such as “4U,” “credit card,” “free,” and “amazing”)
tend to come up a lot in the subject.
2. You would write a detection algorithm for each of the patterns that you
noticed, and your program would flag emails as spam if a number of these
patterns are detected.
3. You would test your program, and repeat steps 1 and 2 until it is good enough.

Machine way of Programming
In contrast, a spam filter based on Machine Learning techniques automatically learns
which words and phrases are good predictors of spam by detecting unusually frequent
patterns of words in the spam examples compared to the ham examples.
The program is much shorter, easier to maintain, and most likely more accurate.

Machine way of Programming (Automatically
Adapting to change (Classification))
If spammers notice that all their emails containing “4U” are blocked, they
might start writing “For U” instead. A spam filter using traditional programming
techniques would need to be updated to flag “For U” emails. If spammers keep
working around your spam filter, you will need to keep writing new rules forever.
In contrast, a spam filter based on Machine Learning techniques automatically notices
that “For U” has become unusually frequent in spam flagged by users, and it starts
flagging them without your intervention.

A well-defined learning task is given by <P, T, E>.
T: Task is Flag spam for new emails
E: Experience is the training Data.
P: Performance for example would be the ration of correctly
classified emails.

Autonomous Cars
• Nevada made it legal for
autonomous cars to drive on
roads in June 2011
• As of 2013, four states (Nevada,
Florida, California, and
Michigan) have legalized
autonomous cars
Penn’s Autonomous Car 
(Ben Franklin Racing Team) 12

Machine Learning in
Automatic Speech Recognition
A Typical Speech Recognition System
ML used to predict of phone states from the sound spectrogram
Deep learning has state-of-the-art results
# Hidden Layers 1 2 4 8 10 12
Word Error Rate % 16.0 12.8 11.4 10.9 11.0 11.1
Baseline GMM performance = 15.4%
[Zeiler et al. “On rectified linear units for speech
recognition” ICASSP 2013]

Types of Learning
• Supervised (inductive) learning
–Given: training data + desired outputs (labels)
• Unsupervised learning
–Given: training data (without desired outputs)
• Semi-supervised learning
–Given: training data + a few desired outputs
• Reinforcement learning
–Rewards from sequence of actions

Supervised Learning
The supervised learning is the ML model where the training
data you feed to the algorithm includes the desired
solutions, called labels is divided into:
 Regression.
 Classification

Supervised Learning: Regression
• Given (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f (x) to predict y given x
– y is real-valued == regression
9
8
7
6
5
4
3
2
1
0
1970 1980 1990 2000 2010 2020
September
Arctic
Sea
Ice
Extent
(1,000,000
sq
km)

Supervised Learning
Classification example:
The spam filter is a good example of this: it is trained with many example emails along
with their class (spam or ham), and it must learn how to classify new emails.
A labeled training set for supervised learning (e.g., spam classification)

Supervised Learning: Classification
• Given (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f (x) to predict y given x
– y is categorical == classification
Breast Cancer (Malignant / Benign)
1(Malignant)
0(Benign)
Tumor Size
Predict Benign Predict Malignant
Tumor Size

Supervised Learning
Tumor Size
Age
- Clump Thickness
- Uniformity of Cell Size
- Uniformity of Cell Shape
…
• x can be multi-dimensional
– Each dimension corresponds to an attribute

Unsupervised Learning
In unsupervised learning, as you might guess, the
training data is unlabeled.
The Model tries to learn without a teacher. So, the
Model find structure and pattern in the data on its own.
Given input
Output hidden
An unlabeled training set for unsupervised
learning

• Given x1, x2, ..., xn (without labels)
• Output hidden structure behind the x’s
– E.g., clustering

• Independent component analysis – separate a combined
signal into its original sources

Reinforcement Learning
• Given a sequence of states and actions with (delayed)
rewards, output a policy
– Policy is a mapping from states  actions that tells you what to
do in a given state
• Examples:
– Credit assignment problem
– Game playing
– Robot in a maze
– Balance a pole on your hand

Key components of Reinforcement learning:
Agent: The learner or decision- maker.
Environment: The space in which the agent operates.
State: A representation of the current situation or condition in which the
agent is.
Actions: What the agent can do or the choices it can make.
Reward: Feedback from the environment in response to the agent’s
actions.
Policy: A strategy used by the agent to determine its actions based on
the current state.

The Agent-Environment Interface
Agent and environment interact at discrete time steps
Agent observes state at step t: st S
: t  0, 1, 2, K
produces action at step t : at  A(st )
gets resulting reward :
and resulting next state :
rt1 
st1
. . . st at
rt +1 st +1
at +1
rt +2 st +2
at +2
rt +3 st +3
. . .
at +3

Machine learning model maps from
features to prediction
Examples
•Classification
•Is this a dog or a cat?
•Is this email spam or not?
•Regression
•What will the stock price be tomorrow?
•What will be the high temperature tomorrow?
𝑓 𝑥
Features
→ 𝑦
Prediction

Learning has three stages
Training: optimize model parameters
Validation: intermediate evaluations to design/select model
Test: final performance evaluation

Training: the model is fit to data to
minimize a loss or maximize an objective
function
𝜃∗ = argmin 𝐿𝑜𝑠𝑠(𝑓 𝑋; 𝜃 , 𝑌)
𝜃
Model parameters
that minimize loss
Features of all training
examples
“Ground Truth” predictions of
all training examples
37.5 41.2 51.0 48.3 50.5
47.0 46.5 48.9 50.5 47.6
… …
67.0 64.7 63.0 61.4 60.2
𝑋 𝑌
1 row per
example
1 column per
feature
Example
Learn to predict next day’s
temperature given
preceding days’
temperatures
𝑖
𝑖
𝐿𝑜𝑠𝑠 𝑓 𝑋;𝜃 , 𝑌 = ∑ 𝑓 𝑋 ;𝜃 − 𝑦𝑖
2
Model: linear
𝑓(𝑋𝑖; 𝜃) = 𝐴𝑋𝑖 + 𝑏
Optimization via ordinary least
squares regression
Loss: sum squared error

Model design and “hyper parameter”
tuning is performed using a validation set
Select model
Set training parameters
Feature selection
Learning rate, regularization parameters, …
Sometimes, there are clear “train”, “val”, and “test” sets. Other
times, you need to split “train” into a train set for learning
parameters and a val set for checking model performance
linear regression

Testing: The effectiveness of the model
is evaluated on a held out test set
“Held out”: not used in training; ideally not viewed by developers, e.g. in
a private test server
Common performance measures
N 𝑖 𝑖 𝑖
1
Classificationerror: ∑ 𝑓 𝑋 ≠ 𝑦 (for classification model, 𝑦 is target/true label)
N 𝑖
Cross-entropy: − 1
∑ log 𝑓 𝑦 = 𝑦 |𝑋
𝑖 𝑖
𝑖
(for probabilistic model)
RMSE:
N
1
∑𝑖 𝑓 𝑋 − 𝑦
𝑖 𝑖
2 (regression measure)
𝑦𝑖 − 𝒚
̅
𝑅2: 1 − ∑𝑖 𝑓 𝑋𝑖 − 𝑦𝑖
2 / ∑𝑖
expectation/mean/avg)
2 (unitless regression measure; 𝒚
̅ is
In machine learning research, usually data is collected once and then
randomly sampled into train and test partitions
Train and test samples are “i.i.d.”, independent and identically distributed
In many real-world applications, the input to the model in deployment comes from a
different distribution than training

Various Function Representations
• Numerical functions
– Linear regression
– Neural networks
– Support vector machines
• Symbolic functions
– Decision trees
• Instance-based functions
– Nearest-neighbor
• Probabilistic Graphical Models
– Naïve Bayes
– Bayesian networks

Evaluation
• Accuracy
• Precision and recall
• Squared error
• Entropy
• etc.

ML in Practice
• Understand domain, prior knowledge, and goals
• Data integration, selection, cleaning, pre-processing, etc.
• Learn models
• Interpret results
• Consolidate and deploy discovered knowledge
Loop

Complete example to understand
steps of Machine Learning
Suppose you want to know if money makes people happy or not
Step1 :download
the Better Life Index data from the OECD’s website
OECD Data Explorer - Archive • Better Life Index
as well as stats about GDP( Gross Domestic Product) per capita ‫الناتج‬
‫للفرد‬ ‫االجمالى‬ ‫المحلى‬
)
from the IMF’s website.
World Economic Outlook Databases (imf.org)
Step 2: join the tables and sort by GDP per capita.
Table shows an excerpt of what you get.

It is observed that the graph tends to be linear means as the GDP per capita
increase the Life Satisfaction increase also, there are some noisy Data. So we will
chose Linear model.
This step is called model selection: you selected a linear model of life satisfaction
with just one attribute, GDP per capita.

linear model equation.
Second step determine model parameters which are chose best values to make
line covers all scattered points. (This step is the training step)

How can you know which values of the parameter values θ0 and θ1. will make
your model perform best ?
 You need to specify a performance measure (utility function (or fitness
function) that measures how good your model is.
 Or you can define a cost function that measures how bad it is (error).
For linear regression problems, people typically use a cost function that
measures the distance between the linear model’s predictions and the training
examples; the objective is to minimize this distance.

it finds the parameters that make the linear model fit best to your data with least
MSE for example.
This is called training the model. In our case the algorithm finds that the optimal
parameter values are θ0 = 4.85 and θ1 = 4.91 × 10–5.

After training the model we can use now.
For example to know how happy Cypriots are, and the OECD data does not have
the answer.
Fortunately, you can use your model to make a good prediction: you look up
Cyprus’s GDP per capita, find $22,587, and then apply your model and find that
life satisfaction.
Is likely to be somewhere around 4.85 + 22,587 × 4.91 × 10-5 = 5.96.

Python code that loads the data, prepares it, creates a scatterplot for visualization, and
then trains a linear model and makes a prediction.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn.linear_model
# Load the data
oecd_bli = pd.read_csv("oecd_bli_2015.csv", thousands=',')
gdp_per_capita = pd.read_csv("gdp_per_capita.csv",thousands=',',delimiter='t',
encoding='latin1', na_values="n/a")
# Prepare the data
country_stats = prepare_country_stats(oecd_bli, gdp_per_capita)
X = np.c_[country_stats["GDP per capita"]]
y = np.c_[country_stats["Life satisfaction"]]
# Visualize the data
country_stats.plot(kind='scatter', x="GDP per capita", y='Life satisfaction')
plt.show()
# Select a linear model
model = sklearn.linear_model.LinearRegression()
# Train the model
model.fit(X, y)
# Make a prediction for Cyprus
X_new = [[22587]] # Cyprus' GDP per capita
print(model.predict(X_new)) # outputs [[ 5.96242338]]

Replacing the Linear Regression model with k-Nearest Neighbors
regression in the previous code is as simple as replacing these two
lines:
import sklearn.linear_model
model = sklearn.linear_model.LinearRegression()
with these two:
import sklearn.neighbors
model = sklearn.neighbors.KNeighborsRegressor(n_neighbors=3)

If all went well, your model will make good predictions.
If not, you may need to use:
 more attributes (employment rate, health, air pollution, etc.),
 get more or better quality training data,
 or perhaps select a more powerful model (e.g., a Polynomial Regression model).
In summary:
 You studied the data.
 You selected a model.
 You trained it on the training data (i.e., the learning algorithm searched for the
model parameter values that minimize a cost function).
 Finally, you applied the model to make predictions on new cases (this is called
inference), hoping that this model will generalize well.

Another Complete example to understand
steps of Machine Learning (Heart attack)
Problem:
A heart attack may happens, when the flow of blood to a part of the
heart muscle gets blocked.
Solution:
We want to use ML model to predict if this person may happens to
him heart attack or not.
So , we are going to follow the following steps (workflow):
Step 1: Data collection and features.
Step 2 : Data preprocessing.
Step 3: Feature extraction.
Step 4: Model prediction.
Step 5: Model evaluation.
Step 6: Deployment and monitoring.

Another Complete example to
understand steps of Machine
Learning (Heart attack)
Step one: Data collection and Features
Which may be
 Age
 Gender
 Blood pressure
 Cholesterol levels
 and others

understand steps of Machine Learning
(Heart attack)
Step two: Data Preprocessing
Which includes
Cleaning and normalizing the collected data.
May include dealing with missing values.
May include dealing with categorical data to be encoded
appropriately to a numerical value the PC can deal with.

understand steps of Machine Learning
(Heart attack)
Step three: Feature selection
Which includes
Identifying the most predictive features, which will
help in predicting the heart attack.
Which means exclude the irrelevant features.

Step four: Model training
Which includes
The data must be divided into training and test.
For, example 80% training and 20% testing.
The processed data will be used to train the ML model.
From common ML models are:
 Logistic regression
 Decision Tree
 Random forest
 Gradient Boosting models

Step five: Model Evaluation
After training the model, we need to evaluate its performance, so
we used different evaluation metrics for classification:
 Accuracy
 Precision
 Recall
 AUC-ROC curve
Note: other metrics will be used for Regression.

Lessons Learned about Learning
• Learning can be viewed as using direct or indirect experience to
approximate a chosen target function.
• Function approximation can be viewed as a search through a space
of hypotheses (representations of functions) for one that best fits a
set of training data.
• Different learning methods assume different hypothesis spaces
(representation languages) and/or employ different search techniques.

01_introduction_Machine learning course.pdf

More Related Content

Similar to 01_introduction_Machine learning course.pdf

Recently uploaded

01_introduction_Machine learning course.pdf