1
Machine Learning by
using Python
Lesson One
By: Professor Lili Saghafi
proflilisaghafi@gmail.com
@Lili_PLS
2
Overview
• Machine learning is the kind of programming which gives
computers the capability to automatically learn from data
without being explicitly programmed.
• This means in other words that these programs change
their behavior by learning from data.
• In this course we will cover various aspects of machine
learning
• Of course, everything will be related to Python. So it is
Machine Learning by using Python.
• What is the best programming language for machine
learning?
• Python is clearly one of the top players!
3
Topics
• k-nearest Neighbor Classifier
• Neural networks
– Neural Networks from Scratch in Python
– Neural Network in Python using Numypy
– Dropout Neural Networks
– Neural Networks with Scikit
– Machine Learning with Scikit and Python
• Naive Bayes Classifier
• Introduction into Text Classification using Naive
Bayes and Python
4
Machine learning can be roughly
separated into three categories:
• Supervised learning
– The machine learning program is both given the input data and
the corresponding labelling. This means that the learn data has
to be labelled by a human being beforehand.
• Unsupervised learning
– No labels are provided to the learning algorithm. The algorithm
has to figure out the a clustering of the input data.
• Reinforcement learning
– A computer program dynamically interacts with its environment.
This means that the program receives positive and/or negative
feedback to improve it performance.
5
6
Machine Learning Terminology
• Classifier, A program or a function which
maps from unlabeled instances to classes
is called a classifier.
7
• Confusion Matrix, A confusion matrix, also
called a contingency table or error matrix, is
used to visualize the performance of a classifier.
• The columns of the matrix represent the
instances of the predicted classes and the rows
represent the instances of the actual class.
(Note: It can be the other way around as well.)
• In the case of binary classification the table has
2 rows and 2 columns.
8
• This means that the
classifier correctly
predicted a male person
in 42 cases and it wrongly
predicted 8 male
instances as female. It
correctly predicted 32
instances as female. 18
cases had been wrongly
predicted as male instead
of female.
Example:
3218female
842maleActual
classes
femalemale
Predicted classesConfusion
Matrix
9
Accuracy (error rate)
• Accuracy is a statistical measure which is defined as the
quotient of correct predictions made by a classifier
divided by the sum of predictions made by the classifier.
• The classifier in our previous example predicted correctly
predicted 42 male instances and 32 female instance.
• Therefore, the accuracy can be calculated by:
• accuracy
= (42+32)/(42+8+18+32)(42+32)/(42+8+18+32)
• which is 0.72
• Let's assume we have a classifier, which always predicts
"female". We have an accuracy of 50 % in this case.
10
500female
500maleActual
classes
femalemale
Predicted classesConfusion
Matrix
11
• We will demonstrate
the so-called
accuracy paradox.
• A spam recogition
classifier is described
by the following
confusion matrix:
914ham
14spamActual
classes
hamspam
Predicted classesConfusion
Matrix
12
• The accuracy of this
classifier is (4 + 91) /
100, i.e. 95 %.
• The following
classifier predicts
solely "ham" and has
the same accuracy.
• The accuracy of this
classifier is 95%,
even though it is not
capable of
recognizing any spam
at all.
950ham
50spamActual
classes
hamspam
Predicted
classes
Confusion
Matrix
13
Precision and Recall
• Accuracy: (TN+TP)/(T
N+TP+FN+FP)(TN+T
P)/(TN+TP+FN+FP)
• Precision: TP/(TP+FP
)TP/(TP+FP)
• Recall: TP/(TP+FN)
•
TPFNpositive
FPTNnegativeActual
classes
positivenegative
Predicted classesConfusion
Matrix
14
Supervised learning
• The machine learning program is both
given the input data and the corresponding
labelling. This means that the learn data
has to be labelled by a human being
beforehand.
15
Unsupervised learning
• No labels are provided to the learning
algorithm. The algorithm has to figure out
the a clustering of the input data.
16
Reinforcement learning
• A computer program dynamically interacts
with its environment. This means that the
program receives positive and/or negative
feedback to improve it performance.
17
k-Nearest-Neighbor Classifier
18
k-Nearest-Neighbor Classifier
19
• "Show me who your friends are and I’ll tell you
who you are?"
• The concept of the k-nearest neighbor classifier
can hardly be simpler described. This is an old
saying, which can be found in many languages
and many cultures. It's also metnioned in other
words in the Bible: "He who walks with wise men
will be wise, but the companion of fools will
suffer harm" (Proverbs 13:20 )
20
k-nearest neighbor classifier
• This means that the concept of the k-
nearest neighbor classifier is part of our
everyday life and judging: Imagine you
meet a group of people, they are all very
young, stylish and sportive.
• They talk about there friend Ben, who isn't
with them. So, what is your imagination of
Ben? Right, you imagine him as being
yong, stylish and sportive as well.
21
k-nearest neighbor classifier
• If you learn that Ben lives in a neighborhood
where people vote conservative and that the
average income is above 200000 dollars a year?
• Both his neighbors make even more than
300,000 dollars per year?
• What do you think of Ben?
• Most probably, you do not consider him to be an
underdog and you may suspect him to be a
conservative as well?
22
k-nearest neighbor classifier
• The principle behind nearest neighbor
classification consists in finding a predefined
number, i.e. the 'k' - of training samples closest
in distance to a new sample, which has to be
classified. The label of the new sample will be
defined from these neighbors.
• k-nearest neighbor classifiers have a fixed user
defined constant for the number of neighbors
which have to be determined.
• There are also radius-based neighbor learning
algorithms, which have a varying number of
neighbors based on the local density of points,
all the samples inside of a fixed radius.
23
Neighbors-based methods are
known as non-generalizing
machine learning methods
• The distance can, in general, be any
metric measure: standard Euclidean
distance is the most common choice.
• Neighbors-based methods are known as
non-generalizing machine learning
methods, since they simply "remember"
all of its training data.
• Classification can be computed by a
majority vote of the nearest neighbors of
the unknown sample.
24
k-NN algorithm
• The k-NN algorithm is among the simplest
of all machine learning algorithms, but
despite its simplicity, it has been quite
successful in a large number of
classification and regression problems, for
example character recognition or image
analysis.
25
• Now let's get a little bit more
mathematically:
• The k-Nearest-Neighbor Classifier (k-NN)
works directly on the learned samples,
instead of creating rules compared to
other classification methods.
26
Nearest Neighbor Algorithm:
• Given a set of
categories {c1,c2,...cn}{c1,c2,...cn}, also called
classes, e.g. {"male", "female"}.
• There is also a learnset LS consisting of labelled
instances.
• The task of classification consists in assigning a
category or class to an arbitrary instance. If the
instance 0 is an element of LS, the label of the
instance will be used.
• Now, we will look at the case where 0 is not
in LS:0 is compared with all instances of LS.
• A distance metric is used for comparison. We
determine the k closest neighbors of 0, i.e. the
items with the smallest distances. k is a user
defined constant and a positive integer, which is
usually small.
27
• The most common class of LS will be assigned
to the instance 0. If k = 1, then the object is
simply assigned to the class of that single
nearest neighbor.
• The algorithm for the k-nearest neighbor
classifier is among the simplest of all machine
learning algorithms. k-NN is a type of instance-
based learning, or lazy learning, where the
function is only approximated locally and all the
computations are performed, when we do the
actual classification.
28
k-nearest-neighbor from Scratch
• Preparing the Dataset
29
Preparing the Dataset
• Before we actually start with writing a nearest neighbor
classifier, we need to think about the data, i.e. the
learnset.
• We will use the "iris" dataset provided by the datasets of
the sklearn module.
• The data set consists of 50 samples from each of three
species of Iris
– Iris setosa,
– Iris virginica and
– Iris versicolor.
• Four features were measured from each sample: the
length and the width of the sepals and petals, in
centimetres.
30
Preparing the Dataset
import numpy as np
from sklearn import datasets
iris = datasets.load_iris()
iris_data = iris.data
iris_labels = iris.target
print (iris_data[0], iris_data[79], iris_data[100])
print (iris_labels[0], iris_labels[79], iris_labels[100])
[5.1 3.5 1.4 0.2] [5.7 2.6 3.5 1. ] [6.3 3.3 6. 2.5] 0 1 2
We create a learnset from the sets above. We use permutation from np.random
to split the data randomly.
31
np.random.seed(42)
indices = np.random.permutation(
(iris_data)) n_training_samples = 12
learnset_data = iris_data[indices[:-n_training_samples]]
learnset_labels = iris_labels[indices[:-n_training_samples]]
testset_data = iris_data[indices[-n_training_samples:]]
testset_labels = iris_labels[indices[-n_training_samples:]]
print (learnset_data[:4], learnset_labels[:4])
print (testset_data[:4], testset_labels[:4])
[[6.1 2.8 4.7 1.2]
[5.7 3.8 1.7 0.3]
[7.7 2.6 6.9 2.3]
[6. 2.9 4.5 1.5]] [1 0 2 1]
[[5.7 2.8 4.1 1.3]
[6.5 3. 5.5 1.8]
[6.3 2.3 4.4 1.3]
[6.4 2.9 4.3 1.3]] [1 2 1 1]
32
Preparing the Dataset
• The following code is only necessary to
visualize the data of our learnset.
• Our data consists of four values per iris
item, so we will reduce the data to three
values by summing up the third and fourth
value.
• This way, we are capable of depicting the
data in 3-dimensional space:
33
Preparing the Dataset
34
Preparing the Dataset
35
k-Nearest Neighbor
• The k-NN is an instance-based classifier.
The underlying idea is that the likelihood
that two instances of the instance space
belong to the same category or class
increases with the proximity of the
instance. Proximity or closeness can be
defined with a distance or similarity
function.
36
Machine Learning by
using Python
Lesson One
By: Professor Lili Saghafi

Machine learning by using python By: Professor Lili Saghafi

  • 1.
    1 Machine Learning by usingPython Lesson One By: Professor Lili Saghafi proflilisaghafi@gmail.com @Lili_PLS
  • 2.
    2 Overview • Machine learningis the kind of programming which gives computers the capability to automatically learn from data without being explicitly programmed. • This means in other words that these programs change their behavior by learning from data. • In this course we will cover various aspects of machine learning • Of course, everything will be related to Python. So it is Machine Learning by using Python. • What is the best programming language for machine learning? • Python is clearly one of the top players!
  • 3.
    3 Topics • k-nearest NeighborClassifier • Neural networks – Neural Networks from Scratch in Python – Neural Network in Python using Numypy – Dropout Neural Networks – Neural Networks with Scikit – Machine Learning with Scikit and Python • Naive Bayes Classifier • Introduction into Text Classification using Naive Bayes and Python
  • 4.
    4 Machine learning canbe roughly separated into three categories: • Supervised learning – The machine learning program is both given the input data and the corresponding labelling. This means that the learn data has to be labelled by a human being beforehand. • Unsupervised learning – No labels are provided to the learning algorithm. The algorithm has to figure out the a clustering of the input data. • Reinforcement learning – A computer program dynamically interacts with its environment. This means that the program receives positive and/or negative feedback to improve it performance.
  • 5.
  • 6.
    6 Machine Learning Terminology •Classifier, A program or a function which maps from unlabeled instances to classes is called a classifier.
  • 7.
    7 • Confusion Matrix,A confusion matrix, also called a contingency table or error matrix, is used to visualize the performance of a classifier. • The columns of the matrix represent the instances of the predicted classes and the rows represent the instances of the actual class. (Note: It can be the other way around as well.) • In the case of binary classification the table has 2 rows and 2 columns.
  • 8.
    8 • This meansthat the classifier correctly predicted a male person in 42 cases and it wrongly predicted 8 male instances as female. It correctly predicted 32 instances as female. 18 cases had been wrongly predicted as male instead of female. Example: 3218female 842maleActual classes femalemale Predicted classesConfusion Matrix
  • 9.
    9 Accuracy (error rate) •Accuracy is a statistical measure which is defined as the quotient of correct predictions made by a classifier divided by the sum of predictions made by the classifier. • The classifier in our previous example predicted correctly predicted 42 male instances and 32 female instance. • Therefore, the accuracy can be calculated by: • accuracy = (42+32)/(42+8+18+32)(42+32)/(42+8+18+32) • which is 0.72 • Let's assume we have a classifier, which always predicts "female". We have an accuracy of 50 % in this case.
  • 10.
  • 11.
    11 • We willdemonstrate the so-called accuracy paradox. • A spam recogition classifier is described by the following confusion matrix: 914ham 14spamActual classes hamspam Predicted classesConfusion Matrix
  • 12.
    12 • The accuracyof this classifier is (4 + 91) / 100, i.e. 95 %. • The following classifier predicts solely "ham" and has the same accuracy. • The accuracy of this classifier is 95%, even though it is not capable of recognizing any spam at all. 950ham 50spamActual classes hamspam Predicted classes Confusion Matrix
  • 13.
    13 Precision and Recall •Accuracy: (TN+TP)/(T N+TP+FN+FP)(TN+T P)/(TN+TP+FN+FP) • Precision: TP/(TP+FP )TP/(TP+FP) • Recall: TP/(TP+FN) • TPFNpositive FPTNnegativeActual classes positivenegative Predicted classesConfusion Matrix
  • 14.
    14 Supervised learning • Themachine learning program is both given the input data and the corresponding labelling. This means that the learn data has to be labelled by a human being beforehand.
  • 15.
    15 Unsupervised learning • Nolabels are provided to the learning algorithm. The algorithm has to figure out the a clustering of the input data.
  • 16.
    16 Reinforcement learning • Acomputer program dynamically interacts with its environment. This means that the program receives positive and/or negative feedback to improve it performance.
  • 17.
  • 18.
  • 19.
    19 • "Show mewho your friends are and I’ll tell you who you are?" • The concept of the k-nearest neighbor classifier can hardly be simpler described. This is an old saying, which can be found in many languages and many cultures. It's also metnioned in other words in the Bible: "He who walks with wise men will be wise, but the companion of fools will suffer harm" (Proverbs 13:20 )
  • 20.
    20 k-nearest neighbor classifier •This means that the concept of the k- nearest neighbor classifier is part of our everyday life and judging: Imagine you meet a group of people, they are all very young, stylish and sportive. • They talk about there friend Ben, who isn't with them. So, what is your imagination of Ben? Right, you imagine him as being yong, stylish and sportive as well.
  • 21.
    21 k-nearest neighbor classifier •If you learn that Ben lives in a neighborhood where people vote conservative and that the average income is above 200000 dollars a year? • Both his neighbors make even more than 300,000 dollars per year? • What do you think of Ben? • Most probably, you do not consider him to be an underdog and you may suspect him to be a conservative as well?
  • 22.
    22 k-nearest neighbor classifier •The principle behind nearest neighbor classification consists in finding a predefined number, i.e. the 'k' - of training samples closest in distance to a new sample, which has to be classified. The label of the new sample will be defined from these neighbors. • k-nearest neighbor classifiers have a fixed user defined constant for the number of neighbors which have to be determined. • There are also radius-based neighbor learning algorithms, which have a varying number of neighbors based on the local density of points, all the samples inside of a fixed radius.
  • 23.
    23 Neighbors-based methods are knownas non-generalizing machine learning methods • The distance can, in general, be any metric measure: standard Euclidean distance is the most common choice. • Neighbors-based methods are known as non-generalizing machine learning methods, since they simply "remember" all of its training data. • Classification can be computed by a majority vote of the nearest neighbors of the unknown sample.
  • 24.
    24 k-NN algorithm • Thek-NN algorithm is among the simplest of all machine learning algorithms, but despite its simplicity, it has been quite successful in a large number of classification and regression problems, for example character recognition or image analysis.
  • 25.
    25 • Now let'sget a little bit more mathematically: • The k-Nearest-Neighbor Classifier (k-NN) works directly on the learned samples, instead of creating rules compared to other classification methods.
  • 26.
    26 Nearest Neighbor Algorithm: •Given a set of categories {c1,c2,...cn}{c1,c2,...cn}, also called classes, e.g. {"male", "female"}. • There is also a learnset LS consisting of labelled instances. • The task of classification consists in assigning a category or class to an arbitrary instance. If the instance 0 is an element of LS, the label of the instance will be used. • Now, we will look at the case where 0 is not in LS:0 is compared with all instances of LS. • A distance metric is used for comparison. We determine the k closest neighbors of 0, i.e. the items with the smallest distances. k is a user defined constant and a positive integer, which is usually small.
  • 27.
    27 • The mostcommon class of LS will be assigned to the instance 0. If k = 1, then the object is simply assigned to the class of that single nearest neighbor. • The algorithm for the k-nearest neighbor classifier is among the simplest of all machine learning algorithms. k-NN is a type of instance- based learning, or lazy learning, where the function is only approximated locally and all the computations are performed, when we do the actual classification.
  • 28.
  • 29.
    29 Preparing the Dataset •Before we actually start with writing a nearest neighbor classifier, we need to think about the data, i.e. the learnset. • We will use the "iris" dataset provided by the datasets of the sklearn module. • The data set consists of 50 samples from each of three species of Iris – Iris setosa, – Iris virginica and – Iris versicolor. • Four features were measured from each sample: the length and the width of the sepals and petals, in centimetres.
  • 30.
    30 Preparing the Dataset importnumpy as np from sklearn import datasets iris = datasets.load_iris() iris_data = iris.data iris_labels = iris.target print (iris_data[0], iris_data[79], iris_data[100]) print (iris_labels[0], iris_labels[79], iris_labels[100]) [5.1 3.5 1.4 0.2] [5.7 2.6 3.5 1. ] [6.3 3.3 6. 2.5] 0 1 2 We create a learnset from the sets above. We use permutation from np.random to split the data randomly.
  • 31.
    31 np.random.seed(42) indices = np.random.permutation( (iris_data))n_training_samples = 12 learnset_data = iris_data[indices[:-n_training_samples]] learnset_labels = iris_labels[indices[:-n_training_samples]] testset_data = iris_data[indices[-n_training_samples:]] testset_labels = iris_labels[indices[-n_training_samples:]] print (learnset_data[:4], learnset_labels[:4]) print (testset_data[:4], testset_labels[:4]) [[6.1 2.8 4.7 1.2] [5.7 3.8 1.7 0.3] [7.7 2.6 6.9 2.3] [6. 2.9 4.5 1.5]] [1 0 2 1] [[5.7 2.8 4.1 1.3] [6.5 3. 5.5 1.8] [6.3 2.3 4.4 1.3] [6.4 2.9 4.3 1.3]] [1 2 1 1]
  • 32.
    32 Preparing the Dataset •The following code is only necessary to visualize the data of our learnset. • Our data consists of four values per iris item, so we will reduce the data to three values by summing up the third and fourth value. • This way, we are capable of depicting the data in 3-dimensional space:
  • 33.
  • 34.
  • 35.
    35 k-Nearest Neighbor • Thek-NN is an instance-based classifier. The underlying idea is that the likelihood that two instances of the instance space belong to the same category or class increases with the proximity of the instance. Proximity or closeness can be defined with a distance or similarity function.
  • 36.
    36 Machine Learning by usingPython Lesson One By: Professor Lili Saghafi