Gibrán Félix Zavala
Edwin Efraín Jiménez Lepe
SEMINARIO - MACHINE LEARNING
2016-09-09
Machine learning
Machine learning - Arthur Samuel (1959)
 “Field of study that gives computers the ability
to learn without being explicitly programmed”.
Machine learning - Tom M. Mitchell (1997)
 “A computer program is said to learn from
experience E with respect to some class of
tasks T and performance measure P, if its
performance at tasks in T, as measured by P,
improves with experience E”.
Machine learning
 Tasks T
 Experience E
 Performance measure P
Tasks T
Tasks
 “A piece of work to be done by someone”.
 What the machine learning model will do?
 What problem the machine learning model will
resolve?
Type tasks
 Classification
 Regression
 Clustering
Type tasks - Classification
 In this task, the computer algorithm assign each
input x to one of the finite k number of discrete
categories:
 𝑓: 𝑹 𝑛 → {1, 2, 3, … , 𝑘}
 We generate a model 𝑦 = 𝑓(𝑥) where 𝑦 is named
data labels, it represent each categories with a
numerical discrete encoding.
Type tasks - Classification
Type tasks - Regression
 If the desired output consists of one or more continuous
variables, then the task is called regression:
 𝑓: 𝑹 𝑛 → 𝑹 𝑘
 In this case our model 𝑦 = 𝑓(𝑥) gives at each input 𝑥 a
real value or a real vector 𝑦, it normally is called
prediction because regression intent to predict a feature
given the actual data.
Type tasks - Regression
Type tasks - Clustering
 Is an exploratory task of revealing the organization
behavior of the data.
 This organization tend to discover similar information
than makes groups named clusters.
 Clusters are a set of objects in such a way that objects
in the same group are more similar to each other than to
those in other clusters.
Type tasks - Clustering
 Clustering task is similar with classification, but in
clustering the 𝑘 groups are unknown.
 𝑓: 𝑹 𝑛𝑋𝑚
→ {𝐶1, 𝐶2, 𝐶3, … , 𝐶 𝑘}
 Where the sum of the cardinality of all partitions 𝐶𝑖
are equal to the cardinality of the input dataset (𝑚).
Type tasks - Clustering
Performance measure P
Performance measure P
 It is a quantitative measure function that evaluates
the machine learning model efficiency.
 It measures how the model is resolving the
machine learning problem.
 It represent the model’s accuracy.
 It is known as loss function 𝑳(𝒙) during the training
phase.
 The selection normally depend on the selected task.
0-1 loss function
 Performance measure function, also known as error
rate.
 It reflect how many samples were misclassified.

 Where 𝐼 is an indicator function, 𝑦(𝑖)
is the prediction
corresponding at 𝑖 𝑡ℎ dataset element, 𝑦(𝑖) is the
desired output for this sample.
Residual Sum of Squared errors function (RSS)
 It is commonly used for regression.
 It can be defined as the sum of quadratic error.

 Where 𝑦(𝑖)
is the prediction corresponding at
𝑖 𝑡ℎ
dataset element, 𝑦(𝑖)
is the desired output
for this sample.
Similarity function
 It is commonly used for clustering tasks.
 Don’t need a expected value.

 Where 𝑥(𝑖)
and 𝑥(𝑗)
are two different element from the
dataset.
Experience E
Experience E
 How the machine reacts for each stimulus during the
learning phase.
 Some authors know this term as learning paradigms.
 It can classified on:
 Supervised
 Unsupervised.
Supervised learning
 Supervised learning is a learning paradigm known as
learning with a teacher because it refers the analogy
of the teacher-student.
 When the student makes mistake, the teacher gives
feedback to him.
 The algorithm makes a mistake when the its output is
different respect the label or target.
Supervised learning
 When this happen, the model uses the
selected loss function to measures the error.
 Then it readjusts its free parameters changing
its knowledge, so in the future it can reduces
the output error.
Unsupervised learning
 In the other hand, unsupervised learning
implies that there is not a teacher observing
and giving feedback during the learning
process.
 This experience paradigm is also know as
learning without a teacher.
Unsupervised learning
 It is stretchy correlated with the term empirical evidence
because the knowledge is acquired by the observation and
the experimentation.
 In unsupervised approach the model approach gets
experience for each stimulus and learns according the
selected objective function.
 Then the model moves their free parameters to minimizing
the error.
Machine Learning Methodology
Data collection
Data pre-processing
Learning
Model evaluation
Data pre-processing
Learning phase
 We select a machine learning model:
𝑦 = 𝑓(𝑥: 𝜃)
 Where 𝑓 𝑥: 𝜃 is the model, 𝜃 is the free
parameter and 𝑦 is the model output.
 We select a loss function:
𝐿 𝑦 𝑖
, 𝑦 𝑖
= 𝐿 𝑓 𝑥 𝑖
: 𝜃 , 𝑦 𝑖
Learning method
Neuronal Networks
Hidden layers
Hidden layers perform a non-linear operation using
the output of the previous layer.
Input layer
Output layer
Used for making a prediction 𝑦. We represent
output layer as simple linear regression.
This is the visible data or the network stimulus.
Machine learning – Example
MNIST
• The MNIST database of handwritten digits, has a training set of 60,000 examples, and a test set of 10,000
examples.
Objetive
 Detect handwritten digits from an image.
Input
 A gray-scale image of 28x28 pixels.
 Classification of the image in the range [0,9].
Output
Task
 Classification.
 Supervised, because we have labeled data.
Learning paradigm
Performance Measure
 Soft-max (Objective function)
 Negative Log Likelihood (Cost function)
Model
 Input (28*28).
 First Hidden layer(1000).
 Second hidden layer(100).
 Output layer (10).
 Accuracy 96.3%.
Machine learning – Workshop
Adult Data Set
 Objective
 Predict whether income exceeds $50K/year based
on census data.
Input
 A vector with data of an adult.
 Age, work class, education, marital-status, occupation, race, sex,
hours-per-week, native country.
 Classification, if the person earns more or less than
$50k/year.
Output
References
 Haykin, S. (1999). Neural network a comprehensive foundation. New Jersey: Prentice Hall International.
 Laoudias, C.; Kemppi, P.; Panayiotou, C.G., "Localization Using Radial Basis Function Networks and Signal
Strength Fingerprints in WLAN," Global Telecommunications Conference, 2009. GLOBECOM 2009. IEEE , vol.,
no., pp.1,6, Nov. 30 2009-Dec. 4 2009
 Genming Ding; Jinbao Zhang; Lingwen Zhang; Zhenhui Tan, "Overview of received signal strength based
fingerprinting localization in indoor wireless LAN environments," Microwave, Antenna, Propagation and EMC
Technologies for Wireless Communications (MAPE), 2013 IEEE 5th International Symposium on , vol., no.,
pp.160,164, 29-31 Oct. 2013.
 Hui Liu; Darabi, H.; Banerjee, P.; Jing Liu, "Survey of Wireless Indoor Positioning Techniques and
Systems," Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions
 Méndez A. (2015), Neuronal Networks Class Slides.
 Nicolas Amiot, Mohamed Laaraiedh, Bernard Uguen. ” PyLayers: An open source dynamic simulator for indoor
propagation and localization”. Communications Workshops (ICC), 2013 IEEE International Conference on, Jun
2013, Budapest, Hungary. Communications Workshops (ICC), 2013 IEEE International Conference on, 2013.

Machine Learning Seminar

  • 1.
    Gibrán Félix Zavala EdwinEfraín Jiménez Lepe SEMINARIO - MACHINE LEARNING 2016-09-09
  • 2.
  • 3.
    Machine learning -Arthur Samuel (1959)  “Field of study that gives computers the ability to learn without being explicitly programmed”.
  • 4.
    Machine learning -Tom M. Mitchell (1997)  “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E”.
  • 5.
    Machine learning  TasksT  Experience E  Performance measure P
  • 6.
  • 7.
    Tasks  “A pieceof work to be done by someone”.  What the machine learning model will do?  What problem the machine learning model will resolve?
  • 8.
    Type tasks  Classification Regression  Clustering
  • 9.
    Type tasks -Classification  In this task, the computer algorithm assign each input x to one of the finite k number of discrete categories:  𝑓: 𝑹 𝑛 → {1, 2, 3, … , 𝑘}  We generate a model 𝑦 = 𝑓(𝑥) where 𝑦 is named data labels, it represent each categories with a numerical discrete encoding.
  • 10.
    Type tasks -Classification
  • 11.
    Type tasks -Regression  If the desired output consists of one or more continuous variables, then the task is called regression:  𝑓: 𝑹 𝑛 → 𝑹 𝑘  In this case our model 𝑦 = 𝑓(𝑥) gives at each input 𝑥 a real value or a real vector 𝑦, it normally is called prediction because regression intent to predict a feature given the actual data.
  • 12.
    Type tasks -Regression
  • 13.
    Type tasks -Clustering  Is an exploratory task of revealing the organization behavior of the data.  This organization tend to discover similar information than makes groups named clusters.  Clusters are a set of objects in such a way that objects in the same group are more similar to each other than to those in other clusters.
  • 14.
    Type tasks -Clustering  Clustering task is similar with classification, but in clustering the 𝑘 groups are unknown.  𝑓: 𝑹 𝑛𝑋𝑚 → {𝐶1, 𝐶2, 𝐶3, … , 𝐶 𝑘}  Where the sum of the cardinality of all partitions 𝐶𝑖 are equal to the cardinality of the input dataset (𝑚).
  • 15.
    Type tasks -Clustering
  • 16.
  • 17.
    Performance measure P It is a quantitative measure function that evaluates the machine learning model efficiency.  It measures how the model is resolving the machine learning problem.  It represent the model’s accuracy.  It is known as loss function 𝑳(𝒙) during the training phase.  The selection normally depend on the selected task.
  • 18.
    0-1 loss function Performance measure function, also known as error rate.  It reflect how many samples were misclassified.   Where 𝐼 is an indicator function, 𝑦(𝑖) is the prediction corresponding at 𝑖 𝑡ℎ dataset element, 𝑦(𝑖) is the desired output for this sample.
  • 19.
    Residual Sum ofSquared errors function (RSS)  It is commonly used for regression.  It can be defined as the sum of quadratic error.   Where 𝑦(𝑖) is the prediction corresponding at 𝑖 𝑡ℎ dataset element, 𝑦(𝑖) is the desired output for this sample.
  • 20.
    Similarity function  Itis commonly used for clustering tasks.  Don’t need a expected value.   Where 𝑥(𝑖) and 𝑥(𝑗) are two different element from the dataset.
  • 21.
  • 22.
    Experience E  Howthe machine reacts for each stimulus during the learning phase.  Some authors know this term as learning paradigms.  It can classified on:  Supervised  Unsupervised.
  • 23.
    Supervised learning  Supervisedlearning is a learning paradigm known as learning with a teacher because it refers the analogy of the teacher-student.  When the student makes mistake, the teacher gives feedback to him.  The algorithm makes a mistake when the its output is different respect the label or target.
  • 24.
    Supervised learning  Whenthis happen, the model uses the selected loss function to measures the error.  Then it readjusts its free parameters changing its knowledge, so in the future it can reduces the output error.
  • 25.
    Unsupervised learning  Inthe other hand, unsupervised learning implies that there is not a teacher observing and giving feedback during the learning process.  This experience paradigm is also know as learning without a teacher.
  • 26.
    Unsupervised learning  Itis stretchy correlated with the term empirical evidence because the knowledge is acquired by the observation and the experimentation.  In unsupervised approach the model approach gets experience for each stimulus and learns according the selected objective function.  Then the model moves their free parameters to minimizing the error.
  • 27.
  • 28.
  • 29.
    Learning phase  Weselect a machine learning model: 𝑦 = 𝑓(𝑥: 𝜃)  Where 𝑓 𝑥: 𝜃 is the model, 𝜃 is the free parameter and 𝑦 is the model output.  We select a loss function: 𝐿 𝑦 𝑖 , 𝑦 𝑖 = 𝐿 𝑓 𝑥 𝑖 : 𝜃 , 𝑦 𝑖
  • 30.
  • 31.
  • 32.
    Hidden layers Hidden layersperform a non-linear operation using the output of the previous layer. Input layer Output layer Used for making a prediction 𝑦. We represent output layer as simple linear regression. This is the visible data or the network stimulus.
  • 33.
  • 34.
    MNIST • The MNISTdatabase of handwritten digits, has a training set of 60,000 examples, and a test set of 10,000 examples.
  • 35.
    Objetive  Detect handwrittendigits from an image.
  • 36.
    Input  A gray-scaleimage of 28x28 pixels.  Classification of the image in the range [0,9]. Output
  • 37.
    Task  Classification.  Supervised,because we have labeled data. Learning paradigm
  • 38.
    Performance Measure  Soft-max(Objective function)  Negative Log Likelihood (Cost function)
  • 39.
    Model  Input (28*28). First Hidden layer(1000).  Second hidden layer(100).  Output layer (10).  Accuracy 96.3%.
  • 40.
  • 41.
    Adult Data Set Objective  Predict whether income exceeds $50K/year based on census data.
  • 42.
    Input  A vectorwith data of an adult.  Age, work class, education, marital-status, occupation, race, sex, hours-per-week, native country.  Classification, if the person earns more or less than $50k/year. Output
  • 43.
    References  Haykin, S.(1999). Neural network a comprehensive foundation. New Jersey: Prentice Hall International.  Laoudias, C.; Kemppi, P.; Panayiotou, C.G., "Localization Using Radial Basis Function Networks and Signal Strength Fingerprints in WLAN," Global Telecommunications Conference, 2009. GLOBECOM 2009. IEEE , vol., no., pp.1,6, Nov. 30 2009-Dec. 4 2009  Genming Ding; Jinbao Zhang; Lingwen Zhang; Zhenhui Tan, "Overview of received signal strength based fingerprinting localization in indoor wireless LAN environments," Microwave, Antenna, Propagation and EMC Technologies for Wireless Communications (MAPE), 2013 IEEE 5th International Symposium on , vol., no., pp.160,164, 29-31 Oct. 2013.  Hui Liu; Darabi, H.; Banerjee, P.; Jing Liu, "Survey of Wireless Indoor Positioning Techniques and Systems," Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions  Méndez A. (2015), Neuronal Networks Class Slides.  Nicolas Amiot, Mohamed Laaraiedh, Bernard Uguen. ” PyLayers: An open source dynamic simulator for indoor propagation and localization”. Communications Workshops (ICC), 2013 IEEE International Conference on, Jun 2013, Budapest, Hungary. Communications Workshops (ICC), 2013 IEEE International Conference on, 2013.

Editor's Notes

  • #33 We multiply the layer weight with the previous layer output and apply a sigmoidal function.