Machine Learning
Nimrita Koul
Assistant Professor
School of Computing & IT
REVA University
Bangalore
 What is Machine Learning ( ML )
 Machine Intelligence Landscape
 Python Libraries for ML
 ML Algorithms
Agenda
 Machine learning is a branch of artificial intelligence
concerned with the construction and study of systems
that can learn from data.
What is machine learning?
Related Fields
Machine learning is primarily concerned with the
accuracy and effectiveness of the computer system.
psychological models
data
mining
cognitive science
decision theory
information theory
databases
machine
learning
neuroscience
statistics
evolutionary
models
control theory
Traditional Programming
Machine Learning
Computer
Data
Program
Output
Computer
Data
Output
Program
Machine Learning Workflow
A machine learning project has a number of well
known steps:
 Define Problem
 Acquire Data
 Prepare Data
 Choose Algorithm- speed, interpretability,
accuracy,
good memory management, implement-ability.
 Fit Your Model.
 Choose Validation Method and validate
 Predict using your model.
Why ML Is Hard
The Curse Of Dimensionality
• To generalize locally,
you need
representative
examples from all
relevant variations (and
there are an
exponential number of
them)!
• Classical Solution:
Hope for a smooth
enough target function,
or make it smooth by
handcrafting good
(i). Space grows exponentially
(ii). Space is stretched, points
become equidistant
Training, Validation & Testing
Training
set
(observed)
Universal
set
(unobserve
d)
Testing set
(unobserve
d)
Data
acquisition
Practical
usage
 Training is the process of making the system able to
learn.
Training and Testing
 There are several factors affecting the performance:
 Types of training provided
 The form and extent of any initial background knowledge
 The type of feedback provided
 The learning algorithms used
 Two important factors:
 Modeling
 Optimization
Performance
 Supervised learning ( )
 Prediction
 Classification (discrete labels), Regression (real values)
 Unsupervised learning ( )
 Clustering
 Probability distribution estimation
 Finding association (in features)
 Dimension reduction
 Semi-supervised learning
 Reinforcement learning
 Decision making (robot, chess machine)
Types of ML Algorithms
Types of ML Algorithms
Supervised
learning
Unsupervised
learning
Semi-supervised
 Supervised learning
Machine learning structure
 Unsupervised learning
Machine learning structure
Python Libraries for DS/ML
Many popular Python toolboxes/libraries:
 NumPy
 SciPy
 Pandas
 SciKit-Learn
Visualization libraries
 matplotlib
 Seaborn
and many more …
Python Libraries for Data
Science
SciPy:
 collection of algorithms for linear algebra,
differential equations, numerical integration,
optimization, statistics and more
 built on NumPy
Link: https://www.scipy.org/scipylib/
Python Libraries for Data
Science
Pandas:
 adds data structures and tools designed to
work with table-like data
 provides tools for data manipulation:
reshaping, merging, sorting, slicing,
aggregation etc.
 allows handling missing dataLink: http://pandas.pydata.org/
matplotlib:
 python 2D plotting library which produces
publication quality figures in a variety of
hardcopy formats
 a set of functionalities similar to those of
MATLAB
 line plots, scatter plots, bar-charts,
histograms, pie charts etc.Link: https://matplotlib.org/
Python Libraries for Data
Science
Seaborn:
 based on matplotlib
 provides high level interface for drawing
attractive statistical graphics
Link: https://seaborn.pydata.org/
Python Libraries for Data
Science
Link: http://scikit-learn.org/
Python Libraries for Data
Science
SciKit-Learn:
 provides machine learning algorithms:
classification, regression, clustering, model
validation etc.
 built on NumPy, SciPy and matplotlib
Create a Google Colaboratory
1.Open Google Colab at
https://colab.research.google.com/notebooks/welcome.i
pynb
1.Click on ‘New Notebook’ and select Python 2 notebook
or Python 3 notebook.
OR
1.Open Google Drive.
2.Create a new folder for the project.
3.Click on ‘New’ > ‘More’ > ‘Colaboratory’.
Hello World of Machine Learning
 The best small project to start with on a
new tool is the classification of iris flowers
(e.g. the iris dataset).
 Code in my Google colab notebook
Iris Dataset
 A multi-class classification problem
 4 attributes and 150 rows,
Diabetes Data Set
Boston Housing Dataset
 The Boston Housing Dataset consists of
price of houses in various places in
Boston. Alongside with price, the dataset
also provide information such as Crime
(CRIM), areas of non-retail business in the
town (INDUS), the age of people who own
the house (AGE), and there are many
other attributes
Boston Housing Dataset
Attribute Information:
 1. CRIM per capita crime rate by town
 2. ZN proportion of residential land zoned for lots over 25,000 sq.ft.
 3. INDUS proportion of non-retail business acres per town
 4. CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
 5. NOX nitric oxides concentration (parts per 10 million)
 6. RM average number of rooms per dwelling
 7. AGE proportion of owner-occupied units built prior to 1940
 8. DIS weighted distances to five Boston employment centres
 9. RAD index of accessibility to radial highways
 10. TAX full-value property-tax rate per $10,000
 11. PTRATIO pupil-teacher ratio by town
 12. B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
 13. LSTAT % lower status of the population
 14. MEDV Median value of owner-occupied homes in $1000's
 Data Set Information:
 The dataset contains cases from a study that was conducted
between 1958 and 1970 at the University of Chicago's Billings
Hospital on the survival of patients who had undergone
surgery for breast cancer.

Attribute Information:
 1. Age of patient at time of operation (numerical)
2. Patient's year of operation (year - 1900, numerical)
3. Number of positive axillary nodes detected (numerical)
4. Survival status (class attribute)
-- 1 = the patient survived 5 years or longer
-- 2 = the patient died within 5 year
 Other Datasets - https://archive.ics.uci.edu/ml/datasets.html
Haberman's Survival Data Set
ML Algorithms 1 by 1
 Linear Regression
 Logistic Regression
 Decision Tree
 SVM
 Naive Bayes
 kNN
 K-Means
 Random Forest
Linear Regression
 Used to estimate real values (cost of
houses, number of calls, total sales etc.)
based on continuous variable(s).
 Here, we establish relationship between
independent and dependent variables by
fitting a best line.
 This best fit line is known as regression
line and represented by a linear equation
Y= a *X + b.
Linear Regression Model
Linear
component
Intercept
Slope
Random
Error
Dependent
Variable
Independent
Variable
Random Error
component
ii10i εXββY 
 Logistic Regression is a mathematical model to
estimate the probability of an event occurring
having been given some previous data.
 Logistic Regression works with binary data, where
either the event happens (1) or the event does not
happen (0).
 So given some feature x it tries to find out whether
some event y happens or not. In the case where
the event happens, y is given the value 1. If the
event does not happen, then y is given the value
of 0.
 For example, if y represents whether a sports
team wins a match, then y will be 1 if they win the
match or y will be 0 if they do not.
Logistic Regression
 Decision Trees (DTs) are a non-
parametric supervised learning method
used for classification and regression.
 The goal is to create a model that predicts
the value of a target variable by learning
simple decision rules inferred from the
data features.
Decision Tree
 A Support Vector Machine (SVM) is a supervised
machine learning algorithm that can be employed for
both classification and regression purposes.
 SVMs are based on the idea of finding a hyperplane that
best divides a dataset into two classes. Hyperplane is a
line or a surface that linearly separates and classifies a
set of data.
 Support vectors are the data points nearest to the
hyperplane. These are points of a data set that, if
removed, would alter the position of the dividing
hyperplane. Because of this, they can be considered the
critical elements of a data set.
 The distance between the hyperplane and the nearest
data point from either set is known as the margin. The
goal is to choose a hyperplane with the greatest possible
margin between the hyperplane and any point within the
training set, giving a greater chance of new data being
SVM
 Naive Bayes methods are a set of
supervised learning algorithms based on
applying Bayes’ theorem with the “naive”
assumption of conditional independence
between every pair of features given the
value of the class variable.
Naive Bayes
Bayes Theorem
P(H|E) = (P(E|H) * P(H)) / P(E)
where
•P(H|E) is the probability of hypothesis H given the event E,
a posterior probability.
•P(E|H) is the probability of event E
given that the hypothesis H is true.
•P(H) is the probability of hypothesis H being true
(regardless of any related event), or prior probability of H.
•P(E) is the probability of the event occurring
(regardless of the hypothesis).
This is the Bayes Theorem.
 K Nearest Neighbor(KNN) is a very simple, easy to
understand, versatile and one of the topmost machine
learning algorithms.
 KNN is used in the variety of applications such as
finance, healthcare, political science, handwriting
detection, image recognition and video recognition. In
Credit ratings, financial institutes will predict the credit
rating of customers. In loan disbursement, banking
institutes will predict whether the loan is safe or risky.
In political science, classifying potential voters in two
classes will vote or won’t vote.
 KNN algorithm used for both classification and
regression problems.
 Based on feature similarity approach.
K - NN
 K-means clustering is one of the most widely used
unsupervised machine learning algorithms that forms clusters
of data based on the similarity between data instances. For
this particular algorithm to work, the number of clusters has to
be defined beforehand. The K in the K-means refers to the
number of clusters.
 The K-means algorithm starts by randomly choosing a
centroid value for each cluster. After that the algorithm
iteratively performs three steps: (i) Find the Euclidean
distance between each data instance and centroids of all the
clusters; (ii) Assign the data instances to the cluster of the
centroid with nearest distance; (iii) Calculate new centroid
values based on the mean values of the coordinates of all the
data instances from the corresponding cluster.
K-Means
 Random forest is a type of supervised machine
learning algorithm based on ensemble learning.
 Ensemble learning is a type of learning where you
join different types of algorithms or same algorithm
multiple times to form a more powerful prediction
model.
 The random forest algorithm combines multiple
algorithm of the same type i.e. multiple decision
trees, resulting in a forest of trees, hence the
name "Random Forest". The random forest
algorithm can be used for both regression and
classification tasks.
Random Forest
 Pick N random records from the dataset.
 Build a decision tree based on these N records.
 Choose the number of trees you want in your
algorithm and repeat steps 1 and 2.
 In case of a regression problem, for a new record,
each tree in the forest predicts a value for Y
(output). The final value can be calculated by
taking the average of all the values predicted by all
the trees in forest.
 Or, in case of a classification problem, each tree in
the forest predicts the category to which the new
record belongs. Finally, the new record is assigned
to the category that wins the majority vote
How the Random Forest Algorithm Works
 Neural Networks are a machine learning
framework that attempts to mimic the learning
pattern of natural biological neural networks.
Biological neural networks have
interconnected neurons with dendrites that
receive inputs, then based on these inputs
they produce an output signal through an
axon to another neuron. We will try to mimic
this process through the use of Artificial
Neural Networks (ANN)
 The process of creating a neural network
begins with the most basic form, a single
perceptron.
Neural Networks
Perceptron – An Artificial
Neuron
y = f b+ wixi
i=1
n-1
å
æ
è
ç
ö
ø
÷
x1 x2 x3
b
y
w1 w3w2
What is an Artificial Neuron?
 An Artificial Neuron (AN) is a non-linear
parameterized function with restricted
output range
Neural Network
Deep Feed Forward Neural Nets
So what then is learning?
hθ(x(i))
hypothesis
(x(i),y(i))
Forward Propagation
Learning is the adjusting of the weights wi,j such that
the cost function J(θ) is minimized (a form of Hebbian
learning).
Simple learning procedure: Back Propagation (of the error signal)
Applications
Applications
 Recognizing patterns:
 Facial identities or facial expressions
 Handwritten or spoken words
 Medical images
 Generating patterns:
 Generating images or motion sequences
 Recognizing anomalies:
 Unusual sequences of credit card transactions
 Unusual patterns of sensor readings in a nuclear
power plant or unusual sound in your car engine.
 Prediction:
 Future stock prices or currency exchange rates
Applications
 Spam filtering, fraud detection:
 Recommendation systems:
 Information retrieval:
 Find documents or images with similar content.
 Data Visualization:
 Display a huge database in a revealing way
 Facial recognition for Face ID, Facebook automatic tagging,
etc. (CNN)
 Scene and image description for low-sighted people. (CNN,
LSTM)
 Traffic sign classification for self driving cars. (CNN)
 Sentiment analysis to detect hateful speech on
Twitter/Instagram. (LSTM)
 Automated game playing to… play games. (Deep Q-Learning)
 Image style transfer for prismAI, image colorization for old
photographs. (CNN)
Hand Written Digit Recognition
Face Detection
Video Object Detection
Object Tracking in Video
Identifying Book Covers
Displaying the structure of a set of documents
using a deep neural network
When Would We Use Machine
Learning?
 When patterns exists in our data
 Especially when we don’t know what they are
 We can not pin down the functional relationships mathematically
 Else we would just code up the algorithm
 When we have lots of (unlabeled) data
 Labeled training sets harder to come by
 Data is of high-dimension
 High dimension “features”
 For example, sensor data
 Want to “discover” lower-dimension representations
 Dimension reduction
 Aside: Machine Learning is heavily focused on implementability
 Frequently using well know numerical optimization techniques
 Lots of open source code available
 See e.g., libsvm (Support Vector Machines): http://www.csie.ntu.edu.tw/~cjlin/libsvm/
 Most of my code in python: http://scikit-learn.org/stable/ (many others)
 Languages (e.g., octave: https://www.gnu.org/software/octave/)
 Python Machine Learning by Example, Yuxi
Hayden Liu
 Applied Machine Learning, Lecture 10:
Introduction to unsupervised and semi-supervised
learning, Richard Johnson
 Building Machine Learning Systems with Python,
Luis Pedro Coelho
 deeplearning.ai
 https://www.coursera.org/learn/machine-
learning#syllabus
 https://chrisalbon.com/#machine_learning
 https://medium.com/machine-learning-for-
humans/how-to-learn-machine-learning-
24d53bb64aa1
Further Learning Resources
We had a simple overview of some
techniques and algorithms in machine
learning. There are many more techniques
that apply machine learning as a solution.
Machine Learning is New ELECTRICITY.
Conclusion
Q&A
THANK YOU

Nimrita koul Machine Learning

  • 1.
    Machine Learning Nimrita Koul AssistantProfessor School of Computing & IT REVA University Bangalore
  • 2.
     What isMachine Learning ( ML )  Machine Intelligence Landscape  Python Libraries for ML  ML Algorithms Agenda
  • 3.
     Machine learningis a branch of artificial intelligence concerned with the construction and study of systems that can learn from data. What is machine learning?
  • 4.
    Related Fields Machine learningis primarily concerned with the accuracy and effectiveness of the computer system. psychological models data mining cognitive science decision theory information theory databases machine learning neuroscience statistics evolutionary models control theory
  • 6.
  • 7.
    Machine Learning Workflow Amachine learning project has a number of well known steps:  Define Problem  Acquire Data  Prepare Data  Choose Algorithm- speed, interpretability, accuracy, good memory management, implement-ability.  Fit Your Model.  Choose Validation Method and validate  Predict using your model.
  • 8.
    Why ML IsHard The Curse Of Dimensionality • To generalize locally, you need representative examples from all relevant variations (and there are an exponential number of them)! • Classical Solution: Hope for a smooth enough target function, or make it smooth by handcrafting good (i). Space grows exponentially (ii). Space is stretched, points become equidistant
  • 9.
    Training, Validation &Testing Training set (observed) Universal set (unobserve d) Testing set (unobserve d) Data acquisition Practical usage
  • 10.
     Training isthe process of making the system able to learn. Training and Testing
  • 11.
     There areseveral factors affecting the performance:  Types of training provided  The form and extent of any initial background knowledge  The type of feedback provided  The learning algorithms used  Two important factors:  Modeling  Optimization Performance
  • 12.
     Supervised learning( )  Prediction  Classification (discrete labels), Regression (real values)  Unsupervised learning ( )  Clustering  Probability distribution estimation  Finding association (in features)  Dimension reduction  Semi-supervised learning  Reinforcement learning  Decision making (robot, chess machine) Types of ML Algorithms
  • 13.
    Types of MLAlgorithms Supervised learning Unsupervised learning Semi-supervised
  • 14.
  • 15.
  • 16.
    Python Libraries forDS/ML Many popular Python toolboxes/libraries:  NumPy  SciPy  Pandas  SciKit-Learn Visualization libraries  matplotlib  Seaborn and many more …
  • 17.
    Python Libraries forData Science SciPy:  collection of algorithms for linear algebra, differential equations, numerical integration, optimization, statistics and more  built on NumPy Link: https://www.scipy.org/scipylib/
  • 18.
    Python Libraries forData Science Pandas:  adds data structures and tools designed to work with table-like data  provides tools for data manipulation: reshaping, merging, sorting, slicing, aggregation etc.  allows handling missing dataLink: http://pandas.pydata.org/
  • 19.
    matplotlib:  python 2Dplotting library which produces publication quality figures in a variety of hardcopy formats  a set of functionalities similar to those of MATLAB  line plots, scatter plots, bar-charts, histograms, pie charts etc.Link: https://matplotlib.org/ Python Libraries for Data Science
  • 20.
    Seaborn:  based onmatplotlib  provides high level interface for drawing attractive statistical graphics Link: https://seaborn.pydata.org/ Python Libraries for Data Science
  • 21.
    Link: http://scikit-learn.org/ Python Librariesfor Data Science SciKit-Learn:  provides machine learning algorithms: classification, regression, clustering, model validation etc.  built on NumPy, SciPy and matplotlib
  • 22.
    Create a GoogleColaboratory 1.Open Google Colab at https://colab.research.google.com/notebooks/welcome.i pynb 1.Click on ‘New Notebook’ and select Python 2 notebook or Python 3 notebook. OR 1.Open Google Drive. 2.Create a new folder for the project. 3.Click on ‘New’ > ‘More’ > ‘Colaboratory’.
  • 23.
    Hello World ofMachine Learning  The best small project to start with on a new tool is the classification of iris flowers (e.g. the iris dataset).  Code in my Google colab notebook
  • 24.
    Iris Dataset  Amulti-class classification problem  4 attributes and 150 rows,
  • 26.
  • 27.
    Boston Housing Dataset The Boston Housing Dataset consists of price of houses in various places in Boston. Alongside with price, the dataset also provide information such as Crime (CRIM), areas of non-retail business in the town (INDUS), the age of people who own the house (AGE), and there are many other attributes
  • 28.
    Boston Housing Dataset AttributeInformation:  1. CRIM per capita crime rate by town  2. ZN proportion of residential land zoned for lots over 25,000 sq.ft.  3. INDUS proportion of non-retail business acres per town  4. CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)  5. NOX nitric oxides concentration (parts per 10 million)  6. RM average number of rooms per dwelling  7. AGE proportion of owner-occupied units built prior to 1940  8. DIS weighted distances to five Boston employment centres  9. RAD index of accessibility to radial highways  10. TAX full-value property-tax rate per $10,000  11. PTRATIO pupil-teacher ratio by town  12. B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town  13. LSTAT % lower status of the population  14. MEDV Median value of owner-occupied homes in $1000's
  • 29.
     Data SetInformation:  The dataset contains cases from a study that was conducted between 1958 and 1970 at the University of Chicago's Billings Hospital on the survival of patients who had undergone surgery for breast cancer.  Attribute Information:  1. Age of patient at time of operation (numerical) 2. Patient's year of operation (year - 1900, numerical) 3. Number of positive axillary nodes detected (numerical) 4. Survival status (class attribute) -- 1 = the patient survived 5 years or longer -- 2 = the patient died within 5 year  Other Datasets - https://archive.ics.uci.edu/ml/datasets.html Haberman's Survival Data Set
  • 31.
    ML Algorithms 1by 1  Linear Regression  Logistic Regression  Decision Tree  SVM  Naive Bayes  kNN  K-Means  Random Forest
  • 32.
    Linear Regression  Usedto estimate real values (cost of houses, number of calls, total sales etc.) based on continuous variable(s).  Here, we establish relationship between independent and dependent variables by fitting a best line.  This best fit line is known as regression line and represented by a linear equation Y= a *X + b.
  • 33.
  • 34.
     Logistic Regressionis a mathematical model to estimate the probability of an event occurring having been given some previous data.  Logistic Regression works with binary data, where either the event happens (1) or the event does not happen (0).  So given some feature x it tries to find out whether some event y happens or not. In the case where the event happens, y is given the value 1. If the event does not happen, then y is given the value of 0.  For example, if y represents whether a sports team wins a match, then y will be 1 if they win the match or y will be 0 if they do not. Logistic Regression
  • 36.
     Decision Trees(DTs) are a non- parametric supervised learning method used for classification and regression.  The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features. Decision Tree
  • 38.
     A SupportVector Machine (SVM) is a supervised machine learning algorithm that can be employed for both classification and regression purposes.  SVMs are based on the idea of finding a hyperplane that best divides a dataset into two classes. Hyperplane is a line or a surface that linearly separates and classifies a set of data.  Support vectors are the data points nearest to the hyperplane. These are points of a data set that, if removed, would alter the position of the dividing hyperplane. Because of this, they can be considered the critical elements of a data set.  The distance between the hyperplane and the nearest data point from either set is known as the margin. The goal is to choose a hyperplane with the greatest possible margin between the hyperplane and any point within the training set, giving a greater chance of new data being SVM
  • 39.
     Naive Bayesmethods are a set of supervised learning algorithms based on applying Bayes’ theorem with the “naive” assumption of conditional independence between every pair of features given the value of the class variable. Naive Bayes
  • 40.
    Bayes Theorem P(H|E) =(P(E|H) * P(H)) / P(E) where •P(H|E) is the probability of hypothesis H given the event E, a posterior probability. •P(E|H) is the probability of event E given that the hypothesis H is true. •P(H) is the probability of hypothesis H being true (regardless of any related event), or prior probability of H. •P(E) is the probability of the event occurring (regardless of the hypothesis). This is the Bayes Theorem.
  • 41.
     K NearestNeighbor(KNN) is a very simple, easy to understand, versatile and one of the topmost machine learning algorithms.  KNN is used in the variety of applications such as finance, healthcare, political science, handwriting detection, image recognition and video recognition. In Credit ratings, financial institutes will predict the credit rating of customers. In loan disbursement, banking institutes will predict whether the loan is safe or risky. In political science, classifying potential voters in two classes will vote or won’t vote.  KNN algorithm used for both classification and regression problems.  Based on feature similarity approach. K - NN
  • 43.
     K-means clusteringis one of the most widely used unsupervised machine learning algorithms that forms clusters of data based on the similarity between data instances. For this particular algorithm to work, the number of clusters has to be defined beforehand. The K in the K-means refers to the number of clusters.  The K-means algorithm starts by randomly choosing a centroid value for each cluster. After that the algorithm iteratively performs three steps: (i) Find the Euclidean distance between each data instance and centroids of all the clusters; (ii) Assign the data instances to the cluster of the centroid with nearest distance; (iii) Calculate new centroid values based on the mean values of the coordinates of all the data instances from the corresponding cluster. K-Means
  • 44.
     Random forestis a type of supervised machine learning algorithm based on ensemble learning.  Ensemble learning is a type of learning where you join different types of algorithms or same algorithm multiple times to form a more powerful prediction model.  The random forest algorithm combines multiple algorithm of the same type i.e. multiple decision trees, resulting in a forest of trees, hence the name "Random Forest". The random forest algorithm can be used for both regression and classification tasks. Random Forest
  • 45.
     Pick Nrandom records from the dataset.  Build a decision tree based on these N records.  Choose the number of trees you want in your algorithm and repeat steps 1 and 2.  In case of a regression problem, for a new record, each tree in the forest predicts a value for Y (output). The final value can be calculated by taking the average of all the values predicted by all the trees in forest.  Or, in case of a classification problem, each tree in the forest predicts the category to which the new record belongs. Finally, the new record is assigned to the category that wins the majority vote How the Random Forest Algorithm Works
  • 46.
     Neural Networksare a machine learning framework that attempts to mimic the learning pattern of natural biological neural networks. Biological neural networks have interconnected neurons with dendrites that receive inputs, then based on these inputs they produce an output signal through an axon to another neuron. We will try to mimic this process through the use of Artificial Neural Networks (ANN)  The process of creating a neural network begins with the most basic form, a single perceptron. Neural Networks
  • 47.
    Perceptron – AnArtificial Neuron
  • 48.
    y = fb+ wixi i=1 n-1 å æ è ç ö ø ÷ x1 x2 x3 b y w1 w3w2 What is an Artificial Neuron?  An Artificial Neuron (AN) is a non-linear parameterized function with restricted output range
  • 49.
  • 50.
    Deep Feed ForwardNeural Nets So what then is learning? hθ(x(i)) hypothesis (x(i),y(i)) Forward Propagation Learning is the adjusting of the weights wi,j such that the cost function J(θ) is minimized (a form of Hebbian learning). Simple learning procedure: Back Propagation (of the error signal)
  • 51.
  • 52.
    Applications  Recognizing patterns: Facial identities or facial expressions  Handwritten or spoken words  Medical images  Generating patterns:  Generating images or motion sequences  Recognizing anomalies:  Unusual sequences of credit card transactions  Unusual patterns of sensor readings in a nuclear power plant or unusual sound in your car engine.  Prediction:  Future stock prices or currency exchange rates
  • 53.
    Applications  Spam filtering,fraud detection:  Recommendation systems:  Information retrieval:  Find documents or images with similar content.  Data Visualization:  Display a huge database in a revealing way  Facial recognition for Face ID, Facebook automatic tagging, etc. (CNN)  Scene and image description for low-sighted people. (CNN, LSTM)  Traffic sign classification for self driving cars. (CNN)  Sentiment analysis to detect hateful speech on Twitter/Instagram. (LSTM)  Automated game playing to… play games. (Deep Q-Learning)  Image style transfer for prismAI, image colorization for old photographs. (CNN)
  • 54.
    Hand Written DigitRecognition
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
    Displaying the structureof a set of documents using a deep neural network
  • 61.
    When Would WeUse Machine Learning?  When patterns exists in our data  Especially when we don’t know what they are  We can not pin down the functional relationships mathematically  Else we would just code up the algorithm  When we have lots of (unlabeled) data  Labeled training sets harder to come by  Data is of high-dimension  High dimension “features”  For example, sensor data  Want to “discover” lower-dimension representations  Dimension reduction  Aside: Machine Learning is heavily focused on implementability  Frequently using well know numerical optimization techniques  Lots of open source code available  See e.g., libsvm (Support Vector Machines): http://www.csie.ntu.edu.tw/~cjlin/libsvm/  Most of my code in python: http://scikit-learn.org/stable/ (many others)  Languages (e.g., octave: https://www.gnu.org/software/octave/)
  • 62.
     Python MachineLearning by Example, Yuxi Hayden Liu  Applied Machine Learning, Lecture 10: Introduction to unsupervised and semi-supervised learning, Richard Johnson  Building Machine Learning Systems with Python, Luis Pedro Coelho  deeplearning.ai  https://www.coursera.org/learn/machine- learning#syllabus  https://chrisalbon.com/#machine_learning  https://medium.com/machine-learning-for- humans/how-to-learn-machine-learning- 24d53bb64aa1 Further Learning Resources
  • 63.
    We had asimple overview of some techniques and algorithms in machine learning. There are many more techniques that apply machine learning as a solution. Machine Learning is New ELECTRICITY. Conclusion
  • 64.