Nimrita koul Machine Learning

Machine Learning
Nimrita Koul
Assistant Professor
School of Computing & IT
REVA University
Bangalore

 What is Machine Learning ( ML )
 Machine Intelligence Landscape
 Python Libraries for ML
 ML Algorithms
Agenda

 Machine learning is a branch of artificial intelligence
concerned with the construction and study of systems
that can learn from data.
What is machine learning?

Related Fields
Machine learning is primarily concerned with the
accuracy and effectiveness of the computer system.
psychological models
data
mining
cognitive science
decision theory
information theory
databases
machine
learning
neuroscience
statistics
evolutionary
models
control theory

Traditional Programming
Machine Learning
Computer
Data
Program
Output
Computer
Data
Output
Program

Machine Learning Workflow
A machine learning project has a number of well
known steps:
 Define Problem
 Acquire Data
 Prepare Data
 Choose Algorithm- speed, interpretability,
accuracy,
good memory management, implement-ability.
 Fit Your Model.
 Choose Validation Method and validate
 Predict using your model.

Why ML Is Hard
The Curse Of Dimensionality
• To generalize locally,
you need
representative
examples from all
relevant variations (and
there are an
exponential number of
them)!
• Classical Solution:
Hope for a smooth
enough target function,
or make it smooth by
handcrafting good
(i). Space grows exponentially
(ii). Space is stretched, points
become equidistant

Training, Validation & Testing
Training
set
(observed)
Universal
set
(unobserve
d)
Testing set
(unobserve
d)
Data
acquisition
Practical
usage

 Training is the process of making the system able to
learn.
Training and Testing

 There are several factors affecting the performance:
 Types of training provided
 The form and extent of any initial background knowledge
 The type of feedback provided
 The learning algorithms used
 Two important factors:
 Modeling
 Optimization
Performance

 Supervised learning ( )
 Prediction
 Classification (discrete labels), Regression (real values)
 Unsupervised learning ( )
 Clustering
 Probability distribution estimation
 Finding association (in features)
 Dimension reduction
 Semi-supervised learning
 Reinforcement learning
 Decision making (robot, chess machine)
Types of ML Algorithms

Types of ML Algorithms
Supervised
learning
Unsupervised
learning
Semi-supervised

 Supervised learning
Machine learning structure

 Unsupervised learning
Machine learning structure

Python Libraries for DS/ML
Many popular Python toolboxes/libraries:
 NumPy
 SciPy
 Pandas
 SciKit-Learn
Visualization libraries
 matplotlib
 Seaborn
and many more …

Python Libraries for Data
Science
SciPy:
 collection of algorithms for linear algebra,
differential equations, numerical integration,
optimization, statistics and more
 built on NumPy
Link: https://www.scipy.org/scipylib/

Science
Pandas:
 adds data structures and tools designed to
work with table-like data
 provides tools for data manipulation:
reshaping, merging, sorting, slicing,
aggregation etc.
 allows handling missing dataLink: http://pandas.pydata.org/

matplotlib:
 python 2D plotting library which produces
publication quality figures in a variety of
hardcopy formats
 a set of functionalities similar to those of
MATLAB
 line plots, scatter plots, bar-charts,
histograms, pie charts etc.Link: https://matplotlib.org/
Science

Seaborn:
 based on matplotlib
 provides high level interface for drawing
attractive statistical graphics
Link: https://seaborn.pydata.org/
Science

Link: http://scikit-learn.org/
Science
SciKit-Learn:
 provides machine learning algorithms:
classification, regression, clustering, model
validation etc.
 built on NumPy, SciPy and matplotlib

Create a Google Colaboratory
1.Open Google Colab at
https://colab.research.google.com/notebooks/welcome.i
pynb
1.Click on ‘New Notebook’ and select Python 2 notebook
or Python 3 notebook.
OR
1.Open Google Drive.
2.Create a new folder for the project.
3.Click on ‘New’ > ‘More’ > ‘Colaboratory’.

Hello World of Machine Learning
 The best small project to start with on a
new tool is the classification of iris flowers
(e.g. the iris dataset).
 Code in my Google colab notebook

Iris Dataset
 A multi-class classification problem
 4 attributes and 150 rows,

Boston Housing Dataset
 The Boston Housing Dataset consists of
price of houses in various places in
Boston. Alongside with price, the dataset
also provide information such as Crime
(CRIM), areas of non-retail business in the
town (INDUS), the age of people who own
the house (AGE), and there are many
other attributes

Boston Housing Dataset
Attribute Information:
 1. CRIM per capita crime rate by town
 2. ZN proportion of residential land zoned for lots over 25,000 sq.ft.
 3. INDUS proportion of non-retail business acres per town
 4. CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
 5. NOX nitric oxides concentration (parts per 10 million)
 6. RM average number of rooms per dwelling
 7. AGE proportion of owner-occupied units built prior to 1940
 8. DIS weighted distances to five Boston employment centres
 9. RAD index of accessibility to radial highways
 10. TAX full-value property-tax rate per $10,000
 11. PTRATIO pupil-teacher ratio by town
 12. B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
 13. LSTAT % lower status of the population
 14. MEDV Median value of owner-occupied homes in $1000's

 Data Set Information:
 The dataset contains cases from a study that was conducted
between 1958 and 1970 at the University of Chicago's Billings
Hospital on the survival of patients who had undergone
surgery for breast cancer.

Attribute Information:
 1. Age of patient at time of operation (numerical)
2. Patient's year of operation (year - 1900, numerical)
3. Number of positive axillary nodes detected (numerical)
4. Survival status (class attribute)
-- 1 = the patient survived 5 years or longer
-- 2 = the patient died within 5 year
 Other Datasets - https://archive.ics.uci.edu/ml/datasets.html
Haberman's Survival Data Set

ML Algorithms 1 by 1
 Linear Regression
 Logistic Regression
 Decision Tree
 SVM
 Naive Bayes
 kNN
 K-Means
 Random Forest

Linear Regression
 Used to estimate real values (cost of
houses, number of calls, total sales etc.)
based on continuous variable(s).
 Here, we establish relationship between
independent and dependent variables by
fitting a best line.
 This best fit line is known as regression
line and represented by a linear equation
Y= a *X + b.

Linear Regression Model
Linear
component
Intercept
Slope
Random
Error
Dependent
Variable
Independent
Variable
Random Error
component
ii10i εXββY 

 Logistic Regression is a mathematical model to
estimate the probability of an event occurring
having been given some previous data.
 Logistic Regression works with binary data, where
either the event happens (1) or the event does not
happen (0).
 So given some feature x it tries to find out whether
some event y happens or not. In the case where
the event happens, y is given the value 1. If the
event does not happen, then y is given the value
of 0.
 For example, if y represents whether a sports
team wins a match, then y will be 1 if they win the
match or y will be 0 if they do not.
Logistic Regression

 Decision Trees (DTs) are a non-
parametric supervised learning method
used for classification and regression.
 The goal is to create a model that predicts
the value of a target variable by learning
simple decision rules inferred from the
data features.
Decision Tree

 A Support Vector Machine (SVM) is a supervised
machine learning algorithm that can be employed for
both classification and regression purposes.
 SVMs are based on the idea of finding a hyperplane that
best divides a dataset into two classes. Hyperplane is a
line or a surface that linearly separates and classifies a
set of data.
 Support vectors are the data points nearest to the
hyperplane. These are points of a data set that, if
removed, would alter the position of the dividing
hyperplane. Because of this, they can be considered the
critical elements of a data set.
 The distance between the hyperplane and the nearest
data point from either set is known as the margin. The
goal is to choose a hyperplane with the greatest possible
margin between the hyperplane and any point within the
training set, giving a greater chance of new data being
SVM

 Naive Bayes methods are a set of
supervised learning algorithms based on
applying Bayes’ theorem with the “naive”
assumption of conditional independence
between every pair of features given the
value of the class variable.
Naive Bayes

Bayes Theorem
P(H|E) = (P(E|H) * P(H)) / P(E)
where
•P(H|E) is the probability of hypothesis H given the event E,
a posterior probability.
•P(E|H) is the probability of event E
given that the hypothesis H is true.
•P(H) is the probability of hypothesis H being true
(regardless of any related event), or prior probability of H.
•P(E) is the probability of the event occurring
(regardless of the hypothesis).
This is the Bayes Theorem.

 K Nearest Neighbor(KNN) is a very simple, easy to
understand, versatile and one of the topmost machine
learning algorithms.
 KNN is used in the variety of applications such as
finance, healthcare, political science, handwriting
detection, image recognition and video recognition. In
Credit ratings, financial institutes will predict the credit
rating of customers. In loan disbursement, banking
institutes will predict whether the loan is safe or risky.
In political science, classifying potential voters in two
classes will vote or won’t vote.
 KNN algorithm used for both classification and
regression problems.
 Based on feature similarity approach.
K - NN

 K-means clustering is one of the most widely used
unsupervised machine learning algorithms that forms clusters
of data based on the similarity between data instances. For
this particular algorithm to work, the number of clusters has to
be defined beforehand. The K in the K-means refers to the
number of clusters.
 The K-means algorithm starts by randomly choosing a
centroid value for each cluster. After that the algorithm
iteratively performs three steps: (i) Find the Euclidean
distance between each data instance and centroids of all the
clusters; (ii) Assign the data instances to the cluster of the
centroid with nearest distance; (iii) Calculate new centroid
values based on the mean values of the coordinates of all the
data instances from the corresponding cluster.
K-Means

 Random forest is a type of supervised machine
learning algorithm based on ensemble learning.
 Ensemble learning is a type of learning where you
join different types of algorithms or same algorithm
multiple times to form a more powerful prediction
model.
 The random forest algorithm combines multiple
algorithm of the same type i.e. multiple decision
trees, resulting in a forest of trees, hence the
name "Random Forest". The random forest
algorithm can be used for both regression and
classification tasks.
Random Forest

 Pick N random records from the dataset.
 Build a decision tree based on these N records.
 Choose the number of trees you want in your
algorithm and repeat steps 1 and 2.
 In case of a regression problem, for a new record,
each tree in the forest predicts a value for Y
(output). The final value can be calculated by
taking the average of all the values predicted by all
the trees in forest.
 Or, in case of a classification problem, each tree in
the forest predicts the category to which the new
record belongs. Finally, the new record is assigned
to the category that wins the majority vote
How the Random Forest Algorithm Works

 Neural Networks are a machine learning
framework that attempts to mimic the learning
pattern of natural biological neural networks.
Biological neural networks have
interconnected neurons with dendrites that
receive inputs, then based on these inputs
they produce an output signal through an
axon to another neuron. We will try to mimic
this process through the use of Artificial
Neural Networks (ANN)
 The process of creating a neural network
begins with the most basic form, a single
perceptron.
Neural Networks

Perceptron – An Artificial
Neuron

y = f b+ wixi
i=1
n-1
å
æ
è
ç
ö
ø
÷
x1 x2 x3
b
y
w1 w3w2
What is an Artificial Neuron?
 An Artificial Neuron (AN) is a non-linear
parameterized function with restricted
output range

Deep Feed Forward Neural Nets
So what then is learning?
hθ(x(i))
hypothesis
(x(i),y(i))
Forward Propagation
Learning is the adjusting of the weights wi,j such that
the cost function J(θ) is minimized (a form of Hebbian
learning).
Simple learning procedure: Back Propagation (of the error signal)

Applications
 Recognizing patterns:
 Facial identities or facial expressions
 Handwritten or spoken words
 Medical images
 Generating patterns:
 Generating images or motion sequences
 Recognizing anomalies:
 Unusual sequences of credit card transactions
 Unusual patterns of sensor readings in a nuclear
power plant or unusual sound in your car engine.
 Prediction:
 Future stock prices or currency exchange rates

Applications
 Spam filtering, fraud detection:
 Recommendation systems:
 Information retrieval:
 Find documents or images with similar content.
 Data Visualization:
 Display a huge database in a revealing way
 Facial recognition for Face ID, Facebook automatic tagging,
etc. (CNN)
 Scene and image description for low-sighted people. (CNN,
LSTM)
 Traffic sign classification for self driving cars. (CNN)
 Sentiment analysis to detect hateful speech on
Twitter/Instagram. (LSTM)
 Automated game playing to… play games. (Deep Q-Learning)
 Image style transfer for prismAI, image colorization for old
photographs. (CNN)

Hand Written Digit Recognition

Displaying the structure of a set of documents
using a deep neural network

When Would We Use Machine
Learning?
 When patterns exists in our data
 Especially when we don’t know what they are
 We can not pin down the functional relationships mathematically
 Else we would just code up the algorithm
 When we have lots of (unlabeled) data
 Labeled training sets harder to come by
 Data is of high-dimension
 High dimension “features”
 For example, sensor data
 Want to “discover” lower-dimension representations
 Dimension reduction
 Aside: Machine Learning is heavily focused on implementability
 Frequently using well know numerical optimization techniques
 Lots of open source code available
 See e.g., libsvm (Support Vector Machines): http://www.csie.ntu.edu.tw/~cjlin/libsvm/
 Most of my code in python: http://scikit-learn.org/stable/ (many others)
 Languages (e.g., octave: https://www.gnu.org/software/octave/)

 Python Machine Learning by Example, Yuxi
Hayden Liu
 Applied Machine Learning, Lecture 10:
Introduction to unsupervised and semi-supervised
learning, Richard Johnson
 Building Machine Learning Systems with Python,
Luis Pedro Coelho
 deeplearning.ai
 https://www.coursera.org/learn/machine-
learning#syllabus
 https://chrisalbon.com/#machine_learning
 https://medium.com/machine-learning-for-
humans/how-to-learn-machine-learning-
24d53bb64aa1
Further Learning Resources

We had a simple overview of some
techniques and algorithms in machine
learning. There are many more techniques
that apply machine learning as a solution.
Machine Learning is New ELECTRICITY.
Conclusion

Nimrita koul Machine Learning

More Related Content

What's hot

Similar to Nimrita koul Machine Learning

More from Nimrita Koul

Recently uploaded

Nimrita koul Machine Learning