Machine Learning by Rj

Machine Learning
Presented by Mr. Raviraj Solanki

Unit-1 Topics
 Introduction to Machine Learning, Model Preparation,
 Modelling and Evaluation
 Human learning versus machine learning,
 types of machine learning,
 applications of machine learning,
 tools for machine learning,
 Machine Learning Activities,
 Data structures for machine learning,
 Data Pre-processing, selecting a model, training a model,
 model representation and interpretability,
 evaluating performance of a model,
 improving performance of a model

Introduction of ML
 Machine learning is a growing technology which enables computers
to learn automatically from past data.
 Machine learning uses various algorithms for building
mathematical models and making predictions using
historical data or information.
 Currently, it is being used for various tasks such as image
recognition, speech recognition, email
filtering, Facebook auto-tagging, recommender system,
and many more.

Machine Learning Definitions
 Algorithm:A Machine Learning algorithm is a set of rules and
statistical techniques used to learn patterns from data and draw
significant information from it. It is the logic behind a Machine
Learning model.An example of a Machine Learning algorithm is the
Linear Regression algorithm.
 Model:A model is the main component of Machine Learning.A
model is trained by using a Machine Learning Algorithm.An
algorithm maps all the decisions that a model is supposed to take
based on the given input, in order to get the correct output.
 PredictorVariable: It is a feature(s) of the data that can be used to
predict the output.

 ResponseVariable: It is the feature or the output variable that
needs to be predicted by using the predictor variable(s).
 Training Data:The Machine Learning model is built using the
training data.The training data helps the model to identify key
trends and patterns essential to predict the output.
 Testing Data: After the model is trained, it must be tested to
evaluate how accurately it can predict an outcome.This is done by
the testing data set.

What is Machine Learning?
 Machine Learning is said as a subset of artificial intelligence that is mainly
concerned with the development of algorithms which allow a computer to
learn from the data and past experiences on their own.
 The term machine learning was first introduced by Arthur Samuel in 1959.
 In simple words, ML is a type of artificial intelligence that extract
patterns out of raw data by using an algorithm or method.
 The main focus of ML is to allow computer systems learn from experience
without being explicitly programmed or human intervention.
 Machine learning enables a machine to automatically learn from data,
improve performance from experiences, and predict things
without being explicitly programmed.

 With the help of sample historical data, which is known as training
data, machine learning algorithms build a mathematical model that
helps in making predictions or decisions without being explicitly
programmed.
 Machine learning brings computer science and statistics together for
creating predictive models.
 Machine learning constructs or uses the algorithms that learn from
historical data.
 The more we will provide the information, the higher will be the
performance.
 A machine has the ability to learn if it can improve its
performance by gaining more data.

Improve their performance (P)
At executing some task (T)
Over time with experience (E)

Human learning versus machine learning
 In traditional programming, a programmer code all the rules in
consultation with an expert in the industry for which software is
being developed.
 Each rule is based on a logical foundation; the machine will execute
an output following the logical statement.
 When the system grows complex, more rules need to be written. It
can quickly become unsustainable to maintain.

 Machine learning is supposed to overcome this issue.
 The machine learns how the input and output data are correlated and
it writes a rule.
 The programmers do not need to write new rules each time there is
new data.
 The algorithms adapt in response to new data and experiences to
improve efficacy over time.

How does Machine Learning work
 A Machine Learning system learns from historical data, builds
the prediction models, and whenever it receives new data,
predicts the output for it.
 The accuracy of predicted output depends upon the amount of data,
as the huge amount of data helps to build a better model which
predicts the output more accurately.
 Suppose we have a complex problem, where we need to
perform some predictions, so instead of writing a code for it, we just
need to feed the data to generic algorithms, and with the help of
these algorithms, machine builds the logic as per the data and predict
the output.
 Machine learning has changed our way of thinking about the
problem.

 The life of Machine Learning programs is straightforward and can be
summarized in the following points:
 Define a question
 Collect data
 Visualize data
 Train algorithm
 Test the Algorithm
 Collect feedback
 Refine the algorithm
 Loop 4-7 until the results are satisfying
 Use the model to make a prediction
 Once the algorithm gets good at drawing the right conclusions, it applies
that knowledge to new sets of data.

1) Supervised Learning
 Supervised learning is a type of machine learning method in which we
provide sample labeled data to the machine learning system in
order to train it, and on that basis, it predicts the output.
 The labeled data set is nothing but the training data set.
 The system creates a model using labeled data to understand the datasets and
learn about each data, once the training and processing are done then
we test the model by providing a sample data to check whether it is
predicting the exact output or not.
 The goal of supervised learning is to map input data with the
output data.
 The supervised learning is based on supervision, and it is the same as when
a student learns things in the supervision of the teacher.

Example of Supervised Learning

 The machine images of Tom and Jerry and the goal is for the
machine to identify and classify the images into two groups (Tom
images and Jerry images).
 The training data set that is fed to the model is labeled, as in, we’re
telling the machine, ‘this is howTom looks and this is
Jerry’.
 By doing so you’re training the machine by using labeled data. In
Supervised Learning, there is a well-defined training phase done
with the help of labeled data.

 Supervised learning can be grouped further in two categories of
algorithms:

 Classification : It is a Supervised Learning task where output is having defined
labels(discrete value).
 Classification algorithms are used when the output variable is categorical,
which means there are two classes such asYes-No, Male-Female,True-false,
etc.
 For example in above Figure , Output – Purchased has defined labels i.e. 0 or
1 ;
 1 means the customer will purchase and 0 means that customer won’t
purchase.
 The goal here is to predict discrete values belonging to a particular class and evaluate on
the basis of accuracy.
 It can be either binary or multi class classification.
 In binary classification, model predicts either 0 or 1 ; yes or no but in case of multi
class classification, model predicts more than one class.
 Example: Gmail classifies mails in more than one classes like social, promotions, updates,
forum.

 Below are some popular classification algorithms which come
under supervised learning:
 Random Forest
 DecisionTrees
 Logistic Regression
 Support vector Machines

 Regression : It is a Supervised Learning task where output is
having continuous value.
 Regression algorithms are used if there is a relationship between the
input variable and the output variable.
 It is used for the prediction of continuous variables, such
asWeather forecasting, MarketTrends, etc.
 Example in above Figure B, Output –Wind Speed is not having any
discrete value but is continuous in the particular range.
 The goal here is to predict a value as much closer to actual
output value as our model can and then evaluation is
done by calculating error value.
 The smaller the error the greater the accuracy of our regression
model.

 Below are some popular Regression algorithms which come
under supervised learning:
 Linear Regression
 RegressionTrees
 Non-Linear Regression
 Bayesian Linear Regression
 Polynomial Regression

Regression Algorithm Classification Algorithm
In Regression, the output variable
must be of continuous nature or real
value.
In Classification, the output variable
must be a discrete value.
The task of the regression algorithm
is to map the input value (x) with
the continuous output variable(y).
The task of the classification algorithm
is to map the input value(x) with the
discrete output variable(y).
RegressionAlgorithms are used with
continuous data.
ClassificationAlgorithms are used with
discrete data.
In Regression, we try to find the
best fit line, which can predict the
output more accurately.
In Classification, we try to find the
decision boundary, which can divide the
dataset into different classes.
Regression algorithms can be used to solve
the regression problems such asWeather
Prediction, House price prediction, etc.
Classification Algorithms can be used to solve
classification problems such as Identification of
spam emails, Speech Recognition, Identification
of cancer cells, etc.

Regression Algorithm Classification Algorithm
The regression Algorithm can be
further divided into Linear and
Non-linear Regression.
The Classification algorithms can be
divided into Binary Classifier and Multi-
class Classifier.

Advantages of Supervised learning:
 With the help of supervised learning, the model can predict the
output on the basis of prior experiences.
 In supervised learning, we can have an exact idea about the classes of
objects.
 Supervised learning model helps us to solve various real-world
problems such as fraud detection, spam filtering, etc.

Disadvantages of supervised learning:
 Supervised learning models are not suitable for handling the
complex tasks.
 Supervised learning cannot predict the correct output if the
test data is different from the training dataset.
 Training required lots of computation times.
 In supervised learning, we need enough knowledge about the classes
of object.

2 ) Unsupervised Learning
 Unsupervised learning is a learning method in which a machine
learns without any supervision.
 The training is provided to the machine with the set of data
that has not been labeled, classified, or categorized, and
the algorithm needs to act on that data without any
supervision.
 The goal of unsupervised learning is to restructure the
input data into new features or a group of objects with
similar patterns.

 In unsupervised learning, we don't have a predetermined result.
The machine tries to find useful insights from the huge amount of
data.
 It can be further classifieds into two categories of algorithms:
 Clustering
 Association

EDA : Exploratory data analysis

 For example, it identifies prominent features ofT om such
as pointy ears, bigger size, etc, to understand that this image
is of type 1.
 Similarly, it finds such features in Jerry and knows that this
image is of type 2.
 Therefore, it classifies the images into two different classes
without knowing whoTom is or Jerry is.

Why Unsupervised Learning?
 Unsupervised machine learning finds all kind of unknown
patterns in data.
 Unsupervised methods help you to find features which can be
useful for categorization.
 It is taken place in real time, so all the input data to be analyzed and
labeled in the presence of learners.
 It is easier to get unlabeled data from a computer than
labeled data, which needs manual intervention.

 Clustering: Clustering is a method of grouping the objects
into clusters such that objects with most similarities
remains into a group and has less or no similarities with
the objects of another group.
 Cluster analysis finds the commonalities between the data objects
and categorizes them as per the presence and absence of those
commonalities.

 Association: An association rule is an unsupervised learning
method which is used for finding the relationships between
variables in the large database.
 It determines the set of items that occurs together in the dataset.
 Association rule makes marketing strategy more effective.
 Such as people who buy X item (suppose a bread) are also tend to
purchaseY (Butter/Jam) item.
 A typical example of Association rule is Market Basket
Analysis.

Unsupervised Learning algorithms:
 K-means clustering
 KNN (k-nearest neighbors)
 Hierarchal clustering
 Anomaly detection
 Neural Networks
 Principle Component Analysis
 Independent Component Analysis
 Apriori algorithm
 Singular value decomposition

Advantages of Unsupervised Learning
 Unsupervised learning is used for more complex tasks as compared
to supervised learning because, in unsupervised learning, we don't
have labeled input data.
 Unsupervised learning is preferable as it is easy to get unlabeled data
in comparison to labeled data.

Disadvantages of Unsupervised Learning
 Unsupervised learning is intrinsically more difficult than supervised
learning as it does not have corresponding output.
 The result of the unsupervised learning algorithm might be less
accurate as input data is not labeled, and algorithms do not know
the exact output in advance.

Supervised Learning Unsupervised Learning
Supervised learning algorithms are
trained using labeled data.
Unsupervised learning algorithms are
trained using unlabeled data.
Supervised learning model takes
direct feedback to check if it is
predicting correct output or not.
Unsupervised learning model does not
take any feedback.
Supervised learning model predicts
the output.
Unsupervised learning model finds the
hidden patterns in data.
In supervised learning, input data is
provided to the model along with the
output.
In unsupervised learning, only input
data is provided to the model.
The goal of supervised learning is to
train the model so that it can predict
the output when it is given new data.
The goal of unsupervised learning is to
find the hidden patterns and useful
insights from the unknown dataset.

Supervised learning needs supervision to
train the model.
Unsupervised learning does not need any
supervision to train the model.
Supervised learning can be categorized
in Classification and Regression problems
.
Unsupervised Learning can be classified
in Clustering andAssociations problems.
Supervised learning can be used for those
cases where we know the input as well as
corresponding outputs.
Unsupervised learning can be used for
those cases where we have only input data
and no corresponding output data.
Supervised learning model produces an
accurate result.
Unsupervised learning model may give
less accurate result as compared to
supervised learning.
Supervised learning is not close to true
Artificial intelligence as in this, we first
train the model for each data, and then
only it can predict the correct output.
Unsupervised learning is more close to
the true Artificial Intelligence as it learns
similarly as a child learns daily routine
things by his experiences.

It includes various algorithms such as
Linear Regression, Logistic Regression,
SupportVector Machine, Multi-class
Classification, Decision tree, Bayesian
Logic, etc.
It includes various algorithms such as
Clustering, KNN, andApriori
algorithm.

3) Reinforcement Learning
 Reinforcement learning is a feedback-based learning method, in
which a learning agent gets a reward for each right action and gets
a penalty for each wrong action.
 The agent learns automatically with these feedbacks and improves
its performance.
 In reinforcement learning, the agent interacts with the environment
and explores it.
 The goal of an agent is to get the most reward points, and hence,
it improves its performance.
 Reinforcement Learning is a part of Machine learning where an agent is
put in an environment and he learns to behave in this
environment by performing certain actions and observing the
rewards which it gets from those actions.

 For ex.You will learn how to live on the island.You will explore the
environment, understand the climate condition, the type of food that
grows there, the dangers of the island, etc.This is exactly how
Reinforcement Learning works, it involves an Agent (you, stuck on
the island) that is put in an unknown environment (island), where he
must learn by observing and performing actions that result in
rewards.
 Ex. KKK
 Reinforcement Learning is mainly used in advanced Machine
Learning areas such as self-driving cars, AplhaGo, etc.
 The robotic dog, which automatically learns the movement of his
arms, is an example of Reinforcement learning.

some important terms used in Reinforcement
 Agent: It is an assumed entity which performs actions in an environment to gain
some reward.
 Environment (e): A scenario that an agent has to face.
 Reward (R): An immediate return given to an agent when he or she performs
specific action or task.
 State (s): State refers to the current situation returned by the environment.
 Policy (π): It is a strategy which applies by the agent to decide the next action
based on the current state.
 Value (V): It is expected long-term return with discount, as compared to the
short-term reward.
 Value Function: It specifies the value of a state that is the total amount of
reward. It is an agent which should be expected beginning from that state.
 Model of the environment: This mimics the behavior of the environment. It
helps you to make inferences to be made and also determine how the
environment will behave.
 Model based methods: It is a method for solving reinforcement learning
problems which use model-based methods.
 Q value or action value (Q): Q value is quite similar to value.The only
difference between the two is that it takes an additional parameter as a current
action.

Reinforcement Learning Algorithms
 Value-Based:
 In a value-based Reinforcement Learning method, you should try to
maximize a value function V(s). In this method, the agent is
expecting a long-term return of the current states under policy π.
 Policy-based:
 In a policy-based RL method, you try to come up with such a policy
that the action performed in every state helps you to gain maximum
reward in the future.
 Two types of policy-based methods are:
 Deterministic: For any state, the same action is produced by the
policy π.
 Stochastic: Every action has a certain probability, which is
determined by the following equation.Stochastic Policy:
 n{as) = PA, = aS, =S]

 Model-Based:
 In this Reinforcement Learning method, you need to create a
virtual model for each environment.The agent learns to perform
in that specific environment.

Reinforcement Learning Supervised Learning
RL works by interacting with the
environment.
Supervised learning works on the
existing dataset.
The RL algorithm works like the
human brain works when making
some decisions.
Supervised Learning works as when
a human learns things in the
supervision of a guide.
There is no labeled dataset is
present
The labeled dataset is present.
No previous training is provided to
the learning agent.
Training is provided to the
algorithm so that it can predict the
output.
RL helps to take decisions
sequentially.
In Supervised learning, decisions
are made when input is given.

Applications of machine learning

1. Image Recognition
 Image recognition is one of the most common applications of
machine learning.
 It is used to identify objects, persons, places, digital images, etc.The
popular use case of image recognition and face detection
is, Automatic friend tagging suggestion:
 Facebook provides us a feature of auto friend tagging suggestion.
Whenever we upload a photo with our Facebook friends, then we
automatically get a tagging suggestion with name, and the technology
behind this is machine learning's face detection and recognition
algorithm.

2. Speech Recognition
 While using Google, we get an option of "Search by voice," it
comes under speech recognition, and it's a popular application of
machine learning.
 Speech recognition is a process of converting voice instructions into
text, and it is also known as "Speech to text", or "Computer
speech recognition."
 At present, machine learning algorithms are widely used by various
applications of speech recognition.
 Google assistant, Siri, Cortana, and Alexa are using speech
recognition technology to follow the voice instructions.

3. Traffic prediction
 If we want to visit a new place, we take help of Google Maps, which
shows us the correct path with the shortest route and predicts the
traffic conditions.
 It predicts the traffic conditions such as whether traffic is cleared,
slow-moving, or heavily congested with the help of two ways:
 RealTime location of the vehicle form Google Map app and sensors
 Average time has taken on past days at the same time.
 Everyone who is using Google Map is helping this app to make it
better.

4. Product recommendations:
 Machine learning is widely used by various e-commerce and
entertainment companies such as Amazon, Netflix, etc., for
product recommendation to the user.
 Whenever we search for some product on Amazon, then we started
getting an advertisement for the same product while internet
surfing on the same browser and this is because of machine learning.

5. Self-driving cars
 One of the most exciting applications of machine learning is self-
driving cars. Machine learning plays a significant role in self-driving
cars.
 Tesla, the most popular car manufacturing company is working on
self-driving car.
 It is using unsupervised learning method to train the car models to
detect people and objects while driving.

6. Email Spam and Malware Filtering
 Whenever we receive a new email, it is filtered automatically as
important, normal, and spam.
 We always receive an important mail in our inbox with the important
symbol and spam emails in our spam box, and the technology behind this
is Machine learning.
 Below are some spam filters used by Gmail:
 Content Filter
 Header filter
 General blacklists filter
 Rules-based filters
 Permission filters
 Some machine learning algorithms such as Multi-Layer
Perceptron, Decision tree, and Naïve Bayes classifier are used for
email spam filtering and malware detection.

7. Virtual Personal Assistant
 We have various virtual personal assistants such as Google
assistant, Alexa, Cortana, Siri.
 As the name suggests, they help us in finding the information using
our voice instruction.
 These assistants can help us in various ways just by our voice
instructions such as Play music, call someone, Open an email,
Scheduling an appointment, etc.

8. Online Fraud Detection
 Machine learning is making our online transaction safe and secure by
detecting fraud transaction.
 Whenever we perform some online transaction, there may be
various ways that a fraudulent transaction can take place such
as fake accounts, fake ids, and steal money in the middle of a
transaction.
 So to detect this, Feed Forward Neural network helps us by
checking whether it is a genuine transaction or a fraud transaction.

 9. Stock Market trading:
 Machine learning is widely used in stock market trading. In the
stock market, there is always a risk of up and downs in shares, so for
this machine learning's long short term memory neural
network is used for the prediction of stock market trends.
 10. Medical Diagnosis:
 In medical science, machine learning is used for diseases diagnoses.
With this, medical technology is growing very fast and able to build
3D models that can predict the exact position of lesions in the brain.
 It helps in finding brain tumors and other brain-related diseases
easily.

11. Automatic Language Translation
 Nowadays, if we visit a new place and we are not aware of the
language then it is not a problem at all, as for this also machine
learning helps us by converting the text into our known languages.
 Google's GNMT (Google Neural MachineTranslation)
provide this feature, which is a Neural Machine Learning that
translates the text into our familiar language, and it called as
automatic translation.

Tools for machine learning
Python
 Python is one of the most popular programming languages of recent
times.
 Python, created by Guido van Rossum in 1991, is an open-
source, high-level, general-purpose programming
language.
 Python is a dynamic programming language that supports object-
oriented, imperative, functional, and procedural
development paradigms.
 Python is very popular in machine learning programming.
 Python is one of the first programming languages that got the
support of machine learning via a variety of libraries and tools.
 Scikit andTensorFlow are two popular machine learning libraries
available to Python developers.

R
 R language is a dynamic, array-based, object-oriented, imperative,
functional, procedural, and reflective computer programming
language.
 The language first appeared in 1993 but has become popular in past few
years among data scientists and machine learning developers for its
functional and statistical algorithm features.
 R language was created by Ross Ihaka and Robert Gentleman at the
University ofAuckland, New Zealand.
 R is open-source and available on r-project.org and Github.
 Currently R is managed and developed under the R Foundation and the R
Development CoreTeam.
 The current version of R is 3.5.2 that was released on Dec 20, 2018.
 R language is one of the most popular programming languages
among data scientists and statistical engineers.
 R supports Linux, OS X, andWindows operating systems.

Matlab
 Matlab (Matrix laboratory) is a licensed commericial software with a
robust support for a wide range of numerical computing.
 MATLAB has a huge user base across industry and academia.
 MATLAB is developed by MathWorks.
 MATLAB also provides extensive support of statistical functions and
has a huge number of machine learning algorithms in-built.
 It also has the ability to scale up for large dataset by parallel
processing on cluster and cloud.

SAS
 SAS (earlier known as ‘statistical Analysis System’) is another
licensed commercial software which provides strong
support for machine learning functionalities.
 Developed in C by SAS had its first release in the year 1976.
 SAS is a software suite computing different components.
 The basic data management functionalities are embedded in the Base
SAS component whereas the other components like SAS/INSIGHT,
Enterprise Miner, SAS/STAT, etc. help in specialized functions
related to data mining and statistical analysis.

Other languages/tools
 Owned by IBM, SPSS (originally named as Statistical package
for the social sciences) is a popular package supporting
specialized data mining and statistical analysis.
 Julia is an open source, liberal license programming
language for numerical analysis and computational science
and also having ability to implement high-performance machine
learning algorithms.

Activities
 Gathering Data
 Data preparation
 DataWrangling
 Analyse Data
 Train the model
 Test the model
 Deployment

1. Gathering Data
 The goal of this step is to identify and obtain all data-related problems.
 In this step, we need to identify the different data sources, as data can be
collected from various sources such as files, database, internet,
or mobile devices.
 It is one of the most important steps of the life cycle.
 The quantity and quality of the collected data will determine the efficiency
of the output.
 The more will be the data, the more accurate will be the prediction.
 This step includes the below tasks:
 Identify various data sources
 Collect data
 Integrate the data obtained from different sources
 By performing the above task, we get a coherent set of data, also called as
a dataset. It will be used in further steps.

2. Data preparation
 After collecting the data, we need to prepare it for further steps.
 Data preparation is a step where we put our data into a
suitable place and prepare it to use in our machine
learning training.
 In this step, first, we put all data together, and then randomize the
ordering of data.
 This step can be further divided into two processes:
 Data exploration:
 It is used to understand the nature of data that we have to work with.
 We need to understand the characteristics, format, and quality of data.
 A better understanding of data leads to an effective outcome. In this, we
find Correlations, general trends, and outliers.
 Data pre-processing:
 Now the next step is preprocessing of data for its analysis.

3. Data Wrangling
 Data wrangling is the process of cleaning and converting raw data
into a useable format.
 It is the process of cleaning the data, selecting the variable to use,
and transforming the data in a proper format to make it more
suitable for analysis in the next step.
 It is one of the most important steps of the complete process.
 Cleaning of data is required to address the quality issues.
 It is not necessary that data we have collected is always of our use as some of
the data may not be useful. In real-world applications, collected data may
have various issues, including:
 MissingValues
 Duplicate data
 Invalid data
 Noise
 So, we use various filtering techniques to clean the data.
 It is mandatory to detect and remove the above issues because it
can negatively affect the quality of the outcome.

4. Data Analysis
 Now the cleaned and prepared data is passed on to the analysis step.This
step involves:
 Selection of analytical techniques
 Building models
 Review the result
 The aim of this step is to build a machine learning model to
analyze the data using various analytical techniques and review
the outcome.
 It starts with the determination of the type of the problems, where we
select the machine learning techniques such
as Classification, Regression, Cluster analysis, Association, etc.
then build the model using prepared data, and evaluate the model.
 Hence, in this step, we take the data and use machine learning
algorithms to build the model.

5. Train Model
 Now the next step is to train the model, in this step we train our
model to improve its performance for better outcome of
the problem.
 We use datasets to train the model using various machine
learning algorithms.
 Training a model is required so that it can understand the
various patterns, rules, and, features.

6. Test Model
 Once our machine learning model has been trained on a given
dataset, then we test the model.
 In this step, we check for the accuracy of our model by
providing a test dataset to it.
 Testing the model determines the percentage accuracy of the
model as per the requirement of project or problem.

7. Deployment
 The last step of machine learning life cycle is deployment, where
we deploy the model in the real-world system.
 If the above-prepared model is producing an accurate result as per
our requirement with acceptable speed, then we deploy the model in
the real system.
 But before deploying the project, we will check whether it
is improving its performance using available data or not.
 The deployment phase is similar to making the final report for a
project.

Basic Data types in ML
 Data can broadly be divided into following two types.
 Qualitative data
 Quantitative data
 Qualitative data
 divided it provides information about the quality of an object or
information which cannot be measured.
 For example, if we consider the quality of performance of students in
terms of‘Good’,‘Average’, and‘Poor’, it falls under the category
of qualitative data.
 Qualitative data is also called categorical data.
 Qualitative data divided into 2 parts,
 Nominal data
 Ordinal data

 Nominal data
 It is one which has no numeric value, but a named value.
 It is used for assigning named values to attributes.
 Nominal values cannot be quantified.
 For examples,
 Blood group:A,B,O,AB, etc
 Nationality: Indian,American, British, etc
 Gender: Male, Female, Other
 We can not do any mathematical operations on nominal data
such as mean, variance, etc.

Ordinal data
 It is used to possessing the properties of nominal data, can also be
natural ordered.
 This means the ordinal data also assigns named values to
attributes but unlike nominal data, they can be arranged in a
sequence of increasing or decreasing value so that we can say
whether a value is better than or greater than another
value.
 For examples,
 Customer satisfaction: ‘Very Happy’,‘Happy’,‘Unhappy’, etc
 Grades: A,B,C,etc
 Median and quartiles can be identified but mean can still not
identified.

Quantitative Data
 It is relates to information about the quantity of an object. Hence,
it can be measured.
 Quantitative Data (Continuous Data) represents measurements and
therefore their values can’t be counted but they can be measured.
 For example, if we consider the attributes ‘marks’, it can be
measured.
 Quantitative data is also termed as numeric data.
 There are two types of Quantitative Data,
 Interval Data
 Ratio Data

 Interval Data
 Interval data is numeric data for which not only the order is known,
but the exact difference b/w values is known.
 For example, CelsiusTemperature.
 The difference b/w 12C and 18C degrees is 6C and same as for
15.5C and 21.5C.
 other examples include data, time, etc.
 For interval data,We can do any mathematical operations on
interval data such as mean, median , mode, variance, SD etc.
 Data don’t have ‘true zero’ value.
 For example, we can not say‘0 temperature’ or‘No temperature’.

 Ratio Data
 It represents numeric data for which exact value can be
measured.
 Absolute zero is available for ratio data.
 OnThese variables we can do mathematical operations.
 For example, Height, weight, age, salary, etc

S.N. Character Quantitative Data Qualitative Data
1.
Definition
These are data that deal
with quantities, values,
or numbers.
These data, on the other hand, deals
with quality.
2.
Measurability Measurable. They are generally not measurable.
3.
Nature of Data
Expressed in numerical
form.
They are descriptive rather than
numerical in nature.
4. Research
Methodology
Conclusive investigative
5. Quantities
measured
Measures quantities such
as length, size, amount,
price, and even duration.
Narratives often make use of adjectives
and other descriptive words to refer to
data on appearance, color, texture, and
other qualities.
6.
Data Structure Structured Unstructured

Data structures for machine learning
Auto MPG data set

DATA REMEDIATION
 Data remediation is a part of data quality.
 Data remediation is an activity that’s focused on cleansing,
organizing and migrating data so it’s fit for purpose or use.
 The process typically involves detecting and correcting (or
removing) corrupt or inaccurate records by replacing, modifying
or deleting the “dirty” data.
 It can be performed manually, with cleansing tools, as a batch
process (script), through data migration or a combination
of these methods.

1. Handling outliers
 An outlier is a piece of data that is an abnormal distance from other
points.
 In other words, it’s data that lies outside the other values in the
set.
 Outliers can have many causes, such as:
 Measurement or input error.
 Data corruption.

 Remove outliers: if the number of records which are outliers is
not many, a simple approach may be to remove them.
 Imputation : one other way is to impute (assign) the value with
mean or median or mode.The value of the most simmiler data
element may also be used for imputation.
 Capping :For the values that lie outside the 1.5[x] IQR
( interquartile range) limits, we can cap them by replacing those
observations below the lower limit value of 5th percentile and those
that lie above the upper limit, with the value of 95th percentile.

Dimensionality Reduction
 In machine learning classification problems, there are often too many
factors on the basis of which the final classification is done.
 These factors are basically variables called features.
 The higher the number of features, the harder it gets to
visualize the training set and then work on it.
 Sometimes, most of these features are correlated, and hence
redundant (unnecessary).
 This is where dimensionality reduction algorithms come into play.
 Dimensionality reduction refers to reducing the number of input variables
for a dataset, by obtaining a set of principal variables.
 It can be divided into feature selection and feature extraction.

 If your data is represented using rows and columns, such as in a spreadsheet,
then the input variables are the columns that are fed as input to a
model to predict the target variable. Input variables are also called
features.
 We can consider the columns of data representing dimensions on an n-
dimensional feature space and the rows of data as points in that
space. This is a useful statistical analysis of a dataset.
 It is often desirable to reduce the number of input features.This reduces the
number of dimensions of the feature space, hence the name “dimensionality
reduction.”
 An intuitive example of dimensionality reduction can be discussed
through a simple e-mail classification problem, where we need to classify
whether the e-mail is spam or not.
 This can involve a large number of features, such as whether or not the e-mail
has a generic title, the content of the e-mail, whether the e-mail uses a template,
etc.

 The most common approach to dimensionality reduction is
called principal components analysis or PCA.
 It makes the large data set simpler, easy to explore and
visualize.
 Principal Component Analysis(PCA) is one of the most popular
linear dimension reduction. Sometimes, it is used alone and
sometimes as a starting solution for other dimension reduction
methods.
 PCA is a projection based method which transforms the
data by projecting it onto a set of orthogonal (right angles)
axes.

Feature subset selection
 Feature subset selection or simply called feature selection, both for
supervised as well as unsupervised learning, try to find out the
optimal subset of the entire feature set which significantly reduces
computational cost without any major impact on the learning
accuracy.
 Feature Subset Selection Methods can be classified into three
broad categories…
 Filter Methods
 Wrapper Methods
 Embedded Methods

Filter Methods
 In this method, select subsets of variables as a pre-processing step,
independently of the used classifier
 It would be worthwhile to note that Variable Ranking-Feature
Selection is a Filter Method.
 Filter Methods are usually fast.
 Filter Methods provide generic selection of features, not tuned by
given learner (universal).

Wrapper Methods
 InWrapper Methods, the Learner is considered a black-box.
 Interface of the black-box is used to score subsets of variables
according to the predictive power of the learner when using the
subsets.
 Results vary for different learners.
 One needs to define: – how to search the space of all possible
variable subsets ?– how to assess the prediction performance of a
learner ?

Embedded Methods
 Embedded Methods are specific to a given learning machine
 Performs variable selection (implicitly) in the process of training
 E.g.WINNOW-algorithm (linear unit with multiplicative
updates).

SOME IMPORTANT POINTS
 Input variable can be denoted by X, while individual input variables are
represented as X1,X2,X3…..Xn and output variable by symbolY.
 The relationship b/w X andY is represented in general form.
 Y= f(X) + e
 f is the target function and ‘e’ is a random error term.
 A cost function (error function) can tell how bad the model is
performing and loss function a function defined on a data point, while
cost function is for entire training data set.
 Objective function takes in data and model (along with parameters) as
input and returns a value.Target is to find values of model parameter to
maximize and minimize the return value.
 There is no one model that works best for every machine learning problem
and that is what ‘No Free Lunch’ theorem also states.
 Supervised learning for solving predictive problems and
unsupervised learning which solve descriptive problems.

Predictive Models
 Models for supervised learning or predictive models, as it
understandable from name itself, try to predict certain value
in an input data set.
 The learning model attempts to establish a relation b/w the target
feature, i.e. the feature being predicted, and predictor
features.
 The predictive models have a clear focus on what they want to
learn and how they want to learn.
 Predictive models, in turn, may need to predict the values of
a category or class to which data instance belongs to.

 Below are example of predictive…
 Predicting win/loss in a cricket match.
 Predicting weather a transaction fraud.
 Predicting whether a customer may move to another
product.
 The models which are used for prediction of target features
of categorical value are known as classification models.
 The target feature known as a class and the categories to
which classes are divided into are called levels.
 Some of the popular classification models include k-Nearest
Neighbor (kNN), Naïve bayes, and DecisionTree.

 Predictive models may also be used to predict numerical
values of the target feature based on the predictor
features. Some examples,
 Prediction of income growth in the succeeding year.
 Prediction of rainfall amount in the coming monsoon.
 The models which are used for prediction of the numerical
value of the target feature of a data instance are known as
regression models.
 Linear Regression
 Logistic Regression

Descriptive Models
 Models for unsupervised learning or descriptive models are
used to describe a data set or gain insight from a data set.
 There is no target feature or single feature of interest in case of
unsupervised learning.
 Based on the value of all features, interesting patterns or
insight are derived about the data set.
 Descriptive models which group together similar data
instance, i.e. data instance having a similar value of the different
features are called clustering models.

 Examples of clustering include..
 Customer grouping or segmentation based on social,
demographic, national, etc factors
 Grouping of music based on different aspects like type,
language, time-period etc.
 Grouping of commodities in an inventory.
 The most popular model for clustering is k-Means.
 Descriptive models are related to pattern discovery is used
for market basket analysis of transactional data.

Training A Model (For supervised Learning)
Holdout Method
 The hold-out method splits the data into training data and test
data.
 Typical ratios used for splitting the data set include 60:40, 80:20 etc.
 Then we build a classifier using the train data and test it using the test
data.
 The hold-out method is usually used when we have thousands of
instances, including several hundred instances from each class.
 This method is only used when we only have one model to evaluate.
Training set
Classifier
Test set
Data

 Once evaluation is complete, all the data can be used to build
the final classifier.
 Generally, the larger the training data the better the
classifier (but returns smaller).
 The larger the test data the more accurate the error
estimate.
 The accuracy we receive from the validation set is not considered
final and another hold-out dataset which is the test dataset is used
to evaluate the final selected model and the error found here is
considered as the generalization error.

Classification: Train, Validation, Test Split
Data
Predictions
Y N
Results Known
Training set
Validation set
+
+
-
-
+
Classifier Builder
Evaluate
+
-
+
-
ClassifierFinalTest Set
+
-
+
-
Final Evaluation
Model
Builder
The test data can’t be used for parameter tuning!

What is Cross Validation?
 CrossValidation is a very useful technique for assessing the
performance of machine learning models.
 It helps in knowing how the machine learning model would
generalize to an independent data set.
 You want to use this technique to estimate how accurate the
predictions your model will give in practice.
 When you are given a machine learning problem, you will be given two
type of data sets — known data (training data set) and unknown data
(test data set).
 By using cross validation, you would be “testing” your machine learning
model in the “training” phase to check for overfitting and to get an idea
about how your machine learning model will generalize to independent
data (he data set which was not used for training the machine learning
model), which is the test data set given in the problem.

K-fold Cross-validation method
 Usually, we split the data set into training and testing sets and use the
training set to train the model and testing set to test the model.
 We then evaluate the model performance based on an error metric to
determine the accuracy of the model.
 This method however, is not very reliable as the accuracy obtained for
one test set can be very different to the accuracy obtained for a
different test set.
 K-fold CrossValidation(CV) provides a solution to this problem by
dividing the data into folds and ensuring that each fold is
used as a testing set at some point.

 K-Fold CV is where a given data set is split into a K number of
sections/folds where each fold is used as a testing set at some
point.
 This is one among the best approach if we have a limited input
data.
 Lets take the scenario of 5-Fold cross validation(K=5).
 Here, the data set is split into 5 folds. In the first iteration, the first
fold is used to test the model and the rest are used to train the
model.
 In the second iteration, 2nd fold is used as the testing set while the
rest serve as the training set.
 This process is repeated until each fold of the 5 folds have been
used as the testing set.
 Then take the average of your recorded scores.That will be the
performance metric for the model.

Steps to perform K-fold validation
 Split the entire data randomly into k folds (value of k shouldn’t be
too small or too high, ideally we choose 5 to 10 depending on the data size).
 The higher value of K leads to less biased model, where as the
lower value of K is similar to the train-test split approach we saw
before.
 Then fit the model using the K — 1 (K minus 1) folds and validate
the model using the remaining Kth fold. Note down the
scores/errors.
 Repeat this process until every K-fold serve as the test set.
 Then take the average of your recorded scores. That will be the
performance metric for the model.

>>> import numpy as np
>>> from sklearn.model_selection import KFold
>>> X = ["a", "b", "c", "d"]
>>> kf = KFold(n_splits=2)
>>> for train, test in kf.split(X):
print("%s %s" % (train, test))
[2 3] [0 1]
[0 1] [2 3]

Approaches in k-fold cross validation
 Two approaches used..
10-fold cross-validation (10-fold CV)
Leave-one-out cross validation (LOOCV)

 With this method we have one data set which we divide
randomly into 10 parts.
 We use 9 of those parts for training and reserve one tenth
for testing.
 We repeat this procedure 10 times each time reserving a
different tenth for testing.
 Calculate the average of all the k test errors and display
the result.
10-fold cross-validation (10-fold CV)

Leave One Out Cross Validation (LOOCV)
 We can use LOOCV when data is limited and you want the absolute
best error estimate for new data.
 LeaveOneOut (or LOO) is a simple cross-validation. Each learning
set is created by taking all the samples except one, the test
set being the sample left out.
 Thus, for n samples, we have n different training sets and n different
tests set.This cross-validation procedure does not waste much data as
only one sample is removed from the training set.
 This approach leaves 1 data point out of training data, i.e. if there
are n data points in the original sample then, n-1 samples
are used to train the model and p points are used as the
validation set.

 This is repeated for all combinations in which the original sample
can be separated this way, and then the error is averaged for
all trials, to give overall effectiveness.
 The number of possible combinations is equal to the number of
data points in the original sample or n.

>>> from sklearn.model_selection import LeaveOneOut
>>> X = [1, 2, 3, 4]
>>> loo = LeaveOneOut()
>>> for train, test in loo.split(X):
. print("%s %s" % (train, test))
[1 2 3] [0]
[0 2 3] [1]
[0 1 3] [2]
[0 1 2] [3]

Bootstrap Sampling
 Bootstrap sampling or simply bootstrapping is a popular way
to identify training and test data from input data set.
 It uses the technique of Simple Random Sampling with
Replacement (SRSWR) , which is a well known technique in
sampling theory for drawing random samples.
 We have seen earlier that k-fold cross-validation divides the
data into separate partitions- say 10 partitions in case of 10 fold
cross-validation. then it uses data instances from partitions as test
data and remaining partitions as training data.

 Bootstrapping randomly picks data instances from input
data set, with the possibility of the same data instance to be
picked multiple times.
 This means that from the input data set having‘n’ data instances,
bootstrapping can create one or more training data sets having‘n’
data instance, some of the data instances being repeated multiple
times.
 This technique is particularly useful in case of input data
sets of small size. i.e. having very less number of data instances.

Example of Bootstrap sampling
 Let’s say we want to find the mean height of all the students in a
school (which has a total population of 1,000). So, how can we
perform this task?
 One approach is to measure the height of all the students and then
compute the mean height.

 Instead of measuring the heights of all the students, we can draw a
random sample of 5 students and measure their heights.
 We would repeat this process 20 times and then average the
collected height data of 100 students (5 x 20).
 This average height would be an estimate of the mean height of all the
students of the school.
 This is the basic idea of Bootstrap Sampling.

Code for bootstrap sampling
# scikit-learn bootstrap
from sklearn.utils import resample
# data sample
data = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6]
# prepare bootstrap sample
boot = resample(data, replace=True, n_samples=4, random_state=1)
print('Bootstrap Sample: %s' % boot)
# out of bag observations
oob = [x for x in data if x not in boot]
print('OOB Sample: %s' % oob)
Output:
Bootstrap Sample: [0.6, 0.4, 0.5, 0.1]
OOB Sample: [0.2, 0.3]

Cross-Validation Bootstrapping
It is a special variant of holdout
method, called repeated holdout.
Hence uses stratified random
sampling approach (without
replacement.)
It uses the technique of simple
random sampling with replacement (
SRSWR), so the same data instance
may be picked up multiple times in a
sample.
The number of possible training/test
data samples that can be drawn using
this technique is finite.
Elements can be repeated in the
sample, possible number of
training/test data samples is
unlimited.

Lazy vs. Eager Learning
 Lazy learning : Simply stores training data (or only minor
processing) and waits until it is given a test tuple.
 Just store Data set without learning from it
 Start classifying data when it receive Test data
 So it takes less time learning and more time classifying data
 e.g. K - Nearest Neighbour, Case - Based Reasoning

 Eager learning : Given a set of training set, constructs a
classification model before receiving new (e.g., test) data to classify.
 When it receive data set it starts classifying (learning)
 Then it does not wait for test data to learn.
 It's fast as it has pre-calculated algorithm.
 So it takes long time learning and less time classifying data.
 e.g. DecisionTree, Naive Bayes,Artificial Neural Networks

Model Representation and
Interpretability

 A model is said to be a good machine learning model if it
generalizes any new input data from the problem domain in a
proper way.
 This helps us to make predictions in the future data, that data
model has never seen.
 Now, suppose we want to check how well our machine learning model
learns and generalizes to the new data.
 For that we have overfitting and underfitting, which are majorly
responsible for the poor performances of the machine learning
algorithms.
 Bias –Assumptions made by a model to make a function easier to learn.
 Variance – If you train your data on training data and obtain a very low
error, upon changing the data and then training the same previous model
you experience high error, this is variance.

Underfitting
 A statistical model or a machine learning algorithm is said to have underfitting
when it cannot capture the underlying trend of the data. (It’s just like
trying to fit undersized cloths!)
 The input features are not explanatory enough to describe the target well.
 Underfitting destroys the accuracy of our machine learning model.
 Its occurrence simply means that our model or the algorithm does not fit
the data well enough.
 It usually happens when we have less data to build an accurate model and
also when we try to build a linear model with a non-linear data.
 In such cases the rules of the machine learning model are too easy and flexible to
be applied on such minimal data and therefore the model will probably make a
lot of wrong predictions.
 Underfitting can be avoided by using more data and also reducing the
features by feature selection.

 Underfitting – High bias and low variance.
 Techniques to reduce underfitting :
1. Increase model complexity.
2. Increase number of features, performing feature engineering
3. Remove noise from the data.
4. Increase the number of epochs or increase the duration of
training to get better results.

Overfitting
 A statistical model is said to be overfitted, when we train it with a lot
of data (just like fitting ourselves in oversized cloths!).
 When a model gets trained with so much of data, it starts learning
from the noise and inaccurate data entries in our data set.
 Then the model does not categorize the data correctly, because of
too many details and noise.
 The causes of overfitting are the non-parametric and non-linear
methods because these types of machine learning algorithms have more
freedom in building the model based on the dataset and therefore they can
really build unrealistic models.
 A solution to avoid overfitting is using a linear algorithm if we
have linear data or using the parameters like the maximal depth
if we are using decision trees.

 Overfitting – High variance and low bias.
 Techniques to reduce overfitting :
1. Increase training data.
2. Reduce model complexity.
3. Early stopping during the training phase.
4. Use dropout for neural networks to tackle overfitting.

Bias-Variance Tradeoff
 Whenever we discuss model prediction, it’s important to understand
prediction errors (bias and variance).
 There is a tradeoff between a model’s ability to minimize bias and
variance.
 If our model is too simple and has very few parameters then it
may have high bias and low variance (Underfitting).
 On the other hand if our model has large number of parameters
then it’s going to have high variance and low bias (overfitting).
 So we need to find the right/good balance without overfitting
and underfitting the data.
 This tradeoff in complexity is why there is a tradeoff between bias and
variance.
 An algorithm can’t be more complex and less complex at the
same time.

 Error due to Bias: The error due to bias is taken as the
difference between the expected (or average) prediction of our
model and the correct value which we are trying to predict.
 Underfitting results in high bias.
 Error due toVariance:The error due to variance is taken as
the variability of a model prediction for a given data point.
 Overfitting results in high variance.

 High Bias LowVariance: Models are consistent but inaccurate on
average.
 High Bias HighVariance : Models are inaccurate and also
inconsistent on average.
 Low Bias LowVariance: Models are accurate and consistent
on averages.We strive for this in our model.
 In fugure, the best solution is to have a model with low bias as well
as low variance. However, that may not be possible in reality.
 Hence, the goal of supervised ML is to achieve a balance b/w bias
and variance.
 For ex., popular supervised learning algorithm k-nearest
Neighbors or kNN, the user configurable parameter‘k’ can be
used to do a trade off b/w bias and variance.

Evaluating Performance of
a model

Supervised learning - classification
 The responsibility of the classification model is to assign class
label to the target feature based on the value of the
predictor feature.
 For ex., in the problem of predicting the win/loss in a cricket
match, the classifier will assign a class value win/loss to target
feature based on the values of other features like whether the team
won the toss, number of spinners in the team, number of wins the
tournament, etc.
 To evaluate the performance of the model, the number of
correct classifications or predictions made by the model
has to be recorded.
 A classification is said to be correct if, say for example in the given
problem, it has been predicted by the model that the team
will win and it has actually win.

 Based on the number of correct and incorrect classifications
or predictions made by a model, the accuracy of the model
is calculated.
 There are 4 possibilities with regards to the cricket match win/loss
prediction:
 The model predicted win and the team won (TP =True Positive)
 The model predicted win and the team lost (FP = False Positive)
 The model predicted loss and the team won (FN = False Negative)
 The model predicted loss and the team lost (TN =True Negative)
 True positives (TP): Predicted positive and are actually positive.
 False positives (FP): Predicted positive and are actually negative.
 False negatives (FN): Predicted negative and are actually positive.
 True negatives (TN): Predicted negative and are actually negative.

Accuracy
 For any classification model, model accuracy is given by total
number of correct classifications (either as the class of interest, i.e.
True Positive or as not the class of interest, i.e.True Negative)
divided by total number of classification done.
Model accuracy = TP +TN
_____________________
TP + FP + FN +TN

Confusion Matrix
 A matrix containing correct and incorrect predictions in
the form ofTPs, FPs, FNs, andTNs is known as confusion
matrix.
 The win/loss predictions of cricket match has two classes of interest
– win and loss.

 https://towardsdatascience.com/various-ways-to-evaluate-a-
machine-learning-models-performance-230449055f15

Machine Learning by Rj

More Related Content

What's hot

Similar to Machine Learning by Rj

More from Shree M.L.Kakadiya MCA mahila college, Amreli

Recently uploaded

Machine Learning by Rj