An Introduction to Popular Tools,
Machine Learning, and Visualization
Replay
Introduction to Data Science Tools (R, SQL)
• SQL Commands and basic Handson command line interface
• R installation and Handson
Session 6 : Machine Learning
Agenda:
What is Machine Learning
Categories of Machine Learning
Common Algorithms
 Linear Regression
 Naïve Bayes
 SVM
 Decision Tree
 KNN
 Random Forest
 K- Means Clustering
Machine Learning Process
Real world use case
What is Machine Learning
Machine learning is a sub-field of A
rtificial Intelligence in which computers provide
predictions based on patterns learned directly from
data without being explicitly programmed to do so.
Categories of Machine Learning
What is an Algorithm ?
Algorithms in machine
learning
are mathematical
procedures and
techniques that allow
computers to learn
from data, identify
patterns, make
predictions, or perform
tasks without explicit
Common Algorithms – Linear
Regression
Linear regression algorithm shows
a linear relationship between a
dependent (y) and one or more
independent (x) variables, hence
called as linear regression. Since
linear regression shows the linear
relationship, which means it finds
how the value of the dependent
variable is changing according to
the value of the independent
variable.
Common Algorithms – Naïve
Baye’s
Naïve Bayes algorithms
calculate the probability
that an event will occur,
based on the occurrence
of a related event
Common Algorithms – SVM
• The goal of the SVM algorithm is to
create the best line or decision boundary
that can segregate n-dimensional space
into classes so that we can easily put the
new data point in the correct category in
the future. This best decision boundary is
called a hyperplane.
• SVM chooses the extreme points/vectors
that help in creating the hyperplane.
These extreme cases are called as
support vectors, and hence algorithm is
termed as Support Vector Machine
Common Algorithms – Decision
Tree
It is a tree-structured classifier, where internal
nodes represent the features of a dataset,
branches represent the decision rules and each
leaf node represents the outcome.
•The decisions or the test are performed on the
basis of features of the given dataset.
•It is a graphical representation for getting all
the possible solutions to a problem/decision
based on given conditions.
Common Algorithms – KNN
• K-NN algorithm assumes the similarity
between the new case/data and available
cases and put the new case into the
category that is most similar to the available
categories.
• K-NN algorithm stores all the available data
and classifies a new data point based on the
similarity. This means when new data
appears then it can be easily classified into a
well suite category by using K- NN
algorithm.
• It is also called a lazy learner
algorithm because it does not learn from
the training set immediately instead it
stores the dataset and at the time of
Common Algorithms – Random
Forest
• Random Forest is a classifier that
contains a number of decision trees
on various subsets of the given
dataset and takes the average to
improve the predictive accuracy of
that dataset.
• It is based on the concept
of ensemble learning, which is a
process of combining multiple
classifiers to solve a complex problem
and to improve the performance of
the model.
Common Algorithms – K Means
Clustering
• K-Means Clustering is an
Unsupervised Learning algorithm, which
groups the unlabeled dataset into different
clusters. Here K defines the number of pre-
defined clusters that need to be created in the
process, as if K=2, there will be two clusters,
and for K=3, there will be three clusters, and so
on.
• It is a centroid-based algorithm, where each
cluster is associated with a centroid. The main
aim of this algorithm is to minimize the sum of
distances between the data point and their
corresponding clusters.
Machine Learning Process
• Step 1: Collect and prepare the data
• Step 2: Train the model
• Step 3: Validate the model
• Step 4: Interpret the results
Categories use case in real world
Q What are the main
differences between
supervised and
unsupervised learning?
Q How does a linear
regression model make
predictions
Q In what scenarios would
you prefer using a Decision
Tree over a Random Forest?

Introduction to data visualization tools like Tableau and Power BI and Excel

  • 1.
    An Introduction toPopular Tools, Machine Learning, and Visualization
  • 2.
    Replay Introduction to DataScience Tools (R, SQL) • SQL Commands and basic Handson command line interface • R installation and Handson
  • 3.
    Session 6 :Machine Learning Agenda: What is Machine Learning Categories of Machine Learning Common Algorithms  Linear Regression  Naïve Bayes  SVM  Decision Tree  KNN  Random Forest  K- Means Clustering Machine Learning Process Real world use case
  • 4.
    What is MachineLearning Machine learning is a sub-field of A rtificial Intelligence in which computers provide predictions based on patterns learned directly from data without being explicitly programmed to do so.
  • 5.
  • 6.
    What is anAlgorithm ? Algorithms in machine learning are mathematical procedures and techniques that allow computers to learn from data, identify patterns, make predictions, or perform tasks without explicit
  • 7.
    Common Algorithms –Linear Regression Linear regression algorithm shows a linear relationship between a dependent (y) and one or more independent (x) variables, hence called as linear regression. Since linear regression shows the linear relationship, which means it finds how the value of the dependent variable is changing according to the value of the independent variable.
  • 8.
    Common Algorithms –Naïve Baye’s Naïve Bayes algorithms calculate the probability that an event will occur, based on the occurrence of a related event
  • 9.
    Common Algorithms –SVM • The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-dimensional space into classes so that we can easily put the new data point in the correct category in the future. This best decision boundary is called a hyperplane. • SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases are called as support vectors, and hence algorithm is termed as Support Vector Machine
  • 10.
    Common Algorithms –Decision Tree It is a tree-structured classifier, where internal nodes represent the features of a dataset, branches represent the decision rules and each leaf node represents the outcome. •The decisions or the test are performed on the basis of features of the given dataset. •It is a graphical representation for getting all the possible solutions to a problem/decision based on given conditions.
  • 11.
    Common Algorithms –KNN • K-NN algorithm assumes the similarity between the new case/data and available cases and put the new case into the category that is most similar to the available categories. • K-NN algorithm stores all the available data and classifies a new data point based on the similarity. This means when new data appears then it can be easily classified into a well suite category by using K- NN algorithm. • It is also called a lazy learner algorithm because it does not learn from the training set immediately instead it stores the dataset and at the time of
  • 12.
    Common Algorithms –Random Forest • Random Forest is a classifier that contains a number of decision trees on various subsets of the given dataset and takes the average to improve the predictive accuracy of that dataset. • It is based on the concept of ensemble learning, which is a process of combining multiple classifiers to solve a complex problem and to improve the performance of the model.
  • 13.
    Common Algorithms –K Means Clustering • K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset into different clusters. Here K defines the number of pre- defined clusters that need to be created in the process, as if K=2, there will be two clusters, and for K=3, there will be three clusters, and so on. • It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this algorithm is to minimize the sum of distances between the data point and their corresponding clusters.
  • 14.
    Machine Learning Process •Step 1: Collect and prepare the data • Step 2: Train the model • Step 3: Validate the model • Step 4: Interpret the results
  • 15.
    Categories use casein real world
  • 16.
    Q What arethe main differences between supervised and unsupervised learning?
  • 17.
    Q How doesa linear regression model make predictions
  • 18.
    Q In whatscenarios would you prefer using a Decision Tree over a Random Forest?