ML SFCSE.pptx

MACHINE LEARNING FOR DATA SCIENCE INTRODUCTION
By
NIKHIL GR
STUDENT
CSE
SJCIT

Contents
• Introduction to Data Science
• Applications of Data Science
• Foundations of Data Science
• Machine Learning
• Supervised Learning
• Classification
• Logistic Regression
• Decision Tree
• Random Forest
• K-Nearest Neighbor
• Support Vector Machine
• Regression
• Unsupervised Learning
• Cluster Analysis
• Principal Component Analysis

Introduction to DataScience
• Data science is a multi-disciplinary field which uses scientific
methods, processes, algorithms and systems to extract knowledge
and insights from structured and unstructured data.
• It is a blend of computer Science, Mathematics and
business/domain expertise.

Foundations of DataScience
• Statistics: Descriptive, Inferential.
• Linear Algebra: Matrices, Planes, Vectors, etc.
• Computer Science: Algorithm, Graph Theory, Data Structure,
DBMS, etc.
• Machine Learning: Supervised, Unsupervised, Reinforcement.
• Business Analytics: Predictive, Prescriptive, Descriptive,
Decision.
• Programming: R/Python, SQL, NoSQL.

Machine Learning
• Machine learning is a subfield of computer science which focuses to
develop the computer algorithm to learn from examples and improve
the performance of a task.
• The algorithms in machine learning use training data which is the set
of past observations.
• There are three broad categories of machine learning:
 Supervised Learning: Which learns from labeled examples.
 Unsupervised Learning: Which learns from unlabeled examples.
 Reinforcement Learning: Which learns from environment through feedbacks.
• It develops predictive analytics models which allow researchers, data
scientists to predict about future based on past and current data.

SupervisedLearning
• It is a category of machine learning algorithms. As name indicates, it
is supervised by the presence of output in the training data.
• It learns from the labelled data – input for which output is known.
• It builds a mathematical model of a set of data that contains both the
inputs and the desired outputs.
• A supervised learning algorithm analyzes the training data and
produces an inferred function, which can be used for mapping new
examples.
• Generally, all the supervised learning problems are classified into
Classification and Regression problems.

Classification
• Classification in machine learning is a supervised learning
problem where the output variable is a category, such as “yes”
or “no” or “disease” and “no disease”.
• In this problem, the dependent variable is categorical whose
category is predicted based on several independent variables.
• A classification model attempts to draw some conclusion from
observed values.
• Given one or more inputs a classification model will try to predict
the value of one or more outcomes.
• There are a number of classification models.

Classification through machine learning
algorithms
Following are the popular machine learning algorithms which are
used in classification problems:-
• Logistic Regression
• Decision Tree
• Random Forest
• K-Nearest Neighbor
• Support Vector Machine

LogisticRegression
• This regression model is used when the dependent variable is
categorical.
• There are binary outputs of categories in this case.

DecisionTree
• A Decision tree is a flowchart like tree structure, where each
internal node denotes a test on an attribute, each branch
represents an outcome of the test, and each leaf node holds a
class label.
Example:-

RandomForest
• Randomforests or random decisionforest is an ensemble
learning method that consists a large number of decision trees.
• Each individual tree in the random forest spits out a class
prediction and the class with the most votes becomes our
model’s prediction.
Example:

K-NearestNeighbor
• In k-NN classification, the output is a class membership of a
new observation.
• An object is classified by a plurality vote of its neighbors, with
the object being assigned to the class most common among its
k nearest neighbors.
• Example:

Support VectorMachine
• In Support Vector Machine (SVM), we plot each data item as a
point in n-dimensional space (where n is the number of features
you have) with the value of each feature being the value of a
particular coordinate.
• Then, we perform classification by finding the
hyperplane that differentiate the two classes very well.
• To identify the hyperplane, we try to maximize the distance
between boundary elements of separated classes.
• Variety of kernel functions are used to separate observations
based on whether they are linear separable or non-linearly
separable.

Regression
• Regression in machine learning is supervised learning problem
where the output variable is a real or continuous value, such as
“salary” or “weight”.
• Many different models can be used, the simplest is the linear
regression.
• It tries to fit data with the best hyper-plane which goes through the
points.
• There are various techniques used for regression analysis such as
Linear Regression, Decision Tree Regression, Random Forest
Regression etc.

UnsupervisedLearning
• Unsupervised learning is performed on the unlabeled data –
there are no input output labels (categories) are given in the
data.
• Here the task of machine is to group unsorted information
according to similarities, patterns and differences without any
prior training of data.
• Two of the main methods used in unsupervised learning are:
• Principal component Analysis, and
• Cluster analysis.

Principal ComponentAnalysis
• Principal component analysisis a method of extracting
important variables from a large set of variables available in a
data set.
• It extractslow dimensional set of features from a high
dimensional data set with a motive to capture as much
information as possible.

ClusterAnalysis
Vaibhav Kumar@DIT
University
• Cluster analysis or clustering is the task of grouping a set of
objects in such a way that objects in the same group (called a
cluster) are more similar (in some sense) to each other than to
those in other groups (clusters).
• Cluster analysis can be achieved by various algorithms that
differ significantly in their understanding of what constitutes a
cluster and how to efficiently find them.
Example:

ML SFCSE.pptx

More Related Content

Similar to ML SFCSE.pptx

Recently uploaded

ML SFCSE.pptx