Machine Learning (simplified)

Linear
Regression
Linear approach for modelling the relationship between a
scalar dependent variable y and one or more explanatory
variables (or independent variables) x
Best fit line using Least Squares Regression
Decision
Tree
Used in classification problems with predefined target
variable. Decision tree is a tree in which each branch
node represents a choice between a number of
alternatives and each leaf node represents a decision.
Tree models where the target variable can take a
discrete set of values are called classification trees. In
these tree structures, leaves represent class labels and
branches represent conjunctions of features that lead
to those class labels.

K Nearest
Neighbours
Instance-based learning, or lazy learning,
function is only approximated locally and
all computation is deferred until
classification.
Used for classification & regression
Logistic
Regression
A classification model (class variable is
categorical)
It handles all types of relationships by
applying non-linear log transforms to the
predicted odds-ratio
Used for classification problems that are
binary such as pass/fail, fraud/genuine

Naïve Bayes
Naive Bayes classifiers are a family of simple
probabilistic classifiers based on applying Bayes'
theorem with strong (naive) independence
assumptions between the features.
It is used for Binary and Multiclass classification
problems
Principal
Component
Analysis
(PCA)
Statistical procedure that uses an orthogonal
transformation to convert a set of observations
of possibly correlated variables into a set of
values of linearly uncorrelated variables called
principal components.
PCA is mostly used as a tool in exploratory data
analysis and for making predictive models

K-Means
method of vector quantization, originally from
signal processing, that is popular for cluster
analysis in data mining. k-means clustering aims
to partition n observations into k clusters in
which each observation belongs to the cluster
with the nearest mean, serving as a prototype of
the cluster. This results in a partitioning of the
data space into Voronoi cells
Hierarchical
Clustering
hierarchical clustering is a method of cluster
analysis which seeks to build a hierarchy of
clusters. The merges and splits are determined
in a greedy manner. The results of hierarchical
clustering are usually presented in a
dendrogram.

Apriori
Algorithm
Used for frequent item set mining and association rule
learning over transactional databases.
It identifies the frequent individual items in the database
and extends them to larger and larger item sets as long as
those item sets appear sufficiently often in the database.
The frequent item sets determined by Apriori can be used
to determine association rules which highlight general
trends in the database.
Used in applications such as market basket analysis
FP-Tree
Association rule learning is a rule-based machine
learning method for discovering interesting relations
between variables in large databases

Random
Forest
They are ensemble learning method for
classification, regression. Random Forests grows
many classification or decision trees at training
time. The output of the decision trees are the
class which is either mode, or mean of the
predictions of the individual trees. Each tree
gives a classification, and we say the tree "votes"
for that class. The Random forest chooses the
classification having the most votes.
Supported
Vector
Machine
(SVM)
SVMs are based on the idea of finding a
hyperplane that best divides a dataset into two
classes.
SVMs are more commonly used in classification
problems such as SVM is used for text
classification tasks such as category
assignment, detecting spam and sentiment
analysis.

Machine Learning (simplified)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Machine Learning (simplified)

Similar to Machine Learning (simplified) (20)

Recently uploaded

Recently uploaded (20)

Machine Learning (simplified)