Linear
Regression
Linear approach for modelling the relationship between a
scalar dependent variable y and one or more explanatory
variables (or independent variables) x
Best fit line using Least Squares Regression
Decision
Tree
Used in classification problems with predefined target
variable. Decision tree is a tree in which each branch
node represents a choice between a number of
alternatives and each leaf node represents a decision.
Tree models where the target variable can take a
discrete set of values are called classification trees. In
these tree structures, leaves represent class labels and
branches represent conjunctions of features that lead
to those class labels.
K Nearest
Neighbours
Instance-based learning, or lazy learning,
function is only approximated locally and
all computation is deferred until
classification.
Used for classification & regression
Logistic
Regression
A classification model (class variable is
categorical)
It handles all types of relationships by
applying non-linear log transforms to the
predicted odds-ratio
Used for classification problems that are
binary such as pass/fail, fraud/genuine
Naïve Bayes
Naive Bayes classifiers are a family of simple
probabilistic classifiers based on applying Bayes'
theorem with strong (naive) independence
assumptions between the features.
It is used for Binary and Multiclass classification
problems
Principal
Component
Analysis
(PCA)
Statistical procedure that uses an orthogonal
transformation to convert a set of observations
of possibly correlated variables into a set of
values of linearly uncorrelated variables called
principal components.
PCA is mostly used as a tool in exploratory data
analysis and for making predictive models
K-Means
method of vector quantization, originally from
signal processing, that is popular for cluster
analysis in data mining. k-means clustering aims
to partition n observations into k clusters in
which each observation belongs to the cluster
with the nearest mean, serving as a prototype of
the cluster. This results in a partitioning of the
data space into Voronoi cells
Hierarchical
Clustering
hierarchical clustering is a method of cluster
analysis which seeks to build a hierarchy of
clusters. The merges and splits are determined
in a greedy manner. The results of hierarchical
clustering are usually presented in a
dendrogram.
Apriori
Algorithm
Used for frequent item set mining and association rule
learning over transactional databases.
It identifies the frequent individual items in the database
and extends them to larger and larger item sets as long as
those item sets appear sufficiently often in the database.
The frequent item sets determined by Apriori can be used
to determine association rules which highlight general
trends in the database.
Used in applications such as market basket analysis
FP-Tree
Association rule learning is a rule-based machine
learning method for discovering interesting relations
between variables in large databases
Random
Forest
They are ensemble learning method for
classification, regression. Random Forests grows
many classification or decision trees at training
time. The output of the decision trees are the
class which is either mode, or mean of the
predictions of the individual trees. Each tree
gives a classification, and we say the tree "votes"
for that class. The Random forest chooses the
classification having the most votes.
Supported
Vector
Machine
(SVM)
SVMs are based on the idea of finding a
hyperplane that best divides a dataset into two
classes.
SVMs are more commonly used in classification
problems such as SVM is used for text
classification tasks such as category
assignment, detecting spam and sentiment
analysis.

Machine Learning (simplified)

  • 1.
    Linear Regression Linear approach formodelling the relationship between a scalar dependent variable y and one or more explanatory variables (or independent variables) x Best fit line using Least Squares Regression Decision Tree Used in classification problems with predefined target variable. Decision tree is a tree in which each branch node represents a choice between a number of alternatives and each leaf node represents a decision. Tree models where the target variable can take a discrete set of values are called classification trees. In these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels.
  • 2.
    K Nearest Neighbours Instance-based learning,or lazy learning, function is only approximated locally and all computation is deferred until classification. Used for classification & regression Logistic Regression A classification model (class variable is categorical) It handles all types of relationships by applying non-linear log transforms to the predicted odds-ratio Used for classification problems that are binary such as pass/fail, fraud/genuine
  • 3.
    Naïve Bayes Naive Bayesclassifiers are a family of simple probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between the features. It is used for Binary and Multiclass classification problems Principal Component Analysis (PCA) Statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. PCA is mostly used as a tool in exploratory data analysis and for making predictive models
  • 4.
    K-Means method of vectorquantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells Hierarchical Clustering hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters. The merges and splits are determined in a greedy manner. The results of hierarchical clustering are usually presented in a dendrogram.
  • 5.
    Apriori Algorithm Used for frequentitem set mining and association rule learning over transactional databases. It identifies the frequent individual items in the database and extends them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent item sets determined by Apriori can be used to determine association rules which highlight general trends in the database. Used in applications such as market basket analysis FP-Tree Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases
  • 6.
    Random Forest They are ensemblelearning method for classification, regression. Random Forests grows many classification or decision trees at training time. The output of the decision trees are the class which is either mode, or mean of the predictions of the individual trees. Each tree gives a classification, and we say the tree "votes" for that class. The Random forest chooses the classification having the most votes. Supported Vector Machine (SVM) SVMs are based on the idea of finding a hyperplane that best divides a dataset into two classes. SVMs are more commonly used in classification problems such as SVM is used for text classification tasks such as category assignment, detecting spam and sentiment analysis.