4. Classification.pdf

• Following are the Machine Learning Classification models:
1. Logistic Regression
2. K-Nearest Neighbors (K-NN)
3. Support Vector Machine
4. Kernel SVM
5. Naïve Bayes
6. Decision Tree Classification
7. Random Forest Classification

Logistic Regression IMP Code
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, y_train)

1. Logistic Regression
# Visualising the Training set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Logistic Regression (Training set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

KNN IMP Code
from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)

3. Support Vector Machine (SVM)

SVM IMP Code & Confusion Matrix
from sklearn.svm import SVC
classifier = SVC(kernel = 'linear', random_state = 0)
array([[66, 2],
[ 8, 24]], dtype=int64)

What is the Kernel Trick?
The Kernel trick is a very interesting and powerful tool.
It is powerful because it provides a bridge from linearity to non-linearity to any
algorithm that can be expressed solely on terms of dot products between two
vectors. It comes from the fact that, if we first map our input data into a higher-
dimensional space, a linear algorithm operating in this space will behave non-
linearly in the original input space. And, we do not exactly need the exact data
points, but only their inner products to compute our decision boundary.
What it implies is that if we want to transform our existing data into a higher
dimensional data, which in many cases help us classify better, we need not
compute the exact transformation of our data, we just need the inner product of
our data in that higher dimensional space.

Types of Kernel Functions in 3D

IMP Code for Kernel SVM
from sklearn.svm import SVC
classifier = SVC(kernel = 'rbf', random_state = 0)
array([[64, 4],
[ 3, 29]], dtype=int64)
93 correctly classified and 7 incorrectly classified.

Projection from 3D to 2D using Kernel SVM

4. Naïve Bayes Classification

IMP code for Naïve Bayes
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
Confusion Matrix
array([[65, 3],
[ 7, 25]], dtype=int64)

5. Decision Tree Classification

IMP code for DTC
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
• Confusion Matrix
array([[62, 6],
[ 3, 29]], dtype=int64)
91 correctly classified and 9 incorrectly classified

6. Random Forest Classification : Based on Ensemble Learning

IMP Code for Random Forest Classification
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)
Confusion Matrix
array([[63, 5],
[ 4, 28]], dtype=int64)
91 correctly classified and 9 incorrectly classified

Classification Model Pros and Cons

Evaluating Classification
Models Performance
1. False Positives and False Negatives
2. Confusion Matrix
3. Accuracy Paradox
4. CAP Curve and its Analysis

1. False Positives & False Negatives

False Positives & False Negatives
There are two errors that often rear their head when you are learning about
hypothesis testing — false positives and false negatives, technically referred to
as type I error and type II error respectively.
A false positive (type I error) — when you reject a true null hypothesis
A false negative (type II error) — when you accept a false null hypothesis?
A False Positive Rate is an accuracy metric that can be measured on a
subset of machine learning models.

In binary prediction/classification terminology, there are four conditions
for any given outcome:
•True Positive: is the correct identification of anomalous data as such,
e.g., classifying as “abnormal” data which is in fact abnormal.
•True Negative: is the correct identification of data as not being
anomalous, i.e. classifying as “normal” data which is in fact normal.
•False Positive: is the incorrect identification of anomalous data as such,
i.e. classifying as “abnormal” data which is in fact normal.
•False Negative: is the incorrect identification of data as not being
anomalous, i.e. classifying as “normal” data which is in fact abnormal.

• A true positive is an outcome where the model correctly predicts the positive class. Similarly,
a true negative is an outcome where the model correctly predicts the negative class.
• A false positive is an outcome where the model incorrectly predicts the positive class. And a false
negative is an outcome where the model incorrectly predicts the negative class.

A false positive is an
outcome where the
model incorrectly
predicts
the positive class.
A false negative is
an outcome where
the model incorrectly
predicts
the negative class.

Cumulative Accuracy Profile (CAP)

4. Classification.pdf

Recommended

Recommended

More Related Content

Similar to 4. Classification.pdf

Similar to 4. Classification.pdf (20)

More from Jyoti Yadav

More from Jyoti Yadav (13)

Recently uploaded

Recently uploaded (20)

4. Classification.pdf