SlideShare a Scribd company logo
1 of 22
Master the Art of Analytics
A Simplistic Explainer Series For Citizen Data Scientists
J o u r n e y To w a r d s A u g m e n t e d A n a l y t i c s
KNN CLASSIFICATION
Basic Terminologies
 Target variable usually denoted by Y , is the variable being predicted and
is also called dependent variable, output variable, response variable or
outcome variable (Ex : One highlighted in red box in table below)
 Predictor, sometimes called an independent variable, is a variable that is
being used to predict the target variable ( Ex : variables highlighted in
green box in table below )
Age Marital Status Gender
Satisfaction
level
58 married Female High
44 single Female Low
33 married Male Medium
47 married Female High
33 single Female Medium
Introduction
• An instance (data record or case) is assigned a class, which is most common among its k nearest
neighbors
• Here, k is a positive integer, typically an odd number and ranging between 1 to 10
For instance, for k = 3, the majority class of 3 nearest neighbors
of center point shown as star in image is Class B (two out of
three circles are purple , i.e. class B) whereas for k=6, majority is
Class A (four out of 6 circles are yellow, i.e. class A)
Note : K is automatically identified by an algorithm based on the
number which gives highest classification accuracy
Steps :
• Determine K = number of nearest neighbors( in terms of distance) to
check for class assignment
• Calculate the distance between an instance and all the training
instances
• Rank the instances by distance and find out k nearest neighbors in
terms of shortest distance from new instance
• Gather the classes of nearest neighbors to find out the majority
• Use this majority of class as a final predicted value of a class
Example : Input
• Based on two attributes : Acid durability and strength, we want to classify a paper tissue into good/bad quality classes :
Acid durability ( in
Seconds)
Strength
(Kg/Square meter)
Paper tissue
Quality
7 7 Good
7 4 Good
3 4 Bad
1 4 Good
Target Variable (Y)Independent variables/predictors
Example : Steps :
Calculate Distance and Ranking to find K nearest neighbors
• For instance :
• If a paper tissue’s acid durability = 3 and strength = 7 then take following steps :
• Step 1 : Decide value of k ; Say it is 3 (based on classification accuracy )
• Step 2,3 : Calculate the distance between each input instance and new instance and rank each input
instance by distance to find out the k nearest neighborsAcid durability ( In
Seconds)
Strength
(Kg/Square
meter)
Paper
tissue
Quality
Distance to instance
Rank by distance to find
nearest neighbor
7 7 Good (7 -3)2 + (7-7)2 =16 3
7 4 Good (7 -3)2 + (4-7)2 =25 4
3 4 Bad (3 -3)2 + (4-7)2 =9 1
1 4 Good (1-3)2 + (4-7)2 =13 2
Input dataset Derived results to find out the majority class of k nearest neighbors
Step 4,5 : As the majority class =
Good for the three nearest
neighbors ( two out of three
records have class = Good) ,
predicted class of an instance =
Good, i.e. quality of a paper
tissue having acid durability =3
and strength =7 is good
Final output :
Acid durability ( In
Seconds)
Strength
(Kg/Square
meter)
Paper
tissue
Quality
7 7 Good
7 4 Good
3 4 Bad
1 4 Good
3 7 Good
Example : Steps
Select majority class of k nearest neighbors as predicted
class
Example : Steps
Find out Accuracy
CLASSIFICATION ACCURACY : (35+ 70) / (35+70+4+4) = 92%
• The prediction accuracy is useful criterion for assessing the model performance
• Model with prediction accuracy >= 70% is useful
CLASSIFICATION ERROR = 100- Accuracy = 8%
There is 8% chance of error in classification
Good Bad
Good 35 4
Bad 4 70
Predicted
Actual
Standard input parameters & sample UI
Standard output 1 : Model Summary
Good Bad
Good 35 4
Bad 4 70
ACTUAL VERSUS PREDICTED
Predicted
Actual
PROFILE OF CLASSES
• Good quality class has average acid durability = 6 and Strength = 7
• Bad quality class has average acid durability = 3 and Strength = 4
Acid durability Strength
Predicted
class
Probability
7 7 Good 0.6
7 4 Bad 0.4
3 4 Bad 0.5
1 4 Good 0.6
5 6 Good 0.7
4 5 Bad 0.3
7 3 Bad 0.1
Standard output 2 : Predicted class
Sample output 3 : Classification plot
• Lesser the overlap
between two classes in
the plot, better the
classification done by
model
Thus, output will contain predicted class and probability columns, confusion matrix
and classification plot
Limitations :
• Data needs to be scaled [(x-min(x)/max(x)-min(x)] before inputting in
the algorithm, else it can lead to high % of misclassification and in
turn low accuracy
• Not suitable for classifying categorical variables
• Individual variable importance can not be measured (which
variable(s) is most important or has high contribution in the
classification model)
• For instance, Age/income might be impactful variables or say, determinant factors when
classifying the applicants into likely defaulters/non defaulters
General applications
Credit/loan
approval analysis
• Given a list of client’s
transactional
attributes, predict
whether a client will
default or not on a
bank loan
Weather
Prediction
• Based on temperature,
humidity, pressure etc.
predict if it will be
rainy/sunny/cold
weather
Rain forecasting
• Based on temperature,
humidity, pressure etc.
predict if it will be
raining or not
Fraud analysis
• Based on various bills
submitted by an
employee for
reimbursement of
food , travel , medical
expense etc., predict
the likelihood of an
employee doing fraud
Use case 1
Business benefit:
•Once classes are assigned, bank will
have a loan applicants’ dataset with
each applicant labeled as
“likely/unlikely to default”
•Based on this labels , bank can easily
make a decision on whether to give
loan to an applicant or not and if yes
then how much credit limit and
interest rate each applicant is eligible
for based on the amount of risk
involved
Business problem :
•A bank loans officer wants to predict if
the loan applicant will be a bank
defaulter or non defaulter based on
attributes such as Loan amount ,
Monthly installment, Employment
tenure , Times delinquent, Annual
income, Debt to income ratio etc.
•Here the target variable would be ‘past
default status’ and predicted class
would be containing values ‘yes or no’
representing ‘likely to default/unlikely
to default’ class respectively
Use case 1 : Input Dataset
Customer
ID
Loan
amount
Monthly
installment
Annual
income
Debt to
income
ratio
Times
delinquent
Employment
tenure
Past default
status
1039153 21000 701.73 105000 9 5 4 No
1069697 15000 483.38 92000 11 5 2 No
1068120 25600 824.96 110000 10 9 2 No
563175 23000 534.94 80000 9 2 12 No
562842 19750 483.65 57228 11 3 21 Yes
562681 25000 571.78 113000 10 0 9 No
562404 21250 471.2 31008 12 1 12 Yes
700159 14400 448.99 82000 20 6 6 No
696484 10000 241.33 45000 18 8 2 Yes
Use case 1 : Output : Predicted Class
Output : Each record will have the predicted class assigned as shown below (Column :
Predicted class) :
Customer
ID
Loan
amount
Monthly
installment
Annual
income
Debt to
income
ratio
Times
delinquent
Employment
tenure
Past
default
status
Predicted
class
1039153 21000 701.73 105000 9 5 4 No No
1069697 15000 483.38 92000 11 5 2 No No
1068120 25600 824.96 110000 10 9 2 No No
563175 23000 534.94 80000 9 2 12 No No
562842 19750 483.65 57228 11 3 21 Yes No
562681 25000 571.78 113000 10 0 9 No No
562404 21250 471.2 31008 12 1 12 Yes Yes
700159 14400 448.99 82000 20 6 6 No No
696484 10000 241.33 45000 18 8 2 Yes Yes
Use case 1 : Output : Class profile
 As can be seen in the table above, there are distinctive characteristics of
defaulters (Class : Yes ) and non defaulters ( Class : No )
 Defaulters have tendency to be delinquent, higher debt to income ratio and lower
employment tenure as compared to non defaulters
 Hence , delinquency , employment tenure and debt to income ratio are the
determinant factors when it comes to classifying loan applicants into likely
defaulter/non defaulters
Class(Likely to
default)
Average
loan
amount
Average
monthly
installment
Average
annual
income
Average debt
to income
ratio
Average
times
delinquent
Average
employment
tenure
No 10447.30 304.87 66467.74 9.58 1.69 16.82
Yes 7521.32 227.43 60935.28 16.55 6.91 4.01
Use case 2
Business benefit:
•Given the body profile of a patient and
recent treatments and drugs taken by
him/her , probability of a cure can be
predicted and changes in treatment/drug
can be suggested if required
Business problem :
•A doctor/ pharmacist wants to predict
the likelihood of a new patient’s disease
being cured/not cured based on various
attributes of a patient such as blood
pressure , hemoglobin level, sugar level ,
name of a drug given to patient, name of
a treatment given to patient etc.
•Here the target variable would be ‘past
cure status’ and predicted class would
contain values ‘yes or no’ meaning ‘prone
to cure/ not prone to cure’ respectively
Use case 3
Business benefit:
•Such classification can prevent a
company from spending unreasonably
on any employee and can in turn save
the company budget by detecting such
fraud beforehand
Business problem :
•An accountant/human resource
manager wants to predict the
likelihood of an employee doing fraud
to a company based on various bills
submitted by him/her so far such as
food bill , travel bill , medical bill
•The target variable in this case would
be ‘past fraud status’ and predicted
class would contain values ‘yes or no’
representing likely fraud and no fraud
respectively
Want to Learn
More?
Get in touch with us @
support@Smarten.com
And Do Checkout the Learning section
on
Smarten.com
June 2018

More Related Content

What's hot

TCP/IP 3-way Handshake
TCP/IP 3-way Handshake TCP/IP 3-way Handshake
TCP/IP 3-way Handshake Alok Tripathi
 
Synchronous and Asynchronous Transmission
Synchronous and Asynchronous TransmissionSynchronous and Asynchronous Transmission
Synchronous and Asynchronous TransmissionAdeel Rasheed
 
Analogue and digital
Analogue and digitalAnalogue and digital
Analogue and digitalrohimab
 
python ppt | Python Course In Ghaziabad | Scode Network Institute
python ppt | Python Course In Ghaziabad | Scode Network Institutepython ppt | Python Course In Ghaziabad | Scode Network Institute
python ppt | Python Course In Ghaziabad | Scode Network InstituteScode Network Institute
 
18CSL58 DBMS LAB Manual.pdf
18CSL58 DBMS LAB Manual.pdf18CSL58 DBMS LAB Manual.pdf
18CSL58 DBMS LAB Manual.pdfSyed Mustafa
 
Os lab file c programs
Os lab file c programsOs lab file c programs
Os lab file c programsKandarp Tiwari
 
Data types in C language
Data types in C languageData types in C language
Data types in C languagekashyap399
 
network hardware
network hardwarenetwork hardware
network hardwaretumetr1
 
Multiprocessor Systems
Multiprocessor SystemsMultiprocessor Systems
Multiprocessor Systemsvampugani
 
Digital signatures
 Digital signatures Digital signatures
Digital signaturesSTUDENT
 
Types of Networks,Network Design Issues,Design Tools
Types of Networks,Network Design Issues,Design ToolsTypes of Networks,Network Design Issues,Design Tools
Types of Networks,Network Design Issues,Design ToolsSurabhi Gosavi
 

What's hot (20)

TCP/IP 3-way Handshake
TCP/IP 3-way Handshake TCP/IP 3-way Handshake
TCP/IP 3-way Handshake
 
Booting & shut down,
Booting & shut down,Booting & shut down,
Booting & shut down,
 
Synchronous and Asynchronous Transmission
Synchronous and Asynchronous TransmissionSynchronous and Asynchronous Transmission
Synchronous and Asynchronous Transmission
 
Analogue and digital
Analogue and digitalAnalogue and digital
Analogue and digital
 
Cooja simple programs.ppt
Cooja simple programs.pptCooja simple programs.ppt
Cooja simple programs.ppt
 
python ppt | Python Course In Ghaziabad | Scode Network Institute
python ppt | Python Course In Ghaziabad | Scode Network Institutepython ppt | Python Course In Ghaziabad | Scode Network Institute
python ppt | Python Course In Ghaziabad | Scode Network Institute
 
Pointers in C
Pointers in CPointers in C
Pointers in C
 
C Programming Unit-1
C Programming Unit-1C Programming Unit-1
C Programming Unit-1
 
18CSL58 DBMS LAB Manual.pdf
18CSL58 DBMS LAB Manual.pdf18CSL58 DBMS LAB Manual.pdf
18CSL58 DBMS LAB Manual.pdf
 
Break and continue
Break and continueBreak and continue
Break and continue
 
Os lab file c programs
Os lab file c programsOs lab file c programs
Os lab file c programs
 
Ch5(loops)
Ch5(loops)Ch5(loops)
Ch5(loops)
 
Data types in C language
Data types in C languageData types in C language
Data types in C language
 
network hardware
network hardwarenetwork hardware
network hardware
 
Multiprocessor Systems
Multiprocessor SystemsMultiprocessor Systems
Multiprocessor Systems
 
C++ theory
C++ theoryC++ theory
C++ theory
 
Digital signatures
 Digital signatures Digital signatures
Digital signatures
 
Types of Networks,Network Design Issues,Design Tools
Types of Networks,Network Design Issues,Design ToolsTypes of Networks,Network Design Issues,Design Tools
Types of Networks,Network Design Issues,Design Tools
 
Programming in c Arrays
Programming in c ArraysProgramming in c Arrays
Programming in c Arrays
 
Theory of computing
Theory of computingTheory of computing
Theory of computing
 

Similar to What is KNN Classification and How Can This Analysis Help an Enterprise?

What is SVM Classification Analysis and How Can It Benefit Business Analytics?
What is SVM Classification Analysis and How Can It Benefit Business Analytics?What is SVM Classification Analysis and How Can It Benefit Business Analytics?
What is SVM Classification Analysis and How Can It Benefit Business Analytics?Smarten Augmented Analytics
 
What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...Smarten Augmented Analytics
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?Smarten Augmented Analytics
 
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...Smarten Augmented Analytics
 
Download the presentation
Download the presentationDownload the presentation
Download the presentationbutest
 
Decision theory
Decision theoryDecision theory
Decision theorySurekha98
 
Average performance prediction of elementary school using multiple regression
Average performance prediction of elementary school using multiple regressionAverage performance prediction of elementary school using multiple regression
Average performance prediction of elementary school using multiple regressionAnurag Shandilya
 
Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...
Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...
Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...Aleksi Aaltonen
 
What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?Smarten Augmented Analytics
 
Project Report for Mostan Superstore.pptx
Project Report for Mostan Superstore.pptxProject Report for Mostan Superstore.pptx
Project Report for Mostan Superstore.pptxChristianahEfunniyi
 
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...Smarten Augmented Analytics
 
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?Smarten Augmented Analytics
 
Supervised learning
Supervised learningSupervised learning
Supervised learningJohnson Ubah
 
07 Mesurement and Scaling.pptx
07 Mesurement and Scaling.pptx07 Mesurement and Scaling.pptx
07 Mesurement and Scaling.pptxMesfinMelese4
 
Supervised Learning-Unit 3.pptx
Supervised Learning-Unit 3.pptxSupervised Learning-Unit 3.pptx
Supervised Learning-Unit 3.pptxnehashanbhag5
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network ModelEric Esajian
 
Survey of Finance and Engineering Economics Presented byMoha.docx
Survey of Finance and Engineering Economics Presented byMoha.docxSurvey of Finance and Engineering Economics Presented byMoha.docx
Survey of Finance and Engineering Economics Presented byMoha.docxmattinsonjanel
 
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )Neeraj Bhandari
 

Similar to What is KNN Classification and How Can This Analysis Help an Enterprise? (20)

What is SVM Classification Analysis and How Can It Benefit Business Analytics?
What is SVM Classification Analysis and How Can It Benefit Business Analytics?What is SVM Classification Analysis and How Can It Benefit Business Analytics?
What is SVM Classification Analysis and How Can It Benefit Business Analytics?
 
What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
 
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
 
Survival_Analysis
Survival_AnalysisSurvival_Analysis
Survival_Analysis
 
Credit scoring i financial sector
Credit scoring i financial  sector Credit scoring i financial  sector
Credit scoring i financial sector
 
Download the presentation
Download the presentationDownload the presentation
Download the presentation
 
Decision theory
Decision theoryDecision theory
Decision theory
 
Average performance prediction of elementary school using multiple regression
Average performance prediction of elementary school using multiple regressionAverage performance prediction of elementary school using multiple regression
Average performance prediction of elementary school using multiple regression
 
Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...
Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...
Not Good Enough but Try Again! Mitigating the Impact of Rejections on New Con...
 
What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?
 
Project Report for Mostan Superstore.pptx
Project Report for Mostan Superstore.pptxProject Report for Mostan Superstore.pptx
Project Report for Mostan Superstore.pptx
 
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
 
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
 
07 Mesurement and Scaling.pptx
07 Mesurement and Scaling.pptx07 Mesurement and Scaling.pptx
07 Mesurement and Scaling.pptx
 
Supervised Learning-Unit 3.pptx
Supervised Learning-Unit 3.pptxSupervised Learning-Unit 3.pptx
Supervised Learning-Unit 3.pptx
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network Model
 
Survey of Finance and Engineering Economics Presented byMoha.docx
Survey of Finance and Engineering Economics Presented byMoha.docxSurvey of Finance and Engineering Economics Presented byMoha.docx
Survey of Finance and Engineering Economics Presented byMoha.docx
 
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
 

More from Smarten Augmented Analytics

Crime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenCrime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenSmarten Augmented Analytics
 
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...Smarten Augmented Analytics
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?Smarten Augmented Analytics
 
Students' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenStudents' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenSmarten Augmented Analytics
 
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values  Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values Smarten Augmented Analytics
 
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Smarten Augmented Analytics
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...Smarten Augmented Analytics
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...Smarten Augmented Analytics
 
Fraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenFraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenSmarten Augmented Analytics
 
Quality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenQuality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenSmarten Augmented Analytics
 
Machine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenMachine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenSmarten Augmented Analytics
 
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenPredictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenSmarten Augmented Analytics
 
Marketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenMarketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenSmarten Augmented Analytics
 
Human Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenHuman Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenSmarten Augmented Analytics
 
Customer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenCustomer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenSmarten Augmented Analytics
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...Smarten Augmented Analytics
 
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...Smarten Augmented Analytics
 
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...Smarten Augmented Analytics
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...Smarten Augmented Analytics
 
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?Smarten Augmented Analytics
 

More from Smarten Augmented Analytics (20)

Crime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenCrime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – Smarten
 
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
 
Students' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenStudents' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – Smarten
 
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values  Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
 
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 
Fraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenFraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – Smarten
 
Quality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenQuality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - Smarten
 
Machine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenMachine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - Smarten
 
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenPredictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
 
Marketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenMarketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - Smarten
 
Human Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenHuman Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - Smarten
 
Customer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenCustomer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - Smarten
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
 
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
 
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?
 

Recently uploaded

Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 

Recently uploaded (20)

Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 

What is KNN Classification and How Can This Analysis Help an Enterprise?

  • 1. Master the Art of Analytics A Simplistic Explainer Series For Citizen Data Scientists J o u r n e y To w a r d s A u g m e n t e d A n a l y t i c s
  • 3. Basic Terminologies  Target variable usually denoted by Y , is the variable being predicted and is also called dependent variable, output variable, response variable or outcome variable (Ex : One highlighted in red box in table below)  Predictor, sometimes called an independent variable, is a variable that is being used to predict the target variable ( Ex : variables highlighted in green box in table below ) Age Marital Status Gender Satisfaction level 58 married Female High 44 single Female Low 33 married Male Medium 47 married Female High 33 single Female Medium
  • 4. Introduction • An instance (data record or case) is assigned a class, which is most common among its k nearest neighbors • Here, k is a positive integer, typically an odd number and ranging between 1 to 10 For instance, for k = 3, the majority class of 3 nearest neighbors of center point shown as star in image is Class B (two out of three circles are purple , i.e. class B) whereas for k=6, majority is Class A (four out of 6 circles are yellow, i.e. class A) Note : K is automatically identified by an algorithm based on the number which gives highest classification accuracy
  • 5. Steps : • Determine K = number of nearest neighbors( in terms of distance) to check for class assignment • Calculate the distance between an instance and all the training instances • Rank the instances by distance and find out k nearest neighbors in terms of shortest distance from new instance • Gather the classes of nearest neighbors to find out the majority • Use this majority of class as a final predicted value of a class
  • 6. Example : Input • Based on two attributes : Acid durability and strength, we want to classify a paper tissue into good/bad quality classes : Acid durability ( in Seconds) Strength (Kg/Square meter) Paper tissue Quality 7 7 Good 7 4 Good 3 4 Bad 1 4 Good Target Variable (Y)Independent variables/predictors
  • 7. Example : Steps : Calculate Distance and Ranking to find K nearest neighbors • For instance : • If a paper tissue’s acid durability = 3 and strength = 7 then take following steps : • Step 1 : Decide value of k ; Say it is 3 (based on classification accuracy ) • Step 2,3 : Calculate the distance between each input instance and new instance and rank each input instance by distance to find out the k nearest neighborsAcid durability ( In Seconds) Strength (Kg/Square meter) Paper tissue Quality Distance to instance Rank by distance to find nearest neighbor 7 7 Good (7 -3)2 + (7-7)2 =16 3 7 4 Good (7 -3)2 + (4-7)2 =25 4 3 4 Bad (3 -3)2 + (4-7)2 =9 1 1 4 Good (1-3)2 + (4-7)2 =13 2 Input dataset Derived results to find out the majority class of k nearest neighbors
  • 8. Step 4,5 : As the majority class = Good for the three nearest neighbors ( two out of three records have class = Good) , predicted class of an instance = Good, i.e. quality of a paper tissue having acid durability =3 and strength =7 is good Final output : Acid durability ( In Seconds) Strength (Kg/Square meter) Paper tissue Quality 7 7 Good 7 4 Good 3 4 Bad 1 4 Good 3 7 Good Example : Steps Select majority class of k nearest neighbors as predicted class
  • 9. Example : Steps Find out Accuracy CLASSIFICATION ACCURACY : (35+ 70) / (35+70+4+4) = 92% • The prediction accuracy is useful criterion for assessing the model performance • Model with prediction accuracy >= 70% is useful CLASSIFICATION ERROR = 100- Accuracy = 8% There is 8% chance of error in classification Good Bad Good 35 4 Bad 4 70 Predicted Actual
  • 11. Standard output 1 : Model Summary Good Bad Good 35 4 Bad 4 70 ACTUAL VERSUS PREDICTED Predicted Actual PROFILE OF CLASSES • Good quality class has average acid durability = 6 and Strength = 7 • Bad quality class has average acid durability = 3 and Strength = 4
  • 12. Acid durability Strength Predicted class Probability 7 7 Good 0.6 7 4 Bad 0.4 3 4 Bad 0.5 1 4 Good 0.6 5 6 Good 0.7 4 5 Bad 0.3 7 3 Bad 0.1 Standard output 2 : Predicted class
  • 13. Sample output 3 : Classification plot • Lesser the overlap between two classes in the plot, better the classification done by model Thus, output will contain predicted class and probability columns, confusion matrix and classification plot
  • 14. Limitations : • Data needs to be scaled [(x-min(x)/max(x)-min(x)] before inputting in the algorithm, else it can lead to high % of misclassification and in turn low accuracy • Not suitable for classifying categorical variables • Individual variable importance can not be measured (which variable(s) is most important or has high contribution in the classification model) • For instance, Age/income might be impactful variables or say, determinant factors when classifying the applicants into likely defaulters/non defaulters
  • 15. General applications Credit/loan approval analysis • Given a list of client’s transactional attributes, predict whether a client will default or not on a bank loan Weather Prediction • Based on temperature, humidity, pressure etc. predict if it will be rainy/sunny/cold weather Rain forecasting • Based on temperature, humidity, pressure etc. predict if it will be raining or not Fraud analysis • Based on various bills submitted by an employee for reimbursement of food , travel , medical expense etc., predict the likelihood of an employee doing fraud
  • 16. Use case 1 Business benefit: •Once classes are assigned, bank will have a loan applicants’ dataset with each applicant labeled as “likely/unlikely to default” •Based on this labels , bank can easily make a decision on whether to give loan to an applicant or not and if yes then how much credit limit and interest rate each applicant is eligible for based on the amount of risk involved Business problem : •A bank loans officer wants to predict if the loan applicant will be a bank defaulter or non defaulter based on attributes such as Loan amount , Monthly installment, Employment tenure , Times delinquent, Annual income, Debt to income ratio etc. •Here the target variable would be ‘past default status’ and predicted class would be containing values ‘yes or no’ representing ‘likely to default/unlikely to default’ class respectively
  • 17. Use case 1 : Input Dataset Customer ID Loan amount Monthly installment Annual income Debt to income ratio Times delinquent Employment tenure Past default status 1039153 21000 701.73 105000 9 5 4 No 1069697 15000 483.38 92000 11 5 2 No 1068120 25600 824.96 110000 10 9 2 No 563175 23000 534.94 80000 9 2 12 No 562842 19750 483.65 57228 11 3 21 Yes 562681 25000 571.78 113000 10 0 9 No 562404 21250 471.2 31008 12 1 12 Yes 700159 14400 448.99 82000 20 6 6 No 696484 10000 241.33 45000 18 8 2 Yes
  • 18. Use case 1 : Output : Predicted Class Output : Each record will have the predicted class assigned as shown below (Column : Predicted class) : Customer ID Loan amount Monthly installment Annual income Debt to income ratio Times delinquent Employment tenure Past default status Predicted class 1039153 21000 701.73 105000 9 5 4 No No 1069697 15000 483.38 92000 11 5 2 No No 1068120 25600 824.96 110000 10 9 2 No No 563175 23000 534.94 80000 9 2 12 No No 562842 19750 483.65 57228 11 3 21 Yes No 562681 25000 571.78 113000 10 0 9 No No 562404 21250 471.2 31008 12 1 12 Yes Yes 700159 14400 448.99 82000 20 6 6 No No 696484 10000 241.33 45000 18 8 2 Yes Yes
  • 19. Use case 1 : Output : Class profile  As can be seen in the table above, there are distinctive characteristics of defaulters (Class : Yes ) and non defaulters ( Class : No )  Defaulters have tendency to be delinquent, higher debt to income ratio and lower employment tenure as compared to non defaulters  Hence , delinquency , employment tenure and debt to income ratio are the determinant factors when it comes to classifying loan applicants into likely defaulter/non defaulters Class(Likely to default) Average loan amount Average monthly installment Average annual income Average debt to income ratio Average times delinquent Average employment tenure No 10447.30 304.87 66467.74 9.58 1.69 16.82 Yes 7521.32 227.43 60935.28 16.55 6.91 4.01
  • 20. Use case 2 Business benefit: •Given the body profile of a patient and recent treatments and drugs taken by him/her , probability of a cure can be predicted and changes in treatment/drug can be suggested if required Business problem : •A doctor/ pharmacist wants to predict the likelihood of a new patient’s disease being cured/not cured based on various attributes of a patient such as blood pressure , hemoglobin level, sugar level , name of a drug given to patient, name of a treatment given to patient etc. •Here the target variable would be ‘past cure status’ and predicted class would contain values ‘yes or no’ meaning ‘prone to cure/ not prone to cure’ respectively
  • 21. Use case 3 Business benefit: •Such classification can prevent a company from spending unreasonably on any employee and can in turn save the company budget by detecting such fraud beforehand Business problem : •An accountant/human resource manager wants to predict the likelihood of an employee doing fraud to a company based on various bills submitted by him/her so far such as food bill , travel bill , medical bill •The target variable in this case would be ‘past fraud status’ and predicted class would contain values ‘yes or no’ representing likely fraud and no fraud respectively
  • 22. Want to Learn More? Get in touch with us @ support@Smarten.com And Do Checkout the Learning section on Smarten.com June 2018