SlideShare a Scribd company logo
1 of 24
Master the Art of Analytics
A Simplistic Explainer Series For Citizen Data Scientists
J o u r n e y To w a r d s A u g m e n t e d A n a l y t i c s
SVM CLASSIFICATION
Basic Terminologies
 Target variable usually denoted by Y , is the variable being predicted and is also called
dependent variable, output variable, response variable or outcome variable (Ex : One
highlighted in red box in table below)
 Predictor, sometimes called an independent variable, is a variable that is being used to
predict the target variable ( Ex : variables highlighted in green box in table below )
Age Marital Status Gender
Satisfaction
level
58 married Female High
44 single Female Low
33 married Male Medium
47 married Female High
33 single Female Medium
Basic Terminologies
 Hyperplane:
 It is a line(in 2D) and a plane(in 3D) that
linearly separates and classifies a set of data as
shown in image in right
 Support vectors :
 Support vectors are the data points nearest to
the hyperplane boundary and "support" the
separation of datasets into predefined classes
 Margin:
 The distance between the hyperplane and the
nearest data point from either set is known as
the margin
Introduction with
Example
Introduction
SVMs are based on the idea of finding a hyperplane that best divides a
dataset into predefined classes, as shown in the image below.
The goal is to choose a hyperplane with the
greatest possible margin between the
hyperplane and any point within the training
set, giving a greater chance of new data
being classified correctly
Example : Input
Let’s conduct the SVM classification on following variables :
Default Status Age Marital Status
Existing Loan
Status
Income
Defaulted 58 married no 46,399
Not Defaulted 44 single no 47,971
Defaulted 33 married yes 52,618
Defaulted 47 married no 28,717
Not Defaulted 33 single no 41,216
Defaulted 35 married no 34,372
Not Defaulted 28 single yes 64,811
Independent variables (Xi)Target Variable (Y)
Example : Output 1
Age
Marital
Status
Existing Loan
Status
Income Default Status Predicted class
58 married no 46,399 Defaulted Defaulted
44 single no 47,971 Not Defaulted Not Defaulted
33 married yes 52,618 Defaulted Defaulted
47 married no 28,717 Defaulted Defaulted
33 single no 41,216 Not Defaulted Not Defaulted
35 married no 34,372 Defaulted Not Defaulted
28 single yes 64,811 Not Defaulted Defaulted
 Thus each existing or new instance will be assigned a predicted class
Example : Output 2
Classification Accuracy : (35+ 70) / (35+70+4+4) = 92%
• The prediction accuracy is useful criterion for assessing the model performance
• Model with prediction accuracy >= 70% is useful
Classification Error = 100- Accuracy = 8%
There is 8% chance of error in classification
Defaulted Not defaulted
Defaulted 35 4
Not defaulted 4 70
Actual versus predicted
Predicted
Actual
Standard Input
Parameters & Sample
UI
Standard input
parameters &
sample UI
Age
Marital
Status
Existing Loan
Status
Income Default Status Predicted class
58 married no 46,399 Defaulted Defaulted
44 single no 47,971 Not Defaulted Not Defaulted
33 married yes 52,618 Defaulted Defaulted
47 married no 28,717 Defaulted Defaulted
33 single no 41,216 Not Defaulted Not Defaulted
35 married no 34,372 Defaulted Not Defaulted
28 single yes 64,811 Not Defaulted Defaulted
42 divorced no 53,000 Not Defaulted Not Defaulted
58 married no 41,375 Defaulted Defaulted
43 single no 53,778 Not Defaulted Defaulted
Sample output 1 : Predicted class
Sample output 2 : Model Summary
Default Non default
Default 35 4
Non default 4 70
ACTUAL VERSUS PREDICTED
Predicted
Actual
PROFILE OF CLASSES
Class
Average
loan
amount
Average
annual
income
Average Age
Non defaulter 10447.30 66467.74 40
Defaulter 7521.32 60935.28 26
Sample output 3 : Classification plot
• Lesser the overlap
between two classes in
the plot, better the
classification done by
model
Thus, output will contain predicted class column, confusion matrix , class profile and
classification plot
Limitations
• Processing time of SVM algorithm on large datasets can be high
• Less effective on datasets with overlapping classes
Business use cases
General applications
CREDIT/LOAN
APPROVAL ANALYSIS
•Given a list of client’s
transactional
attributes, predict
whether a client will
default or not on a
bank loan
MEDICAL DIAGNOSIS
•Given a list of
symptoms, predict if a
patient has disease X
or not
RAIN FORECASTING
•Based on
temperature,
humidity, pressure etc.
predict if it will be
raining or not
TREATMENT
EFFECTIVENESS
ANALYSIS
•Based on patient’s
body attributes such
as blood pressure,
sugar, hemoglobin,
name of a drug taken,
type of a treatment
taken etc., check the
likelihood of a disease
being cured
FRAUD ANALYSIS
•Based on various bills
submitted by an
employee for
reimbursement of
food , travel , medical
expense etc., predict
the likelihood of an
employee doing fraud
Use case 1
• Business problem :
• A bank loans officer wants to predict if the loan applicant will be a bank defaulter or
non defaulter based on attributes such as Loan amount , Monthly installment,
Employment tenure , Times delinquent, Annual income, Debt to income ratio etc.
• Here the target variable would be ‘past default status’ and predicted class would be
containing values ‘yes or no’ representing ‘likely to default/unlikely to default’ class
respectively
• Business benefit:
• Once classes are assigned, bank will have a loan applicants’ dataset with each
applicant labeled as “likely/unlikely to default”
• Based on this labels , bank can easily make a decision on whether to give loan to an
applicant or not and if yes then how much credit limit and interest rate each
applicant is eligible for based on the amount of risk involved
Use case 1 : Input Dataset
Customer ID
Loan
amount
Monthly
installment
Annual
income
Debt to
income
ratio
Times
delinquent
Employment
tenure
Past default
status
1039153 21000 701.73 105000 9 5 4 No
1069697 15000 483.38 92000 11 5 2 No
1068120 25600 824.96 110000 10 9 2 No
563175 23000 534.94 80000 9 2 12 No
562842 19750 483.65 57228 11 3 21 Yes
562681 25000 571.78 113000 10 0 9 No
562404 21250 471.2 31008 12 1 12 Yes
700159 14400 448.99 82000 20 6 6 No
696484 10000 241.33 45000 18 8 2 Yes
Use case 1 : Output : Predicted Class
Output : Each record will have the predicted class assigned as shown below (Column : Predicted class) :
Customer
ID
Loan
amount
Monthly
installment
Annual
income
Debt to
income
ratio
Times
delinquent
Employment
tenure
Past
default
status
Predicted
class
1039153 21000 701.73 105000 9 5 4 No No
1069697 15000 483.38 92000 11 5 2 No No
1068120 25600 824.96 110000 10 9 2 No No
563175 23000 534.94 80000 9 2 12 No No
562842 19750 483.65 57228 11 3 21 Yes No
562681 25000 571.78 113000 10 0 9 No No
562404 21250 471.2 31008 12 1 12 Yes Yes
700159 14400 448.99 82000 20 6 6 No No
696484 10000 241.33 45000 18 8 2 Yes Yes
Use case 1 : Output : Class profile
As can be seen in the table above, there are distinctive characteristics of defaulters (Class :
Yes ) and non defaulters ( Class : No )
Defaulters have tendency to be delinquent, higher debt to income ratio and lower
employment tenure as compared to non defaulters
Hence , delinquency , employment tenure and debt to income ratio are the determinant
factors when it comes to classifying loan applicants into likely defaulter/non defaulters
Class(Likely to
default)
Average
loan
amount
Average
monthly
installment
Average
annual
income
Average
debt to
income ratio
Average
times
delinquent
Average
employment
tenure
No 10447.3 304.87 66467.74 9.58 1.69 16.82
Yes 7521.32 227.43 60935.28 16.55 6.91 4.01
Use case 2
Business benefit:
•Given the body profile of a patient and
recent treatments and drugs taken by
him/her , probability of a cure can be
predicted and changes in treatment/drug
can be suggested if required
Business problem :
•A doctor/ pharmacist wants to predict
the likelihood of a new patient’s disease
being cured/not cured based on various
attributes of a patient such as blood
pressure , hemoglobin level, sugar level ,
name of a drug given to patient, name of
a treatment given to patient etc.
•Here the target variable would be ‘past
cure status’ and predicted class would
contain values ‘yes or no’ meaning ‘prone
to cure/ not prone to cure’ respectively
Use case 3
Business benefit:
•Such classification can prevent a
company from spending unreasonably
on any employee and can in turn save
the company budget by detecting such
fraud beforehand
Business problem :
•An accountant/human resource
manager wants to predict the
likelihood of an employee doing fraud
to a company based on various bills
submitted by him/her so far such as
food bill , travel bill , medical bill
•The target variable in this case would
be ‘past fraud status’ and predicted
class would contain values ‘yes or no’
representing likely fraud and no fraud
respectively
Want to Learn
More?
Get in touch with us @
support@Smarten.com
And Do Checkout the Learning section
on
Smarten.com
June 2018

More Related Content

What's hot

Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning Project
Abhishek Singh
 
Parametric tests seminar
Parametric tests seminarParametric tests seminar
Parametric tests seminar
drdeepika87
 
Fundamentals of data analysis
Fundamentals of data analysisFundamentals of data analysis
Fundamentals of data analysis
Shameem Ali
 
Ensembles of example dependent cost-sensitive decision trees slides
Ensembles of example dependent cost-sensitive decision trees slidesEnsembles of example dependent cost-sensitive decision trees slides
Ensembles of example dependent cost-sensitive decision trees slides
Alejandro Correa Bahnsen, PhD
 

What's hot (20)

Introduction to the t test
Introduction to the t testIntroduction to the t test
Introduction to the t test
 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning Project
 
Back to the basics-Part2: Data exploration: representing and testing data pro...
Back to the basics-Part2: Data exploration: representing and testing data pro...Back to the basics-Part2: Data exploration: representing and testing data pro...
Back to the basics-Part2: Data exploration: representing and testing data pro...
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Parametric tests seminar
Parametric tests seminarParametric tests seminar
Parametric tests seminar
 
PRIORITIZING THE BANKING SERVICE QUALITY OF DIFFERENT BRANCHES USING FACTOR A...
PRIORITIZING THE BANKING SERVICE QUALITY OF DIFFERENT BRANCHES USING FACTOR A...PRIORITIZING THE BANKING SERVICE QUALITY OF DIFFERENT BRANCHES USING FACTOR A...
PRIORITIZING THE BANKING SERVICE QUALITY OF DIFFERENT BRANCHES USING FACTOR A...
 
Cluster analysis in prespective to Marketing Research
Cluster analysis in prespective to Marketing ResearchCluster analysis in prespective to Marketing Research
Cluster analysis in prespective to Marketing Research
 
Data analysis
Data analysisData analysis
Data analysis
 
Chap012
Chap012Chap012
Chap012
 
factor analysis
factor analysisfactor analysis
factor analysis
 
Discriminant analysis using spss
Discriminant analysis using spssDiscriminant analysis using spss
Discriminant analysis using spss
 
Fundamentals of data analysis
Fundamentals of data analysisFundamentals of data analysis
Fundamentals of data analysis
 
Ensembles of example dependent cost-sensitive decision trees slides
Ensembles of example dependent cost-sensitive decision trees slidesEnsembles of example dependent cost-sensitive decision trees slides
Ensembles of example dependent cost-sensitive decision trees slides
 
Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...
Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...
Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...
 
Data analysis
Data analysisData analysis
Data analysis
 
Spss basic Dr Marwa Zalat
Spss basic Dr Marwa ZalatSpss basic Dr Marwa Zalat
Spss basic Dr Marwa Zalat
 
Les5e ppt 08
Les5e ppt 08Les5e ppt 08
Les5e ppt 08
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysis
 
Malhotra09
Malhotra09Malhotra09
Malhotra09
 
Measurement and scaling fundamentals and comparative scaling
Measurement and scaling fundamentals and comparative scalingMeasurement and scaling fundamentals and comparative scaling
Measurement and scaling fundamentals and comparative scaling
 

Similar to What is SVM Classification Analysis and How Can It Benefit Business Analytics?

CreditCardDefaultModel
CreditCardDefaultModelCreditCardDefaultModel
CreditCardDefaultModel
Andrew Rogala
 
Credit Card Marketing Classification Trees Fr.docx
 Credit Card Marketing Classification Trees Fr.docx Credit Card Marketing Classification Trees Fr.docx
Credit Card Marketing Classification Trees Fr.docx
ShiraPrater50
 
Download the presentation
Download the presentationDownload the presentation
Download the presentation
butest
 
Em score-medical-decision-making
Em score-medical-decision-makingEm score-medical-decision-making
Em score-medical-decision-making
SuperCoder LLC
 
Creditscore
CreditscoreCreditscore
Creditscore
kevinlan
 

Similar to What is SVM Classification Analysis and How Can It Benefit Business Analytics? (20)

Survival_Analysis
Survival_AnalysisSurvival_Analysis
Survival_Analysis
 
Credit Risk Evaluation Model
Credit Risk Evaluation ModelCredit Risk Evaluation Model
Credit Risk Evaluation Model
 
What is the Multinomial-Logistic Regression Classification Algorithm and How ...
What is the Multinomial-Logistic Regression Classification Algorithm and How ...What is the Multinomial-Logistic Regression Classification Algorithm and How ...
What is the Multinomial-Logistic Regression Classification Algorithm and How ...
 
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
 
CreditCardDefaultModel
CreditCardDefaultModelCreditCardDefaultModel
CreditCardDefaultModel
 
Credit Card Marketing Classification Trees Fr.docx
 Credit Card Marketing Classification Trees Fr.docx Credit Card Marketing Classification Trees Fr.docx
Credit Card Marketing Classification Trees Fr.docx
 
Introduction to Business Analytics Course Part 10
Introduction to Business Analytics Course Part 10Introduction to Business Analytics Course Part 10
Introduction to Business Analytics Course Part 10
 
Friedman-SPSS.docx
Friedman-SPSS.docxFriedman-SPSS.docx
Friedman-SPSS.docx
 
5 essential steps for sample size determination in clinical trials slideshare
5 essential steps for sample size determination in clinical trials   slideshare5 essential steps for sample size determination in clinical trials   slideshare
5 essential steps for sample size determination in clinical trials slideshare
 
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
 
Download the presentation
Download the presentationDownload the presentation
Download the presentation
 
Summer 07-mfin7011-tang1922
Summer 07-mfin7011-tang1922Summer 07-mfin7011-tang1922
Summer 07-mfin7011-tang1922
 
Telesidang 4 bab_8_9_10stst
Telesidang 4 bab_8_9_10ststTelesidang 4 bab_8_9_10stst
Telesidang 4 bab_8_9_10stst
 
Project Report for Mostan Superstore.pptx
Project Report for Mostan Superstore.pptxProject Report for Mostan Superstore.pptx
Project Report for Mostan Superstore.pptx
 
Data mining for diabetes readmission final
Data mining for diabetes readmission finalData mining for diabetes readmission final
Data mining for diabetes readmission final
 
Em score-medical-decision-making
Em score-medical-decision-makingEm score-medical-decision-making
Em score-medical-decision-making
 
Creditscore
CreditscoreCreditscore
Creditscore
 
Credit risk scoring model final
Credit risk scoring model finalCredit risk scoring model final
Credit risk scoring model final
 
BIOSTATISTICS
BIOSTATISTICSBIOSTATISTICS
BIOSTATISTICS
 
Quality Control for Quantitative Tests by Prof Aamir Ijaz (Pakistan)
Quality Control for Quantitative Tests by Prof Aamir Ijaz (Pakistan)Quality Control for Quantitative Tests by Prof Aamir Ijaz (Pakistan)
Quality Control for Quantitative Tests by Prof Aamir Ijaz (Pakistan)
 

More from Smarten Augmented Analytics

Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values  Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Smarten Augmented Analytics
 
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Smarten Augmented Analytics
 

More from Smarten Augmented Analytics (20)

Crime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenCrime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – Smarten
 
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
 
What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
 
Students' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenStudents' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – Smarten
 
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values  Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
 
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 
Fraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenFraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – Smarten
 
Quality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenQuality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - Smarten
 
Machine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenMachine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - Smarten
 
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenPredictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
 
Marketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenMarketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - Smarten
 
Human Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenHuman Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - Smarten
 
Customer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenCustomer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - Smarten
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
 
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
 

Recently uploaded

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 

Recently uploaded (20)

WSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - KanchanaWSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - Kanchana
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of TransformationWSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
 
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and ApplicationsWSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
WSO2Con2024 - Unleashing the Financial Potential of 13 Million People
WSO2Con2024 - Unleashing the Financial Potential of 13 Million PeopleWSO2Con2024 - Unleashing the Financial Potential of 13 Million People
WSO2Con2024 - Unleashing the Financial Potential of 13 Million People
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
WSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid EnvironmentsWSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid Environments
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 

What is SVM Classification Analysis and How Can It Benefit Business Analytics?

  • 1. Master the Art of Analytics A Simplistic Explainer Series For Citizen Data Scientists J o u r n e y To w a r d s A u g m e n t e d A n a l y t i c s
  • 3. Basic Terminologies  Target variable usually denoted by Y , is the variable being predicted and is also called dependent variable, output variable, response variable or outcome variable (Ex : One highlighted in red box in table below)  Predictor, sometimes called an independent variable, is a variable that is being used to predict the target variable ( Ex : variables highlighted in green box in table below ) Age Marital Status Gender Satisfaction level 58 married Female High 44 single Female Low 33 married Male Medium 47 married Female High 33 single Female Medium
  • 4. Basic Terminologies  Hyperplane:  It is a line(in 2D) and a plane(in 3D) that linearly separates and classifies a set of data as shown in image in right  Support vectors :  Support vectors are the data points nearest to the hyperplane boundary and "support" the separation of datasets into predefined classes  Margin:  The distance between the hyperplane and the nearest data point from either set is known as the margin
  • 6. Introduction SVMs are based on the idea of finding a hyperplane that best divides a dataset into predefined classes, as shown in the image below. The goal is to choose a hyperplane with the greatest possible margin between the hyperplane and any point within the training set, giving a greater chance of new data being classified correctly
  • 7. Example : Input Let’s conduct the SVM classification on following variables : Default Status Age Marital Status Existing Loan Status Income Defaulted 58 married no 46,399 Not Defaulted 44 single no 47,971 Defaulted 33 married yes 52,618 Defaulted 47 married no 28,717 Not Defaulted 33 single no 41,216 Defaulted 35 married no 34,372 Not Defaulted 28 single yes 64,811 Independent variables (Xi)Target Variable (Y)
  • 8. Example : Output 1 Age Marital Status Existing Loan Status Income Default Status Predicted class 58 married no 46,399 Defaulted Defaulted 44 single no 47,971 Not Defaulted Not Defaulted 33 married yes 52,618 Defaulted Defaulted 47 married no 28,717 Defaulted Defaulted 33 single no 41,216 Not Defaulted Not Defaulted 35 married no 34,372 Defaulted Not Defaulted 28 single yes 64,811 Not Defaulted Defaulted  Thus each existing or new instance will be assigned a predicted class
  • 9. Example : Output 2 Classification Accuracy : (35+ 70) / (35+70+4+4) = 92% • The prediction accuracy is useful criterion for assessing the model performance • Model with prediction accuracy >= 70% is useful Classification Error = 100- Accuracy = 8% There is 8% chance of error in classification Defaulted Not defaulted Defaulted 35 4 Not defaulted 4 70 Actual versus predicted Predicted Actual
  • 12. Age Marital Status Existing Loan Status Income Default Status Predicted class 58 married no 46,399 Defaulted Defaulted 44 single no 47,971 Not Defaulted Not Defaulted 33 married yes 52,618 Defaulted Defaulted 47 married no 28,717 Defaulted Defaulted 33 single no 41,216 Not Defaulted Not Defaulted 35 married no 34,372 Defaulted Not Defaulted 28 single yes 64,811 Not Defaulted Defaulted 42 divorced no 53,000 Not Defaulted Not Defaulted 58 married no 41,375 Defaulted Defaulted 43 single no 53,778 Not Defaulted Defaulted Sample output 1 : Predicted class
  • 13. Sample output 2 : Model Summary Default Non default Default 35 4 Non default 4 70 ACTUAL VERSUS PREDICTED Predicted Actual PROFILE OF CLASSES Class Average loan amount Average annual income Average Age Non defaulter 10447.30 66467.74 40 Defaulter 7521.32 60935.28 26
  • 14. Sample output 3 : Classification plot • Lesser the overlap between two classes in the plot, better the classification done by model Thus, output will contain predicted class column, confusion matrix , class profile and classification plot
  • 15. Limitations • Processing time of SVM algorithm on large datasets can be high • Less effective on datasets with overlapping classes
  • 17. General applications CREDIT/LOAN APPROVAL ANALYSIS •Given a list of client’s transactional attributes, predict whether a client will default or not on a bank loan MEDICAL DIAGNOSIS •Given a list of symptoms, predict if a patient has disease X or not RAIN FORECASTING •Based on temperature, humidity, pressure etc. predict if it will be raining or not TREATMENT EFFECTIVENESS ANALYSIS •Based on patient’s body attributes such as blood pressure, sugar, hemoglobin, name of a drug taken, type of a treatment taken etc., check the likelihood of a disease being cured FRAUD ANALYSIS •Based on various bills submitted by an employee for reimbursement of food , travel , medical expense etc., predict the likelihood of an employee doing fraud
  • 18. Use case 1 • Business problem : • A bank loans officer wants to predict if the loan applicant will be a bank defaulter or non defaulter based on attributes such as Loan amount , Monthly installment, Employment tenure , Times delinquent, Annual income, Debt to income ratio etc. • Here the target variable would be ‘past default status’ and predicted class would be containing values ‘yes or no’ representing ‘likely to default/unlikely to default’ class respectively • Business benefit: • Once classes are assigned, bank will have a loan applicants’ dataset with each applicant labeled as “likely/unlikely to default” • Based on this labels , bank can easily make a decision on whether to give loan to an applicant or not and if yes then how much credit limit and interest rate each applicant is eligible for based on the amount of risk involved
  • 19. Use case 1 : Input Dataset Customer ID Loan amount Monthly installment Annual income Debt to income ratio Times delinquent Employment tenure Past default status 1039153 21000 701.73 105000 9 5 4 No 1069697 15000 483.38 92000 11 5 2 No 1068120 25600 824.96 110000 10 9 2 No 563175 23000 534.94 80000 9 2 12 No 562842 19750 483.65 57228 11 3 21 Yes 562681 25000 571.78 113000 10 0 9 No 562404 21250 471.2 31008 12 1 12 Yes 700159 14400 448.99 82000 20 6 6 No 696484 10000 241.33 45000 18 8 2 Yes
  • 20. Use case 1 : Output : Predicted Class Output : Each record will have the predicted class assigned as shown below (Column : Predicted class) : Customer ID Loan amount Monthly installment Annual income Debt to income ratio Times delinquent Employment tenure Past default status Predicted class 1039153 21000 701.73 105000 9 5 4 No No 1069697 15000 483.38 92000 11 5 2 No No 1068120 25600 824.96 110000 10 9 2 No No 563175 23000 534.94 80000 9 2 12 No No 562842 19750 483.65 57228 11 3 21 Yes No 562681 25000 571.78 113000 10 0 9 No No 562404 21250 471.2 31008 12 1 12 Yes Yes 700159 14400 448.99 82000 20 6 6 No No 696484 10000 241.33 45000 18 8 2 Yes Yes
  • 21. Use case 1 : Output : Class profile As can be seen in the table above, there are distinctive characteristics of defaulters (Class : Yes ) and non defaulters ( Class : No ) Defaulters have tendency to be delinquent, higher debt to income ratio and lower employment tenure as compared to non defaulters Hence , delinquency , employment tenure and debt to income ratio are the determinant factors when it comes to classifying loan applicants into likely defaulter/non defaulters Class(Likely to default) Average loan amount Average monthly installment Average annual income Average debt to income ratio Average times delinquent Average employment tenure No 10447.3 304.87 66467.74 9.58 1.69 16.82 Yes 7521.32 227.43 60935.28 16.55 6.91 4.01
  • 22. Use case 2 Business benefit: •Given the body profile of a patient and recent treatments and drugs taken by him/her , probability of a cure can be predicted and changes in treatment/drug can be suggested if required Business problem : •A doctor/ pharmacist wants to predict the likelihood of a new patient’s disease being cured/not cured based on various attributes of a patient such as blood pressure , hemoglobin level, sugar level , name of a drug given to patient, name of a treatment given to patient etc. •Here the target variable would be ‘past cure status’ and predicted class would contain values ‘yes or no’ meaning ‘prone to cure/ not prone to cure’ respectively
  • 23. Use case 3 Business benefit: •Such classification can prevent a company from spending unreasonably on any employee and can in turn save the company budget by detecting such fraud beforehand Business problem : •An accountant/human resource manager wants to predict the likelihood of an employee doing fraud to a company based on various bills submitted by him/her so far such as food bill , travel bill , medical bill •The target variable in this case would be ‘past fraud status’ and predicted class would contain values ‘yes or no’ representing likely fraud and no fraud respectively
  • 24. Want to Learn More? Get in touch with us @ support@Smarten.com And Do Checkout the Learning section on Smarten.com June 2018