Threshold setting for reduction of false positives

•Download as PPTX, PDF•

1 like•187 views

Reshma Sekar

Machine Learning - A short tutorial on improving the performance of a classification model

Data & Analytics

Threshold setting for high prediction
rate with low false positives
Improving the functionality of Supervised
Classification

Why do we need a low false positive rate?
Let us take the example of a cancer prediction problem. If our model
would predict that one of our patients is going to have cancer, when he
actually isn’t going to, we are going to render a mental trauma for him
and his family. In other words, while we want to accurately predict the
number of people who are going to have cancer, we do not want to
falsely predict if someone is going to have cancer when they actually
aren’t going to.
Hence, when we build a classification model, we need to ensure that it is
tested correctly and that the false positive rate is as low as possible
without compromising the classifying accuracy of the model.

Testing the Classification Model
Testing requires two parameters to be observed:
• Sensitivity: Proportion of true positives predicted
Total number of positives
• Specificity: Proportion of true negatives predicted
Total number of negatives
Sensitivity can be intuitively thought of as the predictive(classifying) accuracy of the model on the positive class
(Eg; how correctly are we predicting the number of cancer patients)
Specificity can be intuitively thought of as the predictive(classifying)accuracy of the model on the negative
class (Eg; how correctly are we predicting the number of patients who do not have cancer)

For example
There is a sample of 2000 patients out of which 20 have ovarian cancer.
The classification model built by a healthcare company predicts 22
patients have ovarian cancer out of which 15 people have ovarian
cancer.
What is the sensitivity and specificity?
Sensitivity = 15/20 = 0.75
Specificity = 1973/1980 = 0.99

ROC Curve Analysis
• ROC Curve – plot of sensitivity vs. False positive rate
• Each point corresponds to a different threshold that separates negative samples
from positive samples
• The objective is to find a point (threshold) where the prediction rate is high
(high sensitivity) and false positive rate is low.
• Example in next slide
Source
The use of Decision Threshold Adjustment in Classification of Cancer Predictionhttp://www.ams.sunysb.edu/~hahn/psfile/papthres.pdf

Cases
• Breast Cancer Prediction – 0.98
• Fraud detection – 0.92
Source: http://www.gcxanalytics.com/papers/GCX%20Fraud%20Detection%20Performance%20Evaluation-
GCX.pdf

Similar to Threshold setting for reduction of false positives

Performance Metrics for Machine Learning AlgorithmsKush Kulshrestha

Performance Metrics, Baseline Model, and Hyper ParameterIndraFransiskusAlam1

Evidence based diagnosisHesham Al-Inany

Odds ratio and confidence intervalUttamaTungkhang

ScreeningDr. Anees Alyafei

NY Prostate Cancer Conference - A. Vickers - Session 8: Debate 2: Categorical...European School of Oncology

NY Prostate Cancer Conference - A. Vickers - Session 1: Traditional statistic...European School of Oncology

screening for diseases.pptx . ...AkshayBadore2

screening-140217071714-phpapp02.pdfSYEDZIYADFURQAN

How to do the mathsAhmed Elfaitury

Evidence Based DiagnosisKarl Daniel, M.D.

Causal inference lecture to Texas Children's fellowsPavlos Msaouel, MD, PhD

The Lachman TestLaura Torres

Practical Methods To Overcome Sample Size ChallengesnQuery

Validity of a screening testdrkulrajat

brain tumor presentation.pptxbraintumorpresentationonbraintumorNagavelliMadhavi

Excelsior College PBH 321 Page 1 CASE-CONTROL STU.docxgitagrimston

Oncotype dxNHS

ISCB 2023 Sources of uncertainty b.pptxBenVanCalster

Basics of Sample Size EstimationMandar Baviskar

Similar to Threshold setting for reduction of false positives (20)

Performance Metrics for Machine Learning Algorithms

Performance Metrics, Baseline Model, and Hyper Parameter

Evidence based diagnosis

Odds ratio and confidence interval

Screening

NY Prostate Cancer Conference - A. Vickers - Session 8: Debate 2: Categorical...

NY Prostate Cancer Conference - A. Vickers - Session 1: Traditional statistic...

screening for diseases.pptx . ...

screening-140217071714-phpapp02.pdf

How to do the maths

Evidence Based Diagnosis

Causal inference lecture to Texas Children's fellows

The Lachman Test

Practical Methods To Overcome Sample Size Challenges

Validity of a screening test

brain tumor presentation.pptxbraintumorpresentationonbraintumor

Excelsior College PBH 321 Page 1 CASE-CONTROL STU.docx

Oncotype dx

ISCB 2023 Sources of uncertainty b.pptx

Basics of Sample Size Estimation

Recently uploaded

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H

CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion

B2 Creative Industry Response Evaluation.docxStephen266013

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692

Midocean dropshipping via API with DroFxolyaivanovalion

Carero dropshipping via API with DroFx.pptxolyaivanovalion

FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg

Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor

VidaXL dropshipping via API with DroFx.pptxolyaivanovalion

Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten

Smarteg dropshipping via API with DroFx.pptxolyaivanovalion

100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate

Halmar dropshipping via API with DroFxolyaivanovalion

Invezz.com - Grow your wealth with trading signalsInvezz1

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls

Ravak dropshipping via API with DroFx.pptxolyaivanovalion

Recently uploaded (20)

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf

CebaBaby dropshipping via API with DroFX.pptx

B2 Creative Industry Response Evaluation.docx

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx

Midocean dropshipping via API with DroFx

Carero dropshipping via API with DroFx.pptx

FESE Capital Markets Fact Sheet 2024 Q1.pdf

Unveiling Insights: The Role of a Data Analyst

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130

VidaXL dropshipping via API with DroFx.pptx

Log Analysis using OSSEC sasoasasasas.pptx

Smarteg dropshipping via API with DroFx.pptx

100-Concepts-of-AI by Anupama Kate .pptx

Halmar dropshipping via API with DroFx

Invezz.com - Grow your wealth with trading signals

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...

Ravak dropshipping via API with DroFx.pptx

Threshold setting for reduction of false positives

1. Threshold setting for high prediction rate with low false positives Improving the functionality of Supervised Classification

2. Why do we need a low false positive rate? Let us take the example of a cancer prediction problem. If our model would predict that one of our patients is going to have cancer, when he actually isn’t going to, we are going to render a mental trauma for him and his family. In other words, while we want to accurately predict the number of people who are going to have cancer, we do not want to falsely predict if someone is going to have cancer when they actually aren’t going to. Hence, when we build a classification model, we need to ensure that it is tested correctly and that the false positive rate is as low as possible without compromising the classifying accuracy of the model.

3. Testing the Classification Model Testing requires two parameters to be observed: • Sensitivity: Proportion of true positives predicted Total number of positives • Specificity: Proportion of true negatives predicted Total number of negatives Sensitivity can be intuitively thought of as the predictive(classifying) accuracy of the model on the positive class (Eg; how correctly are we predicting the number of cancer patients) Specificity can be intuitively thought of as the predictive(classifying)accuracy of the model on the negative class (Eg; how correctly are we predicting the number of patients who do not have cancer)

4. For example There is a sample of 2000 patients out of which 20 have ovarian cancer. The classification model built by a healthcare company predicts 22 patients have ovarian cancer out of which 15 people have ovarian cancer. What is the sensitivity and specificity? Sensitivity = 15/20 = 0.75 Specificity = 1973/1980 = 0.99

5. ROC Curve Analysis • ROC Curve – plot of sensitivity vs. False positive rate • Each point corresponds to a different threshold that separates negative samples from positive samples • The objective is to find a point (threshold) where the prediction rate is high (high sensitivity) and false positive rate is low. • Example in next slide Source The use of Decision Threshold Adjustment in Classification of Cancer Predictionhttp://www.ams.sunysb.edu/~hahn/psfile/papthres.pdf

7. Cases • Breast Cancer Prediction – 0.98 • Fraud detection – 0.92 Source: http://www.gcxanalytics.com/papers/GCX%20Fraud%20Detection%20Performance%20Evaluation- GCX.pdf

Threshold setting for reduction of false positives

Recommended

Recommended

More Related Content

Similar to Threshold setting for reduction of false positives

Similar to Threshold setting for reduction of false positives (20)

Recently uploaded

Recently uploaded (20)

Threshold setting for reduction of false positives