SlideShare a Scribd company logo
1 of 23
Data Analysis for Credit Card
Fraud Detection
Alejandro Correa Bahnsen
Luxembourg University
Introduction
โ‚ฌ 500
โ‚ฌ 600
โ‚ฌ 700
โ‚ฌ 800
2007 2008 2009 2010 2011E 2012E
Europe fraud evolution
Internet transactions (millions of euros)
Introduction
$-
$1.0
$2.0
$3.0
$4.0
$5.0
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
US fraud evolution
Online revenue lost due to fraud (Billions of dollars)
Simplify transaction flow
Fraud??
Network
โ€ข Introduction
โ€ข Database
โ€ข Evaluation of algorithms
โ€ข Logistic Regression
โ€ข Financial measure
โ€ข Cost Sensitive Logistic Regression
Agenda
Database
โ€ข Larger European card processing
company
โ€ข 2012 card present transactions
โ€ข 750,000 Transactions
โ€ข 3500 Frauds
โ€ข 0.467% Fraud rate
โ€ข 148,562 EUR lost due to fraud on
test dataset
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
Test
Train
โ€ข Raw attributes
โ€ข Other attributes:
Age, country of residence, postal code, type of card
Database
TRXID Client ID Date Amount Location Type
Merchant
Group
Fraud
1 1 2/1/12 6:00 580 Lux Internet Airlines No
2 1 2/1/12 6:15 120 Lux Present Car Renting No
3 2 2/1/12 8:20 12 Bel Present Hotel Yes
4 1 3/1/12 4:15 60 Lux ATM ATM No
5 2 3/1/12 9:18 8 Fra Present Retail No
6 1 3/1/12 9:55 1210 Lux Internet Airlines Yes
7
โ€ข Derived attributes
Combination of
following criteria:
Database
ID
Num
CC
Date Amt Location Type
Merchant
Group
Fraud
No. of Trx โ€“
same client โ€“
last 6 hour
Sum โ€“ same
client โ€“ last 7
days
1 1 2/1/12 6:00 580 Lux Internet Airlines No 0 0
2 1 2/1/12 6:15 120 Lux Present Car Renting No 1 580
3 2 2/1/12 8:20 12 Bel Present Hotel Yes 0 0
4 1 3/1/12 4:15 60 Lux ATM ATM No 0 700
5 2 3/1/12 9:18 8 Fra Present Retail No 0 12
6 1 3/1/12 9:55 1210 Lux Internet Airlines Yes 1 760
By Group Last Function
Client None hour Count
Credit Card Transaction Type day Sum(Amount)
Merchant week Avg(Amount)
Merchant Category month
Merchant Group 1 3 months
Merchant Group 2
Merchant Country
8
โ€ข Misclassification = 1 โˆ’
๐‘‡๐‘ƒ+๐‘‡๐‘
๐‘‡๐‘ƒ+๐‘‡๐‘+๐น๐‘ƒ+๐น๐‘
โ€ข Recall =
๐‘‡๐‘ƒ
๐‘‡๐‘ƒ+๐น๐‘
โ€ข Precision =
๐‘‡๐‘ƒ
๐‘‡๐‘ƒ+๐น๐‘ƒ
โ€ข F-Score = 2
๐‘ƒ๐‘Ÿ๐‘’๐‘๐‘–๐‘ ๐‘–๐‘œ๐‘› โˆ— ๐‘…๐‘’๐‘๐‘Ž๐‘™๐‘™
๐‘ƒ๐‘Ÿ๐‘’๐‘๐‘–๐‘ ๐‘–๐‘œ๐‘›+๐‘…๐‘’๐‘๐‘Ž๐‘™๐‘™
Evaluation
True Class (๐‘ฆ๐‘–)
Fraud (๐‘ฆ๐‘–=1) Legitimate (๐‘ฆ๐‘–=0)
Predicted class
(๐‘๐‘–)
Fraud (๐‘๐‘–=1) TP FP
Legitimate (๐‘๐‘–=0) FN TN
โ€ข Confusion matrix
โ€ข Introduction
โ€ข Database
โ€ข Evaluation of algorithms
โ€ข Logistic Regression
โ€ข Financial measure
โ€ข Cost Sensitive Logistic Regression
Agenda
True Class (๐‘ฆ๐‘–)
Fraud (๐‘ฆ๐‘–=1) Legitimate (๐‘ฆ๐‘–=0)
Predicted class
(๐‘๐‘–)
Fraud (๐‘๐‘–=1) 0 1
Legitimate (๐‘๐‘–=0) 1 0
โ€ข Model
โ€ข Cost Function
โ€ข Cost Matrix
Logistic Regression
1% 5% 10% 20% 50%
Logistic Regression
Under sampling procedure
0.467%
Select all the frauds and a random sample of the legitimate transactions.
Logistic Regression
Results
0%
10%
20%
30%
40%
50%
60%
70%
No Model All 1% 5% 10% 20% 50%
Recall Precision Miss-cla F1-Score
โ€ข Motivation
โ€ข False positives carry a different cost than false
negatives
โ€ข Frauds range from few to thousands of euros
(dollars, pounds, etc)
Financial evaluation
There is a need for a real comparison measure
โ€ข Cost matrix
where:
Financial evaluation
Ca Administrative costs
Amt Amount of transaction i
True Class (๐‘ฆ๐‘–)
Fraud (๐‘ฆ๐‘–=1) Legitimate (๐‘ฆ๐‘–=0)
Predicted class
(๐‘๐‘–)
Fraud (๐‘๐‘–=1) Ca Ca
Legitimate (๐‘๐‘–=0) Amt 0
โ€ข Evaluation measure
Logistic Regression
Results
โ‚ฌ 148,562โ‚ฌ 148,196โ‚ฌ 142,510
โ‚ฌ 112,103
โ‚ฌ 79,838
โ‚ฌ 65,870
โ‚ฌ 46,530
โ‚ฌ -
โ‚ฌ 20,000
โ‚ฌ 40,000
โ‚ฌ 60,000
โ‚ฌ 80,000
โ‚ฌ 100,000
โ‚ฌ 120,000
โ‚ฌ 140,000
โ‚ฌ 160,000
0%
10%
20%
30%
40%
50%
60%
70%
No Model All 1% 5% 10% 20% 50%
Cost Recall Precision F1-Score
Selecting the algorithm by F1-Score
Selecting the algorithm by Cost
Logistic Regression
โ€ข Best model selected using traditional F1-Score does not give
the best results in terms of cost
โ€ข Model selected by cost, is trained using less than 1% of the
database, meaning there is a lot of information excluded
โ€ข The algorithm is trained to minimize the miss-classification
(approx.) but then is evaluated based on cost
โ€ข Why not train the algorithm to minimize the cost instead?
True Class (๐‘ฆ๐‘–)
Fraud (๐‘ฆ๐‘–=1) Legitimate (๐‘ฆ๐‘–=0)
Predicted class
(๐‘๐‘–)
Fraud (๐‘๐‘–=1) Ca Ca
Legitimate (๐‘๐‘–=0) Amt 0
โ€ข Cost Matrix
Cost Sensitive Logistic Regression
โ€ข Cost Function
โ€ข Objective
Find ๐œƒ that minimized the cost function (Genetic Algorithms)
โ‚ฌ 148,562
โ‚ฌ 31,174
โ‚ฌ 37,785
โ‚ฌ 66,245 โ‚ฌ 67,264
โ‚ฌ 73,772
โ‚ฌ 85,724
โ‚ฌ -
โ‚ฌ 20,000
โ‚ฌ 40,000
โ‚ฌ 60,000
โ‚ฌ 80,000
โ‚ฌ 100,000
โ‚ฌ 120,000
โ‚ฌ 140,000
โ‚ฌ 160,000
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
No
Model
All 1% 5% 10% 20% 50%
Cost Recall Precision F1-Score
Cost sensitive Logistic Regression
Results
Cost sensitive Logistic Regression
Results
โ‚ฌ 148,562
โ‚ฌ 46,530
โ‚ฌ 31,174
โ‚ฌ -
โ‚ฌ 20,000
โ‚ฌ 40,000
โ‚ฌ 60,000
โ‚ฌ 80,000
โ‚ฌ 100,000
โ‚ฌ 120,000
โ‚ฌ 140,000
โ‚ฌ 160,000
0%
10%
20%
30%
40%
50%
60%
70%
80%
No Model Logistic Regression Cost Sensitive
Logistic Regression
Cost Recall Precision F1-Score
Conclusion
โ€ข Selecting models based on traditional statistics does not
give the best results in terms of cost
โ€ข Models should be evaluated taking into account real
financial costs of the application
โ€ข Algorithms should be developed to incorporate those
financial costs
Thank you!
Contact information
Alejandro Correa Bahnsen
University of Luxembourg
Luxembourg
al.bahnsen@gmail.com
http://www.linkedin.com/in/albahnsen
http://www.slideshare.net/albahnsen

More Related Content

Similar to Cost-Sensitive Logistic Regression for Credit Card Fraud Detection

Fraud Detection in Real-time @ Apache Big Data con
Fraud Detection in Real-time @ Apache Big Data conFraud Detection in Real-time @ Apache Big Data con
Fraud Detection in Real-time @ Apache Big Data conSeshika Fernando
ย 
Fraud Detection in Real-time @ Apache Big Data Con
Fraud Detection in Real-time @ Apache Big Data ConFraud Detection in Real-time @ Apache Big Data Con
Fraud Detection in Real-time @ Apache Big Data ConSeshika Fernando
ย 
Machine learning techniques in fraud prevention
Machine learning techniques in fraud preventionMachine learning techniques in fraud prevention
Machine learning techniques in fraud preventionVolodymyr Syzonenko
ย 
Mobile Payments : It's all in the pricing
Mobile Payments : It's all in the pricingMobile Payments : It's all in the pricing
Mobile Payments : It's all in the pricingAmbrish Verma
ย 
02-Bednรกล™ ShipMonk Deck February 2018.pdf
02-Bednรกล™ ShipMonk Deck February 2018.pdf02-Bednรกล™ ShipMonk Deck February 2018.pdf
02-Bednรกล™ ShipMonk Deck February 2018.pdfTomKos3
ย 
CFVG MEBF 2014 riskGRID ETRM CTRM Game
CFVG MEBF 2014 riskGRID ETRM CTRM Game  CFVG MEBF 2014 riskGRID ETRM CTRM Game
CFVG MEBF 2014 riskGRID ETRM CTRM Game commoditytradingroom
ย 
ML & Graph algorithms to prevent financial crime in digital payments
ML & Graph  algorithms to prevent  financial crime in  digital paymentsML & Graph  algorithms to prevent  financial crime in  digital payments
ML & Graph algorithms to prevent financial crime in digital paymentsData Science Milan
ย 
Transactional Streaming: If you can compute it, you can probably stream it.
Transactional Streaming: If you can compute it, you can probably stream it.Transactional Streaming: If you can compute it, you can probably stream it.
Transactional Streaming: If you can compute it, you can probably stream it.jhugg
ย 
Office Developers Conference - Financial Services OBAs
Office Developers Conference - Financial Services OBAsOffice Developers Conference - Financial Services OBAs
Office Developers Conference - Financial Services OBAsMike Walker
ย 
Digital Transformation: How to Model Human Behavior in Digitization
Digital Transformation: How to Model Human Behavior in DigitizationDigital Transformation: How to Model Human Behavior in Digitization
Digital Transformation: How to Model Human Behavior in DigitizationBizagi
ย 
2018 oct executive_forum_sysman_214
2018 oct executive_forum_sysman_2142018 oct executive_forum_sysman_214
2018 oct executive_forum_sysman_214Alex Petrov
ย 
Global Dynamics 365 Bootcamp London 2018
Global Dynamics 365 Bootcamp London 2018Global Dynamics 365 Bootcamp London 2018
Global Dynamics 365 Bootcamp London 2018Stefano Tempesta
ย 
Stash. A new frontier on Cash Management
Stash. A new frontier on Cash ManagementStash. A new frontier on Cash Management
Stash. A new frontier on Cash ManagementSikelia Service
ย 
Artificial Intelligence high ROI case studies from around the world: approach...
Artificial Intelligence high ROI case studies from around the world: approach...Artificial Intelligence high ROI case studies from around the world: approach...
Artificial Intelligence high ROI case studies from around the world: approach...Data Driven Innovation
ย 
Ibm financial crime management solution 3
Ibm financial crime management solution 3Ibm financial crime management solution 3
Ibm financial crime management solution 3Sunny Fei
ย 
Recommendation engines for Banks
Recommendation engines for BanksRecommendation engines for Banks
Recommendation engines for BanksShivanand (Shiva) Rai
ย 

Similar to Cost-Sensitive Logistic Regression for Credit Card Fraud Detection (20)

Fraud Detection in Real-time @ Apache Big Data con
Fraud Detection in Real-time @ Apache Big Data conFraud Detection in Real-time @ Apache Big Data con
Fraud Detection in Real-time @ Apache Big Data con
ย 
Fraud Detection in Real-time @ Apache Big Data Con
Fraud Detection in Real-time @ Apache Big Data ConFraud Detection in Real-time @ Apache Big Data Con
Fraud Detection in Real-time @ Apache Big Data Con
ย 
Machine learning techniques in fraud prevention
Machine learning techniques in fraud preventionMachine learning techniques in fraud prevention
Machine learning techniques in fraud prevention
ย 
Mobile Payments : It's all in the pricing
Mobile Payments : It's all in the pricingMobile Payments : It's all in the pricing
Mobile Payments : It's all in the pricing
ย 
02-Bednรกล™ ShipMonk Deck February 2018.pdf
02-Bednรกล™ ShipMonk Deck February 2018.pdf02-Bednรกล™ ShipMonk Deck February 2018.pdf
02-Bednรกล™ ShipMonk Deck February 2018.pdf
ย 
CFVG MEBF 2014 riskGRID ETRM CTRM Game
CFVG MEBF  2014 riskGRID ETRM CTRM Game  CFVG MEBF  2014 riskGRID ETRM CTRM Game
CFVG MEBF 2014 riskGRID ETRM CTRM Game
ย 
CFVG MEBF 2014 riskGRID ETRM Game
CFVG MEBF 2014 riskGRID ETRM Game  CFVG MEBF 2014 riskGRID ETRM Game
CFVG MEBF 2014 riskGRID ETRM Game
ย 
CFVG MEBF 2014 riskGRID ETRM CTRM Game
CFVG MEBF 2014 riskGRID ETRM CTRM Game  CFVG MEBF 2014 riskGRID ETRM CTRM Game
CFVG MEBF 2014 riskGRID ETRM CTRM Game
ย 
ML & Graph algorithms to prevent financial crime in digital payments
ML & Graph  algorithms to prevent  financial crime in  digital paymentsML & Graph  algorithms to prevent  financial crime in  digital payments
ML & Graph algorithms to prevent financial crime in digital payments
ย 
Transactional Streaming: If you can compute it, you can probably stream it.
Transactional Streaming: If you can compute it, you can probably stream it.Transactional Streaming: If you can compute it, you can probably stream it.
Transactional Streaming: If you can compute it, you can probably stream it.
ย 
Office Developers Conference - Financial Services OBAs
Office Developers Conference - Financial Services OBAsOffice Developers Conference - Financial Services OBAs
Office Developers Conference - Financial Services OBAs
ย 
Digital Transformation: How to Model Human Behavior in Digitization
Digital Transformation: How to Model Human Behavior in DigitizationDigital Transformation: How to Model Human Behavior in Digitization
Digital Transformation: How to Model Human Behavior in Digitization
ย 
2018 oct executive_forum_sysman_214
2018 oct executive_forum_sysman_2142018 oct executive_forum_sysman_214
2018 oct executive_forum_sysman_214
ย 
YM-RMWisdom15 final
YM-RMWisdom15 finalYM-RMWisdom15 final
YM-RMWisdom15 final
ย 
Global Dynamics 365 Bootcamp London 2018
Global Dynamics 365 Bootcamp London 2018Global Dynamics 365 Bootcamp London 2018
Global Dynamics 365 Bootcamp London 2018
ย 
Stash. A new frontier on Cash Management
Stash. A new frontier on Cash ManagementStash. A new frontier on Cash Management
Stash. A new frontier on Cash Management
ย 
AUDITO TOOLS
AUDITO TOOLSAUDITO TOOLS
AUDITO TOOLS
ย 
Artificial Intelligence high ROI case studies from around the world: approach...
Artificial Intelligence high ROI case studies from around the world: approach...Artificial Intelligence high ROI case studies from around the world: approach...
Artificial Intelligence high ROI case studies from around the world: approach...
ย 
Ibm financial crime management solution 3
Ibm financial crime management solution 3Ibm financial crime management solution 3
Ibm financial crime management solution 3
ย 
Recommendation engines for Banks
Recommendation engines for BanksRecommendation engines for Banks
Recommendation engines for Banks
ย 

Recently uploaded

Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionBoston Institute of Analytics
ย 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
ย 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
ย 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
ย 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
ย 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
ย 
๊งโค Greater Noida Call Girls Delhi โค๊ง‚ 9711199171 โ˜Ž๏ธ Hard And Sexy Vip Call
๊งโค Greater Noida Call Girls Delhi โค๊ง‚ 9711199171 โ˜Ž๏ธ Hard And Sexy Vip Call๊งโค Greater Noida Call Girls Delhi โค๊ง‚ 9711199171 โ˜Ž๏ธ Hard And Sexy Vip Call
๊งโค Greater Noida Call Girls Delhi โค๊ง‚ 9711199171 โ˜Ž๏ธ Hard And Sexy Vip Callshivangimorya083
ย 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
ย 
From idea to production in a day โ€“ Leveraging Azure ML and Streamlit to build...
From idea to production in a day โ€“ Leveraging Azure ML and Streamlit to build...From idea to production in a day โ€“ Leveraging Azure ML and Streamlit to build...
From idea to production in a day โ€“ Leveraging Azure ML and Streamlit to build...Florian Roscheck
ย 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
ย 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
ย 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
ย 
ไปฃๅŠžๅ›ฝๅค–ๅคงๅญฆๆ–‡ๅ‡ญใ€ŠๅŽŸ็‰ˆ็พŽๅ›ฝUCLAๆ–‡ๅ‡ญ่ฏไนฆใ€‹ๅŠ ๅทžๅคงๅญฆๆด›ๆ‰็Ÿถๅˆ†ๆ กๆฏ•ไธš่ฏๅˆถไฝœๆˆ็ปฉๅ•ไฟฎๆ”น
ไปฃๅŠžๅ›ฝๅค–ๅคงๅญฆๆ–‡ๅ‡ญใ€ŠๅŽŸ็‰ˆ็พŽๅ›ฝUCLAๆ–‡ๅ‡ญ่ฏไนฆใ€‹ๅŠ ๅทžๅคงๅญฆๆด›ๆ‰็Ÿถๅˆ†ๆ กๆฏ•ไธš่ฏๅˆถไฝœๆˆ็ปฉๅ•ไฟฎๆ”นไปฃๅŠžๅ›ฝๅค–ๅคงๅญฆๆ–‡ๅ‡ญใ€ŠๅŽŸ็‰ˆ็พŽๅ›ฝUCLAๆ–‡ๅ‡ญ่ฏไนฆใ€‹ๅŠ ๅทžๅคงๅญฆๆด›ๆ‰็Ÿถๅˆ†ๆ กๆฏ•ไธš่ฏๅˆถไฝœๆˆ็ปฉๅ•ไฟฎๆ”น
ไปฃๅŠžๅ›ฝๅค–ๅคงๅญฆๆ–‡ๅ‡ญใ€ŠๅŽŸ็‰ˆ็พŽๅ›ฝUCLAๆ–‡ๅ‡ญ่ฏไนฆใ€‹ๅŠ ๅทžๅคงๅญฆๆด›ๆ‰็Ÿถๅˆ†ๆ กๆฏ•ไธš่ฏๅˆถไฝœๆˆ็ปฉๅ•ไฟฎๆ”นatducpo
ย 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
ย 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
ย 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
ย 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
ย 
Full night ๐Ÿฅต Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy โœŒ๏ธo...
Full night ๐Ÿฅต Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy โœŒ๏ธo...Full night ๐Ÿฅต Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy โœŒ๏ธo...
Full night ๐Ÿฅต Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy โœŒ๏ธo...shivangimorya083
ย 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
ย 

Recently uploaded (20)

Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
ย 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
ย 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
ย 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
ย 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
ย 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
ย 
๊งโค Greater Noida Call Girls Delhi โค๊ง‚ 9711199171 โ˜Ž๏ธ Hard And Sexy Vip Call
๊งโค Greater Noida Call Girls Delhi โค๊ง‚ 9711199171 โ˜Ž๏ธ Hard And Sexy Vip Call๊งโค Greater Noida Call Girls Delhi โค๊ง‚ 9711199171 โ˜Ž๏ธ Hard And Sexy Vip Call
๊งโค Greater Noida Call Girls Delhi โค๊ง‚ 9711199171 โ˜Ž๏ธ Hard And Sexy Vip Call
ย 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
ย 
From idea to production in a day โ€“ Leveraging Azure ML and Streamlit to build...
From idea to production in a day โ€“ Leveraging Azure ML and Streamlit to build...From idea to production in a day โ€“ Leveraging Azure ML and Streamlit to build...
From idea to production in a day โ€“ Leveraging Azure ML and Streamlit to build...
ย 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
ย 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
ย 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
ย 
ไปฃๅŠžๅ›ฝๅค–ๅคงๅญฆๆ–‡ๅ‡ญใ€ŠๅŽŸ็‰ˆ็พŽๅ›ฝUCLAๆ–‡ๅ‡ญ่ฏไนฆใ€‹ๅŠ ๅทžๅคงๅญฆๆด›ๆ‰็Ÿถๅˆ†ๆ กๆฏ•ไธš่ฏๅˆถไฝœๆˆ็ปฉๅ•ไฟฎๆ”น
ไปฃๅŠžๅ›ฝๅค–ๅคงๅญฆๆ–‡ๅ‡ญใ€ŠๅŽŸ็‰ˆ็พŽๅ›ฝUCLAๆ–‡ๅ‡ญ่ฏไนฆใ€‹ๅŠ ๅทžๅคงๅญฆๆด›ๆ‰็Ÿถๅˆ†ๆ กๆฏ•ไธš่ฏๅˆถไฝœๆˆ็ปฉๅ•ไฟฎๆ”นไปฃๅŠžๅ›ฝๅค–ๅคงๅญฆๆ–‡ๅ‡ญใ€ŠๅŽŸ็‰ˆ็พŽๅ›ฝUCLAๆ–‡ๅ‡ญ่ฏไนฆใ€‹ๅŠ ๅทžๅคงๅญฆๆด›ๆ‰็Ÿถๅˆ†ๆ กๆฏ•ไธš่ฏๅˆถไฝœๆˆ็ปฉๅ•ไฟฎๆ”น
ไปฃๅŠžๅ›ฝๅค–ๅคงๅญฆๆ–‡ๅ‡ญใ€ŠๅŽŸ็‰ˆ็พŽๅ›ฝUCLAๆ–‡ๅ‡ญ่ฏไนฆใ€‹ๅŠ ๅทžๅคงๅญฆๆด›ๆ‰็Ÿถๅˆ†ๆ กๆฏ•ไธš่ฏๅˆถไฝœๆˆ็ปฉๅ•ไฟฎๆ”น
ย 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
ย 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
ย 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
ย 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
ย 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
ย 
Full night ๐Ÿฅต Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy โœŒ๏ธo...
Full night ๐Ÿฅต Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy โœŒ๏ธo...Full night ๐Ÿฅต Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy โœŒ๏ธo...
Full night ๐Ÿฅต Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy โœŒ๏ธo...
ย 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
ย 

Cost-Sensitive Logistic Regression for Credit Card Fraud Detection

  • 1. Data Analysis for Credit Card Fraud Detection Alejandro Correa Bahnsen Luxembourg University
  • 2. Introduction โ‚ฌ 500 โ‚ฌ 600 โ‚ฌ 700 โ‚ฌ 800 2007 2008 2009 2010 2011E 2012E Europe fraud evolution Internet transactions (millions of euros)
  • 3. Introduction $- $1.0 $2.0 $3.0 $4.0 $5.0 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 US fraud evolution Online revenue lost due to fraud (Billions of dollars)
  • 5. โ€ข Introduction โ€ข Database โ€ข Evaluation of algorithms โ€ข Logistic Regression โ€ข Financial measure โ€ข Cost Sensitive Logistic Regression Agenda
  • 6. Database โ€ข Larger European card processing company โ€ข 2012 card present transactions โ€ข 750,000 Transactions โ€ข 3500 Frauds โ€ข 0.467% Fraud rate โ€ข 148,562 EUR lost due to fraud on test dataset Dec Nov Oct Sep Aug Jul Jun May Apr Mar Feb Jan Test Train
  • 7. โ€ข Raw attributes โ€ข Other attributes: Age, country of residence, postal code, type of card Database TRXID Client ID Date Amount Location Type Merchant Group Fraud 1 1 2/1/12 6:00 580 Lux Internet Airlines No 2 1 2/1/12 6:15 120 Lux Present Car Renting No 3 2 2/1/12 8:20 12 Bel Present Hotel Yes 4 1 3/1/12 4:15 60 Lux ATM ATM No 5 2 3/1/12 9:18 8 Fra Present Retail No 6 1 3/1/12 9:55 1210 Lux Internet Airlines Yes 7
  • 8. โ€ข Derived attributes Combination of following criteria: Database ID Num CC Date Amt Location Type Merchant Group Fraud No. of Trx โ€“ same client โ€“ last 6 hour Sum โ€“ same client โ€“ last 7 days 1 1 2/1/12 6:00 580 Lux Internet Airlines No 0 0 2 1 2/1/12 6:15 120 Lux Present Car Renting No 1 580 3 2 2/1/12 8:20 12 Bel Present Hotel Yes 0 0 4 1 3/1/12 4:15 60 Lux ATM ATM No 0 700 5 2 3/1/12 9:18 8 Fra Present Retail No 0 12 6 1 3/1/12 9:55 1210 Lux Internet Airlines Yes 1 760 By Group Last Function Client None hour Count Credit Card Transaction Type day Sum(Amount) Merchant week Avg(Amount) Merchant Category month Merchant Group 1 3 months Merchant Group 2 Merchant Country 8
  • 9. โ€ข Misclassification = 1 โˆ’ ๐‘‡๐‘ƒ+๐‘‡๐‘ ๐‘‡๐‘ƒ+๐‘‡๐‘+๐น๐‘ƒ+๐น๐‘ โ€ข Recall = ๐‘‡๐‘ƒ ๐‘‡๐‘ƒ+๐น๐‘ โ€ข Precision = ๐‘‡๐‘ƒ ๐‘‡๐‘ƒ+๐น๐‘ƒ โ€ข F-Score = 2 ๐‘ƒ๐‘Ÿ๐‘’๐‘๐‘–๐‘ ๐‘–๐‘œ๐‘› โˆ— ๐‘…๐‘’๐‘๐‘Ž๐‘™๐‘™ ๐‘ƒ๐‘Ÿ๐‘’๐‘๐‘–๐‘ ๐‘–๐‘œ๐‘›+๐‘…๐‘’๐‘๐‘Ž๐‘™๐‘™ Evaluation True Class (๐‘ฆ๐‘–) Fraud (๐‘ฆ๐‘–=1) Legitimate (๐‘ฆ๐‘–=0) Predicted class (๐‘๐‘–) Fraud (๐‘๐‘–=1) TP FP Legitimate (๐‘๐‘–=0) FN TN โ€ข Confusion matrix
  • 10. โ€ข Introduction โ€ข Database โ€ข Evaluation of algorithms โ€ข Logistic Regression โ€ข Financial measure โ€ข Cost Sensitive Logistic Regression Agenda
  • 11. True Class (๐‘ฆ๐‘–) Fraud (๐‘ฆ๐‘–=1) Legitimate (๐‘ฆ๐‘–=0) Predicted class (๐‘๐‘–) Fraud (๐‘๐‘–=1) 0 1 Legitimate (๐‘๐‘–=0) 1 0 โ€ข Model โ€ข Cost Function โ€ข Cost Matrix Logistic Regression
  • 12. 1% 5% 10% 20% 50% Logistic Regression Under sampling procedure 0.467% Select all the frauds and a random sample of the legitimate transactions.
  • 13. Logistic Regression Results 0% 10% 20% 30% 40% 50% 60% 70% No Model All 1% 5% 10% 20% 50% Recall Precision Miss-cla F1-Score
  • 14. โ€ข Motivation โ€ข False positives carry a different cost than false negatives โ€ข Frauds range from few to thousands of euros (dollars, pounds, etc) Financial evaluation There is a need for a real comparison measure
  • 15. โ€ข Cost matrix where: Financial evaluation Ca Administrative costs Amt Amount of transaction i True Class (๐‘ฆ๐‘–) Fraud (๐‘ฆ๐‘–=1) Legitimate (๐‘ฆ๐‘–=0) Predicted class (๐‘๐‘–) Fraud (๐‘๐‘–=1) Ca Ca Legitimate (๐‘๐‘–=0) Amt 0 โ€ข Evaluation measure
  • 16. Logistic Regression Results โ‚ฌ 148,562โ‚ฌ 148,196โ‚ฌ 142,510 โ‚ฌ 112,103 โ‚ฌ 79,838 โ‚ฌ 65,870 โ‚ฌ 46,530 โ‚ฌ - โ‚ฌ 20,000 โ‚ฌ 40,000 โ‚ฌ 60,000 โ‚ฌ 80,000 โ‚ฌ 100,000 โ‚ฌ 120,000 โ‚ฌ 140,000 โ‚ฌ 160,000 0% 10% 20% 30% 40% 50% 60% 70% No Model All 1% 5% 10% 20% 50% Cost Recall Precision F1-Score Selecting the algorithm by F1-Score Selecting the algorithm by Cost
  • 17. Logistic Regression โ€ข Best model selected using traditional F1-Score does not give the best results in terms of cost โ€ข Model selected by cost, is trained using less than 1% of the database, meaning there is a lot of information excluded โ€ข The algorithm is trained to minimize the miss-classification (approx.) but then is evaluated based on cost โ€ข Why not train the algorithm to minimize the cost instead?
  • 18. True Class (๐‘ฆ๐‘–) Fraud (๐‘ฆ๐‘–=1) Legitimate (๐‘ฆ๐‘–=0) Predicted class (๐‘๐‘–) Fraud (๐‘๐‘–=1) Ca Ca Legitimate (๐‘๐‘–=0) Amt 0 โ€ข Cost Matrix Cost Sensitive Logistic Regression โ€ข Cost Function โ€ข Objective Find ๐œƒ that minimized the cost function (Genetic Algorithms)
  • 19. โ‚ฌ 148,562 โ‚ฌ 31,174 โ‚ฌ 37,785 โ‚ฌ 66,245 โ‚ฌ 67,264 โ‚ฌ 73,772 โ‚ฌ 85,724 โ‚ฌ - โ‚ฌ 20,000 โ‚ฌ 40,000 โ‚ฌ 60,000 โ‚ฌ 80,000 โ‚ฌ 100,000 โ‚ฌ 120,000 โ‚ฌ 140,000 โ‚ฌ 160,000 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% No Model All 1% 5% 10% 20% 50% Cost Recall Precision F1-Score Cost sensitive Logistic Regression Results
  • 20. Cost sensitive Logistic Regression Results โ‚ฌ 148,562 โ‚ฌ 46,530 โ‚ฌ 31,174 โ‚ฌ - โ‚ฌ 20,000 โ‚ฌ 40,000 โ‚ฌ 60,000 โ‚ฌ 80,000 โ‚ฌ 100,000 โ‚ฌ 120,000 โ‚ฌ 140,000 โ‚ฌ 160,000 0% 10% 20% 30% 40% 50% 60% 70% 80% No Model Logistic Regression Cost Sensitive Logistic Regression Cost Recall Precision F1-Score
  • 21. Conclusion โ€ข Selecting models based on traditional statistics does not give the best results in terms of cost โ€ข Models should be evaluated taking into account real financial costs of the application โ€ข Algorithms should be developed to incorporate those financial costs
  • 23. Contact information Alejandro Correa Bahnsen University of Luxembourg Luxembourg al.bahnsen@gmail.com http://www.linkedin.com/in/albahnsen http://www.slideshare.net/albahnsen