SlideShare a Scribd company logo
1 of 5
Population Stability Index(PSI)
How to apply PSI, a statistics that’s widely used for scorecard
validations, to a Big Data problem.
A presentation for American Statistical Association, Orange County, CA Chapter.
By
JEOMOAN KURIAN
Director-Risk Management Analytics, Mitsubishi UFJ Union Bank
Jeo.kurian@gmail.com
1
Let’s start with a super model
• A logistics regression model that predicts the probability of defaults.
• At least 3 versions are now active:Ver2 &Ver8 are most common. Ver9 is the most recent.
What tells us the model needs a revision?
800
2
Front End Validation-Oversimplified
(1) (2) (3) (4) (5) (6) (7) (8) (9)
FICO
Range
# Dev.
Sample
# Recent
Sample
% Dev.
Sample
% Recent
Sample
Change
(5) - (4)
Ratio
(5) / (4)
WoE
Ln (7)
PSI
Portion
(6)*(7)
<500 4000 2000 17.2% 11.8% -5.4% 0.687 -0.376 0.020
500-620 2331 1200 10.0% 7.1% -2.9% 0.707 -0.347 0.010
621-660 2448 500 10.5% 2.9% -7.6% 0.280 -1.271 0.096
661-700 2614 3000 11.2% 17.7% 6.5% 1.576 0.455 0.029
701-740 2916 2700 12.5% 15.9% 3.4% 1.271 0.240 0.008
740-780 2241 1900 9.6% 11.2% 1.6% 1.164 0.152 0.002
781-820 2664 2400 11.4% 14.1% 2.7% 1.237 0.213 0.006
820+ 4086 3269 17.5% 19.3% 1.7% 1.098 0.094 0.002
TOTAL 23,300 16,969 100% 100% 0.154PSI (Sum of Column 9) =
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
<500 500-620 621-660 661-700 701-740 740-780 781-820 820+
%Dev Sample(Expected) % Recent Sample
Population Stability Index
0%
8%
16%
24%
32%
40%
0-10% 11-20% 21-30% 31-40% 41-50% 51+%
Debt to Income Ratio Distribution
% Dev. Sample % Recent Sample
Characteristics Analysis: Let’s look
at one of the explanatory
variables that caused the change.
Development sample had more people with high
debt levels. May be a recession effect.
How PSI helps?
• Early indicator that something changed
compared to a baseline.
• A statistic that represent a set of data.
• 0.1 or Less: Little or no change
• 0.1 to 0.25: Some changes that
require close monitoring.
• 0.25 or higher: A major shift that
require review.
3
Marketing Analytics Big Data Question
How to automatically validate a file’s content and detect a bad file?
Clickstreams
Weblogs
Social Media
With multiple marketing channels
and disparate data sources, the data
scene is messy. Text/XML files are
often delivered bad.
Direct
Mail/email
ERP/In-store
CRM/Online
Vendor Files
FILE SOURCE ETL/STAGING STORAGE/HADOOP
Big Challenge: How to validate
each input file for completeness ?
File structure is intact but content
is not what’s expected.
• Bad data impact model
outcomes and results in
inefficient processes(Channel
attribution and subsequent
spending).
• Its expensive to clean up the
data at a later point.
PSI ?
• Display ad channel was 20% last lime
but dropped to 2% this time. ETL does
not detect this as a technical problem.
• In-store sale dropped to 20% as
compared to 70% last month.
• Reversal of trend is difficult to detect
while loading data but its important to
review such instances before it’s
loaded.
4
0%
10%
20%
30%
40%
50%
Adwords BingAds Display Flash Ads Retarget Video
Web Channel Categories
% Lat 3 Months % Recent Month
File validation using PSI: Advertisement channels
Population Stability Index
No records received from Adwords sub-channel.
This need a review before we proceed to data
loading step.
So how PSI helped?
• Set a threshold, say 0.25, to trigger a possible
data issue review.
• Provides a statistic to evaluate the content
quality and compare with previous months.
• Every significant variance from the expectation
will lead to a higher PSI number.
• A moving average benchmark will self adjust
gradual migration from one channel to
another.
• A configurable benchmark will help to handle
the expected scenarios . Say no email channel
expected this month.
(1) (2) (3) (4) (5) (6) (7) (8) (9)
Channel
# records
Previous
Three
Months
# records
Recent
Month
%Prev.
Three
Months
% Recent
Month
Change
(5) - (4)
Ratio
(5) / (4)
WoE
Ln (7)
PSI
Portion
(6)*(7)
Social 8,000 2,000 10.1% 12.8% 2.7% 1.266 0.236 0.006
Web 24,000 1,600 30.4% 10.3% -20.1% 0.338 -1.086 0.219
Email 4,000 1,000 5.1% 6.4% 1.3% 1.266 0.236 0.003
Print 3,000 1,000 3.8% 6.4% 2.6% 1.688 0.524 0.014
Instore 40,000 10,000 50.6% 64.1% 13.5% 1.266 0.236 0.032
TOTAL 79,000 15,600 100% 100% 0.267PSI (Sum of Column 9) =
0.0%
14.0%
28.0%
42.0%
56.0%
70.0%
Social Web Email Print Instore
%Prev 3 Months(Expected) % Recent Month
Characteristics Analysis: Let’s look
at what in web channel caused
the change.
0
5

More Related Content

Viewers also liked

Model Performance Monitoring and Back-Testing as a Business and Risk Manageme...
Model Performance Monitoring and Back-Testing as a Business and Risk Manageme...Model Performance Monitoring and Back-Testing as a Business and Risk Manageme...
Model Performance Monitoring and Back-Testing as a Business and Risk Manageme...Jonathan Harris
 
A Classification Problem of Credit Risk Rating Investigated and Solved by Opt...
A Classification Problem of Credit Risk Rating Investigated and Solved by Opt...A Classification Problem of Credit Risk Rating Investigated and Solved by Opt...
A Classification Problem of Credit Risk Rating Investigated and Solved by Opt...SSA KPI
 
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...Magnify Analytic Solutions
 
How to read a receiver operating characteritic (ROC) curve
How to read a receiver operating characteritic (ROC) curveHow to read a receiver operating characteritic (ROC) curve
How to read a receiver operating characteritic (ROC) curveSamir Haffar
 
Credit Scoring
Credit ScoringCredit Scoring
Credit ScoringMABSIV
 
Gini coefficient vs economic growth
Gini coefficient vs economic growthGini coefficient vs economic growth
Gini coefficient vs economic growthGaetan Lion
 
Model building in credit card and loan approval
Model building in credit card and loan approval Model building in credit card and loan approval
Model building in credit card and loan approval Venkata Reddy Konasani
 
Income inequality
Income inequalityIncome inequality
Income inequalityalishaaan
 
Predictive Model for Loan Approval Process using SAS 9.3_M1
Predictive Model for Loan Approval Process using SAS 9.3_M1Predictive Model for Loan Approval Process using SAS 9.3_M1
Predictive Model for Loan Approval Process using SAS 9.3_M1Akanksha Jain
 
Civitas Learning: Understanding ROC Curves
Civitas Learning: Understanding ROC CurvesCivitas Learning: Understanding ROC Curves
Civitas Learning: Understanding ROC CurvesKristen Hunter
 

Viewers also liked (16)

Model Performance Monitoring and Back-Testing as a Business and Risk Manageme...
Model Performance Monitoring and Back-Testing as a Business and Risk Manageme...Model Performance Monitoring and Back-Testing as a Business and Risk Manageme...
Model Performance Monitoring and Back-Testing as a Business and Risk Manageme...
 
A Classification Problem of Credit Risk Rating Investigated and Solved by Opt...
A Classification Problem of Credit Risk Rating Investigated and Solved by Opt...A Classification Problem of Credit Risk Rating Investigated and Solved by Opt...
A Classification Problem of Credit Risk Rating Investigated and Solved by Opt...
 
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
 
Gini Slides
Gini SlidesGini Slides
Gini Slides
 
Roc
RocRoc
Roc
 
How to read a receiver operating characteritic (ROC) curve
How to read a receiver operating characteritic (ROC) curveHow to read a receiver operating characteritic (ROC) curve
How to read a receiver operating characteritic (ROC) curve
 
Gini coefficient
Gini coefficientGini coefficient
Gini coefficient
 
Credit Scoring
Credit ScoringCredit Scoring
Credit Scoring
 
Gini coefficient vs economic growth
Gini coefficient vs economic growthGini coefficient vs economic growth
Gini coefficient vs economic growth
 
Credit scoring
Credit scoringCredit scoring
Credit scoring
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Model building in credit card and loan approval
Model building in credit card and loan approval Model building in credit card and loan approval
Model building in credit card and loan approval
 
Income inequality
Income inequalityIncome inequality
Income inequality
 
Predictive Model for Loan Approval Process using SAS 9.3_M1
Predictive Model for Loan Approval Process using SAS 9.3_M1Predictive Model for Loan Approval Process using SAS 9.3_M1
Predictive Model for Loan Approval Process using SAS 9.3_M1
 
Income inequality
Income inequalityIncome inequality
Income inequality
 
Civitas Learning: Understanding ROC Curves
Civitas Learning: Understanding ROC CurvesCivitas Learning: Understanding ROC Curves
Civitas Learning: Understanding ROC Curves
 

Similar to Population Stability Index(PSI) for Big Data World

State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023RTTS
 
Samuel Benin - 2022 ReSAKSS Conference Presentation (Side Event)
Samuel Benin - 2022 ReSAKSS Conference Presentation (Side Event)Samuel Benin - 2022 ReSAKSS Conference Presentation (Side Event)
Samuel Benin - 2022 ReSAKSS Conference Presentation (Side Event)AKADEMIYA2063
 
CIAB Febraban - Michael Wagner
CIAB Febraban - Michael Wagner CIAB Febraban - Michael Wagner
CIAB Febraban - Michael Wagner CNseg
 
Quality Control PowerPoint Presentation Slides
Quality Control PowerPoint Presentation Slides Quality Control PowerPoint Presentation Slides
Quality Control PowerPoint Presentation Slides SlideTeam
 
Bad Data is Polluting Big Data
Bad Data is Polluting Big DataBad Data is Polluting Big Data
Bad Data is Polluting Big DataStreamsets Inc.
 
JDO 2019: Data Science for Developers - Matthew Renze
JDO 2019: Data Science for Developers -  Matthew RenzeJDO 2019: Data Science for Developers -  Matthew Renze
JDO 2019: Data Science for Developers - Matthew RenzePROIDEA
 
Hadoop World 2011: The State of Big Data Adoption in the Enterprise - Tony Ba...
Hadoop World 2011: The State of Big Data Adoption in the Enterprise - Tony Ba...Hadoop World 2011: The State of Big Data Adoption in the Enterprise - Tony Ba...
Hadoop World 2011: The State of Big Data Adoption in the Enterprise - Tony Ba...Cloudera, Inc.
 
Hedge Fund IT Challenges Financial Survey
Hedge Fund IT Challenges Financial SurveyHedge Fund IT Challenges Financial Survey
Hedge Fund IT Challenges Financial SurveyAvere Systems
 
Business Case for leveraging Machine Learning (ML) to Validate Data Lake.pdf
Business Case for leveraging Machine Learning (ML) to Validate Data Lake.pdfBusiness Case for leveraging Machine Learning (ML) to Validate Data Lake.pdf
Business Case for leveraging Machine Learning (ML) to Validate Data Lake.pdfarifulislam946965
 
Adapting data warehouse architecture to benefit from agile methodologies
Adapting data warehouse architecture to benefit from agile methodologiesAdapting data warehouse architecture to benefit from agile methodologies
Adapting data warehouse architecture to benefit from agile methodologiesTom Breur
 
IRJET- Testing Improvement in Business Intelligence Area
IRJET- Testing Improvement in Business Intelligence AreaIRJET- Testing Improvement in Business Intelligence Area
IRJET- Testing Improvement in Business Intelligence AreaIRJET Journal
 
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...Big Data Week
 
Road to database automation: database source control
Road to database automation: database source controlRoad to database automation: database source control
Road to database automation: database source controlEduardo Piairo
 
Deliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL TestingDeliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL TestingCognizant
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language ProcessingYunyao Li
 
The Hive Data Virtualization Introduction - Sanjay Krishnamurti, Chief Archit...
The Hive Data Virtualization Introduction - Sanjay Krishnamurti, Chief Archit...The Hive Data Virtualization Introduction - Sanjay Krishnamurti, Chief Archit...
The Hive Data Virtualization Introduction - Sanjay Krishnamurti, Chief Archit...The Hive
 

Similar to Population Stability Index(PSI) for Big Data World (20)

State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023
 
Data quality
Data qualityData quality
Data quality
 
Data quality
Data qualityData quality
Data quality
 
Samuel Benin - 2022 ReSAKSS Conference Presentation (Side Event)
Samuel Benin - 2022 ReSAKSS Conference Presentation (Side Event)Samuel Benin - 2022 ReSAKSS Conference Presentation (Side Event)
Samuel Benin - 2022 ReSAKSS Conference Presentation (Side Event)
 
CIAB Febraban - Michael Wagner
CIAB Febraban - Michael Wagner CIAB Febraban - Michael Wagner
CIAB Febraban - Michael Wagner
 
Quality Control PowerPoint Presentation Slides
Quality Control PowerPoint Presentation Slides Quality Control PowerPoint Presentation Slides
Quality Control PowerPoint Presentation Slides
 
Batch Process Analytics
Batch Process Analytics Batch Process Analytics
Batch Process Analytics
 
Bad Data is Polluting Big Data
Bad Data is Polluting Big DataBad Data is Polluting Big Data
Bad Data is Polluting Big Data
 
JDO 2019: Data Science for Developers - Matthew Renze
JDO 2019: Data Science for Developers -  Matthew RenzeJDO 2019: Data Science for Developers -  Matthew Renze
JDO 2019: Data Science for Developers - Matthew Renze
 
IS_02.Understand
IS_02.UnderstandIS_02.Understand
IS_02.Understand
 
Hadoop World 2011: The State of Big Data Adoption in the Enterprise - Tony Ba...
Hadoop World 2011: The State of Big Data Adoption in the Enterprise - Tony Ba...Hadoop World 2011: The State of Big Data Adoption in the Enterprise - Tony Ba...
Hadoop World 2011: The State of Big Data Adoption in the Enterprise - Tony Ba...
 
Hedge Fund IT Challenges Financial Survey
Hedge Fund IT Challenges Financial SurveyHedge Fund IT Challenges Financial Survey
Hedge Fund IT Challenges Financial Survey
 
Business Case for leveraging Machine Learning (ML) to Validate Data Lake.pdf
Business Case for leveraging Machine Learning (ML) to Validate Data Lake.pdfBusiness Case for leveraging Machine Learning (ML) to Validate Data Lake.pdf
Business Case for leveraging Machine Learning (ML) to Validate Data Lake.pdf
 
Adapting data warehouse architecture to benefit from agile methodologies
Adapting data warehouse architecture to benefit from agile methodologiesAdapting data warehouse architecture to benefit from agile methodologies
Adapting data warehouse architecture to benefit from agile methodologies
 
IRJET- Testing Improvement in Business Intelligence Area
IRJET- Testing Improvement in Business Intelligence AreaIRJET- Testing Improvement in Business Intelligence Area
IRJET- Testing Improvement in Business Intelligence Area
 
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
 
Road to database automation: database source control
Road to database automation: database source controlRoad to database automation: database source control
Road to database automation: database source control
 
Deliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL TestingDeliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL Testing
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
 
The Hive Data Virtualization Introduction - Sanjay Krishnamurti, Chief Archit...
The Hive Data Virtualization Introduction - Sanjay Krishnamurti, Chief Archit...The Hive Data Virtualization Introduction - Sanjay Krishnamurti, Chief Archit...
The Hive Data Virtualization Introduction - Sanjay Krishnamurti, Chief Archit...
 

Recently uploaded

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 

Recently uploaded (20)

Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 

Population Stability Index(PSI) for Big Data World

  • 1. Population Stability Index(PSI) How to apply PSI, a statistics that’s widely used for scorecard validations, to a Big Data problem. A presentation for American Statistical Association, Orange County, CA Chapter. By JEOMOAN KURIAN Director-Risk Management Analytics, Mitsubishi UFJ Union Bank Jeo.kurian@gmail.com 1
  • 2. Let’s start with a super model • A logistics regression model that predicts the probability of defaults. • At least 3 versions are now active:Ver2 &Ver8 are most common. Ver9 is the most recent. What tells us the model needs a revision? 800 2
  • 3. Front End Validation-Oversimplified (1) (2) (3) (4) (5) (6) (7) (8) (9) FICO Range # Dev. Sample # Recent Sample % Dev. Sample % Recent Sample Change (5) - (4) Ratio (5) / (4) WoE Ln (7) PSI Portion (6)*(7) <500 4000 2000 17.2% 11.8% -5.4% 0.687 -0.376 0.020 500-620 2331 1200 10.0% 7.1% -2.9% 0.707 -0.347 0.010 621-660 2448 500 10.5% 2.9% -7.6% 0.280 -1.271 0.096 661-700 2614 3000 11.2% 17.7% 6.5% 1.576 0.455 0.029 701-740 2916 2700 12.5% 15.9% 3.4% 1.271 0.240 0.008 740-780 2241 1900 9.6% 11.2% 1.6% 1.164 0.152 0.002 781-820 2664 2400 11.4% 14.1% 2.7% 1.237 0.213 0.006 820+ 4086 3269 17.5% 19.3% 1.7% 1.098 0.094 0.002 TOTAL 23,300 16,969 100% 100% 0.154PSI (Sum of Column 9) = 0.0% 5.0% 10.0% 15.0% 20.0% 25.0% <500 500-620 621-660 661-700 701-740 740-780 781-820 820+ %Dev Sample(Expected) % Recent Sample Population Stability Index 0% 8% 16% 24% 32% 40% 0-10% 11-20% 21-30% 31-40% 41-50% 51+% Debt to Income Ratio Distribution % Dev. Sample % Recent Sample Characteristics Analysis: Let’s look at one of the explanatory variables that caused the change. Development sample had more people with high debt levels. May be a recession effect. How PSI helps? • Early indicator that something changed compared to a baseline. • A statistic that represent a set of data. • 0.1 or Less: Little or no change • 0.1 to 0.25: Some changes that require close monitoring. • 0.25 or higher: A major shift that require review. 3
  • 4. Marketing Analytics Big Data Question How to automatically validate a file’s content and detect a bad file? Clickstreams Weblogs Social Media With multiple marketing channels and disparate data sources, the data scene is messy. Text/XML files are often delivered bad. Direct Mail/email ERP/In-store CRM/Online Vendor Files FILE SOURCE ETL/STAGING STORAGE/HADOOP Big Challenge: How to validate each input file for completeness ? File structure is intact but content is not what’s expected. • Bad data impact model outcomes and results in inefficient processes(Channel attribution and subsequent spending). • Its expensive to clean up the data at a later point. PSI ? • Display ad channel was 20% last lime but dropped to 2% this time. ETL does not detect this as a technical problem. • In-store sale dropped to 20% as compared to 70% last month. • Reversal of trend is difficult to detect while loading data but its important to review such instances before it’s loaded. 4
  • 5. 0% 10% 20% 30% 40% 50% Adwords BingAds Display Flash Ads Retarget Video Web Channel Categories % Lat 3 Months % Recent Month File validation using PSI: Advertisement channels Population Stability Index No records received from Adwords sub-channel. This need a review before we proceed to data loading step. So how PSI helped? • Set a threshold, say 0.25, to trigger a possible data issue review. • Provides a statistic to evaluate the content quality and compare with previous months. • Every significant variance from the expectation will lead to a higher PSI number. • A moving average benchmark will self adjust gradual migration from one channel to another. • A configurable benchmark will help to handle the expected scenarios . Say no email channel expected this month. (1) (2) (3) (4) (5) (6) (7) (8) (9) Channel # records Previous Three Months # records Recent Month %Prev. Three Months % Recent Month Change (5) - (4) Ratio (5) / (4) WoE Ln (7) PSI Portion (6)*(7) Social 8,000 2,000 10.1% 12.8% 2.7% 1.266 0.236 0.006 Web 24,000 1,600 30.4% 10.3% -20.1% 0.338 -1.086 0.219 Email 4,000 1,000 5.1% 6.4% 1.3% 1.266 0.236 0.003 Print 3,000 1,000 3.8% 6.4% 2.6% 1.688 0.524 0.014 Instore 40,000 10,000 50.6% 64.1% 13.5% 1.266 0.236 0.032 TOTAL 79,000 15,600 100% 100% 0.267PSI (Sum of Column 9) = 0.0% 14.0% 28.0% 42.0% 56.0% 70.0% Social Web Email Print Instore %Prev 3 Months(Expected) % Recent Month Characteristics Analysis: Let’s look at what in web channel caused the change. 0 5