0
Rapid Model Refresh (RMR)in Online Fraud Detection Engine<br />SAS and all other SAS Institute Inc. product or service nam...
Agenda<br />Overview<br /><ul><li>Traditional Tactics Fighting Fraud
Best Practice in PayPal Fraud Detection
Rapid Model Refresh (RMR)
Extensions and Future</li></li></ul><li>Online Fraud in Financial Services<br />Evolution in Financial Services<br /><ul><...
In-Branch
Perceptible Footprint</li></ul>… …<br /><ul><li>Electronic
Cyber Spaces
Invisible Marketplace</li></ul>… …<br /><ul><li>Emerging Fraud Trends
Old-Fashion
Isolated Individual
Limited-Scope Damage
Traceable Patterns</li></ul>… …<br /><ul><li>Tech-Savvy
Organized Gang
Multi-Billion Loss
Dynamic Trends</li></ul>… …<br />
Industry Fact<br />Online Revenue Loss Due to Fraud<br />Source: Cybersource<br />
Agenda<br />Objectives<br /><ul><li>Traditional Tactics Fighting Fraud
Best Practice in PayPal Fraud Detection
Rapid Model Refresh (RMR)
Extensions and Future</li></li></ul><li>Traditional Mitigation Tactics<br />Heuristic Approach<br /><ul><li>Detect Anomalies
Identify Patterns
Set Review Criterion
Model-Based Score
Rely on Statistical Models (Logit Models / Neural Nets)
Generate Suspicion Score
Rank Order Transactions
Rule-Based System
Upcoming SlideShare
Loading in...5
×

Rapid Model Refresh (RMR) in Online Fraud Detection Engine

2,207

Published on

Rapid Model Refresh (RMR) in Online Fraud Detection Engine

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,207
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
89
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "Rapid Model Refresh (RMR) in Online Fraud Detection Engine"

  1. 1. Rapid Model Refresh (RMR)in Online Fraud Detection Engine<br />SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2010 SAS Institute Inc. All rights reserved. S55547.0410<br />
  2. 2. Agenda<br />Overview<br /><ul><li>Traditional Tactics Fighting Fraud
  3. 3. Best Practice in PayPal Fraud Detection
  4. 4. Rapid Model Refresh (RMR)
  5. 5. Extensions and Future</li></li></ul><li>Online Fraud in Financial Services<br />Evolution in Financial Services<br /><ul><li>Paper-Based
  6. 6. In-Branch
  7. 7. Perceptible Footprint</li></ul>… …<br /><ul><li>Electronic
  8. 8. Cyber Spaces
  9. 9. Invisible Marketplace</li></ul>… …<br /><ul><li>Emerging Fraud Trends
  10. 10. Old-Fashion
  11. 11. Isolated Individual
  12. 12. Limited-Scope Damage
  13. 13. Traceable Patterns</li></ul>… …<br /><ul><li>Tech-Savvy
  14. 14. Organized Gang
  15. 15. Multi-Billion Loss
  16. 16. Dynamic Trends</li></ul>… …<br />
  17. 17. Industry Fact<br />Online Revenue Loss Due to Fraud<br />Source: Cybersource<br />
  18. 18. Agenda<br />Objectives<br /><ul><li>Traditional Tactics Fighting Fraud
  19. 19. Best Practice in PayPal Fraud Detection
  20. 20. Rapid Model Refresh (RMR)
  21. 21. Extensions and Future</li></li></ul><li>Traditional Mitigation Tactics<br />Heuristic Approach<br /><ul><li>Detect Anomalies
  22. 22. Identify Patterns
  23. 23. Set Review Criterion
  24. 24. Model-Based Score
  25. 25. Rely on Statistical Models (Logit Models / Neural Nets)
  26. 26. Generate Suspicion Score
  27. 27. Rank Order Transactions
  28. 28. Rule-Based System
  29. 29. Employ Machine Learning Algorithms
  30. 30. Generate Rule Sets for Segmentation
  31. 31. Target High-Risk Segments</li></li></ul><li>Pros and Cons<br />Heuristic<br /><ul><li>Review-Based & Labor Intensive
  32. 32. Local Solutions without Global View
  33. 33. Integrate Domain Knowledge
  34. 34. Easy to Implement
  35. 35. Scoring
  36. 36. Long Time-to-Market
  37. 37. Static perspective of Fraud Trends
  38. 38. Successful Industrial Applications
  39. 39. Ideal for Large-Scale Domains
  40. 40. Rule-Induction
  41. 41. Require Frequent Refreshes
  42. 42. Burden of High-Volume Rules
  43. 43. Fits Dynamic Online Nature
  44. 44. Rapid Development & Deployment</li></li></ul><li>Next … …<br />Now What?<br />
  45. 45. Agenda<br />Objectives<br /><ul><li>Traditional Tactics Fighting Fraud
  46. 46. Best Practices in PayPal Fraud Detection
  47. 47. Rapid Model Refresh (RMR)
  48. 48. Extensions and Future</li></li></ul><li>PayPal's Way to Fight Frauds<br />PayPal Loss Trend from 200X through 200Y<br />
  49. 49. Multi-Level Detection Engine<br />Agent Review<br />Risk Scoring<br />Rule Induction<br /><ul><li>Analysts built decision trees on high-risk transactions ranked order by risk scoring.
  50. 50. Most risky segments are further identified by balancing between bad and pass-through rate.
  51. 51. Modelers developed scoring models with logistic regression / neural network
  52. 52. Risk score is assigned to each transaction through the system.
  53. 53. Low-risk transactions will be passed through.
  54. 54. Most risky transactions identified by rule sets are sent into review queues.
  55. 55. Queued transactions are prioritized and routed to agents in specific domains.
  56. 56. Case review and investigation are conducted.</li></li></ul><li>Implementation Challenges<br />Problems<br />Realities<br />Fast-Growing International Footprint<br />Overwhelming Number of Segments & Models<br />Extremely Rich Data from Diversified Sources<br />Information Overload instead of Data Mining<br />Ever-Complicated IT Infrastructure<br />High Exposures to System Risks<br />Dynamic Fraud Trends & Smarter Fraudsters<br />Escalating Model Decay & Deterioration<br />
  57. 57. Data-Driven Model (DDM) Strategy<br />Dynamic Rule Induction<br />Automatic Model Development<br />Real-Time Deployment<br />Conceptual <br />DDM<br />Modular Data Processing<br />Daily Monitoring<br />Implemented by<br />Rapid Model Refresh (RMR)<br />
  58. 58. Agenda<br />Objectives<br /><ul><li>Traditional Tactics Fighting Fraud
  59. 59. Best Practice in PayPal Fraud Detection
  60. 60. Rapid Model Refresh (RMR)
  61. 61. Extensions and Future</li></li></ul><li>What’s RMR?<br />Three Common Layers<br />Data<br />Layer<br />Deployment<br />Layer<br />Algorithm<br />Layer<br /><ul><li>Packaged Processing
  62. 62. Optimized Queries
  63. 63. Repeatable Stream
  64. 64. Arbitrary Models
  65. 65. Standard Evaluation
  66. 66. Version Controlled
  67. 67. Model Specs. to XML
  68. 68. Deploy in Real-Time
  69. 69. Batched Monitor</li></li></ul><li>RMR – Data Layer<br />Model Development SAS Data<br />Fine<br />Layer<br />SAS as Wrapper around Shell / SED / BTEQ Scripts<br />Modular SAS Macros & Parameterized Scripts<br />Variables Creation / Imputation / Transformation<br />Web Logs<br />Enterprise Database<br />3rd-Party Sources<br />Coarse<br />Layer<br />
  70. 70. Data Layer at A Glance<br /><ul><li>Data Manipulation
  71. 71. Variable Transformation</li></ul>20+ SAS Macros <br /><ul><li>Update Parameters in Scripts</li></ul>SED Stream Editor<br />SAS Workflow<br /><ul><li>Create Dynamic SQL
  72. 72. Parallel Execution</li></ul>Shell Scripts<br /><ul><li>Submit SQL</li></ul>BTEQ Interface with Teradata<br />
  73. 73. Code Snippet in Data Layer<br />1<br />2<br />3<br />Use SED update parameters in the query<br />Submit the query to Teradata through BTEQ<br />Append the log into a output file<br />
  74. 74. RMR – Algorithm Layer<br />Best Models to Production<br />Swap Analysis for Rule Sets<br />Model Evaluation (KS / AUC / … )<br />Stump<br /><ul><li>Exhaustive Search for Best Cutoffs</li></ul>Bumping <br /><ul><li>Stochastic Search for Best Tree(s)</li></ul>Champion<br /><ul><li>Generalized Linear Model</li></ul>Arbitrary Challengers<br /><ul><li>Neural Nets
  75. 75. Bagging Trees</li></ul>… …<br />Supported by SAS / STAT & SAS / Enterprise Miner<br />
  76. 76. A Peek into Algorithm Layer<br />25% Validation<br />25% Testing<br />WoE Vars<br />GLM<br />SASEvaluation Macros<br />SAS EDA Macros<br />Best Model<br />50% Training<br />NNET<br />Bagging<br />Binned Vars<br />… …<br />TreeX<br />Tree2<br />
  77. 77. One Tree, Endless Possibilities<br />Use Cases of Decision Tree in RMR’s View<br /><ul><li>Bagging
  78. 78. Simple Average of Massive Number of Trees
  79. 79. Take Advantages of RMR Deployment Layer and Parallel Computing
  80. 80. Use as A Challenger to Traditional Logistic Regression
  81. 81. Bumping
  82. 82. Stochastic Search from Massive Number of Trees
  83. 83. Improve Estimation while Retain Simple Tree Structure
  84. 84. Use to Enhance Vallina-Version Tree Development
  85. 85. Stump
  86. 86. Exhaustive Search on 1-Dimension Space, e.g. Score
  87. 87. Induce 1-Level Binary Tree by Minimizing Gini Impurity
  88. 88. Use to Find the Best Score Cutoff while Balancing Review Rate</li></li></ul><li>Pick Winner from Multiple Candidates<br />Generically Support Arbitrary Number of Score Inputs for Massive Models Evaluation and Deployment<br />
  89. 89. RMR – Deployment Layer<br />Model Specifications<br />Perl<br />Convert to XML / PMML<br />Inject into Web Engine<br />Collect Web Logs in DB<br />SAS<br />Monitor Daily Scoring Stability<br />Shell<br />Email Reports to Stakeholders<br />
  90. 90. A Use Case: Score Monitoring<br />Objectives:<br /><ul><li>System Breakage
  91. 91. Score Shift</li></ul>Driver Table<br />Log Table<br />Lookup Tables<br />Daily Web Log<br />Baseline Distribution<br />Model / Segment / Owner Lookups<br />SAS Daily Job Scheduled by Cron<br />Population Stability Reports in Html<br />
  92. 92. Sample Reports<br />Overall<br />Detailed<br />
  93. 93. Agenda<br />Objectives<br /><ul><li>Traditional Tactics Fighting Fraud
  94. 94. Best Practice in PayPal Fraud Detection
  95. 95. Rapid Model Refresh (RMR)
  96. 96. Extensions and Future</li></li></ul><li>Evolution of RMR Paradigm<br />Future<br />Past<br />Now<br />Expert Process<br /><ul><li>Programmers Pull Data
  97. 97. Statisticians Build Predictive Model
  98. 98. Engineers Hard-Code Specification into On-Line Environment
  99. 99. Meets Minimum Benefit Schedule.</li></ul>Mechanized Process<br /><ul><li>Population and Performance Criterion Identified
  100. 100. A Suite of Challenger Models Built Automatically
  101. 101. Model Specifications Published in Live Scoring Platform
  102. 102. New Models Deployed in Periodic Batch</li></ul>Online Process<br /><ul><li>Models Developed & Deployed with Most Recent Online Data Dynamically
  103. 103. Re-deployment of New Models not Needed</li></li></ul><li>2-Path Directions<br />SAS / Teradata in-DB Analytics<br />Hadoop with R Integration<br />
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×