Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Predicting Loan Delinquency at One Million Transactions per Second


Published on

Real-time applications of predictive models must be able to generate predictions at the rate that transactions are generated. Previously, such applications of models trained using R needed to be converted to other languages like C++ or Java to achieve the required throughput. In this talk, I’ll describe how to use the in-database R processing capabilities of Microsoft R Server to detect fraud in a SQL Server database of loan records at a rate exceeding one million transactions per second. I will also show the process of training the underlying gradient-boosted tree model on a large training set using the out-of-memory algorithms of Microsoft R.

Published in: Technology
  • Be the first to comment

Predicting Loan Delinquency at One Million Transactions per Second

  1. 1. Predicting Loan Delinquency at 1M Transactions per Second David Smith @revodavid R Community Lead, Microsoft
  2. 2. 2 It looks like you’ve created a predictive model… NOW WHAT?
  3. 3. 3
  4. 4. Generating Predictions Batch Mode • Create many (millions!) of predictions at once • Time required proportional to number of predictions Real Time • Only a few (maybe only one!) data point available to predict – There may be multiple requests in a short timeframe • Latency the key metric here – Many applications require sub-second latency at endpoint 4
  5. 5. Real-Time Operationalization Options • Rewrite prediction code in some other language – PMML / C++ / Java / … • OR, use your R code: – Deploy as a web service with Microsoft R Server – Deploy as a stored procedure in SQL Server 5
  6. 6. Lending Club Loan Performance Data • – Feature selection and generation: 6 LoanStatNew Description all_util Balance to credit limit on all trades annual_inc_joint The combined self-reported annual income provided by the co-borrowers during registration dti_joint A ratio calculated using the co-borrowers' total monthly payments on the total debt obligations, excluding mortgages and the requested LC loan, divided by the co- borrowers' combined self-reported monthly income int_rate Interest Rate on the loan mths_since_last_record The number of months since the last public record. revol_util Revolving line utilization rate, or the amount of credit the borrower is using relative to all available revolving credit. total_rec_prncp Principal received to date is_bad (generated) Late > 16 days, Default, or Charged Off
  7. 7. Operationalization with Microsoft R Server Data Scientist Developer Integration Swagger API Service Consume with any programming language Deployment Publish R function into web services Configuration  Data Science Virtual Machine  Azure GS5 Instance  32 cores  448Gb RAM Microsoft R Server configured for operationalizing R analytics Microsoft R Client (mrsdeploy package) Quant Consumption Explore and consume services in R directly publishServiceMicrosoft R Client (mrsdeploy package) IT Administator
  8. 8. Flexible vs Real-Time Deployment Flexible Deployment Publish R as Web Service • Any R function or package • R interpreter runs on-demand in Swagger via REST API Real-Time Deployment Publish R model object • RevoScaleR or MicrosoftML models • Prediction engine generates scores from data via REST API 8 library(mrsdeploy) publishService( serviceType='Script', Code=<<R script or function>>) library(mrsdeploy) publishService( serviceType='RealTime', model=<<R object>>)
  9. 9. Real-Time Deployment Models Linear Regression (rxLinMod, rxFastLinear) Logistic Regression (rxLogit, rxLogisticRegression) Classification / Regression trees (rxDTree, rxFastTrees) Classification / Regression forests (rxDForest, rxFastForest) Stochastic gradient-boosted decision trees (rxBTrees) One-class Support Vector Machines (rxOneClassSvm) Convolutional Neural Networks (rxNeuralNet) Also: pre-trained models for text sentiment and image featurization 9
  10. 10. FLEXIBLE AND REAL-TIME SCORING WITH MICROSOFT R SERVER Demonstration Server: Azure Data Science Virtual Machine, Azure GS5 instance (32 cores, 448 GB memory) Client: SurfaceBook / Microsoft R Client 10
  11. 11. 11
  12. 12. 12
  13. 13. 13
  14. 14. 14
  15. 15. 15
  16. 16. Flexible vs Real-Time Performance Comparison Server: Standard_D3_v2 (4 CPU core, 14GB RAM), Windows 16 Algos Real time (ms) Flexible (ms) RxLogit (model size 2K) 3.5 39.2 RxNeuralNet (model size 8K) 2.5 122.0 Model Size Real time (ms) Flexible (ms) 2 MB (RxLogisticRegression) 5.0 9215.7 43 MB (RxLogisticRegression) 5.4 20255.6
  17. 17. sp_execute_external_script Flexible Deployment in SQL Server 2016 17 SQL SERVER 2016 Microsoft R Client (RevoScaleR package) rxSerializeObject sp_rxPredict Real-Time
  18. 18. 20 SQL Server 2017 8 sockets, 192 cores 6 TB RAM Flexible operationalization Flexible vs Real-Time 1M predictions/sec Same benchmark One-sixth the resources
  19. 19. Operationalization Overview Platform Flexible Operationalization • Any R Function / Package Real-Time Operationalization • Specific RevoScaleR / MicrosoftML models SQL Server EXEC sp_execute_external_script @language = N'R', @script = N'<<R script>>' EXEC sp_rxPredict @model=<<serialized R object>> @inputData=<<SQL query>> Microsoft R Server library(mrsdeploy) publishService( serviceType='Script', Code=<<R script or function>>) library(mrsdeploy) publishService( serviceType='RealTime', model=<<R object>>) 21 • Use Microsoft R Server 9+ or SQL Server 2016+ as the deployment server • Flexible Operationalization supports any R code / package • Real-Time Operationalization supports Microsoft R models with improved latency
  20. 20. Thank You! David Smith @revodavid R Community Lead, Microsoft Special thanks: Pratik Palnitkar, Microsoft Arun Gurunathan, Microsoft Download Microsoft R Client: Data Science Virtual Machine: