Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

From Zero to Production Dataiku Meetup Berlin

21 views

Published on

With an eye on gaining a competitive edge in the marketplace, banks intend to drive customer engagement with data analytics. By analyzing their clients' economic activities, it is possible to detect patterns and behavior to offer personalized, tailor-made financial products to enhance customer satisfaction. However, the challenge lies in the banks' legacy systems, which can impede the ability to unlock the value of the data they already hold. In his talk, Marco presented a real-world example of predictive modeling in banking. He highlighted the used modeling practice and provided practical advice on the deployment process in a production banking environment. Additionally, Marco explored some best practice techniques on how to tackle data science projects.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

From Zero to Production Dataiku Meetup Berlin

  1. 1. From Zero to Production Deploying Machine Learning Models in a Legacy Banking Environment
  2. 2. From Zero to Production 22.01.2019 Why are you here? • You can‘t believe Sparkasse banks are in Data Analytics (topic reserved for sexy fintechs and software companies) • You are curious about the words „machine learning“ (ML) and „production“ • You are hoping to find the holy grail for your ML and production problems Dataiku Meetup https://commons.wikimedia.org/wiki/File:Holy-grail-round-table-bnf-ms_fr-116F-f610v-15th-detail.jpg Evrard d'Espinques [Public domain], via Wikimedia Commons
  3. 3. From Zero to Production 22.01.2019 What does „production“ mean anyway? Dataiku Meetup https://stackoverflow.com/questions/490289/what-exactly-defines-production
  4. 4. From Zero to Production 22.01.2019 S Rating und Risikosysteme GmbH (SR): We Data Analytics • Founded 2004 with a focus on providing market, regulatory, operational and credit risk frameworks • > 250 employees • Team Data Analytics started 1.5 years ago • Quantitative folks and product managers (20 folks in total) • > 30 machine learning models in production • „Made in Berlin“ (Spittelmarkt) Dataiku Meetup https://www.berliner-sparkasse.de/de/home/200jahre.html?n=true
  5. 5. From Zero to Production 22.01.2019 Savings Banks Finance Group (SFG) • 383 independent Sparkasse commercial/retail banks • Decentralized structure (regional principle) • Central IT service partner (Finanz Informatik) • OneSystemPlus = core banking system for all institutions • S Rating und Risikosysteme GmbH central Data Analytics partner Dataiku Meetup
  6. 6. From Zero to Production 22.01.2019 The SFG (decentralized) data treasure chest • 50 Mio. customers • 118 Mio. banking accounts • 2.1 Bn. online banking visits (per year) • 114 Bn. payment transactions (per year) Dataiku Meetup
  7. 7. From Zero to Production 22.01.2019 Example use cases of ML in Banking and Financial Services Customer Experience Operational Efficiency Sales and Marketing Risk and Fraud • Chat-bots and robo- advisors • Natural Language Processing (NLP) to decipher call logs and customer feedback • Optimizing operational expenses such as call center staff and tellers • Optimizing sales and marketing expenses • Optimizing operational efficiency Dataiku Meetup
  8. 8. From Zero to Production 22.01.2019 Getting more with your score Preparation • What is your target group? Expert advice • Target group based on expert knowledge Data Analytics • Target group based on predictive analytics Age 18-35 Age 35-75 Income 0-1000 € Income 1000-10000 € Product Score Dataiku Meetup
  9. 9. From Zero to Production 22.01.2019 A model data pipeline Structured Data Ingest Transform Model Deploy Dataiku Meetup
  10. 10. From Zero to Production 22.01.2019 Data Analytics closed loop Train model pipeline Serve request (Batch)Deploy models Monitor service Get feedback Update pipelines Prototyp & develop model pipelines Dataiku Meetup
  11. 11. From Zero to Production 22.01.2019 Challenges • I have time constraints – run fast enough • We need to play well with others: • other systems • other teams • Need to be robust and just work • Need to integrate into business processes • Does it increase profits? • Live ML doesn‘t always work the way I expect… Dataiku Meetup
  12. 12. From Zero to Production 22.01.2019 Working well with other teams and systems? (1/3) wallofconfusion AUC looks alright, hyperparameter tuned. Time to deploy! SR Data Scientist What the **** is alpha and beta? FI Mainframe + Java application developer Dev Ops + Dev Dataiku Meetup wallofconfusion Business Sparkasse teller I want to be there for my clients! Target variable what? Person icons made by monkik from www.flaticon.com
  13. 13. From Zero to Production 22.01.2019 Understand the business processes! (2/3) SR Data Scientist Sparkasse teller Business+Dev Dataiku Meetup • Business processes generate data, understand every single step • Work together on the „Ground truth“ (reality you want to predict) • Does it generalize? • Verify every sub-results with practioners • Lack of domain knowledge is a barrier you can overcome
  14. 14. From Zero to Production 22.01.2019 Understand the IT architecture! (3/3) SR Data Scientist FI Mainframe + Java application developer DevOps Deploy model parameter Scoring engine in SAS Ready for production, yeah! Dataiku Meetup
  15. 15. From Zero to Production 22.01.2019 101- Decision tree classifier (1/2) XGBoost: A Scalable Tree Boosting System Tianqi Chen, Conference Paper, 2016 Dataiku Meetup • Flowchart structure starting at root node • Simple IF-ELSE questions in child nodes • CART (classification and regression tree) algorithm uses binary trees
  16. 16. From Zero to Production 22.01.2019 101- Ensemble prediction (2/2) Tree 1 Tree 2 Tree … Score 1 Score 2 Score … Sum Score Dataiku Meetup
  17. 17. From Zero to Production 22.01.2019 Exporting model parameter Dataiku Meetup Tree 1 Tree 2 Tree … TREE_NR INPUT_VAR TREE_SPLT_VAR_NR TREE_SPLT_VALUE 1 Income 1 11.000 1 Age 2 45 1 Occupied 3 1 … … … … TREE_NR TREE_NODE_NR TREE_NODE_SCORE 1 1 0.00331848000000 1 2 -0.00174424000000 1 3 0.04362040000000 1 4 0.00302040000000 … … … Where do I need to split the input variable? Which score do I need to assign to each node?
  18. 18. From Zero to Production 22.01.2019 SAS score engine Dataiku Meetup Export model parameter as CSV file Import model parameter • Model parameter • Input data Give the model parameter and the input data for every customer and I tell you the score! Save the results, please!
  19. 19. From Zero to Production 22.01.2019 Monitoring requests in production Dataiku Meetup • AUC (area under the curve) in case some businees processes change (=drop in AUC) • Correlation between scores and input variables • Descriptive statistics (mean, max, min, count) of input variables • „Acid“ test: ratio of scores regarding target variable • Performance (scores/min)
  20. 20. From Zero to Production 22.01.2019 Wrapping things up Measure, measure and measure • Monitor every single step of your pipeline • Data quality is the holy grail Data Scientists = translators • Learn the „language“ (not only programming) of other teams • Build bridges • What business problem do you want to solve? Start your production pipeline simple • Understand the IT system architecture • Talk with your IT folks and business people Dataiku Meetup Only production code is good code • A Data Scientist should know programming principles • Performance counts in real world applications • Code quality beats model prediction quality to some extend
  21. 21. Data First Folks! Thanks for having me 22.01.2019 Dataiku Meetup Marco Bahrs, Data Scientist Get in touch with me via Disclaimer: This presentations is intended for educational purposes only and does not replace independent professional judgment. Statements of fact and opinions expressed are those of the participants individually and, unless expressly stated to the contrary, are not the opinion or position of the Sparkasse Rating and Risikosysteme GmbH or the Finanz Informatik. The Sparkasse Rating and Risikosysteme GmbH does not endorse or approve, and assumes no responsibility for, the content, accuracy or completeness of the information presented.
  22. 22. From Zero to Production 22.01.2019 Bonus material- Ensemble Dataiku Meetup XGBoost: A Scalable Tree Boosting System Tianqi Chen, Conference Paper, 2016 • Combining many weak learners (many trees = forest)
  23. 23. From Zero to Production 22.01.2019 Bonus material- Gradient Boosting training Age Balance Employed … Personal Loan 37 2560€ 1 1 29 1726€ 1 0 22 460€ 0 0 … … … … Tree 1 Tree 2 Probability 0,87 0,19 0,05 … Error -0,13 0,19 0,05 … Age Balance Employed … Error 37 2560€ 1 -0,13 29 1726€ 1 0,19 22 460€ 0 0,05 … … … … Prediction 0,17 0,24 0,08 … Error 0,30 0,05 0,03 … … Dataiku Meetup

×