Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Alex Korbonits, Data Scientist, Remitly, at MLconf Seattle 2017


Published on

Alex Korbonits is a Data Scientist at Remitly, Inc., where he works extensively on feature extraction and putting machine learning models into production. Outside of work, he loves Kaggle competitions, is diving deep into topological data analysis, and is exploring machine learning on GPUs. Alex is a graduate of the University of Chicago with degrees in Mathematics and Economics.

Abstract summary

Applications of machine learning and ensemble methods to risk rule optimization:
At Remitly, risk management involves a combination of manually created and curated risk rules as well as black-box inputs from machine learning models. Currently, domain experts manage risk rules in production using logical conjunctions of statements about input features. In order to scale this process, we’ve developed a tool and framework for risk rule optimization that generates risk rules from data and optimizes rule sets by ensembling rules from multiple models according to a particular objective function. In this talk, I will describe how we currently manage risk rules, how we learn rules from data, how we determine optimal rule sets, and the importance of smart input features extracted from complex machine learning models.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Alex Korbonits, Data Scientist, Remitly, at MLconf Seattle 2017

  1. 1. 19 May 2017 Applications of machine learning and ensemble methods to risk rule optimization Alex Korbonits, Data Scientist
  2. 2. 2 Introduction About Remitly and Me
  3. 3. 3 Introduction • Risk management and risk rules • Generating rules from machine learning models • Incremental rule ranking • Model ensembling • Rule inclusion/exclusion criteria • Why this matters to Remitly Agenda
  4. 4. A spectre is haunting risk management — the spectre of…
  5. 5. 8 Risk rules, how do they work? • Rules are typically managed via a GUI. Dropdown menus, etc. • Rules are logical conjunctions of expressions of input data, e.g.: (x < 10) AND (y > 20) AND (z < 100) • Rule conditions are based on transaction and customer attributes. • Collectively, all rules form a logical disjunction, e.g.: rule1 OR rule2 OR rule3 • When one rule triggers, we queue a transaction for review. • Easy to integrate rules we’ve learned from data into this framework. Risk management and risk rules
  6. 6. 9 FOILed again • FOIL (first order inductive learner) • Accepts binary features only • A rule is a simple conjunction of binary features • Learns rules via separate-and-conquer • Decision tree • Accepts continuous and categorical features • A single rule is a root-to-leaf path • Learns via divide-and-conquer Generating rules from machine learning models
  7. 7. 10 Separate-and-conquer • FOIL takes as its input sequences of features and a ground truth. We map all of our input features to a boolean space. • Different strategies for continuous features, e.g., binning. • FOIL learns Horn Clause programs from examples Implication form: (p ∧ q ∧ ... ∧ t) → u Disjunction form: ¬p ∨ ¬q ∨ ... ∨ ¬t ∨ u • Learns Horn Clause programs from positive class examples. • Examples are removed from training data at each step. • FOIL rules are simply lists of features. • We map rules we learn from FOIL into human-readable rules that we can implement in our risk rule management system. FOIL (First Order Inductive Learner)
  8. 8. 11 Divide-and-conquer • Decision trees are interpretable • A rule is a root-to-leaf path. • Like a FOIL rule, a decision tree rule is a conjunction. • Use DFS to extract all rules from a decision tree • Easy to evaluate in together with FOIL rules • Easily implementable in our risk rule management system Decision Trees
  9. 9. 12 SQL to the rescue • We synthesize hand-crafted rule performance with SQL • For each transaction, we know if a rule triggers or not. • We can use this to synthesize new handcrafted rules that aren’t yet in production. • We can derive precision/recall easily from this data. • We can rank productionized rules alone to look at rules we can immediately eliminate from production (i.e., remove redundancy). • We can rank productionized rules alone to establish a baseline level of performance for risk rule management. Synthesizing Production Rules
  10. 10. 13 You are the weakest rule, goodbye! • Today, there are hundreds of rules live in production. • A single decision tree or FOIL model can represent thousands of rules. • Can we find a strict subset of those rules that recalls the exact same amount of fraud? • First we measure the performance of each rule individually on a test set. • With each step, we get the (next) best rule and remove the fraud from our test set that our (next) best rule catches. • We repeat this process until our rules no longer catch any uncaught fraud, whereupon the process terminates. Incremental Rule Ranking
  11. 11. 14 Will it blend? • Ensembling rules gives us a lot of lift • We ensemble: • Synthesized production rules • FOIL rules • Decision tree rules • We rank a list of candidate rules from each model class. • Our output is a classifier of ensembled rules • We’re seeing 8% jump in recall and a 1% increase in precision Model ensembling
  12. 12. 15 To include or not to include, that is the question • Risk rule optimization is a constraint optimization problem • Optimal rule sets must satisfy business constraints • We must balance catching fraud with insulting customers • Constraints can be nonlinear, e.g., with tradeoffs between precision and recall. • With each ranking step, we evaluate the whole classifier • We include a rule when our classifier fits our criteria • We discard rules when our classifier violates our criteria Rule inclusion/exclusion criteria
  13. 13. 16 It’s a rule in a black-box! • The most informative rule features are derived from black box models. • Rules/lists of rules with these features as conditions is kind of model stacking • Risk rules limited to conjunctions, but inputs unlimited • Add more black box inputs to improve rules learned • Better black-box inputs reduce complexity of rules (i.e., they have fewer conditions) Black box input features
  14. 14. 17 How did we do this? • Redshift • Python • S3 • EC2 p2.xlarge with deep learning AMI • GPU instance gives us ~17x boost in training/inference time compared to laptop • TensorFlow/Keras • Scalding Technologies used
  15. 15. 18 Citing our sources Bibliography Fürnkranz, Johannes. "Separate-and-conquer rule learning." Artificial Intelligence Review 13, no. 1 (1999): 3-54. Mooney, Raymond J., and Mary Elaine Califf. "Induction of first-order decision lists: Results on learning the past tense of English verbs." JAIR 3 (1995): 1-24. Quinlan, J. Ross. "Induction of decision trees." Machine learning 1, no. 1 (1986): 81-106. Quinlan, J. Ross. "Learning logical definitions from relations." Machine learning 5, no. 3 (1990): 239-266. Quinlan, J. R. "Determinate literals in inductive logic programming." In Proceedings of the Eighth International Workshop on Machine Learning, pp. 442-446. 1991. Quinlan, J., and R. Cameron-Jones. "FOIL: A midterm report." In Machine Learning: ECML-93, pp. 1-20. Springer Berlin/Heidelberg, 1993. Quinlan, J. Ross, and R. Mike Cameron-Jones. "Induction of logic programs: FOIL and related systems." New Generation Computing 13, no. 3-4 (1995): 287-312. Quinlan, J. Ross. C4. 5: programs for machine learning. Elsevier, 2014.
  16. 16. 19 What we talked about • Risk management and risk rules • Generating rules from machine learning models • Incremental rule ranking • Model ensembling • Rule inclusion/exclusion criteria • Why this matters to Remitly Summary
  17. 17. 20 Remitly’s Data Science team uses ML for a variety of purposes. ML applications are core to our business – therefore our business must be core to our ML applications. Machine learning at Remitly
  18. 18. We’re hiring!