Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Jean-François Puget, Distinguished Engineer, Machine Learning and Optimization, IBM at MLconf SF 2016


Published on

Why Machine Learning Algorithms Fall Short (And What You Can Do About It): Many think that machine learning is all about the algorithms. Want a self-learning system? Get your data, start coding or hire a PhD that will build you a model that will stand the test of time. Of course we know that this is not enough. Models degrade over time, algorithms that work great on yesterday’s data may not be the best option, new data sources and types are made available. In short, your self-learning system may not be learning anything at all. In this session, we will examine how to overcome challenges in creating self-learning systems that perform better and are built to stand the test of time. We will show how to apply mathematical optimization algorithms that often prove superior to local optimization methods favored by typical machine learning applications and discuss why these methods can crate better results. We will also examine the role of smart automation in the context of machine learning and how smart automation can create self-learning systems that are built to last.

Published in: Technology
  • Be the first to comment

Jean-François Puget, Distinguished Engineer, Machine Learning and Optimization, IBM at MLconf SF 2016

  1. 1. © 2016 IBM CorporationIBM Confidential From ML Algorithms To Learning Machines (+ Optimization) Jean-François Puget 11/11/2016 @JFPuget
  2. 2. © 2016 IBM Corporation. IBM Confidential2 • 25 years ago, academic topic• The Machine Learning Workflow Data ML algorithm ? publication
  3. 3. © 2016 IBM Corporation. IBM Confidential3 • Perception now• The Machine Learning Workflow Data ??? ML Algorithm ??? $$$
  4. 4. © 2016 IBM Corporation. IBM Confidential4 • Simple!• The Machine Learning Workflow Data Data Scientist ML Algorithm Model $$$ R, Sklearn, Spark ML, Deep Learning, GBM (xgboost), vw, H2O, …
  5. 5. © 2016 IBM Corporation. IBM Confidential5 • Focus on missing pieces• The Machine Learning Workflow Data ??? ML Algorithm ??? $$$
  6. 6. © 2016 IBM Corporation. IBM Confidential6 • Not that simple• The Machine Learning Workflow Data Data Prep ML Algo Model Deploy Predict $$$
  7. 7. © 2016 IBM Corporation. IBM Confidential7 The gap between data scientists and operations is incredible
  8. 8. © 2016 IBM Corporation. IBM Confidential8 AlgorithmData prep Data prem Scoring Labeled examples Training Scoring New data Model Model Predicted data Deploy Dev Ops For each ML toolkit we need model serialization + scalable scoring engine We are building that for Spark ML
  9. 9. © 2016 IBM Corporation. IBM Confidential9 • Not that simple• The Machine Learning Workflow Data Data Prep ML Algo Model Deploy Predict $$$
  10. 10. © 2016 IBM Corporation Cognitive Assistant for Data Scientists • Objective: • Bring automation into key areas of large-scale data analysis tasks • Overcome “analytic decision overload” for Data Scientists • Current CADS System • Automated selection, composition, configuration, training, and deployment of modeling pipelines for supervised data mining tasks that leverages: • AI/Learning and Planning based principled exploration of analytic choices • Cross-platform analytic deployments (e.g., R, Spark, Python, SPSS) on Big Data platforms  Cloud • What is next…. • Automation of more parts of the Data Scientists workflow (e.g. automated feature engineering) • Extend for other problems, data types, scale and user requirements (e.g., unstructured data, Deep Learning) • Self-Learning andAdaptation • Build first-ever conversational data science system with CADS +Watson QA IBM Research10
  11. 11. © 2016 IBM Corporation. IBM Confidential11 SystemML 11 IBM Research Hadoop or Spark Cluster (scale-out) In-Memory Single Node (scale-up) Runtime Compiler Language DML Scripts DML (Declarative Machine Learning Language) since 2010 since 2012 since 2015 Linear Regression Conjugate Gradient
  12. 12. © 2016 IBM Corporation. IBM Confidential12 • Pain points• The Machine Learning Workflow Data Data Prep ML Algo Model Deploy Predict $$$
  13. 13. © 2016 IBM Corporation. IBM Confidential13 • Feedback loop• The Machine Learning Workflow Data Data Prep ML Algo Model Deploy Predict $$$ Prediction acuracy monitoring: Collect predictions vs actuals
  14. 14. © 2016 IBM Corporation. IBM Confidential14 Cognitive = Natural language processing + Machine Learning + … What about Watson and cognitive computing ?
  15. 15. © 2016 IBM Corporation. IBM Confidential15 Machine Learning and Mathematical Optimization  Most ML algorithms solve an optimization problem: find paramaters for a given model family that minimize  Loss function (prediction error)  Model simplicity (regularization)  Optimization algorithms: local methods  Stochastic gradient descent, conjugate gradient, LBFGS, …  Scale to large number of examples  Embarrassingly parallel  Can be stuck in local minima  Hard time coping with additional constraints on the optimization problem  Mathematical optimization (e.g. CPLEX)  Can find global optimum  Can deal with constraints, eg L0 norm  Limited in scale
  16. 16. © 2016 IBM Corporation. IBM Confidential16 Classical ML Algorithms implemented with mathematical optimization models  Linear models: LASSO, Ridge Classifier, Elastic Net, Hinge loss, Hinge-squared loss  Support Vector Machines: Primal, Dual linear, Dual RBF, Hinge models  Decision Forests: Decision trees vote (preliminary work)  Multi-label problems: Using 1-vs-rest method  Alternating Least Squares: Application to Collaborative Filtering (recommendations) LASSO
  17. 17. © 2016 IBM Corporation. IBM Confidential17 Compressive Sensing  Image reconstruction with and without bounds on the pixel value Original Lasso (sklearn) Constrained Lasso (CPLEX) Distribution of pixel values
  18. 18. © 2016 IBM Corporation. IBM Confidential18 Matrix factorization Used in recommendation systems User profiles x movie profiles = observed interactions
  19. 19. © 2016 IBM Corporation. IBM Confidential19 Aternating Least Square with additional constraints (Hugues Juille)
  20. 20. © 2016 IBM Corporation. IBM Confidential20 References  IBM Watson Machine Learning:  System ML:  CADS: ICML 2014  CPLEX-learn Contributors: Jean-Francois Puget, Paul Shaw, Vincent Beraudier, Pierre Bonami, Daniel Junglas, Hugues Juille, Renaud Dumeur, Viu Long Kong, Philippe Couronne