Optimization
A framework for predictive analytics
Matt Gattis, Hunch.com
Simple Optimization Problem
Machine Learning Example: Linear Regression
Machine Learning Anecdote: Clustering
Type Loss function
Hierarchical max/min/mean pairwise differences between observations in each cluster
K-means distance from each observation in a cluster to the centroid of the cluster
Hunch 1) number of K nearest neighbors not in the cluster
2) distance from the furthest observation in the cluster
SYNERGY
• Align your loss function with your product goals
• Add terms to your loss function for optimizing business
metrics
Make sure your product designers, business strategists, and
engineers are all on the same page.
Techniques to Find Parameters
Step #1: Make your loss function convex
Techniques to Find Parameters
• Gradient Descent
• Newton’s method
• Stochastic gradient descent
• Conjugate gradient descent
• Alternating variables
• Simplex Algorithm
Step #2: Convex Optimization
Techniques to Find Parameters
• Python: cvxopt / cvxmod
• C/C++: GLPK (GNU Linear Programming Kit)
• Matlab, R, Numpy/Scipy, etc
Step #2: Convex Optimization
>> from cvxmod import *
>> p = problem(minimize(norm2(A*x-b)))
>> p.constr.append(x >= 0.5)
>> p.solve()
Example: Propagating Labels Across a Social Graph
?
?
Example: Propagating Labels Across a Social Graph
?
?
Solution: [Zhou et al. (2005)]
Trade-offs: Not all Errors are Created Equal
Predicted Actual
True Positives + +
False Positive + −
True Negative − −
False Negative − +
Precision is TP : FP (avoid looking dumb)
Recall is TP : FN (surface the most possibilities)
Overfitting
Model is trying to explaining noise
• insufficient sample size
• too many arguments
• not enough constraints / error terms

Optimization: A Framework for Predictive Analytics

  • 1.
    Optimization A framework forpredictive analytics Matt Gattis, Hunch.com
  • 2.
  • 3.
    Machine Learning Example:Linear Regression
  • 4.
    Machine Learning Anecdote:Clustering Type Loss function Hierarchical max/min/mean pairwise differences between observations in each cluster K-means distance from each observation in a cluster to the centroid of the cluster Hunch 1) number of K nearest neighbors not in the cluster 2) distance from the furthest observation in the cluster
  • 5.
    SYNERGY • Align yourloss function with your product goals • Add terms to your loss function for optimizing business metrics Make sure your product designers, business strategists, and engineers are all on the same page.
  • 6.
    Techniques to FindParameters Step #1: Make your loss function convex
  • 7.
    Techniques to FindParameters • Gradient Descent • Newton’s method • Stochastic gradient descent • Conjugate gradient descent • Alternating variables • Simplex Algorithm Step #2: Convex Optimization
  • 8.
    Techniques to FindParameters • Python: cvxopt / cvxmod • C/C++: GLPK (GNU Linear Programming Kit) • Matlab, R, Numpy/Scipy, etc Step #2: Convex Optimization >> from cvxmod import * >> p = problem(minimize(norm2(A*x-b))) >> p.constr.append(x >= 0.5) >> p.solve()
  • 9.
    Example: Propagating LabelsAcross a Social Graph ? ?
  • 10.
    Example: Propagating LabelsAcross a Social Graph ? ? Solution: [Zhou et al. (2005)]
  • 11.
    Trade-offs: Not allErrors are Created Equal Predicted Actual True Positives + + False Positive + − True Negative − − False Negative − + Precision is TP : FP (avoid looking dumb) Recall is TP : FN (surface the most possibilities)
  • 12.
    Overfitting Model is tryingto explaining noise • insufficient sample size • too many arguments • not enough constraints / error terms