Optimization: A Framework for Predictive Analytics

Optimization
A framework for predictive analytics
Matt Gattis, Hunch.com

Machine Learning Example: Linear Regression

Machine Learning Anecdote: Clustering
Type Loss function
Hierarchical max/min/mean pairwise differences between observations in each cluster
K-means distance from each observation in a cluster to the centroid of the cluster
Hunch 1) number of K nearest neighbors not in the cluster
2) distance from the furthest observation in the cluster

SYNERGY
• Align your loss function with your product goals
• Add terms to your loss function for optimizing business
metrics
Make sure your product designers, business strategists, and
engineers are all on the same page.

Techniques to Find Parameters
Step #1: Make your loss function convex

• Gradient Descent
• Newton’s method
• Stochastic gradient descent
• Conjugate gradient descent
• Alternating variables
• Simplex Algorithm
Step #2: Convex Optimization

• Python: cvxopt / cvxmod
• C/C++: GLPK (GNU Linear Programming Kit)
• Matlab, R, Numpy/Scipy, etc
Step #2: Convex Optimization
>> from cvxmod import *
>> p = problem(minimize(norm2(A*x-b)))
>> p.constr.append(x >= 0.5)
>> p.solve()

Example: Propagating Labels Across a Social Graph
?
?

Example: Propagating Labels Across a Social Graph
?
?
Solution: [Zhou et al. (2005)]

Trade-offs: Not all Errors are Created Equal
Predicted Actual
True Positives + +
False Positive + −
True Negative − −
False Negative − +
Precision is TP : FP (avoid looking dumb)
Recall is TP : FN (surface the most possibilities)

Overﬁtting
Model is trying to explaining noise
• insufﬁcient sample size
• too many arguments
• not enough constraints / error terms

Optimization: A Framework for Predictive Analytics

More Related Content

Viewers also liked

Similar to Optimization: A Framework for Predictive Analytics

More from NYC Predictive Analytics

Recently uploaded

Optimization: A Framework for Predictive Analytics