Your SlideShare is downloading. ×
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Learning Linear Models with Hadoop
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Learning Linear Models with Hadoop

1,486

Published on

Linear models are some of the most successful methods for predictive analytics. In this talk we will teach you why linear models work, how to learn and apply linear models on big data, and we`ll …

Linear models are some of the most successful methods for predictive analytics. In this talk we will teach you why linear models work, how to learn and apply linear models on big data, and we`ll provide tips and tricks on how to improve your models.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,486
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Learning Linear Models with Hadoop Ulrich Rückert © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 2. Agenda What are linear models anyway? How to learn linear models with Hadoop Demo Tips, tricks and caveats Conclusion © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 3. Predictive Analytics Test Data Age Income BuysBook Target 22 67000 ? Example Learning Attributes Attribute 39 41000 ? Task Age Income BuysBook 24 60000 yes • Ad on booksellerʼs web page 65 80000 no 60 95000 no • Will a customer buy this book? 35 52000 yes • Training set: observations on 20 43 45000 75000 yes yes Model previous customers 26 51000 yes 52 47000 no • Test set: new customers 47 38000 no 25 22000 no Letʼs learn a linear 33 47000 yes model! Training Data Age 22 Income 67000 BuysBook yes 39 41000 no Prediction © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 4. Linear Models Expert1 Expert2 BuysBook 24 60 ? 64 80 ? 60 96 ? Whatʼs in the black box? • Letʼs pretend all attributes are expert ratings • Large positive value means yes • Small value means no Expert 1 Expert 2 Prediction • Intermediate value: donʼt know 24 65 60 80 ? ? 60 95 ? Let the experts vote • Sum over ratings for each row • Larger than threshold: predict yes Expert1 24 Expert2 60 Prediction ? • Smaller: predict no 64 60 80 96 ? ? © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 5. Linear Models Expert1 Expert2 BuysBook 24 60 ? 64 80 ? 60 96 ? Whatʼs in the black box? • Letʼs pretend all attributes are expert ratings Threshold • Large positive value means yes 97 • Small value means no Expert 1 Expert 2 > threshold • Intermediate value: donʼt know 24 65 + + 60 80 = = 84 145 no yes 60 + 95 = 155 yes Let the experts vote • Sum over ratings for each row • Larger than threshold: predict yes Expert1 24 Expert2 60 Prediction no • Smaller: predict no 64 60 80 96 yes yes © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 6. Linear Models Expert1 Expert2 BuysBook 24 60 ? 64 80 ? Assign a weight to each 60 96 ? expert • Expert is mostly correct: large Weight 1 Weight 2 Threshold weight 0.75 0.25 48 • Expert is uninformative: zero • Expert is consistently wrong: Expert 1 Expert 2 > threshold negative weight 0.75 • 24 + 0.25 • 60 = 33 no 0.75 • 64 + 0.25 • 80 = 68 yes 0.75 • 60 + 0.25 • 96 = 69 yes Learning models • A linear model contains weights and threshold Expert1 Expert2 Prediction 24 60 no • Learn by finding weights with 64 80 yes lowest error on training data 60 96 yes © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 7. Linear Models Expert1 Expert2 BuysBook 24 60 ? 64 80 ? Assign a weight to each 60 96 ? expert • Expert is mostly correct: large Weight 1 Weight 2 Threshold weight 0 0.25 18 • Expert is uninformative: zero • Expert is consistently wrong: Expert 1 Expert 2 > threshold negative weight 0 • 24 + 0.25 • 60 = 15 no 0 • 64 + 0.25 • 80 = 20 yes 0 • 60 + 0.25 • 96 = 24 yes Learning models • A linear model contains weights and threshold Expert1 Expert2 Prediction 24 60 no • Learn by finding weights with 64 80 yes lowest error on training data 60 96 yes © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 8. Linear Models Expert1 Expert2 BuysBook 24 60 ? 64 80 ? Assign a weight to each 60 96 ? expert • Expert is mostly correct: large Weight 1 Weight 2 Threshold weight -0.5 0.25 -8 • Expert is uninformative: zero • Expert is consistently wrong: Expert 1 Expert 2 > threshold negative weight -0.5 • 24 + 0.25 • 60 = 3 yes -0.5 • 64 + 0.25 • 80 = -12 no -0.5 • 60 + 0.25 • 96 = -6 yes Learning models • A linear model contains weights and threshold Expert1 Expert2 Prediction 24 60 yes • Learn by finding weights with 64 80 no lowest error on training data 60 96 yes © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 9. Learning Linear Models Stochastic Gradient Start with default weights Decent (SGD) • Main idea: start with default weights Read next training row • For each row check if current weights predict correctly • If misclassification: adjust weights Do weights predict the correct label? Yes How to adjust weights? No • if positive class: add row Adjust weights • if negative class: subtract row © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 10. Learning Linear Models repeat row = readNextRow(); if(predict(weights, row.attributes) != row.class) weights += row.class * row.attributes; threshold += -row.class; endif end Weight 1 Weight 2 Threshold 1 -1 0 Age Income > threshold 1•? + -1 • ? = ? ? Age Income BuysBook 24 60 +1 © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 11. Learning Linear Models repeat row = readNextRow(); if(predict(weights, row.attributes) != row.class) weights += row.class * row.attributes; threshold += -row.class; endif end Weight 1 Weight 2 Threshold 1 -1 0 Age Income > threshold 1 • 24 + -1 • 60 = -36 -1 Age Income BuysBook 24 60 +1 © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 12. Learning Linear Models repeat row = readNextRow(); if(predict(weights, row.attributes) != row.class) weights += row.class * row.attributes; threshold += -row.class; endif end Weight 1 Weight 2 Threshold 25 59 0 Age Income > threshold 25 • 24 + 59 • 60 = 4140 +1 Age Income BuysBook 24 60 +1 © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 13. Learning Linear Models repeat row = readNextRow(); if(predict(weights, row.attributes) != row.class) weights += row.class * row.attributes; threshold += -row.class; endif end Weight 1 Weight 2 Threshold 25 59 -1 Age Income > threshold 25 • 24 + 59 • 60 = 4140 +1 Age Income BuysBook 24 60 +1 © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 14. Learning Linear Models repeat row = readNextRow(); if(predict(weights, row.attributes) != row.class) weights += row.class * row.attributes; threshold += -row.class; endif end Weight 1 Weight 2 Threshold 25 59 -1 Age Income > threshold 25 • ? + 59 • ? = ? ? Age Income BuysBook 30 30 -1 © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 15. Learning Linear Models repeat row = readNextRow(); if(predict(weights, row.attributes) != row.class) weights += row.class * row.attributes; threshold += -row.class; endif end Weight 1 Weight 2 Threshold 25 59 -1 Age Income > threshold 25 • 30 + 59 • 30 = 2520 +1 Age Income BuysBook 30 30 -1 © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 16. Learning Linear Models repeat row = readNextRow(); if(predict(weights, row.attributes) != row.class) weights += row.class * row.attributes; threshold += -row.class; endif end Weight 1 Weight 2 Threshold -5 29 -1 Age Income > threshold -5 • 30 + 29 • 30 = 720 +1 Age Income BuysBook 30 30 -1 © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 17. Learning Linear Models repeat row = readNextRow(); if(predict(weights, row.attributes) != row.class) weights += row.class * row.attributes; threshold += -row.class; endif end Weight 1 Weight 2 Threshold -5 29 0 Age Income > threshold -5 • 30 + 29 • 30 = 720 +1 Age Income BuysBook 30 30 -1 © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 18. Learning - Convergence repeat row = readNextRow(); if(predict(weights, row.attributes) != row.class) weights += row.class * row.attributes; threshold += -row.class; endif end © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 19. Learning - Convergence repeat row = readNextRow(); if(predict(weights, row.attributes) != row.class) weights += row.class * row.attributes; threshold += -row.class; endif end © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 20. Learning - Convergence repeat row = readNextRow(); if(predict(weights, row.attributes) != row.class) weights += 0.001 * row.class * row.attributes; threshold += -row.class; endif end © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 21. Learning - Convergence repeat row = readNextRow(); if(predict(weights, row.attributes) != row.class) weights += 0.001 * row.class * row.attributes; threshold += -row.class; endif end © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 22. Learning - Convergence for i=1 to ∞ row = readNextRow(); if(predict(weights, row.attributes) != row.class) weights += (1/i) * row.class * row.attributes; threshold += -row.class; endif end © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 23. Learning - Convergence for i=1 to ∞ row = readNextRow(); if(predict(weights, row.attributes) != row.class) weights += (1/i) * row.class * row.attributes; threshold += -row.class; endif end © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 24. Learning - Margin for i = 1 to ∞ row = readNextRow(); if(margin(weights, row.attributes, threshold) <= 1) weights += (1/n) * row.class * row.attributes; threshold += -row.class; endif end Weight 1 Weight 2 Threshold 0.5 0.25 26.5 Age Income Margin > threshold 0.5 • 24 + 0.25 • 60 = 27 +1 Age Income BuysBook 24 60 +1 © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 25. Learning - Margin for i = 1 to ∞ row = readNextRow(); if(margin(weights, row.attributes, threshold) <= 1) weights += (1/n) * row.class * row.attributes; threshold += -row.class; endif end Weight 1 Weight 2 Threshold 0.5 0.25 26.5 Age Income Margin > threshold 0.5 • 24 + 0.25 • 60 = 27 +1 Age Income BuysBook 24 60 +1 © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 26. Learning - Margin for i = 1 to ∞ row = readNextRow(); if(margin(weights, row.attributes, threshold) <= 1) weights += (1/n) * row.class * row.attributes; threshold += -row.class; endif end Weight 1 Weight 2 Threshold 0.5 0.25 26.5 Age Income Margin > threshold 0.5 • 24 + 0.25 • 60 = 27 +1 Age Income BuysBook 24 60 +1 © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 27. Learning - Regularization for i = 1 to ∞ Attributes are often row = readNextRow(); if(margin(weights, row.attributes, threshold) <= 1) correlated weights += (1/n) * row.class * row.attributes; threshold += -row.class; • Contributions cancel out endif end • This leads to unreasonably large weights... • ... and models which are not Weight 1 Weight 2 Threshold robust to noise 0.5 0.5 30 Regularization Age Income > threshold • Make sure weights donʼt get too 0.5 • 24 + 0.5 • 60 = 42 +1 large • L2 regularization: weights are Age Income BuysBook proportional to attribute quality 24 60 +1 © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 28. Learning - Regularization for i = 1 to ∞ Attributes are often row = readNextRow(); if(margin(weights, row.attributes, threshold) <= 1) correlated weights += (1/n) * row.class * row.attributes; threshold += -row.class; • Contributions cancel out endif end • This leads to unreasonably large weights... • ... and models which are not Weight 1 Weight 2 Threshold robust to noise 1000 -399.3 30 Regularization Age Income > threshold • Make sure weights donʼt get too 1000 • 24 + -399.3 • 60 = 42 +1 large • L2 regularization: weights are Age Income BuysBook proportional to attribute quality 24 60 +1 © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 29. Learning - Regularization for i = 1 to ∞ Attributes are often row = readNextRow(); if(margin(weights, row.attributes, threshold) <= 1) correlated weights += (1/n) * row.class * row.attributes; threshold += -row.class; • Contributions cancel out endif • This leads to unreasonably end weights = i/(i+r) * weights; large weights... • ... and models which are not Weight 1 Weight 2 Threshold robust to noise 1000 -399.3 30 Regularization Age Income > threshold • Make sure weights donʼt get too 1000 • 24 + -399.3 • 60 = 42 +1 large • L2 regularization: weights are Age Income BuysBook proportional to attribute quality 24 60 +1 © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 30. Implementation on Hadoop Map-Reduce • Input data must be in random order • Mapper: send data to reducer in random order • Reducer: run the actual Stochastic Gradient Descent Evaluation and Parameter Selection • Perform several runs with varying parameters • Learn on training set, evaluate on test set • Many runs with with partial data often better than one run with all data © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 31. Demo © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 32. Learning Linear Models Stochastic Gradient Descent: Pros and Cons • One sweep over the data: easy to implement on top of Hadoop • Flexible: support vector machines, logistic regression, etc. • Provides good enough estimate instead of optimum • Parameter selection and evaluation are crucial Alternative: convex optimization • Formulate learning as numerical optimization problem • On Hadoop: usually LBFGS • See Vowpal Wobbit for a large scale implementation © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 33. Conclusion Linear Models • Prediction based on weighted vote and threshold Stochastic Gradient Descent • Adjust weight vector iteratively for each misclassified row • Decreasing step size to ensure convergence • Margins and regularization for robustness Implementation • Mapper provides random order, reducer performs SGD • Evaluation and parameter selection are crucial © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013
  • 34. Thanks urueckert@datameer.com © 2012 Datameer, Inc. All rights reserved.Thursday, March 28, 2013

×