Your SlideShare is downloading. ×
0
Practical Machine Learning and Rails Part1
Practical Machine Learning and Rails Part1
Practical Machine Learning and Rails Part1
Practical Machine Learning and Rails Part1
Practical Machine Learning and Rails Part1
Practical Machine Learning and Rails Part1
Practical Machine Learning and Rails Part1
Practical Machine Learning and Rails Part1
Practical Machine Learning and Rails Part1
Practical Machine Learning and Rails Part1
Practical Machine Learning and Rails Part1
Practical Machine Learning and Rails Part1
Practical Machine Learning and Rails Part1
Practical Machine Learning and Rails Part1
Practical Machine Learning and Rails Part1
Practical Machine Learning and Rails Part1
Practical Machine Learning and Rails Part1
Practical Machine Learning and Rails Part1
Practical Machine Learning and Rails Part1
Practical Machine Learning and Rails Part1
Practical Machine Learning and Rails Part1
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Practical Machine Learning and Rails Part1

5,247

Published on

Part 2: http://www.slideshare.net/ryanstout/practical-machine-learning-and-rails-part2

Part 2: http://www.slideshare.net/ryanstout/practical-machine-learning-and-rails-part2

Published in: Technology, Education
2 Comments
5 Likes
Statistics
Notes
No Downloads
Views
Total Views
5,247
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
69
Comments
2
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Practical MachineLearning and Rails
  • 2. Andrew Cantino VP Engineering, Mavenlink @tectonic Founder, Agile Productions @ryanstout
  • 3. This talk will- introduce machine learning- make you ML-aware- have examples
  • 4. This talk will not- give you a PhD- implement algorithms- cover collaborative filtering, optimization, clustering, advanced statistics, genetic algorithms, classical AI, NLP, ...
  • 5. What is Machine Learning?Many different algorithmsthat predict datafrom other datausing applied statistics.
  • 6. "Enhance and rotate 20 degrees"
  • 7. What data? The web is data. User decisions APIs A/B Tests Databases Logs StreamsBrowser versions Reviews Clicktrails
  • 8. Okay. We have data.What do we do with it?We classify it.
  • 9. Classification
  • 10. Classification OR
  • 11. Classification :) OR :(
  • 12. Classification• Documents o Sort email (Gmails importance filter) o Route questions to appropriate expert (Aardvark) o Categorize reviews (Amazon)• Users o Expertise; interests; pro vs free; likelihood of paying; expected future karma• Events o Abnormal vs. normal
  • 13. Algorithms: Decision Tree Learning
  • 14. Algorithms: Decision Tree Learning Features Email contains word "viagra" no yes Email contains Email contains word "Ruby" attachment? no yes no yes P(Spam)=10% P(Spam)=5% P(Spam)=70% P(Spam)=95% Labels
  • 15. Algorithms: Support Vector Machines (SVMs) Graphics from Wikipedia
  • 16. Algorithms: Support Vector Machines (SVMs) Graphics from Wikipedia
  • 17. Algorithms: Naive Bayes• Break documents into words and treat each word as an independent feature• Surprisingly effective on simple text and document classification• Works well when you have lots of data Graphics from Wikipedia
  • 18. Algorithms: Naive BayesYou received 100 emails, 70 of which were spam.Word Spam with this word Ham with this wordviagra 42 (60%) 1 (3.3%)ruby 7 (10%) 15 (50%)hello 35 (50%) 24 (80%)A new email contains hello and viagra. The probability that itis spam is:P(S|hello,viagra) = P(S) * P(hello,viagra|S) / P(hello,viagra) = 0.7 * (0.5 * 0.6) / (0.59 * 0.43) = 82% Graphics from Wikipedia
  • 19. Algorithms: Neural Nets Hidden layerInput layer (features) Output layer (Classification) Graphics from Wikipedia
  • 20. Curse of DimensionalityThe more features and labels that you have, the more data that you need. http://www.iro.umontreal.ca/~bengioy/yoshua_en/research_files/CurseDimensionality.jpg
  • 21. Overfitting• With enough parameters, anything is possible.• We want our algorithms to generalize and infer, not memorize specific training examples.• Therefore, we test our algorithms on different data than we train them on.

×