3. This talk will- introduce machine learning- make you ML-aware- have examples
4. This talk will not- give you a PhD- implement algorithms- cover collaborative filtering, optimization, clustering, advanced statistics, genetic algorithms, classical AI, NLP, ...
5. What is Machine Learning?Many different algorithmsthat predict datafrom other datausing applied statistics.
6. "Enhance and rotate 20 degrees"
7. What data? The web is data. User decisions APIs A/B Tests Databases Logs StreamsBrowser versions Reviews Clicktrails
8. Okay. We have data.What do we do with it?We classify it.
10. Classification OR
11. Classification :) OR :(
12. Classification• Documents o Sort email (Gmails importance filter) o Route questions to appropriate expert (Aardvark) o Categorize reviews (Amazon)• Users o Expertise; interests; pro vs free; likelihood of paying; expected future karma• Events o Abnormal vs. normal
13. Algorithms: Decision Tree Learning
14. Algorithms: Decision Tree Learning Features Email contains word "viagra" no yes Email contains Email contains word "Ruby" attachment? no yes no yes P(Spam)=10% P(Spam)=5% P(Spam)=70% P(Spam)=95% Labels
15. Algorithms: Support Vector Machines (SVMs) Graphics from Wikipedia
16. Algorithms: Support Vector Machines (SVMs) Graphics from Wikipedia
17. Algorithms: Naive Bayes• Break documents into words and treat each word as an independent feature• Surprisingly effective on simple text and document classification• Works well when you have lots of data Graphics from Wikipedia
18. Algorithms: Naive BayesYou received 100 emails, 70 of which were spam.Word Spam with this word Ham with this wordviagra 42 (60%) 1 (3.3%)ruby 7 (10%) 15 (50%)hello 35 (50%) 24 (80%)A new email contains hello and viagra. The probability that itis spam is:P(S|hello,viagra) = P(S) * P(hello,viagra|S) / P(hello,viagra) = 0.7 * (0.5 * 0.6) / (0.59 * 0.43) = 82% Graphics from Wikipedia
19. Algorithms: Neural Nets Hidden layerInput layer (features) Output layer (Classification) Graphics from Wikipedia
20. Curse of DimensionalityThe more features and labels that you have, the more data that you need. http://www.iro.umontreal.ca/~bengioy/yoshua_en/research_files/CurseDimensionality.jpg
21. Overfitting• With enough parameters, anything is possible.• We want our algorithms to generalize and infer, not memorize specific training examples.• Therefore, we test our algorithms on different data than we train them on.