Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Regenerating America's Legacy Cities by Cassidy Swanson 588 views
- Acc 230 week 1 dq 1 and dq 2 by uoptutorialb 160 views
- Big Data - To Explain or To Predict... by Galit Shmueli 713 views
- Basic ML by Ignacio Elola Villar 830 views
- IFI7159 M2 by David Lamas 2875 views
- Predictive Modeling Workshop by odsc 2475 views

No Downloads

Total views

400

On SlideShare

0

From Embeds

0

Number of Embeds

63

Shares

0

Downloads

19

Comments

0

Likes

2

No embeds

No notes for slide

- 1. Machine learning Applications, types and key concepts
- 2. ● What is machine learning? ● Applications ● Types ● Terminology ● Key concepts Outline
- 3. Next classes 1. Key concepts 2. A tour of machine learning (linear algebra, probability theory, calculus) 3. Machine learning pipelines (pre-processing, model training, and evaluation in Python and Scala) 4. Machine learning case studies (Python and Scala examples) a. Sentiment analysis (Natural Language Processing, NLTK) b. Spam classifier c. Stock price prediction (regression) d. Image recognition, deep learning (TensorFlow, keras) e. Recommendation engine 5. Machine learning at scale (algorithms, linear algebra, probability, Spark MLLib, Vowpal Wabbit, scikit-learn)
- 4. Next classes Key concepts Tour Pipelines Case studies Scale Concepts ০০০ ০ ০০ ০০ ০০০ Code ০ ০০ ০০০ ০০০ ০০০ Math/stats ০ ০০০ ০ ০ ০০০
- 5. What is machine learning? ● Learn from data (past experiences) ● Generalize (find the signal/pattern) ● Predict, forward looking ● Observational data
- 6. Relationship to data science and deep learning What is machine learning? Data Science ML DL
- 7. Applications ● Autonomous cars ● Siri ● Facial recognition ● People who bought this also bought... ● Spam filters ● Targeted advertising ● ...
- 8. Types of machine learning ● Supervised learning ○ Classification ○ Regression ● Unsupervised learning ● Reinforcement learning
- 9. Types of machine learning Supervised Unsupervised Reinforcement learning Cancer diagnosis Stock market prediction Customer churn Recommendation engine Anomaly detection Dimensionality reduction Clustering PageRank Anomaly detection Self-driving cars AlphaGo
- 10. (Linear) Regression ● Predict a continuous variable (e.g. price) ● Y=mx+b ● Ordinary Least Squares ● Analytical solution ● Geometric model
- 11. (Linear) Regression ● Can use multiple variables (multi-variate regression) ● Relationships are not always linear
- 12. (Linear) Regression example ● Boston housing dataset ● Median value of houses (MV) vs. average # rooms (RM) from sklearn.linear_model import LinearRegression model = LinearRegression() x, y = housing[['RM']], housing['MV'] model.fit(x, y) model.score(x, y) R2=0.48
- 13. (Linear) Regression example ● Boston housing dataset ● Median value of houses (MV) vs. average # rooms (RM), and industrial zoning proportions (INDUS) from sklearn.linear_model import LinearRegression model = LinearRegression() x, y = housing[['RM', ‘INDUS’]], housing['MV'] model.fit(x, y) model.score(x, y) R2=0.53
- 14. (Linear) Regression example ● Intuition breaks down in high-dimensions (>3) ● Interpretability goes down ● Real-world data is usually non-linear
- 15. Terminology ● Feature (a.k.a. input, variable, predictor, explanatory, independent variable) ● Output (a.k.a. target, label, class, dependent variable) ● Training instance (aka observation, row) ● Training dataset ● Training (a.k.a. learning, modeling, fitting) ● Model validation and testing RM INDUS ZN ... MV 6.575 2.31 18.0 ... 24.0 6.421 7.07 0.0 ... 21.6 ... ... ... ... ...
- 16. Terminology ● Feature (a.k.a. input, variable, predictor, explanatory, independent variable) ● Output (a.k.a. target, label, class, dependent variable) ● Training instance (aka observation, row) ● Training dataset ● Training (a.k.a. learning, modeling, fitting) ● Model validation and testing RM INDUS ZN ... MV 6.575 2.31 18.0 ... 24.0 6.421 7.07 0.0 ... 21.6 ... ... ... ... ...
- 17. Terminology ● Feature (a.k.a. input, variable, predictor, explanatory, independent variable) ● Output (a.k.a. target, label, class, dependent variable) ● Training instance (aka observation, row) ● Training dataset ● Training (a.k.a. learning, modeling, fitting) ● Model validation and testing RM INDUS ZN ... MV 6.575 2.31 18.0 ... 24.0 6.421 7.07 0.0 ... 21.6 ... ... ... ... ...
- 18. Terminology ● Feature (a.k.a. input, variable, predictor, explanatory, independent variable) ● Output (a.k.a. target, label, class, dependent variable) ● Training instance (aka observation, row) ● Training dataset ● Training (a.k.a. learning, modeling, fitting) ● Model validation and testing RM INDUS ZN ... MV 6.575 2.31 18.0 ... 24.0 6.421 7.07 0.0 ... 21.6 ... ... ... ... ...
- 19. Terminology ● Feature (a.k.a. input, variable, predictor, explanatory, independent variable) ● Output (a.k.a. target, label, class, dependent variable) ● Training instance (aka observation, row) ● Training dataset ● Training (a.k.a. learning, modeling, fitting) ● Model validation and testing RM INDUS ZN ... MV 6.575 2.31 18.0 ... 24.0 6.421 7.07 0.0 ... 21.6 ... ... ... ... ...
- 20. Terminology ● Feature (a.k.a. input, variable, predictor, explanatory, independent variable) ● Output (a.k.a. target, label, class, dependent variable) ● Training instance (aka observation, row) ● Training dataset ● Training (a.k.a. learning, modeling, fitting) ● Model validation and testing RM INDUS ZN ... MV 6.575 2.31 18.0 ... 24.0 6.421 7.07 0.0 ... 21.6 ... ... ... ... ...
- 21. Statistical learning ● The true underlying function is not known ● Usually can’t observe all features (e.g. policy impact, global trends, etc.) ● Most interesting phenomenon are neither deterministic, nor stationary ● No guarantee that a set of variables is predictive of the outcome Machine learning territory100% deterministic F=ma 100% stochastic Coin flip
- 22. Classification ● Target variable, qualitative, classes ● Binary classification Cancer patient Positive class (class of interest) Healthy patient Negative class
- 23. Classification ● Linear vs. non-linear decision boundaries ● Model complexity, training time, and latency
- 24. Bias Cancer patient Positive class (class of interest) Healthy patient Negative class
- 25. Bias Cancer patient Positive class (class of interest) Healthy patient Negative class
- 26. Bias
- 27. Variance/overfit ● Learning the wrong things, memorizing ● Modeling the noise and not the signal ○ Model 1- if GPA > 3.8 and hours studied>5 then passed ○ Model 2- if student ID != 2 then passed ○ New record: StudentID = 4, Hours Studied = 5.5, GPA = 3.82, passed? Student ID Hours studied GPA ... Passed 1 10 4.00 Yes 2 0 2.71 No 3 6 3.95 Yes
- 28. Bias/variance
- 29. Guarding against overfitting ● Split into train, validation and test ● Cross-validation
- 30. Summary Increasing model complexity generally: ● Increases model fit ● Decreases interpretability ● Increases chance of overfitting ● Increases training time ● Increases model latency
- 31. Remember that...
- 32. Next class: a tour of machine learning
- 33. Preparation for next class 1- Test your understanding: http://bit.ly/mlseries1 2- Check this out: Visual intro to ML Want to pursue machine learning more seriously? ● Read A few useful things to know about machine learning ● Theory and intuition, Python Machine Learning book ● Hands-on experience, Kaggle (start with titanic) ● Elements of statistical learning (advanced)

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment