This document outlines an introductory machine learning course, covering key concepts, applications, and types of machine learning like supervised and unsupervised learning. It discusses techniques like linear regression, classification, and handling overfitting. The course will include tutorials on sentiment analysis, spam filtering, stock prediction, image recognition and recommendation engines using Python and Scala. Later classes cover machine learning at scale using tools like Spark MLLib.
2. ● What is machine learning?
● Applications
● Types
● Terminology
● Key concepts
Outline
3. Next classes
1. Key concepts
2. A tour of machine learning (linear algebra, probability theory, calculus)
3. Machine learning pipelines (pre-processing, model training, and evaluation in Python and Scala)
4. Machine learning case studies (Python and Scala examples)
a. Sentiment analysis (Natural Language Processing, NLTK)
b. Spam classifier
c. Stock price prediction (regression)
d. Image recognition, deep learning (TensorFlow, keras)
e. Recommendation engine
5. Machine learning at scale (algorithms, linear algebra, probability, Spark MLLib, Vowpal Wabbit, scikit-learn)
4. Next classes
Key concepts Tour Pipelines Case studies Scale
Concepts ০০০ ০ ০০ ০০ ০০০
Code ০ ০০ ০০০ ০০০ ০০০
Math/stats ০ ০০০ ০ ০ ০০০
5. What is machine learning?
● Learn from data (past experiences)
● Generalize (find the signal/pattern)
● Predict, forward looking
● Observational data
6. Relationship to data science and deep learning
What is machine learning?
Data Science
ML
DL
7. Applications
● Autonomous cars
● Siri
● Facial recognition
● People who bought this also bought...
● Spam filters
● Targeted advertising
● ...
10. (Linear) Regression
● Predict a continuous variable (e.g. price)
● Y=mx+b
● Ordinary Least Squares
● Analytical solution
● Geometric model
11. (Linear) Regression
● Can use multiple variables
(multi-variate regression)
● Relationships are not always linear
12. (Linear) Regression example
● Boston housing dataset
● Median value of houses (MV)
vs. average # rooms (RM)
from sklearn.linear_model import LinearRegression
model = LinearRegression()
x, y = housing[['RM']], housing['MV']
model.fit(x, y)
model.score(x, y)
R2=0.48
13. (Linear) Regression example
● Boston housing dataset
● Median value of houses (MV)
vs. average # rooms (RM),
and industrial zoning proportions (INDUS)
from sklearn.linear_model import LinearRegression
model = LinearRegression()
x, y = housing[['RM', ‘INDUS’]], housing['MV']
model.fit(x, y)
model.score(x, y)
R2=0.53
14. (Linear) Regression example
● Intuition breaks down in high-dimensions (>3)
● Interpretability goes down
● Real-world data is usually non-linear
21. Statistical learning
● The true underlying function is not known
● Usually can’t observe all features (e.g. policy impact, global trends, etc.)
● Most interesting phenomenon are neither deterministic, nor stationary
● No guarantee that a set of variables is predictive of the outcome
Machine learning territory100% deterministic
F=ma
100% stochastic
Coin flip
22. Classification
● Target variable, qualitative, classes
● Binary classification
Cancer patient
Positive class (class of interest)
Healthy patient
Negative class
27. Variance/overfit
● Learning the wrong things, memorizing
● Modeling the noise and not the signal
○ Model 1- if GPA > 3.8 and hours studied>5 then passed
○ Model 2- if student ID != 2 then passed
○ New record: StudentID = 4, Hours Studied = 5.5, GPA = 3.82, passed?
Student ID Hours studied GPA ... Passed
1 10 4.00 Yes
2 0 2.71 No
3 6 3.95 Yes
30. Summary
Increasing model complexity generally:
● Increases model fit
● Decreases interpretability
● Increases chance of overfitting
● Increases training time
● Increases model latency
33. Preparation for next class
1- Test your understanding: http://bit.ly/mlseries1
2- Check this out: Visual intro to ML
Want to pursue machine learning more seriously?
● Read A few useful things to know about machine learning
● Theory and intuition, Python Machine Learning book
● Hands-on experience, Kaggle (start with titanic)
● Elements of statistical learning (advanced)