2. Agenda
• Why Machine Learning ?
• What is Machine Learning ?
• Some ML Applications
• Data Science Pipeline
• Data -> Big Data
• Big Data -> Feature Selection
• Machine Learning Modelling
• Model Evaluation
• Inference/Analytics
• Summary
12/24/2017 2
10. Data => Big Data
• Structured Data
• Unstructured Data
• Big Data
- Text Data
- Time Series Data
- Spatial/Location-based Data
- Image/Video/Audio Data
12/24/2017 10
12. Big Data => Feature Selection
• Simplification of models - easier to interpret
• Shorter training times
• To avoid the curse of dimensionality
• Enhanced generalization by reducing
overfitting(formally, reduction of variance)
12/24/2017 12
22. Parametric Modelling
• Data is behaved according to a probability
distribution
• No of parameters is constant
• Focused on group means
12/24/2017 22
23. Non-Parametric Modelling
• Do not assume a particular probability
distribution
• No of parameters grows with training
samples
• Focused on group medians
12/24/2017 23
25. Frequentist ML Modelling
Maximum Likelihood Estimation(MLE)
You need to model your random variables realistically
- Discrete r.v
i.e : Bernouli/Binomial/Geometri/Poisson)
- Continuous r.v
i.e :
Uniform/Exponential/Gamma/Normal(Gaussian)
Explained in Regression Modeling – Probabilistic
Interpretation
12/24/2017 25
26. Bayesian ML Modelling
Very powerful modeling approach
Prior knowledge is incorporated
Maximum-a-Posteriori(MAP)
12/24/2017 26
31. Supervised Learning
• Given a training set of N example input–
output pairs (x1, y1), (x2, y2), . . . (xN, yN) ,
where each yj was generated by an
unknown function y = f(x),
• discover a function h that approximates the
true function f.
12/24/2017 31
33. Supervised Learning
12/24/2017 33
• Regression
When the output y is a number
i.e : tomorrow’s temperature
• Classification
When the output y is one of a finite set of
values.
i.e : sunny, cloudy or rainy