Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Location:
QuantUniversity Meetup
August 24th 2016
Boston MA
Machine Learning: An intuitive foundation
2016 Copyright Quant...
2
Slides and Code will be available at:
http://www.analyticscertificate.com
- Analytics Advisory services
- Custom training programs
- Architecture assessments, advice and audits
• Founder of QuantUniversity LLC. and
www.analyticscertificate.com
• Advisory and Consultancy for Financial Analytics
• Pr...
5
Quantitative Analytics and Big Data Analytics Onboarding
• Trained more than 500 students in
Quantitative methods, Data ...
7
Quantitative Analytics and Big Data Analytics Onboarding
• Apply at:
www.analyticscertificate.com
• Program starting Sep...
8
• September
▫ 11th, 12th : Spark Workshop, Boston
 www.analyticscertificate.com/SparkWorkshop
 Sponsored by IBM
▫ 19th...
9
Agenda
1. Data
2. Goals
3. Machine learning algorithms
4. Process
5. Performance Evaluation
11
Dataset, variable and Observations
Dataset: A rectangular array with Rows as observations and
columns as variables
Variabl...
Variables
 A variable could be:
Categorical
 Yes/No flags
 AAA,BB ratings for bonds
Numerical
 35 mpg
 $170K salary
Datasets
• Longitudinal
▫ Observations are dependent
▫ Temporal-continuity is required
• Cross-sectional
▫ Observations ar...
15
Data
Cross
sectional
Numerical Categorical
Longitudinal
Numerical
Summary
16
17
• Descriptive Statistics
▫ Goal is to describe the data at hand
▫ Backward looking
▫ Statistical techniques employed he...
18
• How do you summarize numerical variables ?
• How do you summarize categorical variables ?
• How do you describe varia...
19
• Goal is to extract the various components
Longitudinal datasets
20
• Given a dataset, build a model that captures the similarities in
different observations and assigns them to different...
21
• Given a time series dataset, build a model that can be used to
forecast values in the future
Predictive Analytics : T...
22
Goal
Descriptive
Statistics
Cross
sectional
Numerical Categorical
Numerical vs
Categorical
Categorical vs
Categorical
N...
23
24
Machine Learning Algorithms
Goal
Descriptive
Statistics
Cross
sectional
Numerical Categorical
Numerical vs
Categorical
...
25
• Supervised Algorithms
▫ Given a set of variables 𝑥𝑖, predict the value of another variable 𝑦 in a
given data set such...
26
• Unsupervised Algorithms
▫ Given a dataset with variables 𝑥𝑖, build a model that captures the
similarities in differen...
27
Supervised
Learning
algorithms
Parametric
models
Non-
Parametric
models
Supervised learning Algorithms - Prediction
28
• Parametric models
▫ Assume some functional form
▫ Fit coefficients
• Examples : Linear Regression, Neural Networks
Su...
29
• Non-Parametric models
▫ No functional form assumed
• Examples : K-nearest neighbors, Decision Trees
Supervised Learni...
• Given estimates መ𝛽0, መ𝛽1, … , መ𝛽 𝑝We can make predictions using the
formula
ො𝑦 = መ𝛽0 + መ𝛽1 𝑥1 + መ𝛽2 𝑥2 + ⋯ + መ𝛽 𝑝 𝑥 𝑝
• ...
31
• Parametric models
▫ Assume some functional form
▫ Fit coefficients
• Examples : Logistic Regression, Neural Networks
...
32
• Non-Parametric models
▫ No functional form assumed
• Examples : K-nearest neighbors, Decision Trees
Supervised Learni...
33
• Unsupervised Algorithms
▫ Given a dataset with variables 𝑥𝑖, build a model that captures the
similarities in differen...
K-means clustering
• These methods partition the data into k clusters by assigning each data point to
its closest cluster ...
35
Anomaly Detection vs Unsupervised Learning
36
Distance functions
• Euclidean distance:
37
Distance functions
• Manhattan distance:
D =|𝑋𝐴- 𝑋 𝐵|+ |𝑌𝐴- 𝑌𝐵|
38
Distance functions
• Correlation distance:
39
Machine Learning Algorithms
Machine
Learning
Supervised
Prediction
Parametric
Linear
Regression
Neural
Networks
Non-
pa...
40
41
The Process
Data
cleansing
Feature
Engineering
Training
and Testing
Model
building
Model
selection
42
• What transformations do I need for the x and y variables ?
• Which are the best features to use?
▫ Dimension Reductio...
43
Data
Training
80%
Testing
20%
Training the model
44
45
Evaluating
Machine
learning
algorithms
Supervised -
Prediction
R-square RMS MAE MAPE
Supervised-
Classification
Confusi...
46
• The prediction error for record i is defined as the difference
between its actual y value and its predicted y value
𝑒...
47
• Fit measures in classical regression modeling:
• Adjusted 𝑅2 has been adjusted for the number of predictors. It incre...
48
▫ MAPE (mean absolute percentage error) gives a percentage score of
how predictions deviate on average
𝑀𝐴𝑃𝐸 =
1
𝑛 σ𝑖=1
...
49
• Consider a two-class case with classes 𝐶0 and 𝐶1
• Classification matrix:
Classification matrix
Predicted Class
Actua...
50
• Estimated misclassification rate (overall error rate) is a main
accuracy measure
𝑒𝑟𝑟 =
𝑛0,1 + 𝑛1,0
𝑛0,0 + 𝑛0,1 + 𝑛1,0...
51
• The ROC curve plots the pairs {sensitivity, 1-specificity}
as the cutoff value increases from 0 and 1
• Sensitivity (...
Agenda
1. Data
2. Goals
3. Machine learning algorithms
4. Process
5. Performance Evaluation
53
Data
Cross
sectional
Numerical Categorical
Longitudinal
Numerical
Handling Data
54
Goal
Descriptive
Statistics
Cross
sectional
Numerical Categorical
Numerical vs
Categorical
Categorical vs
Categorical
N...
55
Machine Learning Algorithms
Machine
Learning
Supervised
Prediction
Parametric
Linear
Regression
Neural
Networks
Non-
pa...
56
The Process
Data
cleansing
Feature
Engineering
Training
and Testing
Model
building
Model
selection
57
Evaluating
Machine
learning
algorithms
Supervised -
Prediction
R-square RMS MAE MAPE
Supervised-
Classification
Confusi...
60
www.analyticscertificate.com/SparkWorkshop
61
Q&A
Slides, code and details about the Apache Spark Workshop
at: http://www.analyticscertificate.com/SparkWorkshop/
Thank you!
Members & Sponsors!
Sri Krishnamurthy, CFA, CAP
Founder and CEO
QuantUniversity LLC.
srikrishnamurthy
www.Quant...
Machine learning meetup
Machine learning meetup
Machine learning meetup
Upcoming SlideShare
Loading in …5
×

Machine learning meetup

3,678 views

Published on

With R, Python, Apache Spark and a plethora of other open source tools, anyone with a computer can run machine learning algorithms in a jiffy! However, without an understanding of which algorithms to choose and when to apply a particular technique, most machine learning efforts turn into trial and error experiments with conclusions like "The algorithms don't work" or "Perhaps we should get more data".

In this lecture, we will focus on the key tenets of machine learning algorithms and how to choose an algorithm for a particular purpose. Rather than just showing how to run experiments in R ,Python or Apache Spark, we will provide an intuitive introduction to machine learning with just enough mathematics and basic statistics.

We will address:

• How do you differentiate Clustering, Classification and Prediction algorithms?

• What are the key steps in running a machine learning algorithm?

• How do you choose an algorithm for a specific goal?

• Where does exploratory data analysis and feature engineering fit into the picture?

• Once you run an algorithm, how do you evaluate the performance of an algorithm?

Published in: Data & Analytics
  • Be the first to comment

Machine learning meetup

  1. 1. Location: QuantUniversity Meetup August 24th 2016 Boston MA Machine Learning: An intuitive foundation 2016 Copyright QuantUniversity LLC. Presented By: Sri Krishnamurthy, CFA, CAP www.QuantUniversity.com sri@quantuniversity.com
  2. 2. 2 Slides and Code will be available at: http://www.analyticscertificate.com
  3. 3. - Analytics Advisory services - Custom training programs - Architecture assessments, advice and audits
  4. 4. • Founder of QuantUniversity LLC. and www.analyticscertificate.com • Advisory and Consultancy for Financial Analytics • Prior Experience at MathWorks, Citigroup and Endeca and 25+ financial services and energy customers. • Regular Columnist for the Wilmott Magazine • Author of forthcoming book “Financial Modeling: A case study approach” published by Wiley • Charted Financial Analyst and Certified Analytics Professional • Teaches Analytics in the Babson College MBA program and at Northeastern University, Boston Sri Krishnamurthy Founder and CEO 4
  5. 5. 5 Quantitative Analytics and Big Data Analytics Onboarding • Trained more than 500 students in Quantitative methods, Data Science and Big Data Technologies using MATLAB, Python and R • Launching the Analytics Certificate Program in September
  6. 6. 7 Quantitative Analytics and Big Data Analytics Onboarding • Apply at: www.analyticscertificate.com • Program starting September 18th • Module 1: ▫ Sep 18th , 25th , Oct 2nd, 9th • Module 2: ▫ Oct 16th , 23th , 30th, Nov 6th • Module 3: ▫ Nov 13th, 20th, Dec 4th, Dec 11th • Capstone + Certification Ceremony ▫ Dec 18th
  7. 7. 8 • September ▫ 11th, 12th : Spark Workshop, Boston  www.analyticscertificate.com/SparkWorkshop  Sponsored by IBM ▫ 19th, 20th : Anomaly Detection Workshop, New York  www.analyticscertificate.com/AnomalyNYC  Sponsored by Microsoft Events of Interest
  8. 8. 9
  9. 9. Agenda 1. Data 2. Goals 3. Machine learning algorithms 4. Process 5. Performance Evaluation
  10. 10. 11
  11. 11. Dataset, variable and Observations Dataset: A rectangular array with Rows as observations and columns as variables Variable: A characteristic of members of a population ( Age, State etc.) Observation: List of Variable values for a member of the population
  12. 12. Variables  A variable could be: Categorical  Yes/No flags  AAA,BB ratings for bonds Numerical  35 mpg  $170K salary
  13. 13. Datasets • Longitudinal ▫ Observations are dependent ▫ Temporal-continuity is required • Cross-sectional ▫ Observations are independent
  14. 14. 15 Data Cross sectional Numerical Categorical Longitudinal Numerical Summary
  15. 15. 16
  16. 16. 17 • Descriptive Statistics ▫ Goal is to describe the data at hand ▫ Backward looking ▫ Statistical techniques employed here • Predictive Analytics ▫ Goal is to use historical data to build a model for prediction ▫ Forward looking ▫ Machine learning techniques employed here Goal
  17. 17. 18 • How do you summarize numerical variables ? • How do you summarize categorical variables ? • How do you describe variability in numerical variables ? • How do you summarize relationships between categorical and numerical variables ? • How do you summarize relationships between 2 numerical variables? Descriptive Statistics – Cross sectional datasets See Data Analysis Taxonomy.xlsx
  18. 18. 19 • Goal is to extract the various components Longitudinal datasets
  19. 19. 20 • Given a dataset, build a model that captures the similarities in different observations and assigns them to different buckets. • Given a set of variables, predict the value of another variable in a given data set ▫ Predict Salaries given work experience, education etc. ▫ Predict whether a loan would be approved given fico score, current loans, employment status etc. Predictive Analytics : Cross sectional datasets
  20. 20. 21 • Given a time series dataset, build a model that can be used to forecast values in the future Predictive Analytics : Time series datasets
  21. 21. 22 Goal Descriptive Statistics Cross sectional Numerical Categorical Numerical vs Categorical Categorical vs Categorical Numerical vs Numerical Time series Predictive Analytics Cross- sectional Segmentation Prediction Predict a number Predict a category Time-series Summary
  22. 22. 23
  23. 23. 24 Machine Learning Algorithms Goal Descriptive Statistics Cross sectional Numerical Categorical Numerical vs Categorical Categorical vs Categorical Numerical vs Numerical Time series Predictive Analytics Cross- sectional Segmentation Prediction Predict a number Predict a category Time-series
  24. 24. 25 • Supervised Algorithms ▫ Given a set of variables 𝑥𝑖, predict the value of another variable 𝑦 in a given data set such that ▫ If y is numeric => Prediction ▫ If y is categorical => Classification Machine Learning x1,x2,x3… Model F(X) y
  25. 25. 26 • Unsupervised Algorithms ▫ Given a dataset with variables 𝑥𝑖, build a model that captures the similarities in different observations and assigns them to different buckets => Clustering Machine Learning Obs1, Obs2,Obs3 etc. Model Obs1- Class 1 Obs2- Class 2 Obs3- Class 1
  26. 26. 27 Supervised Learning algorithms Parametric models Non- Parametric models Supervised learning Algorithms - Prediction
  27. 27. 28 • Parametric models ▫ Assume some functional form ▫ Fit coefficients • Examples : Linear Regression, Neural Networks Supervised Learning models - Prediction 𝑌 = 𝛽0 + 𝛽1 𝑋1 Linear Regression Model Neural network Model
  28. 28. 29 • Non-Parametric models ▫ No functional form assumed • Examples : K-nearest neighbors, Decision Trees Supervised Learning models K-nearest neighbor Model Decision tree Model
  29. 29. • Given estimates መ𝛽0, መ𝛽1, … , መ𝛽 𝑝We can make predictions using the formula ො𝑦 = መ𝛽0 + መ𝛽1 𝑥1 + መ𝛽2 𝑥2 + ⋯ + መ𝛽 𝑝 𝑥 𝑝 • The parameters are estimated using the least squares approach to minimize the sum of squared errors 𝑅𝑆𝑆 = ෍ 𝑖=1 𝑛 (𝑦𝑖 − ො𝑦𝑖)2 Multiple linear regression 30
  30. 30. 31 • Parametric models ▫ Assume some functional form ▫ Fit coefficients • Examples : Logistic Regression, Neural Networks Supervised Learning models - Classification Logistic Regression Model Neural network Model
  31. 31. 32 • Non-Parametric models ▫ No functional form assumed • Examples : K-nearest neighbors, Decision Trees Supervised Learning models K-nearest neighbor Model Decision tree Model
  32. 32. 33 • Unsupervised Algorithms ▫ Given a dataset with variables 𝑥𝑖, build a model that captures the similarities in different observations and assigns them to different buckets => Clustering Machine Learning Obs1, Obs2,Obs3 etc. Model Obs1- Class 1 Obs2- Class 2 Obs3- Class 1
  33. 33. K-means clustering • These methods partition the data into k clusters by assigning each data point to its closest cluster centroid by minimizing the within-cluster sum of squares (WSS), which is: ෍ 𝑘=1 𝐾 ෍ 𝑖∈𝑆 𝑘 ෍ 𝑗=1 𝑃 (𝑥𝑖𝑗 − 𝜇 𝑘𝑗)2 where 𝑆 𝑘 is the set of observations in the kth cluster and 𝜇 𝑘𝑗 is the mean of jth variable of the cluster center of the kth cluster. • Then, they select the top n points that are the farthest away from their nearest cluster centers as outliers. 34
  34. 34. 35 Anomaly Detection vs Unsupervised Learning
  35. 35. 36 Distance functions • Euclidean distance:
  36. 36. 37 Distance functions • Manhattan distance: D =|𝑋𝐴- 𝑋 𝐵|+ |𝑌𝐴- 𝑌𝐵|
  37. 37. 38 Distance functions • Correlation distance:
  38. 38. 39 Machine Learning Algorithms Machine Learning Supervised Prediction Parametric Linear Regression Neural Networks Non- parametric KNN Decision Trees Classification Parametric Logistic Regression Neural Networks Non Parametric Decision Trees KNN Unsupervised algorithms K-means Associative rule mining
  39. 39. 40
  40. 40. 41 The Process Data cleansing Feature Engineering Training and Testing Model building Model selection
  41. 41. 42 • What transformations do I need for the x and y variables ? • Which are the best features to use? ▫ Dimension Reduction – PCA ▫ Best subset selection  Forward selection  Backward elimination  Stepwise regression Feature Engineering
  42. 42. 43 Data Training 80% Testing 20% Training the model
  43. 43. 44
  44. 44. 45 Evaluating Machine learning algorithms Supervised - Prediction R-square RMS MAE MAPE Supervised- Classification Confusion Matrix ROC Curves Evaluation framework
  45. 45. 46 • The prediction error for record i is defined as the difference between its actual y value and its predicted y value 𝑒𝑖 = 𝑦𝑖 − ො𝑦𝑖 • 𝑅2 indicates how well data fits the statistical model 𝑅2 = 1 − σ𝑖=1 𝑛 (𝑦𝑖 − ො𝑦𝑖)2 σ𝑖=1 𝑛 (𝑦𝑖 − ത𝑦𝑖)2 Prediction Accuracy Measures
  46. 46. 47 • Fit measures in classical regression modeling: • Adjusted 𝑅2 has been adjusted for the number of predictors. It increases only when the improve of model is more than one would expect to see by chance (p is the total number of explanatory variables) 𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅2 = 1 − Τσ𝑖=1 𝑛 (𝑦𝑖 − ො𝑦𝑖)2 (𝑛 − 𝑝 − 1) σ𝑖=1 𝑛 𝑦𝑖 − ത𝑦𝑖 2 /(𝑛 − 1) • MAE or MAD (mean absolute error/deviation) gives the magnitude of the average absolute error 𝑀𝐴𝐸 = 1 𝑛 σ𝑖=1 𝑛 𝑒𝑖 Prediction Accuracy Measures
  47. 47. 48 ▫ MAPE (mean absolute percentage error) gives a percentage score of how predictions deviate on average 𝑀𝐴𝑃𝐸 = 1 𝑛 σ𝑖=1 𝑛 𝑒𝑖/𝑦𝑖 × 100% • RMSE (root-mean-squared error) is computed on the training and validation data 𝑅𝑀𝑆𝐸 = 1/𝑛 ෍ 𝑖=1 𝑛 𝑒𝑖 2 Prediction Accuracy Measures
  48. 48. 49 • Consider a two-class case with classes 𝐶0 and 𝐶1 • Classification matrix: Classification matrix Predicted Class Actual Class 𝐶0 𝐶1 𝐶0 𝑛0,0= number of 𝐶0 cases classified correctly 𝑛0,1= number of 𝐶0 cases classified incorrectly as 𝐶1 𝐶1 𝑛1,0= number of 𝐶1 cases classified incorrectly as 𝐶0 𝑛1,1= number of 𝐶1 cases classified correctly
  49. 49. 50 • Estimated misclassification rate (overall error rate) is a main accuracy measure 𝑒𝑟𝑟 = 𝑛0,1 + 𝑛1,0 𝑛0,0 + 𝑛0,1 + 𝑛1,0 + 𝑛1,1 = 𝑛0,1 + 𝑛1,0 𝑛 • Overall accuracy: 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 1 − 𝑒𝑟𝑟 = 𝑛0,0 + 𝑛1,1 𝑛 Accuracy Measures
  50. 50. 51 • The ROC curve plots the pairs {sensitivity, 1-specificity} as the cutoff value increases from 0 and 1 • Sensitivity (also called the true positive rate, or the recall in some fields) measures the proportion of positives that are correctly identified (e.g., the percentage of sick people who are correctly identified as having the condition). • Specificity (also called the true negative rate) measures the proportion of negatives that are correctly identified as such (e.g., the percentage of healthy people who are correctly identified as not having the condition). • Better performance is reflected by curves that are closer to the top left corner ROC Curve
  51. 51. Agenda 1. Data 2. Goals 3. Machine learning algorithms 4. Process 5. Performance Evaluation
  52. 52. 53 Data Cross sectional Numerical Categorical Longitudinal Numerical Handling Data
  53. 53. 54 Goal Descriptive Statistics Cross sectional Numerical Categorical Numerical vs Categorical Categorical vs Categorical Numerical vs Numerical Time series Predictive Analytics Cross- sectional Segmentation Prediction Predict a number Predict a category Time-series Goal
  54. 54. 55 Machine Learning Algorithms Machine Learning Supervised Prediction Parametric Linear Regression Neural Networks Non- parametric KNN Decision Trees Classification Parametric Logistic Regression Neural Networks Non Parametric Decision Trees KNN Unsupervised algorithms K-means Associative rule mining
  55. 55. 56 The Process Data cleansing Feature Engineering Training and Testing Model building Model selection
  56. 56. 57 Evaluating Machine learning algorithms Supervised - Prediction R-square RMS MAE MAPE Supervised- Classification Confusion Matrix ROC Curves Evaluation framework
  57. 57. 60 www.analyticscertificate.com/SparkWorkshop
  58. 58. 61 Q&A Slides, code and details about the Apache Spark Workshop at: http://www.analyticscertificate.com/SparkWorkshop/
  59. 59. Thank you! Members & Sponsors! Sri Krishnamurthy, CFA, CAP Founder and CEO QuantUniversity LLC. srikrishnamurthy www.QuantUniversity.com Contact Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be distributed or used in any other publication without the prior written consent of QuantUniversity LLC. 62

×