Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Overview of Machine
Learning & Feature
Engineering
Machine Learning 101 Tutorial
Strata + Hadoop World, NYC, Sep 2015
Alic...
2
About us
Chris DuBois
Intro to recommenders
Alice Zheng
Overview of ML
Piotr Teterwak
Intro to image search & deep learn...
3
Why machine learning?
Model data.
Make predictions.
Build intelligent
applications.
Classification
Predict amongst a discrete set of classes
4
5
Input Output
6
Spam filtering
data prediction
Spam
vs.
Not spam
Text classification
EDUCATION
FINANCE
TECHNOLOGY
Regression
Predict real/numeric values
8
9
Stock market
Input
Output
Similarity
Find things like this
10
11
Similar products
Product I’m buying
Output: other products I might be interested in
12
Given image, find similar images
http://www.tiltomo.com/
Recommender systems
Learn what I want before I know it
13
14
15
Playlist recommendations
Recommendations form
coherent & diverse sequence
16
Friend recommendations
Users and “items” are of
the same type
Clustering
Grouping similar items
17
18
Clustering images
Goldberger et al.
Set of Images
19
Clustering web search results
20
Machine learning … how?
Data
Answers
I fell in love the instant I laid
my eyes on that puppy. His
big eyes and playful ...
21
The machine learning pipeline
I fell in love the instant I laid
my eyes on that puppy. His
big eyes and playful tail, h...
22
Three things to know about ML
• Feature = numeric representation of raw data
• Model = mathematical “summary” of featur...
Feature = numeric representation of raw data
24
Representing natural text
It is a puppy and it is
extremely cute.
What’s important?
Phrases? Specific
words? Ordering?
...
25
Representing natural text
It is a puppy and it is
extremely cute.
Classify:
puppy or not?
Raw Text Bag of Words
it 2
th...
26
Representing images
Image source: “Recognizing and learning object categories,”
Li Fei-Fei, Rob Fergus, Anthony Torralb...
27
Representing images
Classify:
person or animal?
Raw Image Deep learning features
3.29
-15
-5.24
48.3
1.36
47.1
-
1.92
3...
28
Feature space in machine learning
• Raw data  high dimensional vectors
• Collection of data points  point cloud in fe...
Crudely speaking, mathematicians fall into two
categories: the algebraists, who find it easiest to reduce
all problems to ...
30
Algebra vs. Geometry
a
b
c
a2 + b2 = c2
Algebra Geometry
Pythagorean
Theorem
(Euclidean space)
31
Visualizing a sphere in 2D
x2 + y2 = 1
a
b
c
Pythagorean theorem:
a2 + b2 = c2
x
y
1
1
32
Visualizing a sphere in 3D
x2 + y2 + z2 = 1
x
y
z
1
1
1
33
Visualizing a sphere in 4D
x2 + y2 + z2 + t2 = 1
x
y
z
1
1
1
34
Why are we looking at spheres?
= =
= =
Poincaré Conjecture:
All physical objects without holes
is “equivalent” to a sph...
35
The power of higher dimensions
• A sphere in 4D can model the birth and death process of
physical objects
• High dimens...
Visualizing Feature Space
37
The challenge of high dimension geometry
• Feature space can have hundreds to millions of
dimensions
• In high dimensio...
38
Visualizing bag-of-words
puppy
cute
1
1
I have a puppy and
it is extremely cute
I have a puppy and
it is extremely cute...
39
Visualizing bag-of-words
puppy
cute
1
1
1
extremely
I have a puppy and
it is extremely cute
I have an extremely
cute ca...
40
Document point cloud
word 1
word 2
Model = mathematical “summary” of features
42
What is a summary?
• Data  point cloud in feature space
• Model = a geometric shape that best “fits” the point cloud
43
Clustering model
Feature 2
Feature 1
Group data points tightly
44
Classification model
Feature 2
Feature 1
Decide between two classes
45
Regression model
Target
Feature
Fit the target values
Visualizing Feature Engineering
47
When does bag-of-words fail?
puppy
cat
2
1
1
have
I have a puppy
I have a cat
I have a kitten
Task: find a surface that...
48
Improving on bag-of-words
• Idea: “normalize” word counts so that popular words
are discounted
• Term frequency (tf) = ...
49
From BOW to tf-idf
puppy
cat
2
1
1
have
I have a puppy
I have a cat
I have a kitten
idf(puppy) = log 4
idf(cat) = log 4...
50
From BOW to tf-idf
puppy
cat1
have
tfidf(puppy) = log 4
tfidf(cat) = log 4
tfidf(have) = 0
I have a dog
and I have a pe...
51
Entry points of feature engineering
• Start from data and task
- What’s the best text representation for classification...
Dato’s Machine Learning Platform
53
Dato’s machine learning platform
Raw data
Features Models
Predictions
Deploy in
production
GraphLab Create
Dato Distrib...
54
Data structures for feature engineering
Features SFrames
User Com.
Title Body
User Disc.
SGraphs
55
Machine learning toolkits in GraphLab Create
• Classification/regression
• Clustering
• Recommenders
• Deep learning
• ...
Demo
57
Dimensionality reduction
Feature 1
Feature 2
Flatten non-useful features
PCA: Find most non-flat
linear subspace
58
PCA : Principal Component Analysis
Center data at origin
59
PCA : Principal Component Analysis
Find a line, such that
the average distance of
every data point to the
line is minim...
60
PCA : Principal Component Analysis
Find a 2nd line,
- at right angles to the 1st
- such that the average
distance of ev...
61
PCA : Principal Component Analysis
Find a 3rd line
- at right angles to the
previous lines
- such that the average
dist...
Demo
63
Coursera Machine Learning Specialization
• Learn machine learning in depth
• Build and deploy intelligent applications
...
64
Next up today
alicez@dato.com @RainyData, #StrataConf
11:30am - Intro to recommenders
Chris DuBois
1:30pm - Intro to im...
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
Feature Engineering
Next
Upcoming SlideShare
Feature Engineering
Next
Download to read offline and view in fullscreen.

Share

Overview of Machine Learning and Feature Engineering

Download to read offline

Machine Learning 101 Tutorial at Strata NYC, Sep 2015
Overview of machine learning models and features. Visualization of feature space and feature engineering methods.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Overview of Machine Learning and Feature Engineering

  1. 1. Overview of Machine Learning & Feature Engineering Machine Learning 101 Tutorial Strata + Hadoop World, NYC, Sep 2015 Alice Zheng, Dato 1
  2. 2. 2 About us Chris DuBois Intro to recommenders Alice Zheng Overview of ML Piotr Teterwak Intro to image search & deep learning Krishna Sridhar Deploying ML as a predictive service Danny Bickson TA Alon Palombo TA
  3. 3. 3 Why machine learning? Model data. Make predictions. Build intelligent applications.
  4. 4. Classification Predict amongst a discrete set of classes 4
  5. 5. 5 Input Output
  6. 6. 6 Spam filtering data prediction Spam vs. Not spam
  7. 7. Text classification EDUCATION FINANCE TECHNOLOGY
  8. 8. Regression Predict real/numeric values 8
  9. 9. 9 Stock market Input Output
  10. 10. Similarity Find things like this 10
  11. 11. 11 Similar products Product I’m buying Output: other products I might be interested in
  12. 12. 12 Given image, find similar images http://www.tiltomo.com/
  13. 13. Recommender systems Learn what I want before I know it 13
  14. 14. 14
  15. 15. 15 Playlist recommendations Recommendations form coherent & diverse sequence
  16. 16. 16 Friend recommendations Users and “items” are of the same type
  17. 17. Clustering Grouping similar items 17
  18. 18. 18 Clustering images Goldberger et al. Set of Images
  19. 19. 19 Clustering web search results
  20. 20. 20 Machine learning … how? Data Answers I fell in love the instant I laid my eyes on that puppy. His big eyes and playful tail, his soft furry paws, … Many systems Many tools Many teams Lots of methods/jargon
  21. 21. 21 The machine learning pipeline I fell in love the instant I laid my eyes on that puppy. His big eyes and playful tail, his soft furry paws, … Raw data Features Models Predictions Deploy in production
  22. 22. 22 Three things to know about ML • Feature = numeric representation of raw data • Model = mathematical “summary” of features • Making something that works = choose the right model and features, given data and task
  23. 23. Feature = numeric representation of raw data
  24. 24. 24 Representing natural text It is a puppy and it is extremely cute. What’s important? Phrases? Specific words? Ordering? Subject, object, verb? Classify: puppy or not? Raw Text {“it”:2, “is”:2, “a”:1, “puppy”:1, “and”:1, “extremely”:1, “cute”:1 } Bag of Words
  25. 25. 25 Representing natural text It is a puppy and it is extremely cute. Classify: puppy or not? Raw Text Bag of Words it 2 they 0 I 1 am 0 how 0 puppy 1 and 1 cat 0 aardvark 0 cute 1 extremely 1 … … Sparse vector representation
  26. 26. 26 Representing images Image source: “Recognizing and learning object categories,” Li Fei-Fei, Rob Fergus, Anthony Torralba, ICCV 2005—2009. Raw image: millions of RGB triplets, one for each pixel Classify: person or animal? Raw Image Bag of Visual Words
  27. 27. 27 Representing images Classify: person or animal? Raw Image Deep learning features 3.29 -15 -5.24 48.3 1.36 47.1 - 1.92 36.5 2.83 95.4 -19 -89 5.09 37.8 Dense vector representation
  28. 28. 28 Feature space in machine learning • Raw data  high dimensional vectors • Collection of data points  point cloud in feature space • Feature engineering = creating features of the appropriate granularity for the task
  29. 29. Crudely speaking, mathematicians fall into two categories: the algebraists, who find it easiest to reduce all problems to sets of numbers and variables, and the geometers, who understand the world through shapes. -- Masha Gessen, “Perfect Rigor”
  30. 30. 30 Algebra vs. Geometry a b c a2 + b2 = c2 Algebra Geometry Pythagorean Theorem (Euclidean space)
  31. 31. 31 Visualizing a sphere in 2D x2 + y2 = 1 a b c Pythagorean theorem: a2 + b2 = c2 x y 1 1
  32. 32. 32 Visualizing a sphere in 3D x2 + y2 + z2 = 1 x y z 1 1 1
  33. 33. 33 Visualizing a sphere in 4D x2 + y2 + z2 + t2 = 1 x y z 1 1 1
  34. 34. 34 Why are we looking at spheres? = = = = Poincaré Conjecture: All physical objects without holes is “equivalent” to a sphere.
  35. 35. 35 The power of higher dimensions • A sphere in 4D can model the birth and death process of physical objects • High dimensional features can model many things
  36. 36. Visualizing Feature Space
  37. 37. 37 The challenge of high dimension geometry • Feature space can have hundreds to millions of dimensions • In high dimensions, our geometric imagination is limited - Algebra comes to our aid
  38. 38. 38 Visualizing bag-of-words puppy cute 1 1 I have a puppy and it is extremely cute I have a puppy and it is extremely cute it 1 they 0 I 1 am 0 how 0 puppy 1 and 1 cat 0 aardvark 0 zebra 0 cute 1 extremely 1 … …
  39. 39. 39 Visualizing bag-of-words puppy cute 1 1 1 extremely I have a puppy and it is extremely cute I have an extremely cute cat I have a cute puppy
  40. 40. 40 Document point cloud word 1 word 2
  41. 41. Model = mathematical “summary” of features
  42. 42. 42 What is a summary? • Data  point cloud in feature space • Model = a geometric shape that best “fits” the point cloud
  43. 43. 43 Clustering model Feature 2 Feature 1 Group data points tightly
  44. 44. 44 Classification model Feature 2 Feature 1 Decide between two classes
  45. 45. 45 Regression model Target Feature Fit the target values
  46. 46. Visualizing Feature Engineering
  47. 47. 47 When does bag-of-words fail? puppy cat 2 1 1 have I have a puppy I have a cat I have a kitten Task: find a surface that separates documents about dogs vs. cats Problem: the word “have” adds fluff instead of information I have a dog and I have a pen 1
  48. 48. 48 Improving on bag-of-words • Idea: “normalize” word counts so that popular words are discounted • Term frequency (tf) = Number of times a terms appears in a document • Inverse document frequency of word (idf) = • N = total number of documents • Tf-idf count = tf x idf
  49. 49. 49 From BOW to tf-idf puppy cat 2 1 1 have I have a puppy I have a cat I have a kitten idf(puppy) = log 4 idf(cat) = log 4 idf(have) = log 1 = 0 I have a dog and I have a pen 1
  50. 50. 50 From BOW to tf-idf puppy cat1 have tfidf(puppy) = log 4 tfidf(cat) = log 4 tfidf(have) = 0 I have a dog and I have a pen, I have a kitten 1 log 4 log 4 I have a cat I have a puppy Decision surface Tf-idf flattens uninformative dimensions in the BOW point cloud
  51. 51. 51 Entry points of feature engineering • Start from data and task - What’s the best text representation for classification? • Start from modeling method - What kind of features does k-means assume? - What does linear regression assume about the data?
  52. 52. Dato’s Machine Learning Platform
  53. 53. 53 Dato’s machine learning platform Raw data Features Models Predictions Deploy in production GraphLab Create Dato Distributed Dato Predictive Services
  54. 54. 54 Data structures for feature engineering Features SFrames User Com. Title Body User Disc. SGraphs
  55. 55. 55 Machine learning toolkits in GraphLab Create • Classification/regression • Clustering • Recommenders • Deep learning • Similarity search • Data matching • Sentiment analysis • Churn prediction • Frequent pattern mining • And on…
  56. 56. Demo
  57. 57. 57 Dimensionality reduction Feature 1 Feature 2 Flatten non-useful features PCA: Find most non-flat linear subspace
  58. 58. 58 PCA : Principal Component Analysis Center data at origin
  59. 59. 59 PCA : Principal Component Analysis Find a line, such that the average distance of every data point to the line is minimized. This is the 1st Principal Component
  60. 60. 60 PCA : Principal Component Analysis Find a 2nd line, - at right angles to the 1st - such that the average distance of every data point to the line is minimized. This is the 2nd Principal Component
  61. 61. 61 PCA : Principal Component Analysis Find a 3rd line - at right angles to the previous lines - such that the average distance of every data point to the line is minimized. … There can only be as many principle components as the dimensionality of the data.
  62. 62. Demo
  63. 63. 63 Coursera Machine Learning Specialization • Learn machine learning in depth • Build and deploy intelligent applications • Year long certification program • Joint project between University of Washington + Dato • Details: https://www.coursera.org/specializations/machine-learning
  64. 64. 64 Next up today alicez@dato.com @RainyData, #StrataConf 11:30am - Intro to recommenders Chris DuBois 1:30pm - Intro to image search & deep learning Piotr Teterwak 3:30pm - Deploying ML as a predictive service Krishna Sridhar
  • IdyNiangSene

    Jun. 2, 2021
  • sonalismore

    Oct. 20, 2020
  • Mehedihasan1185

    Sep. 9, 2020
  • SIDDHARTHANAND38

    Jun. 12, 2020
  • ManoharShetty3

    May. 14, 2020
  • davye

    Apr. 29, 2020
  • shanthiniRavivarma

    Feb. 18, 2020
  • omaruriel

    Jan. 27, 2020
  • NovicaRistic1

    Nov. 26, 2019
  • andiyaaa

    Nov. 23, 2019
  • eric2323223

    May. 7, 2019
  • JooPauloNogueira4

    Feb. 21, 2019
  • VithenkaKiran

    Sep. 18, 2018
  • Magarya1

    Aug. 1, 2018
  • welinda33

    Jul. 21, 2018
  • kennethowino9

    Jul. 4, 2018
  • ssuser24804f

    Jun. 26, 2018
  • IgnacioGarciaVicente

    Apr. 6, 2018
  • AnilIngale2

    Mar. 5, 2018
  • KisishviliTamuna

    Mar. 1, 2018

Machine Learning 101 Tutorial at Strata NYC, Sep 2015 Overview of machine learning models and features. Visualization of feature space and feature engineering methods.

Views

Total views

13,271

On Slideshare

0

From embeds

0

Number of embeds

218

Actions

Downloads

981

Shares

0

Comments

0

Likes

151

×