Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
{
Introduction to
Machine Learning
Ryo Onozuka
2016/11/30
 It is becoming popular that using machine
learning in business models.
 Understanding what machine learning does
inside...
What is Machine Learning?
 They are algorithm that is becoming wiser by
incorporating experience (i.e. data).
 They get ...
Rule Based Approach:
Before Machine Learning
 How can we classify people in this campus into some categories.
 Make a ru...
How Machine Learning Solve this?
 There are some approaches.
 Supervised Learning
 Regression, Naïve Bayes, k-NN, Neura...
How Supervised Learning Solve
this?
 First, we need LARGE amount of appropriate data on people in this
campus.
 Second, ...
 It is like polls.
How do it learn a rule to
classify?
Feature Undergrad. Teacher
Tired +1 +1
Not tired +2
Young +3
Not y...
 How will it classify this person?
When a new person come up
Feature Undergrad. Teacher
Tired +1 +1
Not tired +2
Young +3...
 1) How will it classify this person?
 2) Do you have any idea to improve its
accuracy?
 Change features that it extrac...
How Unsupervised Learning Solve
this?
Id Young Hair Fashion Tired
1 young bold t shirts not tired
2 young black suits tire...
Make a distance matrix
Id young
not
young
bold black white
t
shirts
suits tired
not
tired
1 1 1 1 1
2 1 1 1 1
3 1 1 1 1
4 ...
Calculate Euclid Distance
Id young
not
young
bold black white
t
shirts
suits tired
not
tired
Distan
ce
1 1 1 1 1
2 1 1 1 1...
Find the most similar people
1 2 3 4
1 1 √6 2√2 √2
2 √6 1 2 2
3 2√2 2 1 2√2
4 √2 2 2√2 1
↓ Distance Matrix
Id Young Hair F...
1 2 3 4
1 1 √6 2√2 √2
2 √6 1 2 2
3 2√2 2 1 2√2
4 √2 2 2√2 1
Show the Result in a form of
Dendrogram
↓ Distance Matrix
Id Y...
Name clusters
Id Young Hair
Fashio
n
Tired
1 young bold t shirts
not
tired
2 young black suits tired
3
not
young
white sui...
 Make a new distance matrix and a new
dendrogram
 Name each cluster
 What cluster will this person belong to?
 Do you ...
 You are a data analyst of smart phone game company.
You want to predict whether new customer will buy
premium membership...
Upcoming SlideShare
Loading in …5
×

A very easy explanation to understanding machine learning (Supervised & Unsupervised Learning)

1,278 views

Published on

Explain machine learning very simply using simplified algorithm of supervised and unsupervised learning.

Published in: Data & Analytics

A very easy explanation to understanding machine learning (Supervised & Unsupervised Learning)

  1. 1. { Introduction to Machine Learning Ryo Onozuka 2016/11/30
  2. 2.  It is becoming popular that using machine learning in business models.  Understanding what machine learning does inside it is needed to analyze those business models. Why Machine Learning? 2012 20162014 Big Data Machine Learning Data Analysis Data from Google Trends
  3. 3. What is Machine Learning?  They are algorithm that is becoming wiser by incorporating experience (i.e. data).  They get knowledge and rules automatically from data.  They emerged when past researches on AI were faced with the limit that human explicitly supplied AI with knowledge and rules. Training Data Classification Model Estimate Boundary
  4. 4. Rule Based Approach: Before Machine Learning  How can we classify people in this campus into some categories.  Make a rule of the classification by human.  When the amount of data are small or data is difficult to quantify, the result from human’s inference is better. Machine cannot answer what it didn’t know. Looks young? Looks tired? Teacher Undergraduate students Graduate students Not young Category Condition Young Not tired Tried
  5. 5. How Machine Learning Solve this?  There are some approaches.  Supervised Learning  Regression, Naïve Bayes, k-NN, Neural Network, etc.  Unsupervised Learning  k-means  Semi-supervised Learning  Reinforcement Learning Important!  Main purpose  Classification, Prediction, Recommendation
  6. 6. How Supervised Learning Solve this?  First, we need LARGE amount of appropriate data on people in this campus.  Second, we need to extract FEATURES from each person.  Third, we select one algorithm and train it. Label Undergrad. Undergrad. Teacher Undergrad. ↓Training data for supervised learning Id Young Hair Fashion Tired 1 young blond t shirts not tired 2 young black suits tired 3 not young white suits tired 4 young black t shirts not tired
  7. 7.  It is like polls. How do it learn a rule to classify? Feature Undergrad. Teacher Tired +1 +1 Not tired +2 Young +3 Not young +1 Blond hair +1 Id Young Hair Fashion Tired 1 young blond t shirts not tired 2 young black suits tired 3 not young white suits tired 4 young black t shirts not tired Label Undergrad. Undergrad. Teacher Undergrad. Feature Undergrad. Teacher Black hair +1 White hair +1 suits +1 +1 t shirts +2
  8. 8.  How will it classify this person? When a new person come up Feature Undergrad. Teacher Tired +1 +1 Not tired +2 Young +3 Not young +1 Blond hair +1 Feature Undergrad. Teacher Black hair +1 White hair +1 suits +1 +1 t shirts +2 Tired, young, black hair, suits Undergrad. Teacher 6 points 2 points Undergrad. !!
  9. 9.  1) How will it classify this person?  2) Do you have any idea to improve its accuracy?  Change features that it extracts from people?  Change a way to gather data for training?  Change algorithm of it? Question (10 min.) Not tired, not young, black hair, suits
  10. 10. How Unsupervised Learning Solve this? Id Young Hair Fashion Tired 1 young bold t shirts not tired 2 young black suits tired 3 not young white suits tired 4 young black t shirts not tired Label Undergrad. Undergrad. Teacher Undergrad. ↓Training data for unsupervised learning Unsupervised learning don’t need labels Id young not young bold black white t shirts suits tired not tired 1 1 1 1 1 2 1 1 1 1 3 1 1 1 1 4 1 1 1 1
  11. 11. Make a distance matrix Id young not young bold black white t shirts suits tired not tired 1 1 1 1 1 2 1 1 1 1 3 1 1 1 1 4 1 1 1 1 Calculate distance between every pair. There are many types of distance. 1) Euclid distance 2) Correlation coefficient 3) Cosine similarity, etc.
  12. 12. Calculate Euclid Distance Id young not young bold black white t shirts suits tired not tired Distan ce 1 1 1 1 1 2 1 1 1 1 (1-1)2 (0-0)2 (1-0)2 (0-1)2 (0-0)2 (1-0)2 (0-1)2 (0-1)2 (1-0)2 √6 1 1 1 1 1 3 1 1 1 1 (1-0)2 (0-1)2 (1-0)2 (0-0)2 (0-1)2 (1-0)2 (0-1)2 (0-1)2 (1-0)2 2√2 1 1 1 1 1 4 1 1 1 1 (1-1)2 (0-0)2 (1-0)2 (0-1)2 (0-0)2 (1-1)2 (0-0)2 (0-0)2 (1-1)2 √2 2 1 1 1 1 3 1 1 1 1 (1-0)2 (0-1)2 (0-0)2 (1-0)2 (0-1)2 (0-0)2 (1-1)2 (1-1)2 (0-0)2 2 3 1 1 1 1
  13. 13. Find the most similar people 1 2 3 4 1 1 √6 2√2 √2 2 √6 1 2 2 3 2√2 2 1 2√2 4 √2 2 2√2 1 ↓ Distance Matrix Id Young Hair Fashion Tired 1 young bold t shirts not tired 2 young black suits tired 3 not young white suits tired 4 young black t shirts not tired ↓ Gathered Data The nearest! The nearest! 2nd nearest 2nd nearest
  14. 14. 1 2 3 4 1 1 √6 2√2 √2 2 √6 1 2 2 3 2√2 2 1 2√2 4 √2 2 2√2 1 Show the Result in a form of Dendrogram ↓ Distance Matrix Id Young Hair Fashio n Tired 1 young bold t shirts not tired 2 young black suits tired 3 not young white suits tired not ↓ Gathered Data 1 4 2 3 Distance 3 2 0 Cluster A Cluster B Use furthest one as distance between clusters
  15. 15. Name clusters Id Young Hair Fashio n Tired 1 young bold t shirts not tired 2 young black suits tired 3 not young white suits tired 4 young black t shirts not tired ↓ Gathered Data 1 4 2 3 Distance -1 0 1 Cluster A Cluster B 1 4 Cluster A 2 3 Cluster B Not tired young cluster? Tired suits cluster? →Are there better names?
  16. 16.  Make a new distance matrix and a new dendrogram  Name each cluster  What cluster will this person belong to?  Do you have any idea to improve this result? Question (15 min.) Not tired, not young, black hair, suits
  17. 17.  You are a data analyst of smart phone game company. You want to predict whether new customer will buy premium membership or not.  1) Which algorithm do you use?  2) How do you gather data?  3) If you use supervised learning, how to make label for training?  4) What features will you extract from user activities? Discussion (20 min.)

×