VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
Ridho Rahmadi ML Models Learning May 10
1. Models and Learning
Ridho Rahmadi
Center of Data Science UII
May 10, 2020
Ridho Rahmadi (Center of Data Science UII) Models and Learning May 10, 2020 1 / 30
3. Artificial Intelligence (AI)
Note that most of processes are automatic procedure.
Ridho Rahmadi (Center of Data Science UII) Models and Learning May 10, 2020 3 / 30
4. ML, DL, CM
Machine Learning (ML), Deep Learning (DL), and Causal Modeling (CM) are parts of
AI.
Ridho Rahmadi (Center of Data Science UII) Models and Learning May 10, 2020 4 / 30
5. Redefining Data
49K photos in Instagram
3.9M Google searches
4.3M Youtube watch
473K Twitter tweets
12.9M text sent
750K Spotify stream
156M emails sent
154K Skype calls
Ridho Rahmadi (Center of Data Science UII) Models and Learning May 10, 2020 5 / 30
6. Data To Expect
Estimated there are
> 2.500.000.000.000.000.000 bytes
generated per day
Ridho Rahmadi (Center of Data Science UII) Models and Learning May 10, 2020 6 / 30
7. Perspectives in Data Science
Problem Activity Questions Examples
Association
P(y|x)
Seeing
What is?
How would seeing X
change my belief in Y ?
What does a symptom tell
me about a disease?
Intervention
P(y|do(x), z)
Doing
What if?
What if I do X = x?
What if I take aspirin, will
my headache be cured?
Counterfactual
P(yx|x0
, y0
)
Imagining
Why?
Was it X that caused Y ?
What if I had acted differ-
ently?
What if I had not been
smoking the past 2 years?
Ridho Rahmadi (Center of Data Science UII) Models and Learning May 10, 2020 7 / 30
8. Machine Learning
Data {x, y}
House area
Calories intake
Supervised
Machine
Learning f
Linear regression
Polynomial regression
etc.
Linear regression model
E.g., House price,
Weight gain
Ridho Rahmadi (Center of Data Science UII) Models and Learning May 10, 2020 8 / 30
9. Machine Learning
Data {x, y}
Twitter tweets
Clinical assessment
Students’ study hour
Supervised
Machine
Learning f
Logistic regression
Naiv̈e Bayes Classifier
Random forest
Support vector machine
etc.
Classification model
E.g., Hoax or not
Diabetes or not
Pass exam or not
Ridho Rahmadi (Center of Data Science UII) Models and Learning May 10, 2020 9 / 30
10. Example when x and y are continuous
1 2 3 4
1
2
3
4
5
6
eat 1 cookie
eat 2 cookies
cookies
Kg
What if I eat 3 cookies?
Ridho Rahmadi (Center of Data Science UII) Models and Learning May 10, 2020 10 / 30
11. Extend the problem
1,000 2,000 3,000 4,000
200
400
600
Area in m2
Price
What is the price of a house if the area is 558 M2?
Ridho Rahmadi (Center of Data Science UII) Models and Learning May 10, 2020 11 / 30
12. A good model?
1,000 2,000 3,000 4,000
200
400
600
Area in m2
Price
Draw a line by connecting all the points like this?
Ridho Rahmadi (Center of Data Science UII) Models and Learning May 10, 2020 12 / 30
13. A good model?
1,000 2,000 3,000 4,000
200
400
600
Area in m2
Price
Draw a line by connecting all the points like this? Our objective is a model
generalization; the model above will not fit well other data.
Ridho Rahmadi (Center of Data Science UII) Models and Learning May 10, 2020 12 / 30
14. A better model?
1,000 2,000 3,000 4,000 5,000
0
200
400
600
800
Area in m2
Price
Draw a line like this?
Ridho Rahmadi (Center of Data Science UII) Models and Learning May 10, 2020 13 / 30
15. Which one?
1,000 2,000 3,000 4,000 5,000
0
200
400
600
800
Area in m2
Price
But which line?
Ridho Rahmadi (Center of Data Science UII) Models and Learning May 10, 2020 14 / 30
16. Which one?
1,000 2,000 3,000 4,000 5,000
0
200
400
600
800
Area in m2
Price
But which line?
Ridho Rahmadi (Center of Data Science UII) Models and Learning May 10, 2020 15 / 30
17. Linear Regression
1 Pick an initial line/model
h(θ) by randomly choosing
parameter θ
2 Compute the corresponding
cost function J
3 Update the line/model h(θ)
by updating θ that makes
J(θ) smaller, using, e.g.
gradient descent
4 Repeat steps 2 and 3 until
converges
Ridho Rahmadi (Center of Data Science UII) Models and Learning May 10, 2020 16 / 30
18. When x continuous and y discrete
Cholesterol x1 Exercise x2 Status y
100 200 healthy
200 50 unhealthy
90 300 healthy
95 250 healthy
250 30 unhealthy
.
.
.
.
.
.
x2
x1
Given a training set consisting of two classes “healthy” or “unhealthy”,
what is the class of a new sample with x1 = 300, x2 = 20?
Ridho Rahmadi (Center of Data Science UII) Models and Learning May 10, 2020 17 / 30
19. A good classifier?
x2
x1
Note that this is a slightly different data set with the previous one.
Ridho Rahmadi (Center of Data Science UII) Models and Learning May 10, 2020 18 / 30
21. Support Vector Machine (SVM)
y
x
m
a
r
g
i
n
Ridho Rahmadi (Center of Data Science UII) Models and Learning May 10, 2020 20 / 30
22. Unsupervised Learning
Cholesterol x1 Exercise x2
100 200
200 50
90 300
95 250
250 30
.
.
.
.
.
.
x2
x1
In unsupervised learning, our training set has no target variable y, that is,
{x(1), . . . , x(m)}, and thus regression and classification is no longer of
interest.
Ridho Rahmadi (Center of Data Science UII) Models and Learning May 10, 2020 21 / 30
23. Unsupervised Learning
In unsupervised learning, we want to
find an interesting
patterns/structures in the data.
x2
x1
Ridho Rahmadi (Center of Data Science UII) Models and Learning May 10, 2020 22 / 30
24. Unsupervised Learning
For example, clusters or smaller
groups in a data set
The idea: partitioning data into
distinct groups
observations within each
group are similar
observations in different
groups are different
x2
x1
Ridho Rahmadi (Center of Data Science UII) Models and Learning May 10, 2020 23 / 30
25. An algorithm fo clustering: K-Means
1 Initialize cluster centroids
2 Repeat until convergence (no
change)
1 Assign each ith observation to
the closest cluster centroid
2 For each cluster, move the
centroid to the mean of
observations belong to the
cluster
Ridho Rahmadi (Center of Data Science UII) Models and Learning May 10, 2020 24 / 30
26. K-Means
1 Initialize cluster centroids
2 Repeat until convergence (no
change)
1 Assign each ith observation to
the closest cluster centroid
2 For each cluster, move the
centroid to the mean of
observations belong to the
cluster
Ridho Rahmadi (Center of Data Science UII) Models and Learning May 10, 2020 25 / 30
27. K-Means
1 Initialize cluster centroids
2 Repeat until convergence (no
change)
1 Assign each ith observation to
the closest cluster centroid
2 For each cluster, move the
centroid to the mean of
observations belong to the
cluster
Ridho Rahmadi (Center of Data Science UII) Models and Learning May 10, 2020 26 / 30
28. K-Means
1 Initialize cluster centroids
2 Repeat until convergence (no
change)
1 Assign each ith observation to
the closest cluster centroid
2 For each cluster, move the
centroid to the mean of
observations belong to the
cluster
Ridho Rahmadi (Center of Data Science UII) Models and Learning May 10, 2020 27 / 30
29. K-Means
1 Initialize cluster centroids
2 Repeat until convergence (no
change)
1 Assign each ith observation to
the closest cluster centroid
2 For each cluster, move the
centroid to the mean of
observations belong to the
cluster
Ridho Rahmadi (Center of Data Science UII) Models and Learning May 10, 2020 28 / 30
30. K-Means
1 Initialize cluster centroids
2 Repeat until convergence (no
change)
1 Assign each ith observation to
the closest cluster centroid
2 For each cluster, move the
centroid to the mean of
observations belong to the
cluster
Ridho Rahmadi (Center of Data Science UII) Models and Learning May 10, 2020 29 / 30