This document provides an overview of machine learning algorithms and techniques. It begins by distinguishing machine learning from other types of artificial intelligence. It then describes popular machine learning algorithms including supervised learning algorithms like linear regression, logistic regression, support vector machines, and neural networks as well as unsupervised learning algorithms like k-means clustering and principal component analysis. The document provides examples of how these algorithms can be applied to tasks like text classification, sentiment analysis, and clustering financial data into categories. It emphasizes that machine learning involves finding patterns in data to make predictions without being explicitly programmed.
11. Question 1: Supervised
or Unsupervised?
You are designing an agent for The Matrix.
It’s task is to classify people that are threats to the system.
Feature Set:
Age
IQ
Level of Education
# of Times They Watched the Movie The Matrix
Training Set of 100,000 people: 50k threats, 50k non-threats
12. Question 2: Supervised
or Unsupervised?
You are designing the brain of a battle robot.
It’s primary attack is hand-to-hand combat. Your task is to
find the most effective move combos.
Feature Set:
# of Kicks
# of Punches
# of Head-butts
# of Leg Sweeps
Training Set of 100,000 winning battles
13. Natural Language
Processing
Convert text into a numerical representation
Find commonalities within data
Clustering
Make predictions from data
Classification
Category, Popularity, Sentiment,
Relationships
14. Bag of Words Model
Corpus
Cats like to chase mice.
Dogs like to eat big bones.
15. Create a Dictionary Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Cats like to chase mice.
Dogs like to eat big bones.
Corpus
16. Digitize Text
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
Vector Length = 8
Corpus
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
17. Classify Documents
(eating)
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
0
1
Corpus
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
18. Predict on New Data
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
Bats eat bugs.
0 0 0 0 0 1 0 0
0
1
?
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
19. Predict on New Data
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
Bats eat bugs.
0 0 0 0 0 1 0 0
0
1
?
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
20. Predict on New Data
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
Bats eat bugs.
0 0 0 0 0 1 0 0
0
1
1
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
21. Does it Really Work?
> data
[1] "Cats like to chase mice." "Dogs like to eat big
bones."
> train
big bone cat chase dog eat like mice y
1 0 0 1 1 0 0 1 1 0
2 1 1 0 0 1 1 1 0 1
> predict(fit, newdata = train)
[1] 0 1
> data2
[1] "Bats eat bugs."
> test
big bone cat chase dog eat like mice
1 0 0 0 0 0 1 0 0
> predict(fit, newdata = test)
[1] 1
Document
Term Matrix
100% Accuracy Training
Test Case
Success! Source code:
https://goo.gl/UxjPBs
22. Unsupervised Learning
Finding patterns in data
Grouping similar data into clusters
Does not require labeled data
Exploratory data analysis
Predict clusters for new data!
23. K-Means Clustering
Popular clustering algorithm
Groups data into k clusters
Data points belong to the cluster with closest mean
Each cluster has a centroid (center)
24. k-Means Algorithm
Choose a value for k (number of clusters)
Guess
Rule of thumb: ~~(Math.sqrt(points.length * 0.5))
Initialize centroids
Random
Farthest Point
K-means++
Assign data points to closest centroid
Move centroids to center of assigned points
Demo: https://goo.gl/AjNEJk