Machine Learning for Data Mining
Introduction
Andres Mendez-Vazquez
May 13, 2015
1 / 56
Outline
1 Why are we interested in Analyzing Data?
Intuitive Definition: The 3V’s
Complexity
Data Everywhere
2 Machine Learning
Machine Learning Process
Features
Classification
Clustering Analysis
3 Data Mining
Definition
Applications
Example: Frequent Itemsets
4 Hardware Support
ASICS
GPU’s
5 Projects
What projects can you do?
2 / 56
Outline
1 Why are we interested in Analyzing Data?
Intuitive Definition: The 3V’s
Complexity
Data Everywhere
2 Machine Learning
Machine Learning Process
Features
Classification
Clustering Analysis
3 Data Mining
Definition
Applications
Example: Frequent Itemsets
4 Hardware Support
ASICS
GPU’s
5 Projects
What projects can you do?
3 / 56
Intuitive Definition: Volume
When looking at the Volumes of Information, we have:
Volumes of it: Terabyte(1012), Petabyte(1015) and UP!!!
Examples of these Volumes are
1 Records
2 Transactions
3 Web Searches
4 etc
4 / 56
Intuitive Definition: Volume
When looking at the Volumes of Information, we have:
Volumes of it: Terabyte(1012), Petabyte(1015) and UP!!!
Examples of these Volumes are
1 Records
2 Transactions
3 Web Searches
4 etc
4 / 56
Intuitive Definition: Volume
When looking at the Volumes of Information, we have:
Volumes of it: Terabyte(1012), Petabyte(1015) and UP!!!
Examples of these Volumes are
1 Records
2 Transactions
3 Web Searches
4 etc
4 / 56
Intuitive Definition: Volume
When looking at the Volumes of Information, we have:
Volumes of it: Terabyte(1012), Petabyte(1015) and UP!!!
Examples of these Volumes are
1 Records
2 Transactions
3 Web Searches
4 etc
4 / 56
Intuitive Definition: Volume
When looking at the Volumes of Information, we have:
Volumes of it: Terabyte(1012), Petabyte(1015) and UP!!!
Examples of these Volumes are
1 Records
2 Transactions
3 Web Searches
4 etc
4 / 56
However
Something Notable
What constitutes truly “high” volume varies by industry and even
geography!!!
Simply look at the DNA data for a cellular cycle.
Example
5 / 56
However
Something Notable
What constitutes truly “high” volume varies by industry and even
geography!!!
Simply look at the DNA data for a cellular cycle.
Example
5 / 56
Intuitive Definition: Variety
When looking at the Structure of the Information, we have:
Variety like there is not tomorrow:
It is structured, semi-structured, unstructured
So
Do you have some examples of structures in Information?
6 / 56
Intuitive Definition: Variety
When looking at the Structure of the Information, we have:
Variety like there is not tomorrow:
It is structured, semi-structured, unstructured
So
Do you have some examples of structures in Information?
6 / 56
Intuitive Definition: Variety
When looking at the Structure of the Information, we have:
Variety like there is not tomorrow:
It is structured, semi-structured, unstructured
So
Do you have some examples of structures in Information?
6 / 56
Intuitive Definition: Volume
When Looking at the Velocity of this Information?
Data in Motion!!!
Velocity:
Dynamic Generation
Real Time Generation
Problems with that: Latency
Lag time between capture or generation and when it is available!!!
7 / 56
Intuitive Definition: Volume
When Looking at the Velocity of this Information?
Data in Motion!!!
Velocity:
Dynamic Generation
Real Time Generation
Problems with that: Latency
Lag time between capture or generation and when it is available!!!
7 / 56
Intuitive Definition: Volume
When Looking at the Velocity of this Information?
Data in Motion!!!
Velocity:
Dynamic Generation
Real Time Generation
Problems with that: Latency
Lag time between capture or generation and when it is available!!!
7 / 56
Intuitive Definition: Volume
When Looking at the Velocity of this Information?
Data in Motion!!!
Velocity:
Dynamic Generation
Real Time Generation
Problems with that: Latency
Lag time between capture or generation and when it is available!!!
7 / 56
Intuitive Definition: Volume
When Looking at the Velocity of this Information?
Data in Motion!!!
Velocity:
Dynamic Generation
Real Time Generation
Problems with that: Latency
Lag time between capture or generation and when it is available!!!
7 / 56
For example
Imagine that I have a stream of m = 1025
integers with Ranges from
[a1, ..., an] with n = 10, 000, 000
Now, somebody ask you to find the most frequent item!!!
A naive algorithm
1 Take hash table with a counter.
2 Then, put numbers in the hash table.
Problems
Which problems we have?
8 / 56
For example
Imagine that I have a stream of m = 1025
integers with Ranges from
[a1, ..., an] with n = 10, 000, 000
Now, somebody ask you to find the most frequent item!!!
A naive algorithm
1 Take hash table with a counter.
2 Then, put numbers in the hash table.
Problems
Which problems we have?
8 / 56
For example
Imagine that I have a stream of m = 1025
integers with Ranges from
[a1, ..., an] with n = 10, 000, 000
Now, somebody ask you to find the most frequent item!!!
A naive algorithm
1 Take hash table with a counter.
2 Then, put numbers in the hash table.
Problems
Which problems we have?
8 / 56
However
There is the
Count-Min Sketch Algorithm
Invented by
Charikar, Chen and Farch-Colton in 2004
With Properties
Space Used Error Probability Error
O 1
log 1
δ · (log m + log n) δ
9 / 56
However
There is the
Count-Min Sketch Algorithm
Invented by
Charikar, Chen and Farch-Colton in 2004
With Properties
Space Used Error Probability Error
O 1
log 1
δ · (log m + log n) δ
9 / 56
However
There is the
Count-Min Sketch Algorithm
Invented by
Charikar, Chen and Farch-Colton in 2004
With Properties
Space Used Error Probability Error
O 1
log 1
δ · (log m + log n) δ
9 / 56
Outline
1 Why are we interested in Analyzing Data?
Intuitive Definition: The 3V’s
Complexity
Data Everywhere
2 Machine Learning
Machine Learning Process
Features
Classification
Clustering Analysis
3 Data Mining
Definition
Applications
Example: Frequent Itemsets
4 Hardware Support
ASICS
GPU’s
5 Projects
What projects can you do?
10 / 56
Complexity
Given all these things
It is necessary to correlate and share data across entities.
It is necessary to link, match and transform data across business
entities and systems.
With this...
Complexity goes through the roof!!!
11 / 56
Complexity
Given all these things
It is necessary to correlate and share data across entities.
It is necessary to link, match and transform data across business
entities and systems.
With this...
Complexity goes through the roof!!!
11 / 56
Complexity
Given all these things
It is necessary to correlate and share data across entities.
It is necessary to link, match and transform data across business
entities and systems.
With this...
Complexity goes through the roof!!!
11 / 56
And it is through the roof!!! Linking open-data community
project
12 / 56
Cautionary Tale
Something Notable
In 1880 the USA made a Census of the Population in different aspects:
Population
Mortality
Agriculture
Manufacturing
However
Once data was collected it took 7 years to say something!!!
13 / 56
Cautionary Tale
Something Notable
In 1880 the USA made a Census of the Population in different aspects:
Population
Mortality
Agriculture
Manufacturing
However
Once data was collected it took 7 years to say something!!!
13 / 56
Cautionary Tale
Something Notable
In 1880 the USA made a Census of the Population in different aspects:
Population
Mortality
Agriculture
Manufacturing
However
Once data was collected it took 7 years to say something!!!
13 / 56
Cautionary Tale
Something Notable
In 1880 the USA made a Census of the Population in different aspects:
Population
Mortality
Agriculture
Manufacturing
However
Once data was collected it took 7 years to say something!!!
13 / 56
Cautionary Tale
Something Notable
In 1880 the USA made a Census of the Population in different aspects:
Population
Mortality
Agriculture
Manufacturing
However
Once data was collected it took 7 years to say something!!!
13 / 56
Cautionary Tale
Something Notable
In 1880 the USA made a Census of the Population in different aspects:
Population
Mortality
Agriculture
Manufacturing
However
Once data was collected it took 7 years to say something!!!
13 / 56
Cautionary Tale
Something Notable
In 1880 the USA made a Census of the Population in different aspects:
Population
Mortality
Agriculture
Manufacturing
However
Once data was collected it took 7 years to say something!!!
13 / 56
Ahhh...
Thus, Hollering came with the following machine (Circa 1890)!!!
14 / 56
Hollering Tabulating Machine
It was basically a sorter and counter
Using punching cards as memories.
And Mercury Sensors.
Example
15 / 56
Hollering Tabulating Machine
It was basically a sorter and counter
Using punching cards as memories.
And Mercury Sensors.
Example
15 / 56
It was FAST!!!
It took only!!!
2 years!!!
Nevertheless in 1837
Babbage’s Difference engine was
The First General Computer!!!
Turing-complete!!!
Way more complex than the tabulator!!! 53 years earlier!!!
16 / 56
It was FAST!!!
It took only!!!
2 years!!!
Nevertheless in 1837
Babbage’s Difference engine was
The First General Computer!!!
Turing-complete!!!
Way more complex than the tabulator!!! 53 years earlier!!!
16 / 56
It was FAST!!!
It took only!!!
2 years!!!
Nevertheless in 1837
Babbage’s Difference engine was
The First General Computer!!!
Turing-complete!!!
Way more complex than the tabulator!!! 53 years earlier!!!
16 / 56
It was FAST!!!
It took only!!!
2 years!!!
Nevertheless in 1837
Babbage’s Difference engine was
The First General Computer!!!
Turing-complete!!!
Way more complex than the tabulator!!! 53 years earlier!!!
16 / 56
It was FAST!!!
It took only!!!
2 years!!!
Nevertheless in 1837
Babbage’s Difference engine was
The First General Computer!!!
Turing-complete!!!
Way more complex than the tabulator!!! 53 years earlier!!!
16 / 56
Funny!!!
Funny!!!
17 / 56
The Problem
Actually, it never reached completion because
Babbage was actually a yucky project manager!!!
18 / 56
The Problem
Actually, it never reached completion because
Babbage was actually a yucky project manager!!!
18 / 56
Outline
1 Why are we interested in Analyzing Data?
Intuitive Definition: The 3V’s
Complexity
Data Everywhere
2 Machine Learning
Machine Learning Process
Features
Classification
Clustering Analysis
3 Data Mining
Definition
Applications
Example: Frequent Itemsets
4 Hardware Support
ASICS
GPU’s
5 Projects
What projects can you do?
19 / 56
Data is Everywhere!
Lots of data is being collected and warehoused
Web data, e-commerce
Purchases at department/ grocery stores
Bank/Credit Card transactions
Social Network
Many Places
20 / 56
Data is Everywhere!
Lots of data is being collected and warehoused
Web data, e-commerce
Purchases at department/ grocery stores
Bank/Credit Card transactions
Social Network
Many Places
20 / 56
Data is Everywhere!
Lots of data is being collected and warehoused
Web data, e-commerce
Purchases at department/ grocery stores
Bank/Credit Card transactions
Social Network
Many Places
20 / 56
Data is Everywhere!
Lots of data is being collected and warehoused
Web data, e-commerce
Purchases at department/ grocery stores
Bank/Credit Card transactions
Social Network
Many Places
20 / 56
Data is Everywhere!
Lots of data is being collected and warehoused
Web data, e-commerce
Purchases at department/ grocery stores
Bank/Credit Card transactions
Social Network
Many Places
20 / 56
The Staggering Numbers
A Ocean of Data
How many data in the world?
800 Terabytes, 2000
160 Exabytes, 2006
500 Exabytes (Internet), 2009
2.7 Zettabytes, 2012
35 Zettabytes by 2020
Generation
How many data generated ONE
day?
7 TB, Twitter
10 TB, Facebook
Source: “Big data: The next frontier for innovation, competition, and pro-
ductivity”
McKinsey Global Institute 2011 21 / 56
The Staggering Numbers
A Ocean of Data
How many data in the world?
800 Terabytes, 2000
160 Exabytes, 2006
500 Exabytes (Internet), 2009
2.7 Zettabytes, 2012
35 Zettabytes by 2020
Generation
How many data generated ONE
day?
7 TB, Twitter
10 TB, Facebook
Source: “Big data: The next frontier for innovation, competition, and pro-
ductivity”
McKinsey Global Institute 2011 21 / 56
The Staggering Numbers
A Ocean of Data
How many data in the world?
800 Terabytes, 2000
160 Exabytes, 2006
500 Exabytes (Internet), 2009
2.7 Zettabytes, 2012
35 Zettabytes by 2020
Generation
How many data generated ONE
day?
7 TB, Twitter
10 TB, Facebook
Source: “Big data: The next frontier for innovation, competition, and pro-
ductivity”
McKinsey Global Institute 2011 21 / 56
The Staggering Numbers
A Ocean of Data
How many data in the world?
800 Terabytes, 2000
160 Exabytes, 2006
500 Exabytes (Internet), 2009
2.7 Zettabytes, 2012
35 Zettabytes by 2020
Generation
How many data generated ONE
day?
7 TB, Twitter
10 TB, Facebook
Source: “Big data: The next frontier for innovation, competition, and pro-
ductivity”
McKinsey Global Institute 2011 21 / 56
The Staggering Numbers
A Ocean of Data
How many data in the world?
800 Terabytes, 2000
160 Exabytes, 2006
500 Exabytes (Internet), 2009
2.7 Zettabytes, 2012
35 Zettabytes by 2020
Generation
How many data generated ONE
day?
7 TB, Twitter
10 TB, Facebook
Source: “Big data: The next frontier for innovation, competition, and pro-
ductivity”
McKinsey Global Institute 2011 21 / 56
The Staggering Numbers
A Ocean of Data
How many data in the world?
800 Terabytes, 2000
160 Exabytes, 2006
500 Exabytes (Internet), 2009
2.7 Zettabytes, 2012
35 Zettabytes by 2020
Generation
How many data generated ONE
day?
7 TB, Twitter
10 TB, Facebook
Source: “Big data: The next frontier for innovation, competition, and pro-
ductivity”
McKinsey Global Institute 2011 21 / 56
The Staggering Numbers
A Ocean of Data
How many data in the world?
800 Terabytes, 2000
160 Exabytes, 2006
500 Exabytes (Internet), 2009
2.7 Zettabytes, 2012
35 Zettabytes by 2020
Generation
How many data generated ONE
day?
7 TB, Twitter
10 TB, Facebook
Source: “Big data: The next frontier for innovation, competition, and pro-
ductivity”
McKinsey Global Institute 2011 21 / 56
The Staggering Numbers
A Ocean of Data
How many data in the world?
800 Terabytes, 2000
160 Exabytes, 2006
500 Exabytes (Internet), 2009
2.7 Zettabytes, 2012
35 Zettabytes by 2020
Generation
How many data generated ONE
day?
7 TB, Twitter
10 TB, Facebook
Source: “Big data: The next frontier for innovation, competition, and pro-
ductivity”
McKinsey Global Institute 2011 21 / 56
The Staggering Numbers
A Ocean of Data
How many data in the world?
800 Terabytes, 2000
160 Exabytes, 2006
500 Exabytes (Internet), 2009
2.7 Zettabytes, 2012
35 Zettabytes by 2020
Generation
How many data generated ONE
day?
7 TB, Twitter
10 TB, Facebook
Source: “Big data: The next frontier for innovation, competition, and pro-
ductivity”
McKinsey Global Institute 2011 21 / 56
Type of Data
Thus
Relational Data (Tables/Transaction/Legacy Data)
Text Data (Web)
Semi-structured Data (XML)
And more...
Graph Data
Social Network, Semantic Web (RDF), . . .
Streaming Data
You can only scan the data once
22 / 56
Type of Data
Thus
Relational Data (Tables/Transaction/Legacy Data)
Text Data (Web)
Semi-structured Data (XML)
And more...
Graph Data
Social Network, Semantic Web (RDF), . . .
Streaming Data
You can only scan the data once
22 / 56
Type of Data
Thus
Relational Data (Tables/Transaction/Legacy Data)
Text Data (Web)
Semi-structured Data (XML)
And more...
Graph Data
Social Network, Semantic Web (RDF), . . .
Streaming Data
You can only scan the data once
22 / 56
Type of Data
Thus
Relational Data (Tables/Transaction/Legacy Data)
Text Data (Web)
Semi-structured Data (XML)
And more...
Graph Data
Social Network, Semantic Web (RDF), . . .
Streaming Data
You can only scan the data once
22 / 56
Type of Data
Thus
Relational Data (Tables/Transaction/Legacy Data)
Text Data (Web)
Semi-structured Data (XML)
And more...
Graph Data
Social Network, Semantic Web (RDF), . . .
Streaming Data
You can only scan the data once
22 / 56
Type of Data
Thus
Relational Data (Tables/Transaction/Legacy Data)
Text Data (Web)
Semi-structured Data (XML)
And more...
Graph Data
Social Network, Semantic Web (RDF), . . .
Streaming Data
You can only scan the data once
22 / 56
Type of Data
Thus
Relational Data (Tables/Transaction/Legacy Data)
Text Data (Web)
Semi-structured Data (XML)
And more...
Graph Data
Social Network, Semantic Web (RDF), . . .
Streaming Data
You can only scan the data once
22 / 56
The Ever Growing Landscape
23 / 56
Machine Learning
Definition
Algorithms or techniques that enable computer (machine) to “learn” from
data. Related with many areas such as data mining, statistics, information
theory, etc.
Algorithm Types:
Unsupervised Learning
Supervised learning
Reinforcement learning
Examples
Artificial Neural Network (ANN)
Support Vector Machine (SVM)
Expectation-Maximization (EM)
Deterministic Annealing (DA)
24 / 56
Machine Learning
Definition
Algorithms or techniques that enable computer (machine) to “learn” from
data. Related with many areas such as data mining, statistics, information
theory, etc.
Algorithm Types:
Unsupervised Learning
Supervised learning
Reinforcement learning
Examples
Artificial Neural Network (ANN)
Support Vector Machine (SVM)
Expectation-Maximization (EM)
Deterministic Annealing (DA)
24 / 56
Machine Learning
Definition
Algorithms or techniques that enable computer (machine) to “learn” from
data. Related with many areas such as data mining, statistics, information
theory, etc.
Algorithm Types:
Unsupervised Learning
Supervised learning
Reinforcement learning
Examples
Artificial Neural Network (ANN)
Support Vector Machine (SVM)
Expectation-Maximization (EM)
Deterministic Annealing (DA)
24 / 56
Machine Learning
Definition
Algorithms or techniques that enable computer (machine) to “learn” from
data. Related with many areas such as data mining, statistics, information
theory, etc.
Algorithm Types:
Unsupervised Learning
Supervised learning
Reinforcement learning
Examples
Artificial Neural Network (ANN)
Support Vector Machine (SVM)
Expectation-Maximization (EM)
Deterministic Annealing (DA)
24 / 56
Machine Learning
Definition
Algorithms or techniques that enable computer (machine) to “learn” from
data. Related with many areas such as data mining, statistics, information
theory, etc.
Algorithm Types:
Unsupervised Learning
Supervised learning
Reinforcement learning
Examples
Artificial Neural Network (ANN)
Support Vector Machine (SVM)
Expectation-Maximization (EM)
Deterministic Annealing (DA)
24 / 56
Machine Learning
Definition
Algorithms or techniques that enable computer (machine) to “learn” from
data. Related with many areas such as data mining, statistics, information
theory, etc.
Algorithm Types:
Unsupervised Learning
Supervised learning
Reinforcement learning
Examples
Artificial Neural Network (ANN)
Support Vector Machine (SVM)
Expectation-Maximization (EM)
Deterministic Annealing (DA)
24 / 56
Machine Learning
Definition
Algorithms or techniques that enable computer (machine) to “learn” from
data. Related with many areas such as data mining, statistics, information
theory, etc.
Algorithm Types:
Unsupervised Learning
Supervised learning
Reinforcement learning
Examples
Artificial Neural Network (ANN)
Support Vector Machine (SVM)
Expectation-Maximization (EM)
Deterministic Annealing (DA)
24 / 56
Machine Learning
Definition
Algorithms or techniques that enable computer (machine) to “learn” from
data. Related with many areas such as data mining, statistics, information
theory, etc.
Algorithm Types:
Unsupervised Learning
Supervised learning
Reinforcement learning
Examples
Artificial Neural Network (ANN)
Support Vector Machine (SVM)
Expectation-Maximization (EM)
Deterministic Annealing (DA)
24 / 56
Outline
1 Why are we interested in Analyzing Data?
Intuitive Definition: The 3V’s
Complexity
Data Everywhere
2 Machine Learning
Machine Learning Process
Features
Classification
Clustering Analysis
3 Data Mining
Definition
Applications
Example: Frequent Itemsets
4 Hardware Support
ASICS
GPU’s
5 Projects
What projects can you do?
25 / 56
Machine Learning Process
Process
1 Feature Extraction/Feature Generation
2 Clustering ≈ Class Identification ≈ Unsupervised Learning
3 Classification ≈ Supervised Learning
Then...
We start thinking: We need to process a lot of data...
Or...
LARGE SCALE MACHINE LEARNING
26 / 56
Machine Learning Process
Process
1 Feature Extraction/Feature Generation
2 Clustering ≈ Class Identification ≈ Unsupervised Learning
3 Classification ≈ Supervised Learning
Then...
We start thinking: We need to process a lot of data...
Or...
LARGE SCALE MACHINE LEARNING
26 / 56
Machine Learning Process
Process
1 Feature Extraction/Feature Generation
2 Clustering ≈ Class Identification ≈ Unsupervised Learning
3 Classification ≈ Supervised Learning
Then...
We start thinking: We need to process a lot of data...
Or...
LARGE SCALE MACHINE LEARNING
26 / 56
Outline
1 Why are we interested in Analyzing Data?
Intuitive Definition: The 3V’s
Complexity
Data Everywhere
2 Machine Learning
Machine Learning Process
Features
Classification
Clustering Analysis
3 Data Mining
Definition
Applications
Example: Frequent Itemsets
4 Hardware Support
ASICS
GPU’s
5 Projects
What projects can you do?
27 / 56
Feature Generation/Dimensionality Reduction
Feature Generation
Given a set of measurements, the goal is to discover compact and
informative representations of the obtained data.
Examples
1 The Karhunen–Loève transform ≈ Principal Component Analysis
1 Popular for feature generation and Dimensionality Reduction
2 The Singular Value Decomposition
1 Used for Dimensionality Reduction
28 / 56
Feature Generation/Dimensionality Reduction
Feature Generation
Given a set of measurements, the goal is to discover compact and
informative representations of the obtained data.
Examples
1 The Karhunen–Loève transform ≈ Principal Component Analysis
1 Popular for feature generation and Dimensionality Reduction
2 The Singular Value Decomposition
1 Used for Dimensionality Reduction
28 / 56
Feature Generation/Dimensionality Reduction
Feature Generation
Given a set of measurements, the goal is to discover compact and
informative representations of the obtained data.
Examples
1 The Karhunen–Loève transform ≈ Principal Component Analysis
1 Popular for feature generation and Dimensionality Reduction
2 The Singular Value Decomposition
1 Used for Dimensionality Reduction
28 / 56
Feature Generation/Dimensionality Reduction
Feature Generation
Given a set of measurements, the goal is to discover compact and
informative representations of the obtained data.
Examples
1 The Karhunen–Loève transform ≈ Principal Component Analysis
1 Popular for feature generation and Dimensionality Reduction
2 The Singular Value Decomposition
1 Used for Dimensionality Reduction
28 / 56
Dimension Reduction/Feature Extraction
Definition
Process to transform high-dimensional data into low-dimensional ones for
improving accuracy, understanding, or removing noises.
Why?
Curse of dimensionality: Complexity grows exponentially in volume by
adding extra dimensions.
29 / 56
Dimension Reduction/Feature Extraction
Definition
Process to transform high-dimensional data into low-dimensional ones for
improving accuracy, understanding, or removing noises.
Why?
Curse of dimensionality: Complexity grows exponentially in volume by
adding extra dimensions.
29 / 56
Feature Selection
Feature Selection
Which features should be used for the classifier?
Why? The Curse of Dimensionality!!!
Hypothesis Testing to discriminate good features
30 / 56
Feature Selection
Feature Selection
Which features should be used for the classifier?
Why? The Curse of Dimensionality!!!
30 / 56
What can be done?
Measures for Class Separability
Example: Between-class scatter matrix:
Sb =
M
i=1
Pi (µi − µ0) (µi − µ0)T
(1)
Where:
µ0 is the global mean vector, µ0 = M
i=1 Pi µi .
µi the median of class ωi .
Pi
∼= ni
N .
31 / 56
What can be done?
Measures for Class Separability
Example: Between-class scatter matrix:
Sb =
M
i=1
Pi (µi − µ0) (µi − µ0)T
(1)
Where:
µ0 is the global mean vector, µ0 = M
i=1 Pi µi .
µi the median of class ωi .
Pi
∼= ni
N .
31 / 56
What can be done?
Measures for Class Separability
Example: Between-class scatter matrix:
Sb =
M
i=1
Pi (µi − µ0) (µi − µ0)T
(1)
Where:
µ0 is the global mean vector, µ0 = M
i=1 Pi µi .
µi the median of class ωi .
Pi
∼= ni
N .
31 / 56
What can be done?
Measures for Class Separability
Example: Between-class scatter matrix:
Sb =
M
i=1
Pi (µi − µ0) (µi − µ0)T
(1)
Where:
µ0 is the global mean vector, µ0 = M
i=1 Pi µi .
µi the median of class ωi .
Pi
∼= ni
N .
31 / 56
What can be done?
Feature Subset Selection
Examples:
Filter Approach
All combinations of features are used together with a separability
measure.
Wrapper Approach:
Use the decided classifier itself to find the best set.
32 / 56
What can be done?
Feature Subset Selection
Examples:
Filter Approach
All combinations of features are used together with a separability
measure.
Wrapper Approach:
Use the decided classifier itself to find the best set.
32 / 56
What can be done?
Feature Subset Selection
Examples:
Filter Approach
All combinations of features are used together with a separability
measure.
Wrapper Approach:
Use the decided classifier itself to find the best set.
32 / 56
What can be done?
Feature Subset Selection
Examples:
Filter Approach
All combinations of features are used together with a separability
measure.
Wrapper Approach:
Use the decided classifier itself to find the best set.
32 / 56
What can be done?
Feature Subset Selection
Examples:
Filter Approach
All combinations of features are used together with a separability
measure.
Wrapper Approach:
Use the decided classifier itself to find the best set.
32 / 56
Outline
1 Why are we interested in Analyzing Data?
Intuitive Definition: The 3V’s
Complexity
Data Everywhere
2 Machine Learning
Machine Learning Process
Features
Classification
Clustering Analysis
3 Data Mining
Definition
Applications
Example: Frequent Itemsets
4 Hardware Support
ASICS
GPU’s
5 Projects
What projects can you do?
33 / 56
Classification
Definition
A procedure dividing data into the given set of categories based on the
training set in a supervised way.
What we want from classification?
Generalization Vs. Specification
Hard to achieve both
Avoid - overfitting/overtraining
Early stopping
Holdout validation
K-fold cross validation
Leave-one-out cross-validation
34 / 56
Classification
Definition
A procedure dividing data into the given set of categories based on the
training set in a supervised way.
What we want from classification?
Generalization Vs. Specification
Hard to achieve both
Avoid - overfitting/overtraining
Early stopping
Holdout validation
K-fold cross validation
Leave-one-out cross-validation
34 / 56
Classification
Definition
A procedure dividing data into the given set of categories based on the
training set in a supervised way.
What we want from classification?
Generalization Vs. Specification
Hard to achieve both
Avoid - overfitting/overtraining
Early stopping
Holdout validation
K-fold cross validation
Leave-one-out cross-validation
34 / 56
Classification
Definition
A procedure dividing data into the given set of categories based on the
training set in a supervised way.
What we want from classification?
Generalization Vs. Specification
Hard to achieve both
Avoid - overfitting/overtraining
Early stopping
Holdout validation
K-fold cross validation
Leave-one-out cross-validation
34 / 56
Classification
Definition
A procedure dividing data into the given set of categories based on the
training set in a supervised way.
What we want from classification?
Generalization Vs. Specification
Hard to achieve both
Avoid - overfitting/overtraining
Early stopping
Holdout validation
K-fold cross validation
Leave-one-out cross-validation
34 / 56
Classification
Definition
A procedure dividing data into the given set of categories based on the
training set in a supervised way.
What we want from classification?
Generalization Vs. Specification
Hard to achieve both
Avoid - overfitting/overtraining
Early stopping
Holdout validation
K-fold cross validation
Leave-one-out cross-validation
34 / 56
Classification
Definition
A procedure dividing data into the given set of categories based on the
training set in a supervised way.
What we want from classification?
Generalization Vs. Specification
Hard to achieve both
Avoid - overfitting/overtraining
Early stopping
Holdout validation
K-fold cross validation
Leave-one-out cross-validation
34 / 56
Classification
Definition
A procedure dividing data into the given set of categories based on the
training set in a supervised way.
What we want from classification?
Generalization Vs. Specification
Hard to achieve both
Avoid - overfitting/overtraining
Early stopping
Holdout validation
K-fold cross validation
Leave-one-out cross-validation
34 / 56
Avoid - overfitting/overtraining
Validation and Training Error
Underfitting Overfitting
Validation Error
Training Error
35 / 56
Examples of Classification Algorithms
Many Possible Algorithms
Linear Classifiers: Perceptron
Probability Classifiers: Naive Bayes
Kernel Methods Classifiers : Support Vector Machines
Non-Linear Classifiers: Artificial Neural Networks
Graph Model Classifiers:
. . .
36 / 56
Examples of Classification Algorithms
Many Possible Algorithms
Linear Classifiers: Perceptron
Probability Classifiers: Naive Bayes
Kernel Methods Classifiers : Support Vector Machines
Non-Linear Classifiers: Artificial Neural Networks
Graph Model Classifiers:
. . .
36 / 56
Examples of Classification Algorithms
Many Possible Algorithms
Linear Classifiers: Perceptron
Probability Classifiers: Naive Bayes
Kernel Methods Classifiers : Support Vector Machines
Non-Linear Classifiers: Artificial Neural Networks
Graph Model Classifiers:
. . .
36 / 56
Examples of Classification Algorithms
Many Possible Algorithms
Linear Classifiers: Perceptron
Probability Classifiers: Naive Bayes
Kernel Methods Classifiers : Support Vector Machines
Non-Linear Classifiers: Artificial Neural Networks
Graph Model Classifiers:
. . .
36 / 56
Examples of Classification Algorithms
Many Possible Algorithms
Linear Classifiers: Perceptron
Probability Classifiers: Naive Bayes
Kernel Methods Classifiers : Support Vector Machines
Non-Linear Classifiers: Artificial Neural Networks
Graph Model Classifiers:
. . .
36 / 56
Examples of Classification Algorithms
Many Possible Algorithms
Linear Classifiers: Perceptron
Probability Classifiers: Naive Bayes
Kernel Methods Classifiers : Support Vector Machines
Non-Linear Classifiers: Artificial Neural Networks
Graph Model Classifiers:
. . .
36 / 56
Outline
1 Why are we interested in Analyzing Data?
Intuitive Definition: The 3V’s
Complexity
Data Everywhere
2 Machine Learning
Machine Learning Process
Features
Classification
Clustering Analysis
3 Data Mining
Definition
Applications
Example: Frequent Itemsets
4 Hardware Support
ASICS
GPU’s
5 Projects
What projects can you do?
37 / 56
Clustering Analysis
Definition
Grouping unlabeled data into clusters, for the purpose of inference of
hidden structures or information.
Using, for example
Dissimilarity measurement
Angle : Inner product, . . .
Non-metric : Rank, Intensity, . . .
Distance : Euclidean (l2), Manhattan(l1), . . .
38 / 56
Clustering Analysis
Definition
Grouping unlabeled data into clusters, for the purpose of inference of
hidden structures or information.
Using, for example
Dissimilarity measurement
Angle : Inner product, . . .
Non-metric : Rank, Intensity, . . .
Distance : Euclidean (l2), Manhattan(l1), . . .
38 / 56
Clustering Analysis
Definition
Grouping unlabeled data into clusters, for the purpose of inference of
hidden structures or information.
Using, for example
Dissimilarity measurement
Angle : Inner product, . . .
Non-metric : Rank, Intensity, . . .
Distance : Euclidean (l2), Manhattan(l1), . . .
38 / 56
Clustering Analysis
Definition
Grouping unlabeled data into clusters, for the purpose of inference of
hidden structures or information.
Using, for example
Dissimilarity measurement
Angle : Inner product, . . .
Non-metric : Rank, Intensity, . . .
Distance : Euclidean (l2), Manhattan(l1), . . .
38 / 56
Clustering Analysis
Definition
Grouping unlabeled data into clusters, for the purpose of inference of
hidden structures or information.
Using, for example
Dissimilarity measurement
Angle : Inner product, . . .
Non-metric : Rank, Intensity, . . .
Distance : Euclidean (l2), Manhattan(l1), . . .
38 / 56
Example
39 / 56
Examples of Clustering Algorithms
Clustering
1 Basic Clustering Algorithms
1 K-means
2 Clustering Based in Cost Functions
1 Fuzzy C-means
2 Possibilistic
3 Hierarchical Clustering
1 Entropy based
4 Clustering Based in Graph Theory
40 / 56
Examples of Clustering Algorithms
Clustering
1 Basic Clustering Algorithms
1 K-means
2 Clustering Based in Cost Functions
1 Fuzzy C-means
2 Possibilistic
3 Hierarchical Clustering
1 Entropy based
4 Clustering Based in Graph Theory
40 / 56
Examples of Clustering Algorithms
Clustering
1 Basic Clustering Algorithms
1 K-means
2 Clustering Based in Cost Functions
1 Fuzzy C-means
2 Possibilistic
3 Hierarchical Clustering
1 Entropy based
4 Clustering Based in Graph Theory
40 / 56
Examples of Clustering Algorithms
Clustering
1 Basic Clustering Algorithms
1 K-means
2 Clustering Based in Cost Functions
1 Fuzzy C-means
2 Possibilistic
3 Hierarchical Clustering
1 Entropy based
4 Clustering Based in Graph Theory
40 / 56
Examples of Clustering Algorithms
Clustering
1 Basic Clustering Algorithms
1 K-means
2 Clustering Based in Cost Functions
1 Fuzzy C-means
2 Possibilistic
3 Hierarchical Clustering
1 Entropy based
4 Clustering Based in Graph Theory
40 / 56
Examples of Clustering Algorithms
Clustering
1 Basic Clustering Algorithms
1 K-means
2 Clustering Based in Cost Functions
1 Fuzzy C-means
2 Possibilistic
3 Hierarchical Clustering
1 Entropy based
4 Clustering Based in Graph Theory
40 / 56
Examples of Clustering Algorithms
Clustering
1 Basic Clustering Algorithms
1 K-means
2 Clustering Based in Cost Functions
1 Fuzzy C-means
2 Possibilistic
3 Hierarchical Clustering
1 Entropy based
4 Clustering Based in Graph Theory
40 / 56
Examples of Clustering Algorithms
Clustering
1 Basic Clustering Algorithms
1 K-means
2 Clustering Based in Cost Functions
1 Fuzzy C-means
2 Possibilistic
3 Hierarchical Clustering
1 Entropy based
4 Clustering Based in Graph Theory
40 / 56
Outline
1 Why are we interested in Analyzing Data?
Intuitive Definition: The 3V’s
Complexity
Data Everywhere
2 Machine Learning
Machine Learning Process
Features
Classification
Clustering Analysis
3 Data Mining
Definition
Applications
Example: Frequent Itemsets
4 Hardware Support
ASICS
GPU’s
5 Projects
What projects can you do?
41 / 56
What Is Data Mining?
Data mining (knowledge discovery in databases):
Extraction of interesting information or patterns from data in large
databases.
Alternative names and their “inside stories”:
Knowledge discovery(mining) in databases (KDD)
Knowledge extraction
Data/pattern analysis
Data archeology
Business intelligence
etc.
42 / 56
What Is Data Mining?
Data mining (knowledge discovery in databases):
Extraction of interesting information or patterns from data in large
databases.
Alternative names and their “inside stories”:
Knowledge discovery(mining) in databases (KDD)
Knowledge extraction
Data/pattern analysis
Data archeology
Business intelligence
etc.
42 / 56
What Is Data Mining?
Data mining (knowledge discovery in databases):
Extraction of interesting information or patterns from data in large
databases.
Alternative names and their “inside stories”:
Knowledge discovery(mining) in databases (KDD)
Knowledge extraction
Data/pattern analysis
Data archeology
Business intelligence
etc.
42 / 56
What Is Data Mining?
Data mining (knowledge discovery in databases):
Extraction of interesting information or patterns from data in large
databases.
Alternative names and their “inside stories”:
Knowledge discovery(mining) in databases (KDD)
Knowledge extraction
Data/pattern analysis
Data archeology
Business intelligence
etc.
42 / 56
What Is Data Mining?
Data mining (knowledge discovery in databases):
Extraction of interesting information or patterns from data in large
databases.
Alternative names and their “inside stories”:
Knowledge discovery(mining) in databases (KDD)
Knowledge extraction
Data/pattern analysis
Data archeology
Business intelligence
etc.
42 / 56
What Is Data Mining?
Data mining (knowledge discovery in databases):
Extraction of interesting information or patterns from data in large
databases.
Alternative names and their “inside stories”:
Knowledge discovery(mining) in databases (KDD)
Knowledge extraction
Data/pattern analysis
Data archeology
Business intelligence
etc.
42 / 56
Examples: What is (not) Data Mining?
What is not Data Mining?
1 Look up phone number in phone directory
2 Query a Web search engine for information about “Amazon”
What is Data Mining?
1 Certain names are more prevalent in certain US locations (O’Brien,
O’Rurke, O’Reilly. . . in Boston area)
2 Group together similar documents returned by search engine
according to their context (e.g. Amazon rainforest, Amazon.com)
43 / 56
Examples: What is (not) Data Mining?
What is not Data Mining?
1 Look up phone number in phone directory
2 Query a Web search engine for information about “Amazon”
What is Data Mining?
1 Certain names are more prevalent in certain US locations (O’Brien,
O’Rurke, O’Reilly. . . in Boston area)
2 Group together similar documents returned by search engine
according to their context (e.g. Amazon rainforest, Amazon.com)
43 / 56
Examples: What is (not) Data Mining?
What is not Data Mining?
1 Look up phone number in phone directory
2 Query a Web search engine for information about “Amazon”
What is Data Mining?
1 Certain names are more prevalent in certain US locations (O’Brien,
O’Rurke, O’Reilly. . . in Boston area)
2 Group together similar documents returned by search engine
according to their context (e.g. Amazon rainforest, Amazon.com)
43 / 56
Examples: What is (not) Data Mining?
What is not Data Mining?
1 Look up phone number in phone directory
2 Query a Web search engine for information about “Amazon”
What is Data Mining?
1 Certain names are more prevalent in certain US locations (O’Brien,
O’Rurke, O’Reilly. . . in Boston area)
2 Group together similar documents returned by search engine
according to their context (e.g. Amazon rainforest, Amazon.com)
43 / 56
Outline
1 Why are we interested in Analyzing Data?
Intuitive Definition: The 3V’s
Complexity
Data Everywhere
2 Machine Learning
Machine Learning Process
Features
Classification
Clustering Analysis
3 Data Mining
Definition
Applications
Example: Frequent Itemsets
4 Hardware Support
ASICS
GPU’s
5 Projects
What projects can you do?
44 / 56
Data mining Applications
Applications
Mining the Web for Structured Data
Near Neighbor Search in High Dimensional Data.
Frequent itemsets and Association Rules
Structure of the webgraph
PageRank
Link Analysis
Proximity on Graphs
Mining data streams.
Large scale supervised machine learning techniques.
45 / 56
Data mining Applications
Applications
Mining the Web for Structured Data
Near Neighbor Search in High Dimensional Data.
Frequent itemsets and Association Rules
Structure of the webgraph
PageRank
Link Analysis
Proximity on Graphs
Mining data streams.
Large scale supervised machine learning techniques.
45 / 56
Data mining Applications
Applications
Mining the Web for Structured Data
Near Neighbor Search in High Dimensional Data.
Frequent itemsets and Association Rules
Structure of the webgraph
PageRank
Link Analysis
Proximity on Graphs
Mining data streams.
Large scale supervised machine learning techniques.
45 / 56
Data mining Applications
Applications
Mining the Web for Structured Data
Near Neighbor Search in High Dimensional Data.
Frequent itemsets and Association Rules
Structure of the webgraph
PageRank
Link Analysis
Proximity on Graphs
Mining data streams.
Large scale supervised machine learning techniques.
45 / 56
Data mining Applications
Applications
Mining the Web for Structured Data
Near Neighbor Search in High Dimensional Data.
Frequent itemsets and Association Rules
Structure of the webgraph
PageRank
Link Analysis
Proximity on Graphs
Mining data streams.
Large scale supervised machine learning techniques.
45 / 56
Data mining Applications
Applications
Mining the Web for Structured Data
Near Neighbor Search in High Dimensional Data.
Frequent itemsets and Association Rules
Structure of the webgraph
PageRank
Link Analysis
Proximity on Graphs
Mining data streams.
Large scale supervised machine learning techniques.
45 / 56
Data mining Applications
Applications
Mining the Web for Structured Data
Near Neighbor Search in High Dimensional Data.
Frequent itemsets and Association Rules
Structure of the webgraph
PageRank
Link Analysis
Proximity on Graphs
Mining data streams.
Large scale supervised machine learning techniques.
45 / 56
Data mining Applications
Applications
Mining the Web for Structured Data
Near Neighbor Search in High Dimensional Data.
Frequent itemsets and Association Rules
Structure of the webgraph
PageRank
Link Analysis
Proximity on Graphs
Mining data streams.
Large scale supervised machine learning techniques.
45 / 56
Data mining Applications
Applications
Mining the Web for Structured Data
Near Neighbor Search in High Dimensional Data.
Frequent itemsets and Association Rules
Structure of the webgraph
PageRank
Link Analysis
Proximity on Graphs
Mining data streams.
Large scale supervised machine learning techniques.
45 / 56
Outline
1 Why are we interested in Analyzing Data?
Intuitive Definition: The 3V’s
Complexity
Data Everywhere
2 Machine Learning
Machine Learning Process
Features
Classification
Clustering Analysis
3 Data Mining
Definition
Applications
Example: Frequent Itemsets
4 Hardware Support
ASICS
GPU’s
5 Projects
What projects can you do?
46 / 56
Example: Frequent Itemsets
Based in the Market-Basket Model
1 On the one hand, we have items.
2 On the other we have baskets, sometimes called “transactions.”
1 Each basket consists of a set of items (an itemset)
2 They are small.
Examples
1 {Cat, and, dog, bites}
2 {Yahoo, news, claims, cat, dog, and, produced, viable, offspring}
3 {Cat, killer, likely, is, a, big, dog}
4 {Professional, free, advice, on, dog, training, puppy}
47 / 56
Example: Frequent Itemsets
Based in the Market-Basket Model
1 On the one hand, we have items.
2 On the other we have baskets, sometimes called “transactions.”
1 Each basket consists of a set of items (an itemset)
2 They are small.
Examples
1 {Cat, and, dog, bites}
2 {Yahoo, news, claims, cat, dog, and, produced, viable, offspring}
3 {Cat, killer, likely, is, a, big, dog}
4 {Professional, free, advice, on, dog, training, puppy}
47 / 56
Example: Frequent Itemsets
Based in the Market-Basket Model
1 On the one hand, we have items.
2 On the other we have baskets, sometimes called “transactions.”
1 Each basket consists of a set of items (an itemset)
2 They are small.
Examples
1 {Cat, and, dog, bites}
2 {Yahoo, news, claims, cat, dog, and, produced, viable, offspring}
3 {Cat, killer, likely, is, a, big, dog}
4 {Professional, free, advice, on, dog, training, puppy}
47 / 56
Example: Frequent Itemsets
Based in the Market-Basket Model
1 On the one hand, we have items.
2 On the other we have baskets, sometimes called “transactions.”
1 Each basket consists of a set of items (an itemset)
2 They are small.
Examples
1 {Cat, and, dog, bites}
2 {Yahoo, news, claims, cat, dog, and, produced, viable, offspring}
3 {Cat, killer, likely, is, a, big, dog}
4 {Professional, free, advice, on, dog, training, puppy}
47 / 56
Example: Frequent Itemsets
Based in the Market-Basket Model
1 On the one hand, we have items.
2 On the other we have baskets, sometimes called “transactions.”
1 Each basket consists of a set of items (an itemset)
2 They are small.
Examples
1 {Cat, and, dog, bites}
2 {Yahoo, news, claims, cat, dog, and, produced, viable, offspring}
3 {Cat, killer, likely, is, a, big, dog}
4 {Professional, free, advice, on, dog, training, puppy}
47 / 56
Example: Frequent Itemsets
Based in the Market-Basket Model
1 On the one hand, we have items.
2 On the other we have baskets, sometimes called “transactions.”
1 Each basket consists of a set of items (an itemset)
2 They are small.
Examples
1 {Cat, and, dog, bites}
2 {Yahoo, news, claims, cat, dog, and, produced, viable, offspring}
3 {Cat, killer, likely, is, a, big, dog}
4 {Professional, free, advice, on, dog, training, puppy}
47 / 56
Example: Frequent Itemsets
Based in the Market-Basket Model
1 On the one hand, we have items.
2 On the other we have baskets, sometimes called “transactions.”
1 Each basket consists of a set of items (an itemset)
2 They are small.
Examples
1 {Cat, and, dog, bites}
2 {Yahoo, news, claims, cat, dog, and, produced, viable, offspring}
3 {Cat, killer, likely, is, a, big, dog}
4 {Professional, free, advice, on, dog, training, puppy}
47 / 56
Example: Frequent Itemsets
Based in the Market-Basket Model
1 On the one hand, we have items.
2 On the other we have baskets, sometimes called “transactions.”
1 Each basket consists of a set of items (an itemset)
2 They are small.
Examples
1 {Cat, and, dog, bites}
2 {Yahoo, news, claims, cat, dog, and, produced, viable, offspring}
3 {Cat, killer, likely, is, a, big, dog}
4 {Professional, free, advice, on, dog, training, puppy}
47 / 56
Example: Frequent Itemsets
Then, we do the following
Transaction ID Cat Dog and a mated
1 1 1 1 0 0
2 1 1 1 1 1
3 1 1 0 1 0
4 0 1 0 0 0
48 / 56
Combinatorial Problem
Problem
How many subsets we have?
But we can do the following
Given the itemset x in a database D and a set of transactions {ti }i∈I
supp(x, D) = |{ti ∈ D|x ∈ ti }| (2)
Then, setting a threshold
How many frequent (supp(x, D) > ) itemsets?
49 / 56
Combinatorial Problem
Problem
How many subsets we have?
But we can do the following
Given the itemset x in a database D and a set of transactions {ti }i∈I
supp(x, D) = |{ti ∈ D|x ∈ ti }| (2)
Then, setting a threshold
How many frequent (supp(x, D) > ) itemsets?
49 / 56
Combinatorial Problem
Problem
How many subsets we have?
But we can do the following
Given the itemset x in a database D and a set of transactions {ti }i∈I
supp(x, D) = |{ti ∈ D|x ∈ ti }| (2)
Then, setting a threshold
How many frequent (supp(x, D) > ) itemsets?
49 / 56
Outline
1 Why are we interested in Analyzing Data?
Intuitive Definition: The 3V’s
Complexity
Data Everywhere
2 Machine Learning
Machine Learning Process
Features
Classification
Clustering Analysis
3 Data Mining
Definition
Applications
Example: Frequent Itemsets
4 Hardware Support
ASICS
GPU’s
5 Projects
What projects can you do?
50 / 56
Hardware Solutions: ASICS
Application-Specific Integrated Circuit (ASIC)
An ASIC is an integrated circuit customized for a particular use, rather
than intended for general-purpose use.
It allows for
1 Lower Power Consumption.
2 Better Colling Approaches.
Example: From Microsoft Research
51 / 56
Hardware Solutions: ASICS
Application-Specific Integrated Circuit (ASIC)
An ASIC is an integrated circuit customized for a particular use, rather
than intended for general-purpose use.
It allows for
1 Lower Power Consumption.
2 Better Colling Approaches.
Example: From Microsoft Research
51 / 56
Hardware Solutions: ASICS
Application-Specific Integrated Circuit (ASIC)
An ASIC is an integrated circuit customized for a particular use, rather
than intended for general-purpose use.
It allows for
1 Lower Power Consumption.
2 Better Colling Approaches.
Example: From Microsoft Research
51 / 56
Outline
1 Why are we interested in Analyzing Data?
Intuitive Definition: The 3V’s
Complexity
Data Everywhere
2 Machine Learning
Machine Learning Process
Features
Classification
Clustering Analysis
3 Data Mining
Definition
Applications
Example: Frequent Itemsets
4 Hardware Support
ASICS
GPU’s
5 Projects
What projects can you do?
52 / 56
Hardware Solutions: GPU’s
IDEAS
Based on CUDA parallel computing architecture from Nvidia
Emphasis on executing many concurrent LIGHT threads instead of
one HEAVY thread as in CPUs
Hardware for 8800
53 / 56
Advantages
Massively parallel
Hundreds of cores, millions of threads
High throughput
Limitations
May not be applicable for all tasks
Generic hardware (CPUs) closing the gap
54 / 56
Outline
1 Why are we interested in Analyzing Data?
Intuitive Definition: The 3V’s
Complexity
Data Everywhere
2 Machine Learning
Machine Learning Process
Features
Classification
Clustering Analysis
3 Data Mining
Definition
Applications
Example: Frequent Itemsets
4 Hardware Support
ASICS
GPU’s
5 Projects
What projects can you do?
55 / 56
Projects
Possible topic are:
Oil exploration detection.
Association Rule Preprocessing Project.
Neural Network-Based Financial Market Forecasting Project.
Page Ranking - Improving over the Google Matrix
Influence Maximization in Social Networks.
Web Word Relevance Measures.
Recommendation Systems.
There are more possibilities at https://www.kaggle.com/competitions
56 / 56
Projects
Possible topic are:
Oil exploration detection.
Association Rule Preprocessing Project.
Neural Network-Based Financial Market Forecasting Project.
Page Ranking - Improving over the Google Matrix
Influence Maximization in Social Networks.
Web Word Relevance Measures.
Recommendation Systems.
There are more possibilities at https://www.kaggle.com/competitions
56 / 56
Projects
Possible topic are:
Oil exploration detection.
Association Rule Preprocessing Project.
Neural Network-Based Financial Market Forecasting Project.
Page Ranking - Improving over the Google Matrix
Influence Maximization in Social Networks.
Web Word Relevance Measures.
Recommendation Systems.
There are more possibilities at https://www.kaggle.com/competitions
56 / 56
Projects
Possible topic are:
Oil exploration detection.
Association Rule Preprocessing Project.
Neural Network-Based Financial Market Forecasting Project.
Page Ranking - Improving over the Google Matrix
Influence Maximization in Social Networks.
Web Word Relevance Measures.
Recommendation Systems.
There are more possibilities at https://www.kaggle.com/competitions
56 / 56
Projects
Possible topic are:
Oil exploration detection.
Association Rule Preprocessing Project.
Neural Network-Based Financial Market Forecasting Project.
Page Ranking - Improving over the Google Matrix
Influence Maximization in Social Networks.
Web Word Relevance Measures.
Recommendation Systems.
There are more possibilities at https://www.kaggle.com/competitions
56 / 56
Projects
Possible topic are:
Oil exploration detection.
Association Rule Preprocessing Project.
Neural Network-Based Financial Market Forecasting Project.
Page Ranking - Improving over the Google Matrix
Influence Maximization in Social Networks.
Web Word Relevance Measures.
Recommendation Systems.
There are more possibilities at https://www.kaggle.com/competitions
56 / 56
Projects
Possible topic are:
Oil exploration detection.
Association Rule Preprocessing Project.
Neural Network-Based Financial Market Forecasting Project.
Page Ranking - Improving over the Google Matrix
Influence Maximization in Social Networks.
Web Word Relevance Measures.
Recommendation Systems.
There are more possibilities at https://www.kaggle.com/competitions
56 / 56
Projects
Possible topic are:
Oil exploration detection.
Association Rule Preprocessing Project.
Neural Network-Based Financial Market Forecasting Project.
Page Ranking - Improving over the Google Matrix
Influence Maximization in Social Networks.
Web Word Relevance Measures.
Recommendation Systems.
There are more possibilities at https://www.kaggle.com/competitions
56 / 56

01 Machine Learning Introduction

  • 1.
    Machine Learning forData Mining Introduction Andres Mendez-Vazquez May 13, 2015 1 / 56
  • 2.
    Outline 1 Why arewe interested in Analyzing Data? Intuitive Definition: The 3V’s Complexity Data Everywhere 2 Machine Learning Machine Learning Process Features Classification Clustering Analysis 3 Data Mining Definition Applications Example: Frequent Itemsets 4 Hardware Support ASICS GPU’s 5 Projects What projects can you do? 2 / 56
  • 3.
    Outline 1 Why arewe interested in Analyzing Data? Intuitive Definition: The 3V’s Complexity Data Everywhere 2 Machine Learning Machine Learning Process Features Classification Clustering Analysis 3 Data Mining Definition Applications Example: Frequent Itemsets 4 Hardware Support ASICS GPU’s 5 Projects What projects can you do? 3 / 56
  • 4.
    Intuitive Definition: Volume Whenlooking at the Volumes of Information, we have: Volumes of it: Terabyte(1012), Petabyte(1015) and UP!!! Examples of these Volumes are 1 Records 2 Transactions 3 Web Searches 4 etc 4 / 56
  • 5.
    Intuitive Definition: Volume Whenlooking at the Volumes of Information, we have: Volumes of it: Terabyte(1012), Petabyte(1015) and UP!!! Examples of these Volumes are 1 Records 2 Transactions 3 Web Searches 4 etc 4 / 56
  • 6.
    Intuitive Definition: Volume Whenlooking at the Volumes of Information, we have: Volumes of it: Terabyte(1012), Petabyte(1015) and UP!!! Examples of these Volumes are 1 Records 2 Transactions 3 Web Searches 4 etc 4 / 56
  • 7.
    Intuitive Definition: Volume Whenlooking at the Volumes of Information, we have: Volumes of it: Terabyte(1012), Petabyte(1015) and UP!!! Examples of these Volumes are 1 Records 2 Transactions 3 Web Searches 4 etc 4 / 56
  • 8.
    Intuitive Definition: Volume Whenlooking at the Volumes of Information, we have: Volumes of it: Terabyte(1012), Petabyte(1015) and UP!!! Examples of these Volumes are 1 Records 2 Transactions 3 Web Searches 4 etc 4 / 56
  • 9.
    However Something Notable What constitutestruly “high” volume varies by industry and even geography!!! Simply look at the DNA data for a cellular cycle. Example 5 / 56
  • 10.
    However Something Notable What constitutestruly “high” volume varies by industry and even geography!!! Simply look at the DNA data for a cellular cycle. Example 5 / 56
  • 11.
    Intuitive Definition: Variety Whenlooking at the Structure of the Information, we have: Variety like there is not tomorrow: It is structured, semi-structured, unstructured So Do you have some examples of structures in Information? 6 / 56
  • 12.
    Intuitive Definition: Variety Whenlooking at the Structure of the Information, we have: Variety like there is not tomorrow: It is structured, semi-structured, unstructured So Do you have some examples of structures in Information? 6 / 56
  • 13.
    Intuitive Definition: Variety Whenlooking at the Structure of the Information, we have: Variety like there is not tomorrow: It is structured, semi-structured, unstructured So Do you have some examples of structures in Information? 6 / 56
  • 14.
    Intuitive Definition: Volume WhenLooking at the Velocity of this Information? Data in Motion!!! Velocity: Dynamic Generation Real Time Generation Problems with that: Latency Lag time between capture or generation and when it is available!!! 7 / 56
  • 15.
    Intuitive Definition: Volume WhenLooking at the Velocity of this Information? Data in Motion!!! Velocity: Dynamic Generation Real Time Generation Problems with that: Latency Lag time between capture or generation and when it is available!!! 7 / 56
  • 16.
    Intuitive Definition: Volume WhenLooking at the Velocity of this Information? Data in Motion!!! Velocity: Dynamic Generation Real Time Generation Problems with that: Latency Lag time between capture or generation and when it is available!!! 7 / 56
  • 17.
    Intuitive Definition: Volume WhenLooking at the Velocity of this Information? Data in Motion!!! Velocity: Dynamic Generation Real Time Generation Problems with that: Latency Lag time between capture or generation and when it is available!!! 7 / 56
  • 18.
    Intuitive Definition: Volume WhenLooking at the Velocity of this Information? Data in Motion!!! Velocity: Dynamic Generation Real Time Generation Problems with that: Latency Lag time between capture or generation and when it is available!!! 7 / 56
  • 19.
    For example Imagine thatI have a stream of m = 1025 integers with Ranges from [a1, ..., an] with n = 10, 000, 000 Now, somebody ask you to find the most frequent item!!! A naive algorithm 1 Take hash table with a counter. 2 Then, put numbers in the hash table. Problems Which problems we have? 8 / 56
  • 20.
    For example Imagine thatI have a stream of m = 1025 integers with Ranges from [a1, ..., an] with n = 10, 000, 000 Now, somebody ask you to find the most frequent item!!! A naive algorithm 1 Take hash table with a counter. 2 Then, put numbers in the hash table. Problems Which problems we have? 8 / 56
  • 21.
    For example Imagine thatI have a stream of m = 1025 integers with Ranges from [a1, ..., an] with n = 10, 000, 000 Now, somebody ask you to find the most frequent item!!! A naive algorithm 1 Take hash table with a counter. 2 Then, put numbers in the hash table. Problems Which problems we have? 8 / 56
  • 22.
    However There is the Count-MinSketch Algorithm Invented by Charikar, Chen and Farch-Colton in 2004 With Properties Space Used Error Probability Error O 1 log 1 δ · (log m + log n) δ 9 / 56
  • 23.
    However There is the Count-MinSketch Algorithm Invented by Charikar, Chen and Farch-Colton in 2004 With Properties Space Used Error Probability Error O 1 log 1 δ · (log m + log n) δ 9 / 56
  • 24.
    However There is the Count-MinSketch Algorithm Invented by Charikar, Chen and Farch-Colton in 2004 With Properties Space Used Error Probability Error O 1 log 1 δ · (log m + log n) δ 9 / 56
  • 25.
    Outline 1 Why arewe interested in Analyzing Data? Intuitive Definition: The 3V’s Complexity Data Everywhere 2 Machine Learning Machine Learning Process Features Classification Clustering Analysis 3 Data Mining Definition Applications Example: Frequent Itemsets 4 Hardware Support ASICS GPU’s 5 Projects What projects can you do? 10 / 56
  • 26.
    Complexity Given all thesethings It is necessary to correlate and share data across entities. It is necessary to link, match and transform data across business entities and systems. With this... Complexity goes through the roof!!! 11 / 56
  • 27.
    Complexity Given all thesethings It is necessary to correlate and share data across entities. It is necessary to link, match and transform data across business entities and systems. With this... Complexity goes through the roof!!! 11 / 56
  • 28.
    Complexity Given all thesethings It is necessary to correlate and share data across entities. It is necessary to link, match and transform data across business entities and systems. With this... Complexity goes through the roof!!! 11 / 56
  • 29.
    And it isthrough the roof!!! Linking open-data community project 12 / 56
  • 30.
    Cautionary Tale Something Notable In1880 the USA made a Census of the Population in different aspects: Population Mortality Agriculture Manufacturing However Once data was collected it took 7 years to say something!!! 13 / 56
  • 31.
    Cautionary Tale Something Notable In1880 the USA made a Census of the Population in different aspects: Population Mortality Agriculture Manufacturing However Once data was collected it took 7 years to say something!!! 13 / 56
  • 32.
    Cautionary Tale Something Notable In1880 the USA made a Census of the Population in different aspects: Population Mortality Agriculture Manufacturing However Once data was collected it took 7 years to say something!!! 13 / 56
  • 33.
    Cautionary Tale Something Notable In1880 the USA made a Census of the Population in different aspects: Population Mortality Agriculture Manufacturing However Once data was collected it took 7 years to say something!!! 13 / 56
  • 34.
    Cautionary Tale Something Notable In1880 the USA made a Census of the Population in different aspects: Population Mortality Agriculture Manufacturing However Once data was collected it took 7 years to say something!!! 13 / 56
  • 35.
    Cautionary Tale Something Notable In1880 the USA made a Census of the Population in different aspects: Population Mortality Agriculture Manufacturing However Once data was collected it took 7 years to say something!!! 13 / 56
  • 36.
    Cautionary Tale Something Notable In1880 the USA made a Census of the Population in different aspects: Population Mortality Agriculture Manufacturing However Once data was collected it took 7 years to say something!!! 13 / 56
  • 37.
    Ahhh... Thus, Hollering camewith the following machine (Circa 1890)!!! 14 / 56
  • 38.
    Hollering Tabulating Machine Itwas basically a sorter and counter Using punching cards as memories. And Mercury Sensors. Example 15 / 56
  • 39.
    Hollering Tabulating Machine Itwas basically a sorter and counter Using punching cards as memories. And Mercury Sensors. Example 15 / 56
  • 40.
    It was FAST!!! Ittook only!!! 2 years!!! Nevertheless in 1837 Babbage’s Difference engine was The First General Computer!!! Turing-complete!!! Way more complex than the tabulator!!! 53 years earlier!!! 16 / 56
  • 41.
    It was FAST!!! Ittook only!!! 2 years!!! Nevertheless in 1837 Babbage’s Difference engine was The First General Computer!!! Turing-complete!!! Way more complex than the tabulator!!! 53 years earlier!!! 16 / 56
  • 42.
    It was FAST!!! Ittook only!!! 2 years!!! Nevertheless in 1837 Babbage’s Difference engine was The First General Computer!!! Turing-complete!!! Way more complex than the tabulator!!! 53 years earlier!!! 16 / 56
  • 43.
    It was FAST!!! Ittook only!!! 2 years!!! Nevertheless in 1837 Babbage’s Difference engine was The First General Computer!!! Turing-complete!!! Way more complex than the tabulator!!! 53 years earlier!!! 16 / 56
  • 44.
    It was FAST!!! Ittook only!!! 2 years!!! Nevertheless in 1837 Babbage’s Difference engine was The First General Computer!!! Turing-complete!!! Way more complex than the tabulator!!! 53 years earlier!!! 16 / 56
  • 45.
  • 46.
    The Problem Actually, itnever reached completion because Babbage was actually a yucky project manager!!! 18 / 56
  • 47.
    The Problem Actually, itnever reached completion because Babbage was actually a yucky project manager!!! 18 / 56
  • 48.
    Outline 1 Why arewe interested in Analyzing Data? Intuitive Definition: The 3V’s Complexity Data Everywhere 2 Machine Learning Machine Learning Process Features Classification Clustering Analysis 3 Data Mining Definition Applications Example: Frequent Itemsets 4 Hardware Support ASICS GPU’s 5 Projects What projects can you do? 19 / 56
  • 49.
    Data is Everywhere! Lotsof data is being collected and warehoused Web data, e-commerce Purchases at department/ grocery stores Bank/Credit Card transactions Social Network Many Places 20 / 56
  • 50.
    Data is Everywhere! Lotsof data is being collected and warehoused Web data, e-commerce Purchases at department/ grocery stores Bank/Credit Card transactions Social Network Many Places 20 / 56
  • 51.
    Data is Everywhere! Lotsof data is being collected and warehoused Web data, e-commerce Purchases at department/ grocery stores Bank/Credit Card transactions Social Network Many Places 20 / 56
  • 52.
    Data is Everywhere! Lotsof data is being collected and warehoused Web data, e-commerce Purchases at department/ grocery stores Bank/Credit Card transactions Social Network Many Places 20 / 56
  • 53.
    Data is Everywhere! Lotsof data is being collected and warehoused Web data, e-commerce Purchases at department/ grocery stores Bank/Credit Card transactions Social Network Many Places 20 / 56
  • 54.
    The Staggering Numbers AOcean of Data How many data in the world? 800 Terabytes, 2000 160 Exabytes, 2006 500 Exabytes (Internet), 2009 2.7 Zettabytes, 2012 35 Zettabytes by 2020 Generation How many data generated ONE day? 7 TB, Twitter 10 TB, Facebook Source: “Big data: The next frontier for innovation, competition, and pro- ductivity” McKinsey Global Institute 2011 21 / 56
  • 55.
    The Staggering Numbers AOcean of Data How many data in the world? 800 Terabytes, 2000 160 Exabytes, 2006 500 Exabytes (Internet), 2009 2.7 Zettabytes, 2012 35 Zettabytes by 2020 Generation How many data generated ONE day? 7 TB, Twitter 10 TB, Facebook Source: “Big data: The next frontier for innovation, competition, and pro- ductivity” McKinsey Global Institute 2011 21 / 56
  • 56.
    The Staggering Numbers AOcean of Data How many data in the world? 800 Terabytes, 2000 160 Exabytes, 2006 500 Exabytes (Internet), 2009 2.7 Zettabytes, 2012 35 Zettabytes by 2020 Generation How many data generated ONE day? 7 TB, Twitter 10 TB, Facebook Source: “Big data: The next frontier for innovation, competition, and pro- ductivity” McKinsey Global Institute 2011 21 / 56
  • 57.
    The Staggering Numbers AOcean of Data How many data in the world? 800 Terabytes, 2000 160 Exabytes, 2006 500 Exabytes (Internet), 2009 2.7 Zettabytes, 2012 35 Zettabytes by 2020 Generation How many data generated ONE day? 7 TB, Twitter 10 TB, Facebook Source: “Big data: The next frontier for innovation, competition, and pro- ductivity” McKinsey Global Institute 2011 21 / 56
  • 58.
    The Staggering Numbers AOcean of Data How many data in the world? 800 Terabytes, 2000 160 Exabytes, 2006 500 Exabytes (Internet), 2009 2.7 Zettabytes, 2012 35 Zettabytes by 2020 Generation How many data generated ONE day? 7 TB, Twitter 10 TB, Facebook Source: “Big data: The next frontier for innovation, competition, and pro- ductivity” McKinsey Global Institute 2011 21 / 56
  • 59.
    The Staggering Numbers AOcean of Data How many data in the world? 800 Terabytes, 2000 160 Exabytes, 2006 500 Exabytes (Internet), 2009 2.7 Zettabytes, 2012 35 Zettabytes by 2020 Generation How many data generated ONE day? 7 TB, Twitter 10 TB, Facebook Source: “Big data: The next frontier for innovation, competition, and pro- ductivity” McKinsey Global Institute 2011 21 / 56
  • 60.
    The Staggering Numbers AOcean of Data How many data in the world? 800 Terabytes, 2000 160 Exabytes, 2006 500 Exabytes (Internet), 2009 2.7 Zettabytes, 2012 35 Zettabytes by 2020 Generation How many data generated ONE day? 7 TB, Twitter 10 TB, Facebook Source: “Big data: The next frontier for innovation, competition, and pro- ductivity” McKinsey Global Institute 2011 21 / 56
  • 61.
    The Staggering Numbers AOcean of Data How many data in the world? 800 Terabytes, 2000 160 Exabytes, 2006 500 Exabytes (Internet), 2009 2.7 Zettabytes, 2012 35 Zettabytes by 2020 Generation How many data generated ONE day? 7 TB, Twitter 10 TB, Facebook Source: “Big data: The next frontier for innovation, competition, and pro- ductivity” McKinsey Global Institute 2011 21 / 56
  • 62.
    The Staggering Numbers AOcean of Data How many data in the world? 800 Terabytes, 2000 160 Exabytes, 2006 500 Exabytes (Internet), 2009 2.7 Zettabytes, 2012 35 Zettabytes by 2020 Generation How many data generated ONE day? 7 TB, Twitter 10 TB, Facebook Source: “Big data: The next frontier for innovation, competition, and pro- ductivity” McKinsey Global Institute 2011 21 / 56
  • 63.
    Type of Data Thus RelationalData (Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) And more... Graph Data Social Network, Semantic Web (RDF), . . . Streaming Data You can only scan the data once 22 / 56
  • 64.
    Type of Data Thus RelationalData (Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) And more... Graph Data Social Network, Semantic Web (RDF), . . . Streaming Data You can only scan the data once 22 / 56
  • 65.
    Type of Data Thus RelationalData (Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) And more... Graph Data Social Network, Semantic Web (RDF), . . . Streaming Data You can only scan the data once 22 / 56
  • 66.
    Type of Data Thus RelationalData (Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) And more... Graph Data Social Network, Semantic Web (RDF), . . . Streaming Data You can only scan the data once 22 / 56
  • 67.
    Type of Data Thus RelationalData (Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) And more... Graph Data Social Network, Semantic Web (RDF), . . . Streaming Data You can only scan the data once 22 / 56
  • 68.
    Type of Data Thus RelationalData (Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) And more... Graph Data Social Network, Semantic Web (RDF), . . . Streaming Data You can only scan the data once 22 / 56
  • 69.
    Type of Data Thus RelationalData (Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) And more... Graph Data Social Network, Semantic Web (RDF), . . . Streaming Data You can only scan the data once 22 / 56
  • 70.
    The Ever GrowingLandscape 23 / 56
  • 71.
    Machine Learning Definition Algorithms ortechniques that enable computer (machine) to “learn” from data. Related with many areas such as data mining, statistics, information theory, etc. Algorithm Types: Unsupervised Learning Supervised learning Reinforcement learning Examples Artificial Neural Network (ANN) Support Vector Machine (SVM) Expectation-Maximization (EM) Deterministic Annealing (DA) 24 / 56
  • 72.
    Machine Learning Definition Algorithms ortechniques that enable computer (machine) to “learn” from data. Related with many areas such as data mining, statistics, information theory, etc. Algorithm Types: Unsupervised Learning Supervised learning Reinforcement learning Examples Artificial Neural Network (ANN) Support Vector Machine (SVM) Expectation-Maximization (EM) Deterministic Annealing (DA) 24 / 56
  • 73.
    Machine Learning Definition Algorithms ortechniques that enable computer (machine) to “learn” from data. Related with many areas such as data mining, statistics, information theory, etc. Algorithm Types: Unsupervised Learning Supervised learning Reinforcement learning Examples Artificial Neural Network (ANN) Support Vector Machine (SVM) Expectation-Maximization (EM) Deterministic Annealing (DA) 24 / 56
  • 74.
    Machine Learning Definition Algorithms ortechniques that enable computer (machine) to “learn” from data. Related with many areas such as data mining, statistics, information theory, etc. Algorithm Types: Unsupervised Learning Supervised learning Reinforcement learning Examples Artificial Neural Network (ANN) Support Vector Machine (SVM) Expectation-Maximization (EM) Deterministic Annealing (DA) 24 / 56
  • 75.
    Machine Learning Definition Algorithms ortechniques that enable computer (machine) to “learn” from data. Related with many areas such as data mining, statistics, information theory, etc. Algorithm Types: Unsupervised Learning Supervised learning Reinforcement learning Examples Artificial Neural Network (ANN) Support Vector Machine (SVM) Expectation-Maximization (EM) Deterministic Annealing (DA) 24 / 56
  • 76.
    Machine Learning Definition Algorithms ortechniques that enable computer (machine) to “learn” from data. Related with many areas such as data mining, statistics, information theory, etc. Algorithm Types: Unsupervised Learning Supervised learning Reinforcement learning Examples Artificial Neural Network (ANN) Support Vector Machine (SVM) Expectation-Maximization (EM) Deterministic Annealing (DA) 24 / 56
  • 77.
    Machine Learning Definition Algorithms ortechniques that enable computer (machine) to “learn” from data. Related with many areas such as data mining, statistics, information theory, etc. Algorithm Types: Unsupervised Learning Supervised learning Reinforcement learning Examples Artificial Neural Network (ANN) Support Vector Machine (SVM) Expectation-Maximization (EM) Deterministic Annealing (DA) 24 / 56
  • 78.
    Machine Learning Definition Algorithms ortechniques that enable computer (machine) to “learn” from data. Related with many areas such as data mining, statistics, information theory, etc. Algorithm Types: Unsupervised Learning Supervised learning Reinforcement learning Examples Artificial Neural Network (ANN) Support Vector Machine (SVM) Expectation-Maximization (EM) Deterministic Annealing (DA) 24 / 56
  • 79.
    Outline 1 Why arewe interested in Analyzing Data? Intuitive Definition: The 3V’s Complexity Data Everywhere 2 Machine Learning Machine Learning Process Features Classification Clustering Analysis 3 Data Mining Definition Applications Example: Frequent Itemsets 4 Hardware Support ASICS GPU’s 5 Projects What projects can you do? 25 / 56
  • 80.
    Machine Learning Process Process 1Feature Extraction/Feature Generation 2 Clustering ≈ Class Identification ≈ Unsupervised Learning 3 Classification ≈ Supervised Learning Then... We start thinking: We need to process a lot of data... Or... LARGE SCALE MACHINE LEARNING 26 / 56
  • 81.
    Machine Learning Process Process 1Feature Extraction/Feature Generation 2 Clustering ≈ Class Identification ≈ Unsupervised Learning 3 Classification ≈ Supervised Learning Then... We start thinking: We need to process a lot of data... Or... LARGE SCALE MACHINE LEARNING 26 / 56
  • 82.
    Machine Learning Process Process 1Feature Extraction/Feature Generation 2 Clustering ≈ Class Identification ≈ Unsupervised Learning 3 Classification ≈ Supervised Learning Then... We start thinking: We need to process a lot of data... Or... LARGE SCALE MACHINE LEARNING 26 / 56
  • 83.
    Outline 1 Why arewe interested in Analyzing Data? Intuitive Definition: The 3V’s Complexity Data Everywhere 2 Machine Learning Machine Learning Process Features Classification Clustering Analysis 3 Data Mining Definition Applications Example: Frequent Itemsets 4 Hardware Support ASICS GPU’s 5 Projects What projects can you do? 27 / 56
  • 84.
    Feature Generation/Dimensionality Reduction FeatureGeneration Given a set of measurements, the goal is to discover compact and informative representations of the obtained data. Examples 1 The Karhunen–Loève transform ≈ Principal Component Analysis 1 Popular for feature generation and Dimensionality Reduction 2 The Singular Value Decomposition 1 Used for Dimensionality Reduction 28 / 56
  • 85.
    Feature Generation/Dimensionality Reduction FeatureGeneration Given a set of measurements, the goal is to discover compact and informative representations of the obtained data. Examples 1 The Karhunen–Loève transform ≈ Principal Component Analysis 1 Popular for feature generation and Dimensionality Reduction 2 The Singular Value Decomposition 1 Used for Dimensionality Reduction 28 / 56
  • 86.
    Feature Generation/Dimensionality Reduction FeatureGeneration Given a set of measurements, the goal is to discover compact and informative representations of the obtained data. Examples 1 The Karhunen–Loève transform ≈ Principal Component Analysis 1 Popular for feature generation and Dimensionality Reduction 2 The Singular Value Decomposition 1 Used for Dimensionality Reduction 28 / 56
  • 87.
    Feature Generation/Dimensionality Reduction FeatureGeneration Given a set of measurements, the goal is to discover compact and informative representations of the obtained data. Examples 1 The Karhunen–Loève transform ≈ Principal Component Analysis 1 Popular for feature generation and Dimensionality Reduction 2 The Singular Value Decomposition 1 Used for Dimensionality Reduction 28 / 56
  • 88.
    Dimension Reduction/Feature Extraction Definition Processto transform high-dimensional data into low-dimensional ones for improving accuracy, understanding, or removing noises. Why? Curse of dimensionality: Complexity grows exponentially in volume by adding extra dimensions. 29 / 56
  • 89.
    Dimension Reduction/Feature Extraction Definition Processto transform high-dimensional data into low-dimensional ones for improving accuracy, understanding, or removing noises. Why? Curse of dimensionality: Complexity grows exponentially in volume by adding extra dimensions. 29 / 56
  • 90.
    Feature Selection Feature Selection Whichfeatures should be used for the classifier? Why? The Curse of Dimensionality!!! Hypothesis Testing to discriminate good features 30 / 56
  • 91.
    Feature Selection Feature Selection Whichfeatures should be used for the classifier? Why? The Curse of Dimensionality!!! 30 / 56
  • 92.
    What can bedone? Measures for Class Separability Example: Between-class scatter matrix: Sb = M i=1 Pi (µi − µ0) (µi − µ0)T (1) Where: µ0 is the global mean vector, µ0 = M i=1 Pi µi . µi the median of class ωi . Pi ∼= ni N . 31 / 56
  • 93.
    What can bedone? Measures for Class Separability Example: Between-class scatter matrix: Sb = M i=1 Pi (µi − µ0) (µi − µ0)T (1) Where: µ0 is the global mean vector, µ0 = M i=1 Pi µi . µi the median of class ωi . Pi ∼= ni N . 31 / 56
  • 94.
    What can bedone? Measures for Class Separability Example: Between-class scatter matrix: Sb = M i=1 Pi (µi − µ0) (µi − µ0)T (1) Where: µ0 is the global mean vector, µ0 = M i=1 Pi µi . µi the median of class ωi . Pi ∼= ni N . 31 / 56
  • 95.
    What can bedone? Measures for Class Separability Example: Between-class scatter matrix: Sb = M i=1 Pi (µi − µ0) (µi − µ0)T (1) Where: µ0 is the global mean vector, µ0 = M i=1 Pi µi . µi the median of class ωi . Pi ∼= ni N . 31 / 56
  • 96.
    What can bedone? Feature Subset Selection Examples: Filter Approach All combinations of features are used together with a separability measure. Wrapper Approach: Use the decided classifier itself to find the best set. 32 / 56
  • 97.
    What can bedone? Feature Subset Selection Examples: Filter Approach All combinations of features are used together with a separability measure. Wrapper Approach: Use the decided classifier itself to find the best set. 32 / 56
  • 98.
    What can bedone? Feature Subset Selection Examples: Filter Approach All combinations of features are used together with a separability measure. Wrapper Approach: Use the decided classifier itself to find the best set. 32 / 56
  • 99.
    What can bedone? Feature Subset Selection Examples: Filter Approach All combinations of features are used together with a separability measure. Wrapper Approach: Use the decided classifier itself to find the best set. 32 / 56
  • 100.
    What can bedone? Feature Subset Selection Examples: Filter Approach All combinations of features are used together with a separability measure. Wrapper Approach: Use the decided classifier itself to find the best set. 32 / 56
  • 101.
    Outline 1 Why arewe interested in Analyzing Data? Intuitive Definition: The 3V’s Complexity Data Everywhere 2 Machine Learning Machine Learning Process Features Classification Clustering Analysis 3 Data Mining Definition Applications Example: Frequent Itemsets 4 Hardware Support ASICS GPU’s 5 Projects What projects can you do? 33 / 56
  • 102.
    Classification Definition A procedure dividingdata into the given set of categories based on the training set in a supervised way. What we want from classification? Generalization Vs. Specification Hard to achieve both Avoid - overfitting/overtraining Early stopping Holdout validation K-fold cross validation Leave-one-out cross-validation 34 / 56
  • 103.
    Classification Definition A procedure dividingdata into the given set of categories based on the training set in a supervised way. What we want from classification? Generalization Vs. Specification Hard to achieve both Avoid - overfitting/overtraining Early stopping Holdout validation K-fold cross validation Leave-one-out cross-validation 34 / 56
  • 104.
    Classification Definition A procedure dividingdata into the given set of categories based on the training set in a supervised way. What we want from classification? Generalization Vs. Specification Hard to achieve both Avoid - overfitting/overtraining Early stopping Holdout validation K-fold cross validation Leave-one-out cross-validation 34 / 56
  • 105.
    Classification Definition A procedure dividingdata into the given set of categories based on the training set in a supervised way. What we want from classification? Generalization Vs. Specification Hard to achieve both Avoid - overfitting/overtraining Early stopping Holdout validation K-fold cross validation Leave-one-out cross-validation 34 / 56
  • 106.
    Classification Definition A procedure dividingdata into the given set of categories based on the training set in a supervised way. What we want from classification? Generalization Vs. Specification Hard to achieve both Avoid - overfitting/overtraining Early stopping Holdout validation K-fold cross validation Leave-one-out cross-validation 34 / 56
  • 107.
    Classification Definition A procedure dividingdata into the given set of categories based on the training set in a supervised way. What we want from classification? Generalization Vs. Specification Hard to achieve both Avoid - overfitting/overtraining Early stopping Holdout validation K-fold cross validation Leave-one-out cross-validation 34 / 56
  • 108.
    Classification Definition A procedure dividingdata into the given set of categories based on the training set in a supervised way. What we want from classification? Generalization Vs. Specification Hard to achieve both Avoid - overfitting/overtraining Early stopping Holdout validation K-fold cross validation Leave-one-out cross-validation 34 / 56
  • 109.
    Classification Definition A procedure dividingdata into the given set of categories based on the training set in a supervised way. What we want from classification? Generalization Vs. Specification Hard to achieve both Avoid - overfitting/overtraining Early stopping Holdout validation K-fold cross validation Leave-one-out cross-validation 34 / 56
  • 110.
    Avoid - overfitting/overtraining Validationand Training Error Underfitting Overfitting Validation Error Training Error 35 / 56
  • 111.
    Examples of ClassificationAlgorithms Many Possible Algorithms Linear Classifiers: Perceptron Probability Classifiers: Naive Bayes Kernel Methods Classifiers : Support Vector Machines Non-Linear Classifiers: Artificial Neural Networks Graph Model Classifiers: . . . 36 / 56
  • 112.
    Examples of ClassificationAlgorithms Many Possible Algorithms Linear Classifiers: Perceptron Probability Classifiers: Naive Bayes Kernel Methods Classifiers : Support Vector Machines Non-Linear Classifiers: Artificial Neural Networks Graph Model Classifiers: . . . 36 / 56
  • 113.
    Examples of ClassificationAlgorithms Many Possible Algorithms Linear Classifiers: Perceptron Probability Classifiers: Naive Bayes Kernel Methods Classifiers : Support Vector Machines Non-Linear Classifiers: Artificial Neural Networks Graph Model Classifiers: . . . 36 / 56
  • 114.
    Examples of ClassificationAlgorithms Many Possible Algorithms Linear Classifiers: Perceptron Probability Classifiers: Naive Bayes Kernel Methods Classifiers : Support Vector Machines Non-Linear Classifiers: Artificial Neural Networks Graph Model Classifiers: . . . 36 / 56
  • 115.
    Examples of ClassificationAlgorithms Many Possible Algorithms Linear Classifiers: Perceptron Probability Classifiers: Naive Bayes Kernel Methods Classifiers : Support Vector Machines Non-Linear Classifiers: Artificial Neural Networks Graph Model Classifiers: . . . 36 / 56
  • 116.
    Examples of ClassificationAlgorithms Many Possible Algorithms Linear Classifiers: Perceptron Probability Classifiers: Naive Bayes Kernel Methods Classifiers : Support Vector Machines Non-Linear Classifiers: Artificial Neural Networks Graph Model Classifiers: . . . 36 / 56
  • 117.
    Outline 1 Why arewe interested in Analyzing Data? Intuitive Definition: The 3V’s Complexity Data Everywhere 2 Machine Learning Machine Learning Process Features Classification Clustering Analysis 3 Data Mining Definition Applications Example: Frequent Itemsets 4 Hardware Support ASICS GPU’s 5 Projects What projects can you do? 37 / 56
  • 118.
    Clustering Analysis Definition Grouping unlabeleddata into clusters, for the purpose of inference of hidden structures or information. Using, for example Dissimilarity measurement Angle : Inner product, . . . Non-metric : Rank, Intensity, . . . Distance : Euclidean (l2), Manhattan(l1), . . . 38 / 56
  • 119.
    Clustering Analysis Definition Grouping unlabeleddata into clusters, for the purpose of inference of hidden structures or information. Using, for example Dissimilarity measurement Angle : Inner product, . . . Non-metric : Rank, Intensity, . . . Distance : Euclidean (l2), Manhattan(l1), . . . 38 / 56
  • 120.
    Clustering Analysis Definition Grouping unlabeleddata into clusters, for the purpose of inference of hidden structures or information. Using, for example Dissimilarity measurement Angle : Inner product, . . . Non-metric : Rank, Intensity, . . . Distance : Euclidean (l2), Manhattan(l1), . . . 38 / 56
  • 121.
    Clustering Analysis Definition Grouping unlabeleddata into clusters, for the purpose of inference of hidden structures or information. Using, for example Dissimilarity measurement Angle : Inner product, . . . Non-metric : Rank, Intensity, . . . Distance : Euclidean (l2), Manhattan(l1), . . . 38 / 56
  • 122.
    Clustering Analysis Definition Grouping unlabeleddata into clusters, for the purpose of inference of hidden structures or information. Using, for example Dissimilarity measurement Angle : Inner product, . . . Non-metric : Rank, Intensity, . . . Distance : Euclidean (l2), Manhattan(l1), . . . 38 / 56
  • 123.
  • 124.
    Examples of ClusteringAlgorithms Clustering 1 Basic Clustering Algorithms 1 K-means 2 Clustering Based in Cost Functions 1 Fuzzy C-means 2 Possibilistic 3 Hierarchical Clustering 1 Entropy based 4 Clustering Based in Graph Theory 40 / 56
  • 125.
    Examples of ClusteringAlgorithms Clustering 1 Basic Clustering Algorithms 1 K-means 2 Clustering Based in Cost Functions 1 Fuzzy C-means 2 Possibilistic 3 Hierarchical Clustering 1 Entropy based 4 Clustering Based in Graph Theory 40 / 56
  • 126.
    Examples of ClusteringAlgorithms Clustering 1 Basic Clustering Algorithms 1 K-means 2 Clustering Based in Cost Functions 1 Fuzzy C-means 2 Possibilistic 3 Hierarchical Clustering 1 Entropy based 4 Clustering Based in Graph Theory 40 / 56
  • 127.
    Examples of ClusteringAlgorithms Clustering 1 Basic Clustering Algorithms 1 K-means 2 Clustering Based in Cost Functions 1 Fuzzy C-means 2 Possibilistic 3 Hierarchical Clustering 1 Entropy based 4 Clustering Based in Graph Theory 40 / 56
  • 128.
    Examples of ClusteringAlgorithms Clustering 1 Basic Clustering Algorithms 1 K-means 2 Clustering Based in Cost Functions 1 Fuzzy C-means 2 Possibilistic 3 Hierarchical Clustering 1 Entropy based 4 Clustering Based in Graph Theory 40 / 56
  • 129.
    Examples of ClusteringAlgorithms Clustering 1 Basic Clustering Algorithms 1 K-means 2 Clustering Based in Cost Functions 1 Fuzzy C-means 2 Possibilistic 3 Hierarchical Clustering 1 Entropy based 4 Clustering Based in Graph Theory 40 / 56
  • 130.
    Examples of ClusteringAlgorithms Clustering 1 Basic Clustering Algorithms 1 K-means 2 Clustering Based in Cost Functions 1 Fuzzy C-means 2 Possibilistic 3 Hierarchical Clustering 1 Entropy based 4 Clustering Based in Graph Theory 40 / 56
  • 131.
    Examples of ClusteringAlgorithms Clustering 1 Basic Clustering Algorithms 1 K-means 2 Clustering Based in Cost Functions 1 Fuzzy C-means 2 Possibilistic 3 Hierarchical Clustering 1 Entropy based 4 Clustering Based in Graph Theory 40 / 56
  • 132.
    Outline 1 Why arewe interested in Analyzing Data? Intuitive Definition: The 3V’s Complexity Data Everywhere 2 Machine Learning Machine Learning Process Features Classification Clustering Analysis 3 Data Mining Definition Applications Example: Frequent Itemsets 4 Hardware Support ASICS GPU’s 5 Projects What projects can you do? 41 / 56
  • 133.
    What Is DataMining? Data mining (knowledge discovery in databases): Extraction of interesting information or patterns from data in large databases. Alternative names and their “inside stories”: Knowledge discovery(mining) in databases (KDD) Knowledge extraction Data/pattern analysis Data archeology Business intelligence etc. 42 / 56
  • 134.
    What Is DataMining? Data mining (knowledge discovery in databases): Extraction of interesting information or patterns from data in large databases. Alternative names and their “inside stories”: Knowledge discovery(mining) in databases (KDD) Knowledge extraction Data/pattern analysis Data archeology Business intelligence etc. 42 / 56
  • 135.
    What Is DataMining? Data mining (knowledge discovery in databases): Extraction of interesting information or patterns from data in large databases. Alternative names and their “inside stories”: Knowledge discovery(mining) in databases (KDD) Knowledge extraction Data/pattern analysis Data archeology Business intelligence etc. 42 / 56
  • 136.
    What Is DataMining? Data mining (knowledge discovery in databases): Extraction of interesting information or patterns from data in large databases. Alternative names and their “inside stories”: Knowledge discovery(mining) in databases (KDD) Knowledge extraction Data/pattern analysis Data archeology Business intelligence etc. 42 / 56
  • 137.
    What Is DataMining? Data mining (knowledge discovery in databases): Extraction of interesting information or patterns from data in large databases. Alternative names and their “inside stories”: Knowledge discovery(mining) in databases (KDD) Knowledge extraction Data/pattern analysis Data archeology Business intelligence etc. 42 / 56
  • 138.
    What Is DataMining? Data mining (knowledge discovery in databases): Extraction of interesting information or patterns from data in large databases. Alternative names and their “inside stories”: Knowledge discovery(mining) in databases (KDD) Knowledge extraction Data/pattern analysis Data archeology Business intelligence etc. 42 / 56
  • 139.
    Examples: What is(not) Data Mining? What is not Data Mining? 1 Look up phone number in phone directory 2 Query a Web search engine for information about “Amazon” What is Data Mining? 1 Certain names are more prevalent in certain US locations (O’Brien, O’Rurke, O’Reilly. . . in Boston area) 2 Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com) 43 / 56
  • 140.
    Examples: What is(not) Data Mining? What is not Data Mining? 1 Look up phone number in phone directory 2 Query a Web search engine for information about “Amazon” What is Data Mining? 1 Certain names are more prevalent in certain US locations (O’Brien, O’Rurke, O’Reilly. . . in Boston area) 2 Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com) 43 / 56
  • 141.
    Examples: What is(not) Data Mining? What is not Data Mining? 1 Look up phone number in phone directory 2 Query a Web search engine for information about “Amazon” What is Data Mining? 1 Certain names are more prevalent in certain US locations (O’Brien, O’Rurke, O’Reilly. . . in Boston area) 2 Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com) 43 / 56
  • 142.
    Examples: What is(not) Data Mining? What is not Data Mining? 1 Look up phone number in phone directory 2 Query a Web search engine for information about “Amazon” What is Data Mining? 1 Certain names are more prevalent in certain US locations (O’Brien, O’Rurke, O’Reilly. . . in Boston area) 2 Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com) 43 / 56
  • 143.
    Outline 1 Why arewe interested in Analyzing Data? Intuitive Definition: The 3V’s Complexity Data Everywhere 2 Machine Learning Machine Learning Process Features Classification Clustering Analysis 3 Data Mining Definition Applications Example: Frequent Itemsets 4 Hardware Support ASICS GPU’s 5 Projects What projects can you do? 44 / 56
  • 144.
    Data mining Applications Applications Miningthe Web for Structured Data Near Neighbor Search in High Dimensional Data. Frequent itemsets and Association Rules Structure of the webgraph PageRank Link Analysis Proximity on Graphs Mining data streams. Large scale supervised machine learning techniques. 45 / 56
  • 145.
    Data mining Applications Applications Miningthe Web for Structured Data Near Neighbor Search in High Dimensional Data. Frequent itemsets and Association Rules Structure of the webgraph PageRank Link Analysis Proximity on Graphs Mining data streams. Large scale supervised machine learning techniques. 45 / 56
  • 146.
    Data mining Applications Applications Miningthe Web for Structured Data Near Neighbor Search in High Dimensional Data. Frequent itemsets and Association Rules Structure of the webgraph PageRank Link Analysis Proximity on Graphs Mining data streams. Large scale supervised machine learning techniques. 45 / 56
  • 147.
    Data mining Applications Applications Miningthe Web for Structured Data Near Neighbor Search in High Dimensional Data. Frequent itemsets and Association Rules Structure of the webgraph PageRank Link Analysis Proximity on Graphs Mining data streams. Large scale supervised machine learning techniques. 45 / 56
  • 148.
    Data mining Applications Applications Miningthe Web for Structured Data Near Neighbor Search in High Dimensional Data. Frequent itemsets and Association Rules Structure of the webgraph PageRank Link Analysis Proximity on Graphs Mining data streams. Large scale supervised machine learning techniques. 45 / 56
  • 149.
    Data mining Applications Applications Miningthe Web for Structured Data Near Neighbor Search in High Dimensional Data. Frequent itemsets and Association Rules Structure of the webgraph PageRank Link Analysis Proximity on Graphs Mining data streams. Large scale supervised machine learning techniques. 45 / 56
  • 150.
    Data mining Applications Applications Miningthe Web for Structured Data Near Neighbor Search in High Dimensional Data. Frequent itemsets and Association Rules Structure of the webgraph PageRank Link Analysis Proximity on Graphs Mining data streams. Large scale supervised machine learning techniques. 45 / 56
  • 151.
    Data mining Applications Applications Miningthe Web for Structured Data Near Neighbor Search in High Dimensional Data. Frequent itemsets and Association Rules Structure of the webgraph PageRank Link Analysis Proximity on Graphs Mining data streams. Large scale supervised machine learning techniques. 45 / 56
  • 152.
    Data mining Applications Applications Miningthe Web for Structured Data Near Neighbor Search in High Dimensional Data. Frequent itemsets and Association Rules Structure of the webgraph PageRank Link Analysis Proximity on Graphs Mining data streams. Large scale supervised machine learning techniques. 45 / 56
  • 153.
    Outline 1 Why arewe interested in Analyzing Data? Intuitive Definition: The 3V’s Complexity Data Everywhere 2 Machine Learning Machine Learning Process Features Classification Clustering Analysis 3 Data Mining Definition Applications Example: Frequent Itemsets 4 Hardware Support ASICS GPU’s 5 Projects What projects can you do? 46 / 56
  • 154.
    Example: Frequent Itemsets Basedin the Market-Basket Model 1 On the one hand, we have items. 2 On the other we have baskets, sometimes called “transactions.” 1 Each basket consists of a set of items (an itemset) 2 They are small. Examples 1 {Cat, and, dog, bites} 2 {Yahoo, news, claims, cat, dog, and, produced, viable, offspring} 3 {Cat, killer, likely, is, a, big, dog} 4 {Professional, free, advice, on, dog, training, puppy} 47 / 56
  • 155.
    Example: Frequent Itemsets Basedin the Market-Basket Model 1 On the one hand, we have items. 2 On the other we have baskets, sometimes called “transactions.” 1 Each basket consists of a set of items (an itemset) 2 They are small. Examples 1 {Cat, and, dog, bites} 2 {Yahoo, news, claims, cat, dog, and, produced, viable, offspring} 3 {Cat, killer, likely, is, a, big, dog} 4 {Professional, free, advice, on, dog, training, puppy} 47 / 56
  • 156.
    Example: Frequent Itemsets Basedin the Market-Basket Model 1 On the one hand, we have items. 2 On the other we have baskets, sometimes called “transactions.” 1 Each basket consists of a set of items (an itemset) 2 They are small. Examples 1 {Cat, and, dog, bites} 2 {Yahoo, news, claims, cat, dog, and, produced, viable, offspring} 3 {Cat, killer, likely, is, a, big, dog} 4 {Professional, free, advice, on, dog, training, puppy} 47 / 56
  • 157.
    Example: Frequent Itemsets Basedin the Market-Basket Model 1 On the one hand, we have items. 2 On the other we have baskets, sometimes called “transactions.” 1 Each basket consists of a set of items (an itemset) 2 They are small. Examples 1 {Cat, and, dog, bites} 2 {Yahoo, news, claims, cat, dog, and, produced, viable, offspring} 3 {Cat, killer, likely, is, a, big, dog} 4 {Professional, free, advice, on, dog, training, puppy} 47 / 56
  • 158.
    Example: Frequent Itemsets Basedin the Market-Basket Model 1 On the one hand, we have items. 2 On the other we have baskets, sometimes called “transactions.” 1 Each basket consists of a set of items (an itemset) 2 They are small. Examples 1 {Cat, and, dog, bites} 2 {Yahoo, news, claims, cat, dog, and, produced, viable, offspring} 3 {Cat, killer, likely, is, a, big, dog} 4 {Professional, free, advice, on, dog, training, puppy} 47 / 56
  • 159.
    Example: Frequent Itemsets Basedin the Market-Basket Model 1 On the one hand, we have items. 2 On the other we have baskets, sometimes called “transactions.” 1 Each basket consists of a set of items (an itemset) 2 They are small. Examples 1 {Cat, and, dog, bites} 2 {Yahoo, news, claims, cat, dog, and, produced, viable, offspring} 3 {Cat, killer, likely, is, a, big, dog} 4 {Professional, free, advice, on, dog, training, puppy} 47 / 56
  • 160.
    Example: Frequent Itemsets Basedin the Market-Basket Model 1 On the one hand, we have items. 2 On the other we have baskets, sometimes called “transactions.” 1 Each basket consists of a set of items (an itemset) 2 They are small. Examples 1 {Cat, and, dog, bites} 2 {Yahoo, news, claims, cat, dog, and, produced, viable, offspring} 3 {Cat, killer, likely, is, a, big, dog} 4 {Professional, free, advice, on, dog, training, puppy} 47 / 56
  • 161.
    Example: Frequent Itemsets Basedin the Market-Basket Model 1 On the one hand, we have items. 2 On the other we have baskets, sometimes called “transactions.” 1 Each basket consists of a set of items (an itemset) 2 They are small. Examples 1 {Cat, and, dog, bites} 2 {Yahoo, news, claims, cat, dog, and, produced, viable, offspring} 3 {Cat, killer, likely, is, a, big, dog} 4 {Professional, free, advice, on, dog, training, puppy} 47 / 56
  • 162.
    Example: Frequent Itemsets Then,we do the following Transaction ID Cat Dog and a mated 1 1 1 1 0 0 2 1 1 1 1 1 3 1 1 0 1 0 4 0 1 0 0 0 48 / 56
  • 163.
    Combinatorial Problem Problem How manysubsets we have? But we can do the following Given the itemset x in a database D and a set of transactions {ti }i∈I supp(x, D) = |{ti ∈ D|x ∈ ti }| (2) Then, setting a threshold How many frequent (supp(x, D) > ) itemsets? 49 / 56
  • 164.
    Combinatorial Problem Problem How manysubsets we have? But we can do the following Given the itemset x in a database D and a set of transactions {ti }i∈I supp(x, D) = |{ti ∈ D|x ∈ ti }| (2) Then, setting a threshold How many frequent (supp(x, D) > ) itemsets? 49 / 56
  • 165.
    Combinatorial Problem Problem How manysubsets we have? But we can do the following Given the itemset x in a database D and a set of transactions {ti }i∈I supp(x, D) = |{ti ∈ D|x ∈ ti }| (2) Then, setting a threshold How many frequent (supp(x, D) > ) itemsets? 49 / 56
  • 166.
    Outline 1 Why arewe interested in Analyzing Data? Intuitive Definition: The 3V’s Complexity Data Everywhere 2 Machine Learning Machine Learning Process Features Classification Clustering Analysis 3 Data Mining Definition Applications Example: Frequent Itemsets 4 Hardware Support ASICS GPU’s 5 Projects What projects can you do? 50 / 56
  • 167.
    Hardware Solutions: ASICS Application-SpecificIntegrated Circuit (ASIC) An ASIC is an integrated circuit customized for a particular use, rather than intended for general-purpose use. It allows for 1 Lower Power Consumption. 2 Better Colling Approaches. Example: From Microsoft Research 51 / 56
  • 168.
    Hardware Solutions: ASICS Application-SpecificIntegrated Circuit (ASIC) An ASIC is an integrated circuit customized for a particular use, rather than intended for general-purpose use. It allows for 1 Lower Power Consumption. 2 Better Colling Approaches. Example: From Microsoft Research 51 / 56
  • 169.
    Hardware Solutions: ASICS Application-SpecificIntegrated Circuit (ASIC) An ASIC is an integrated circuit customized for a particular use, rather than intended for general-purpose use. It allows for 1 Lower Power Consumption. 2 Better Colling Approaches. Example: From Microsoft Research 51 / 56
  • 170.
    Outline 1 Why arewe interested in Analyzing Data? Intuitive Definition: The 3V’s Complexity Data Everywhere 2 Machine Learning Machine Learning Process Features Classification Clustering Analysis 3 Data Mining Definition Applications Example: Frequent Itemsets 4 Hardware Support ASICS GPU’s 5 Projects What projects can you do? 52 / 56
  • 171.
    Hardware Solutions: GPU’s IDEAS Basedon CUDA parallel computing architecture from Nvidia Emphasis on executing many concurrent LIGHT threads instead of one HEAVY thread as in CPUs Hardware for 8800 53 / 56
  • 172.
    Advantages Massively parallel Hundreds ofcores, millions of threads High throughput Limitations May not be applicable for all tasks Generic hardware (CPUs) closing the gap 54 / 56
  • 173.
    Outline 1 Why arewe interested in Analyzing Data? Intuitive Definition: The 3V’s Complexity Data Everywhere 2 Machine Learning Machine Learning Process Features Classification Clustering Analysis 3 Data Mining Definition Applications Example: Frequent Itemsets 4 Hardware Support ASICS GPU’s 5 Projects What projects can you do? 55 / 56
  • 174.
    Projects Possible topic are: Oilexploration detection. Association Rule Preprocessing Project. Neural Network-Based Financial Market Forecasting Project. Page Ranking - Improving over the Google Matrix Influence Maximization in Social Networks. Web Word Relevance Measures. Recommendation Systems. There are more possibilities at https://www.kaggle.com/competitions 56 / 56
  • 175.
    Projects Possible topic are: Oilexploration detection. Association Rule Preprocessing Project. Neural Network-Based Financial Market Forecasting Project. Page Ranking - Improving over the Google Matrix Influence Maximization in Social Networks. Web Word Relevance Measures. Recommendation Systems. There are more possibilities at https://www.kaggle.com/competitions 56 / 56
  • 176.
    Projects Possible topic are: Oilexploration detection. Association Rule Preprocessing Project. Neural Network-Based Financial Market Forecasting Project. Page Ranking - Improving over the Google Matrix Influence Maximization in Social Networks. Web Word Relevance Measures. Recommendation Systems. There are more possibilities at https://www.kaggle.com/competitions 56 / 56
  • 177.
    Projects Possible topic are: Oilexploration detection. Association Rule Preprocessing Project. Neural Network-Based Financial Market Forecasting Project. Page Ranking - Improving over the Google Matrix Influence Maximization in Social Networks. Web Word Relevance Measures. Recommendation Systems. There are more possibilities at https://www.kaggle.com/competitions 56 / 56
  • 178.
    Projects Possible topic are: Oilexploration detection. Association Rule Preprocessing Project. Neural Network-Based Financial Market Forecasting Project. Page Ranking - Improving over the Google Matrix Influence Maximization in Social Networks. Web Word Relevance Measures. Recommendation Systems. There are more possibilities at https://www.kaggle.com/competitions 56 / 56
  • 179.
    Projects Possible topic are: Oilexploration detection. Association Rule Preprocessing Project. Neural Network-Based Financial Market Forecasting Project. Page Ranking - Improving over the Google Matrix Influence Maximization in Social Networks. Web Word Relevance Measures. Recommendation Systems. There are more possibilities at https://www.kaggle.com/competitions 56 / 56
  • 180.
    Projects Possible topic are: Oilexploration detection. Association Rule Preprocessing Project. Neural Network-Based Financial Market Forecasting Project. Page Ranking - Improving over the Google Matrix Influence Maximization in Social Networks. Web Word Relevance Measures. Recommendation Systems. There are more possibilities at https://www.kaggle.com/competitions 56 / 56
  • 181.
    Projects Possible topic are: Oilexploration detection. Association Rule Preprocessing Project. Neural Network-Based Financial Market Forecasting Project. Page Ranking - Improving over the Google Matrix Influence Maximization in Social Networks. Web Word Relevance Measures. Recommendation Systems. There are more possibilities at https://www.kaggle.com/competitions 56 / 56