3. MACHINE
LEARNING Data Mining
An interdisciplinary field
DATABASE DATA PATTERN
SYSTEMS MINING RECOGNITION
INFORMATION
STATISTICS
SCIENCE
“Extracting Knowledge from the Data”
4. CRISP-DM
CRoss Industry
Standard Process
for Data Mining
SIX
Phases
http://www.crisp-dm.org/ founded in 1996
9. General Purpose computation using GPU in
applications “other than 3D graphics”
Flexible and programmable
it fully supports vectorized floating
point operations at IEEE single
precision
additional levels of programmability
are emerging with every generation of
GPU (about every 18 months)
an attractive platform for general-
purpose computation
10.
11. Thread block
“a batch of threads that can
cooperate together by
efficiently sharing data
through some fast shared
memory and synchronizing
their execution to coordinate
memory accesses.”
Example of Block ID:
A block (x,y) of a grid of
DIM(X,Y) has block ID
(x + y.X)
12.
13.
14. Data Mining on Cloud
(Nov 22nd ‘10)
SVM
GPU Miner for Estimation of
http://code.google.com/p/gpuminer/ Aqueous Solubility
15.
16. An itemset is
frequent if its
support is not less
than a threshold
specified by users
Thresholds:
Minimum Confidence (in %): bond between the items of an itemset
Minimum Support Count (in Numbers): how many times an itemset
occur in the database
17. “if an itemset is not frequent, any of its
superset is never frequent”
Proposed by Agrawal & Srikant
@ VLDB’94
An influential algorithm for mining frequent itemsets for association rules.
25. o We have presented a GPU-based implementation of Apriori algorithm for
frequent itemset mining.
o This implementation employs a bitmap data structure to encode the
transaction database on the GPU and utilize the GPU's SIMD parallelism for
support counting.
o Our implementation stores the itemsets in a bitmap, and runs entirely on the
GPU.