SlideShare a Scribd company logo
Introduction to Datamining
using Practical View
Created : Ngô Tùng Sơn
Part 1
Schedule:
1. Example of Datamining
2. What and Where is Datamining in the System
3. Datamining Techniques
 Data preprocessing
 Data Analysis
 Data Visualization
How data look like?
X Y
3 3
3 1
2 2
4 6
2 3
6 7
7 5
5 6
Can we get some thing from this?
The row represents
an object and its
columns represent
its attributes
Ex: can we identify the group of these objects? YES
1. Example of Datamining
Now, forget the table, consider a row as a point then we have
0
2
4
6
8
0 2 4 6 8
X
Y
B
A
C
From each data point, we find its neighbors by scanning with a radius r .
For Example : A will have 2 Neighbors B and C , denoted: A{B,C}
r
D
A and D have same neighbors so they are considered as neighbors
Same for B {A,B,C,D} ,C{A,B,C,D}, D{B,C}
The points have neighborhood will be in the same group.
1. Example of Datamining
Finally we have 2 groups after considering all points
0
2
4
6
8
0 2 4 6 8
X
Y
What do we see here?
Data has not been classified into groups but we now have the groups
This is just an example of technique called CLUSTERING in DATAMINING
1. Example of Datamining
2. What and Where is Datamining in the System
So. What exactly is Datamining?
Datamining is the set of tools and techniques to retrieve
hidden Knowledge/Rules from data
The name of datamining could make us to misunderstand
Data was there, we do not need to ‘mining’ it
For ore mining you need hammers and shovels 
However, for datamining you need mathematic, statistic and
probability, machine learning, computer programming,
database techniques,...
2. What and Where is Datamining in the System
Where is Datamining in the system?
Employee/Staff
Day by day, The staff using the software (Web/
Desktop/Mobile application) to generate data by recording
all of his/her business activities (customers, products,
order detail, contracts ,…) Database
Data is added to Database
Online transaction processing (OLTP)
Database
Database
….
Data from several data sources (OLTP) will be collected to a common repository
Data
warehouse
Integration
Service
Datamining service will access to the Data warehouse to process
Data Mining
3. Datamining Techniques
What are the techniques in Datamining?
There are so many techniques can be applied in datamining
Basically we can classify them into 3 groups / phases
Data-Preprocessing
Data Analysis
Data Presentation
3. Datamining Techniques
Data-Preprocessing
3. Datamining Techniques
We can understand that:
The quality of collected data would be not good.
It is necessary to clean / format / transform .... Before analyzing
This is very important process. It is very hard to find an
abstract way to describe.
Data-Preprocessing
Here we will see few examples of data pre-processing
techniques:
• Similarity Measure
• Down Sampling
• Dimension Reduction
• Vectorization
3. Datamining Techniques
How can we know which object are similar?
Data-Preprocessing Similarity Measure
A(x1,y1)
B(x2,y2)
C(x1,y1)
D2D1
Measure the distance between AB and AC
We see that D1 < D2 -> A is more similar with B than C
Every point can be represented as vector. Measure the angle between
pair of vectors: A and B, then A and C
We see that 𝜶 < 𝜷 -> A is more similar with B than C
𝜶
𝜷
3. Datamining Techniques
What if, you have so many data, performing data analysis on all
of them may be not necessary and reducing performance ?
Data-Preprocessing Down Sampling
Just pick some of them to evaluate
Example: using a cell-size of 𝑔. Keep only object / cell
𝑔
𝑔
Origin Data Down Sampling
3. Datamining Techniques
All example data have been presented to you are in 2
dimensions, 2 attributes (X,Y) . What if it was ~10.000 attributes
for each object
Data-Preprocessing Dimension Reduction
This could reduce the performance (and or accuracy) of data-
analysis algorithms . Somehow we need to reduce number of
dimensions
Principal component Analysis & Singular value Decomposition
are 2 of most effective methods to do this
3. Datamining Techniques
Data-Preprocessing Dimension Reduction - PCA
PCA
X
Y
𝑃1
𝑃2
Origin Data Data projected to Principal Components
We Only keep 𝑘 Principal Components that have highest eigenvalues. On above
example. We can let 𝑘 = 1 then keep 𝑃1 instead of both 𝑃1 , 𝑃2
By this way the number of dimensions has been reduced
3. Datamining Techniques
Data-Preprocessing Vectorization
Most of Data Analysis algorithms consider the input as set of
vectors, so we need to transform the collected data into set of
vectors.
Ex: Giving a document: “Mr A has not passed the exam this
year. He will do it again next year”
Some of important words will be extracted like “Mr A” , “not” ,
“pass” ,”exam” , “again” , “next” , “year”
Measure the frequency of each word, we get the vector that
represent the document
Mr A not pass exam again next year
1 1 1 1 1 1 2
3. Datamining Techniques
Data Analysis
3. Datamining Techniques
There are so many techniques in this phase:
• Clustering
• Classification
• Regression
• Rule Bases
• ….
This is the most important phase, where we find all of
hidden knowledge/ rules in the data
Data Analysis
3. Datamining Techniques
The process of clustering is to find ways to group objects
into groups (clusters)
Data Analysis Clustering
The objects in the same cluster are similar and otherwise
they are not similar.
There are 2 types of clustering : Partional & Hierarchical
In this presentation: we see an example of the most famous
clustering method : K-Mean
3. Datamining Techniques
Data Analysis Clustering – K mean Algorithm
1. Randomly select K center (centroid) for K clusters (cluster).
2. Calculate the distance between objects (objects) to the K center
3. Group objects to the nearest group
4. Defining the new focus for the group
5. Repeat step 2 until no change of subject groups
3. Datamining Techniques
Data Analysis Clustering – K mean Algorithm
Consider the below data
Plot them we have:
3. Datamining Techniques
Data Analysis Clustering – K mean Algorithm
Select K=2 centroids Compute the new position of
centroids
Finally centroids stop changing
The object belongs to the group of
its closest centroid
The key point of algorithm is to
select a good k
3. Datamining Techniques
Data Analysis Classification
How can we identify the group of unclassified object ?
Sure! we can perform clustering to do this.
However, what if we know some of classified objects in
the past? Can we do better than Clustering? YES.
We can construct a prediction model to predict the group
of unclassified objects based on the classified objects
This process called CLASSIFICATION
3. Datamining Techniques
Data Analysis Classification
The process of Classification can be described as below
Learning
Algorithm
Model
3. Datamining Techniques
Data Analysis Classification - SVM
Support Vector Machine (SVM) is one of famous classification
method. It belongs to group of linear classifiers
For example: data classified in red and blue Training Data
𝑤 : normal vector
𝑏 : bias / distance from the line to origin
?
𝑥
𝑦 𝑤 + 𝑏 > 0 → blue
Classification Model?
𝑥
𝑦 𝑤 + 𝑏 < 0 → red
3. Datamining Techniques
Data Analysis Regression
Use for prediction: but to predict the missing value of an
attribute
For example:
Y
X𝑥𝑖
𝑦𝑖
• How to find 𝑦𝑖 , if 𝑥𝑖 known?
• We can estimate the line
that describe The data
• Plug 𝑥𝑖 to line equation to
Find 𝑦𝑖
• This is just an example of
Linear Regression
3. Datamining Techniques
Data Analysis Rule Base
Rule Base techniques : to find hidden patterns in the data
Example of rule base techniques:
• Customer normally buy rice always buy vegetable
• Young people want to more expensive phone than others
• People always buy laptop before buying cell-phone
Frequent Pattern
Gradual Pattern
Sequential Pattern
3. Datamining Techniques
Data Visualization
3. Datamining Techniques
Data Visualization
Techniques to present knowledge that you retrieved to user
0
2
4
6
8
10
12
14
Series 3
Series 2
Series 1
Series 1 Series 2 Series 3
Category
1 4.3 2.4 2
Category
2 2.5 4.4 2
Category
3 3.5 1.8 3
Category
4 4.5 2.8 5
Thank you for your attention

More Related Content

What's hot

Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
Hadi Fadlallah
 
Data mining
Data miningData mining
Data mining
Hoang Nguyen
 
Data mining an introduction
Data mining an introductionData mining an introduction
Data mining an introduction
Dr-Dipali Meher
 
Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques
Houw Liong The
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
Salah Amean
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
Devakumar Jain
 
An introduction to data mining and its techniques
An introduction to data mining and its techniquesAn introduction to data mining and its techniques
An introduction to data mining and its techniques
Sandhya Tarwani
 
Data Mining
Data MiningData Mining
Data Mining
solairajAnandappan
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
Valerii Klymchuk
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining Phi Jack
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
error007
 
Data mining
Data mining Data mining
Data mining
AthiraR23
 
Data mining
Data miningData mining
Data mining
pradeepa n
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data mining
Er. Nawaraj Bhandari
 
Additional themes of data mining for Msc CS
Additional themes of data mining for Msc CSAdditional themes of data mining for Msc CS
Additional themes of data mining for Msc CS
Thanveen
 
Data mining and its applications!
Data mining and its applications!Data mining and its applications!
Data mining and its applications!
COSTARCH Analytical Consulting (P) Ltd.
 
Data mining
Data miningData mining
Data mining
Daminda Herath
 
3 Data Mining Tasks
3  Data Mining Tasks3  Data Mining Tasks
3 Data Mining Tasks
Mahmoud Alfarra
 
Key Principles Of Data Mining
Key Principles Of Data MiningKey Principles Of Data Mining
Key Principles Of Data Mining
tobiemuir
 

What's hot (20)

Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
 
Data mining
Data miningData mining
Data mining
 
Data mining an introduction
Data mining an introductionData mining an introduction
Data mining an introduction
 
Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
 
An introduction to data mining and its techniques
An introduction to data mining and its techniquesAn introduction to data mining and its techniques
An introduction to data mining and its techniques
 
Data Mining
Data MiningData Mining
Data Mining
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
 
Data mining
Data mining Data mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data mining
 
Additional themes of data mining for Msc CS
Additional themes of data mining for Msc CSAdditional themes of data mining for Msc CS
Additional themes of data mining for Msc CS
 
Data Mining
Data MiningData Mining
Data Mining
 
Data mining and its applications!
Data mining and its applications!Data mining and its applications!
Data mining and its applications!
 
Data mining
Data miningData mining
Data mining
 
3 Data Mining Tasks
3  Data Mining Tasks3  Data Mining Tasks
3 Data Mining Tasks
 
Key Principles Of Data Mining
Key Principles Of Data MiningKey Principles Of Data Mining
Key Principles Of Data Mining
 

Viewers also liked

Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
snoreen
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Jason Rodrigues
 
Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesSaif Ullah
 
Approaches to Mining Large-Scale Heterogeneous Data: Old and New
Approaches to Mining Large-Scale Heterogeneous Data: Old and NewApproaches to Mining Large-Scale Heterogeneous Data: Old and New
Approaches to Mining Large-Scale Heterogeneous Data: Old and New
Center for Transportation Research - UT Austin
 
Ethics In DW &amp; DM
Ethics In DW &amp; DMEthics In DW &amp; DM
Ethics In DW &amp; DM
abethan
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
Justin Cletus
 
Datamining with R
Datamining with RDatamining with R
Datamining with R
Shitalkumar Sukhdeve
 
Digital footprints& datamining
Digital footprints& dataminingDigital footprints& datamining
Digital footprints& datamining
Paige Jaeger
 
Data Mining using Weka
Data Mining using WekaData Mining using Weka
Data Mining using Weka
Shashidhar Shenoy
 
Data mining project presentation
Data mining project presentationData mining project presentation
Data mining project presentationKaiwen Qi
 
Datamining
DataminingDatamining
Datamining
Yaman Çakmaklar
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementation
Sandip Tipayle Patil
 
Introduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseIntroduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in Database
Kartik Kalpande Patil
 
Application of data mining
Application of data miningApplication of data mining
Application of data mining
SHIVANI SONI
 
Knowledge Discovery in Databases
Knowledge Discovery in DatabasesKnowledge Discovery in Databases
Knowledge Discovery in Databases
Diwas Kandel
 
Weka presentation
Weka presentationWeka presentation
Weka presentationSaeed Iqbal
 
Kdd process
Kdd processKdd process
Kdd process
Rajesh Chandra
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
Krish_ver2
 

Viewers also liked (20)

Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 
Approaches to Mining Large-Scale Heterogeneous Data: Old and New
Approaches to Mining Large-Scale Heterogeneous Data: Old and NewApproaches to Mining Large-Scale Heterogeneous Data: Old and New
Approaches to Mining Large-Scale Heterogeneous Data: Old and New
 
Ethics In DW &amp; DM
Ethics In DW &amp; DMEthics In DW &amp; DM
Ethics In DW &amp; DM
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Datamining with R
Datamining with RDatamining with R
Datamining with R
 
Digital footprints& datamining
Digital footprints& dataminingDigital footprints& datamining
Digital footprints& datamining
 
Data Mining using Weka
Data Mining using WekaData Mining using Weka
Data Mining using Weka
 
Data mining project presentation
Data mining project presentationData mining project presentation
Data mining project presentation
 
Datamining
DataminingDatamining
Datamining
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementation
 
Introduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseIntroduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in Database
 
Application of data mining
Application of data miningApplication of data mining
Application of data mining
 
Knowledge Discovery in Databases
Knowledge Discovery in DatabasesKnowledge Discovery in Databases
Knowledge Discovery in Databases
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Weka presentation
Weka presentationWeka presentation
Weka presentation
 
Kdd process
Kdd processKdd process
Kdd process
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
 

Similar to Introduction to Datamining Concept and Techniques

Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-steps
Shesha R
 
5. Machine Learning.pptx
5.  Machine Learning.pptx5.  Machine Learning.pptx
5. Machine Learning.pptx
ssuser6654de1
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dmsumit621
 
Machine_Learning_Trushita
Machine_Learning_TrushitaMachine_Learning_Trushita
Machine_Learning_Trushita
Trushita Redij
 
Singular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptxSingular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptx
rajalakshmi5921
 
EDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptxEDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptx
rajalakshmi5921
 
Machine Learning.pptx
Machine Learning.pptxMachine Learning.pptx
Machine Learning.pptx
NitinSharma134320
 
Current clustering techniques
Current clustering techniquesCurrent clustering techniques
Current clustering techniques
Poonam Kshirsagar
 
Intro to Data warehousing lecture 17
Intro to Data warehousing   lecture 17Intro to Data warehousing   lecture 17
Intro to Data warehousing lecture 17
AnwarrChaudary
 
Data Reduction
Data ReductionData Reduction
Data Reduction
Rajan Shah
 
07 learning
07 learning07 learning
07 learning
ankit_ppt
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
jagan477830
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
Ujjawal
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptx
iamultapromax
 
Machine learning introduction
Machine learning introductionMachine learning introduction
Machine learning introduction
Anas Jamil
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET Journal
 
Data reduction
Data reductionData reduction
Data reduction
GowriLatha1
 
DATA MINING.pptx
DATA MINING.pptxDATA MINING.pptx
DATA MINING.pptx
Dipankar Boruah
 

Similar to Introduction to Datamining Concept and Techniques (20)

Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-steps
 
5. Machine Learning.pptx
5.  Machine Learning.pptx5.  Machine Learning.pptx
5. Machine Learning.pptx
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
 
Machine_Learning_Trushita
Machine_Learning_TrushitaMachine_Learning_Trushita
Machine_Learning_Trushita
 
Singular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptxSingular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptx
 
EDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptxEDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptx
 
Machine Learning.pptx
Machine Learning.pptxMachine Learning.pptx
Machine Learning.pptx
 
Current clustering techniques
Current clustering techniquesCurrent clustering techniques
Current clustering techniques
 
Intro to Data warehousing lecture 17
Intro to Data warehousing   lecture 17Intro to Data warehousing   lecture 17
Intro to Data warehousing lecture 17
 
Data Reduction
Data ReductionData Reduction
Data Reduction
 
07 learning
07 learning07 learning
07 learning
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptx
 
Machine learning introduction
Machine learning introductionMachine learning introduction
Machine learning introduction
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data Mining
 
Data1
Data1Data1
Data1
 
Data1
Data1Data1
Data1
 
Data reduction
Data reductionData reduction
Data reduction
 
DATA MINING.pptx
DATA MINING.pptxDATA MINING.pptx
DATA MINING.pptx
 

Recently uploaded

1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
deeptiverma2406
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
thanhdowork
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
gb193092
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
Scholarhat
 

Recently uploaded (20)

1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
 

Introduction to Datamining Concept and Techniques

  • 1. Introduction to Datamining using Practical View Created : Ngô Tùng Sơn Part 1
  • 2. Schedule: 1. Example of Datamining 2. What and Where is Datamining in the System 3. Datamining Techniques  Data preprocessing  Data Analysis  Data Visualization
  • 3. How data look like? X Y 3 3 3 1 2 2 4 6 2 3 6 7 7 5 5 6 Can we get some thing from this? The row represents an object and its columns represent its attributes Ex: can we identify the group of these objects? YES 1. Example of Datamining
  • 4. Now, forget the table, consider a row as a point then we have 0 2 4 6 8 0 2 4 6 8 X Y B A C From each data point, we find its neighbors by scanning with a radius r . For Example : A will have 2 Neighbors B and C , denoted: A{B,C} r D A and D have same neighbors so they are considered as neighbors Same for B {A,B,C,D} ,C{A,B,C,D}, D{B,C} The points have neighborhood will be in the same group. 1. Example of Datamining
  • 5. Finally we have 2 groups after considering all points 0 2 4 6 8 0 2 4 6 8 X Y What do we see here? Data has not been classified into groups but we now have the groups This is just an example of technique called CLUSTERING in DATAMINING 1. Example of Datamining
  • 6. 2. What and Where is Datamining in the System So. What exactly is Datamining? Datamining is the set of tools and techniques to retrieve hidden Knowledge/Rules from data The name of datamining could make us to misunderstand Data was there, we do not need to ‘mining’ it For ore mining you need hammers and shovels  However, for datamining you need mathematic, statistic and probability, machine learning, computer programming, database techniques,...
  • 7. 2. What and Where is Datamining in the System Where is Datamining in the system? Employee/Staff Day by day, The staff using the software (Web/ Desktop/Mobile application) to generate data by recording all of his/her business activities (customers, products, order detail, contracts ,…) Database Data is added to Database Online transaction processing (OLTP) Database Database …. Data from several data sources (OLTP) will be collected to a common repository Data warehouse Integration Service Datamining service will access to the Data warehouse to process Data Mining
  • 8. 3. Datamining Techniques What are the techniques in Datamining? There are so many techniques can be applied in datamining Basically we can classify them into 3 groups / phases Data-Preprocessing Data Analysis Data Presentation
  • 10. 3. Datamining Techniques We can understand that: The quality of collected data would be not good. It is necessary to clean / format / transform .... Before analyzing This is very important process. It is very hard to find an abstract way to describe. Data-Preprocessing Here we will see few examples of data pre-processing techniques: • Similarity Measure • Down Sampling • Dimension Reduction • Vectorization
  • 11. 3. Datamining Techniques How can we know which object are similar? Data-Preprocessing Similarity Measure A(x1,y1) B(x2,y2) C(x1,y1) D2D1 Measure the distance between AB and AC We see that D1 < D2 -> A is more similar with B than C Every point can be represented as vector. Measure the angle between pair of vectors: A and B, then A and C We see that 𝜶 < 𝜷 -> A is more similar with B than C 𝜶 𝜷
  • 12. 3. Datamining Techniques What if, you have so many data, performing data analysis on all of them may be not necessary and reducing performance ? Data-Preprocessing Down Sampling Just pick some of them to evaluate Example: using a cell-size of 𝑔. Keep only object / cell 𝑔 𝑔 Origin Data Down Sampling
  • 13. 3. Datamining Techniques All example data have been presented to you are in 2 dimensions, 2 attributes (X,Y) . What if it was ~10.000 attributes for each object Data-Preprocessing Dimension Reduction This could reduce the performance (and or accuracy) of data- analysis algorithms . Somehow we need to reduce number of dimensions Principal component Analysis & Singular value Decomposition are 2 of most effective methods to do this
  • 14. 3. Datamining Techniques Data-Preprocessing Dimension Reduction - PCA PCA X Y 𝑃1 𝑃2 Origin Data Data projected to Principal Components We Only keep 𝑘 Principal Components that have highest eigenvalues. On above example. We can let 𝑘 = 1 then keep 𝑃1 instead of both 𝑃1 , 𝑃2 By this way the number of dimensions has been reduced
  • 15. 3. Datamining Techniques Data-Preprocessing Vectorization Most of Data Analysis algorithms consider the input as set of vectors, so we need to transform the collected data into set of vectors. Ex: Giving a document: “Mr A has not passed the exam this year. He will do it again next year” Some of important words will be extracted like “Mr A” , “not” , “pass” ,”exam” , “again” , “next” , “year” Measure the frequency of each word, we get the vector that represent the document Mr A not pass exam again next year 1 1 1 1 1 1 2
  • 17. 3. Datamining Techniques There are so many techniques in this phase: • Clustering • Classification • Regression • Rule Bases • …. This is the most important phase, where we find all of hidden knowledge/ rules in the data Data Analysis
  • 18. 3. Datamining Techniques The process of clustering is to find ways to group objects into groups (clusters) Data Analysis Clustering The objects in the same cluster are similar and otherwise they are not similar. There are 2 types of clustering : Partional & Hierarchical In this presentation: we see an example of the most famous clustering method : K-Mean
  • 19. 3. Datamining Techniques Data Analysis Clustering – K mean Algorithm 1. Randomly select K center (centroid) for K clusters (cluster). 2. Calculate the distance between objects (objects) to the K center 3. Group objects to the nearest group 4. Defining the new focus for the group 5. Repeat step 2 until no change of subject groups
  • 20. 3. Datamining Techniques Data Analysis Clustering – K mean Algorithm Consider the below data Plot them we have:
  • 21. 3. Datamining Techniques Data Analysis Clustering – K mean Algorithm Select K=2 centroids Compute the new position of centroids Finally centroids stop changing The object belongs to the group of its closest centroid The key point of algorithm is to select a good k
  • 22. 3. Datamining Techniques Data Analysis Classification How can we identify the group of unclassified object ? Sure! we can perform clustering to do this. However, what if we know some of classified objects in the past? Can we do better than Clustering? YES. We can construct a prediction model to predict the group of unclassified objects based on the classified objects This process called CLASSIFICATION
  • 23. 3. Datamining Techniques Data Analysis Classification The process of Classification can be described as below Learning Algorithm Model
  • 24. 3. Datamining Techniques Data Analysis Classification - SVM Support Vector Machine (SVM) is one of famous classification method. It belongs to group of linear classifiers For example: data classified in red and blue Training Data 𝑤 : normal vector 𝑏 : bias / distance from the line to origin ? 𝑥 𝑦 𝑤 + 𝑏 > 0 → blue Classification Model? 𝑥 𝑦 𝑤 + 𝑏 < 0 → red
  • 25. 3. Datamining Techniques Data Analysis Regression Use for prediction: but to predict the missing value of an attribute For example: Y X𝑥𝑖 𝑦𝑖 • How to find 𝑦𝑖 , if 𝑥𝑖 known? • We can estimate the line that describe The data • Plug 𝑥𝑖 to line equation to Find 𝑦𝑖 • This is just an example of Linear Regression
  • 26. 3. Datamining Techniques Data Analysis Rule Base Rule Base techniques : to find hidden patterns in the data Example of rule base techniques: • Customer normally buy rice always buy vegetable • Young people want to more expensive phone than others • People always buy laptop before buying cell-phone Frequent Pattern Gradual Pattern Sequential Pattern
  • 28. 3. Datamining Techniques Data Visualization Techniques to present knowledge that you retrieved to user 0 2 4 6 8 10 12 14 Series 3 Series 2 Series 1 Series 1 Series 2 Series 3 Category 1 4.3 2.4 2 Category 2 2.5 4.4 2 Category 3 3.5 1.8 3 Category 4 4.5 2.8 5
  • 29. Thank you for your attention