Md. Sanzidul Islam
M.Sc. In CSE, DIU
ID: 211-25-953
There is a huge amount of data available in the
Information Industry. This data is of no use until it is
converted into useful information. It is necessary to
analyze this huge amount of data and extract useful
information from it.
Extraction of information is not only the single process,
data mining also involves other processes such as Data
Cleaning, Data Integration, Data Transformation, Data
Mining, Pattern Evaluation and Data Presentation.
Once all these processes are over, we would be able to use
this information in many applications such as Fraud
detection, Market analysis, Science exploration, etc.
 What is Data Mining?
 Why Data Mining?
 What is KDD Process?
 On What Kind of Data?
 Data Mining Techniques
 Data Mining Query Language
 Applications of Data Mining
Extraction of interesting
Patterns or Knowledge
from huge amount of data
(Knowledge Discovery
from Data)
One of the Step from KDD
process
The Explosive Growth of Data: from
terabytes to petabytes
We are drowning in data, but starving for
knowledge!
Fraud detection and detection of unusual
patterns
 Data cleaning
to remove noise and inconsistent data
Data integration
where multiple data sources may be combined
Data selection
Related Data
Data transformation
Unified format
 Data mining
Extract Patterns
Pattern evaluation
to identify the truly interesting patterns
representing knowledge
Knowledge presentation
Present the mined knowledge to the user
Relational Databases
Collection of tables
Data Warehouses
Data from different sources
Transactional Databases
Consists of a file where each record represent
transactions
Advanced Data &Applications
 Multimedia, Spatial data and WWW
 Classification
 Clustering
 Regression
 Association Rules
Classification is the process of predicting the class
of a new item.
Therefore to classify the new item and identify to
which class it belongs
Group Data into Clusters
Similar data is grouped in the same cluster
Dissimilar data is grouped in the same cluster
 “Regression deals with the
prediction of a value, rather
than a class.”
 Regression is a data mining
function that predicts a number
 For example, a regression
model could be used to predict
children's height, given their
age, weight, and other factors.
“An association algorithm creates
rules that describe how often events
have occurred together.”
Example: When a customer buys a
Computer, then 90% of the time
they will buy softwares.
A DMQL can provide the ability to supportinteractive
data mining.
Adopts SQL-like syntax
Hence, can be easily integrated with relational query
languages
Market BasketAnalysis
 Market basket analysis is a modeling technique based upon a theorythat
if you buy a certain group of items you are more likely to buy another
group of items.
 This information may help the retailer to know the buyer’s needsand
retailer can enhance the store’s layout
Bio Informatics
 Mining biological data helps to extract useful knowledge frommassive
datasets gathered in biology, and in other related life sciences areas
 Applications of data mining to bioinformaticsinclude
gene finding, protein function inference, disease diagnosis,disease
treatment
Education
Data mining can be used by an institution to take accurate
decisions and also to predict the results of the student.
Learning pattern of the students can be captured and used to
develop techniques to teach them.
Customer Relationships Management (CRM)
To maintain a proper relationship with a customer a business
need to collect data and analyze the information.
With data mining technologies the collected data can be
used for analysis.

Data Mining Techniques

  • 1.
    Md. Sanzidul Islam M.Sc.In CSE, DIU ID: 211-25-953
  • 2.
    There is ahuge amount of data available in the Information Industry. This data is of no use until it is converted into useful information. It is necessary to analyze this huge amount of data and extract useful information from it. Extraction of information is not only the single process, data mining also involves other processes such as Data Cleaning, Data Integration, Data Transformation, Data Mining, Pattern Evaluation and Data Presentation. Once all these processes are over, we would be able to use this information in many applications such as Fraud detection, Market analysis, Science exploration, etc.
  • 3.
     What isData Mining?  Why Data Mining?  What is KDD Process?  On What Kind of Data?  Data Mining Techniques  Data Mining Query Language  Applications of Data Mining
  • 4.
    Extraction of interesting Patternsor Knowledge from huge amount of data (Knowledge Discovery from Data) One of the Step from KDD process
  • 5.
    The Explosive Growthof Data: from terabytes to petabytes We are drowning in data, but starving for knowledge! Fraud detection and detection of unusual patterns
  • 6.
     Data cleaning toremove noise and inconsistent data Data integration where multiple data sources may be combined Data selection Related Data Data transformation Unified format
  • 7.
     Data mining ExtractPatterns Pattern evaluation to identify the truly interesting patterns representing knowledge Knowledge presentation Present the mined knowledge to the user
  • 9.
    Relational Databases Collection oftables Data Warehouses Data from different sources Transactional Databases Consists of a file where each record represent transactions Advanced Data &Applications  Multimedia, Spatial data and WWW
  • 10.
     Classification  Clustering Regression  Association Rules
  • 11.
    Classification is theprocess of predicting the class of a new item. Therefore to classify the new item and identify to which class it belongs
  • 12.
    Group Data intoClusters Similar data is grouped in the same cluster Dissimilar data is grouped in the same cluster
  • 13.
     “Regression dealswith the prediction of a value, rather than a class.”  Regression is a data mining function that predicts a number  For example, a regression model could be used to predict children's height, given their age, weight, and other factors.
  • 14.
    “An association algorithmcreates rules that describe how often events have occurred together.” Example: When a customer buys a Computer, then 90% of the time they will buy softwares.
  • 15.
    A DMQL canprovide the ability to supportinteractive data mining. Adopts SQL-like syntax Hence, can be easily integrated with relational query languages
  • 16.
    Market BasketAnalysis  Marketbasket analysis is a modeling technique based upon a theorythat if you buy a certain group of items you are more likely to buy another group of items.  This information may help the retailer to know the buyer’s needsand retailer can enhance the store’s layout Bio Informatics  Mining biological data helps to extract useful knowledge frommassive datasets gathered in biology, and in other related life sciences areas  Applications of data mining to bioinformaticsinclude gene finding, protein function inference, disease diagnosis,disease treatment
  • 17.
    Education Data mining canbe used by an institution to take accurate decisions and also to predict the results of the student. Learning pattern of the students can be captured and used to develop techniques to teach them. Customer Relationships Management (CRM) To maintain a proper relationship with a customer a business need to collect data and analyze the information. With data mining technologies the collected data can be used for analysis.