INTRODUCTION TO
DATA MINING
Samrat Devidas Tayade
TE IT
ARMIET
prerequisites
• Knowledge of databases
• Data warehousing
• Olap
Knowledge of databases
• Database : A database is an organized collection of data, generally
stored and accessed electronically from a computer system.
Where databases are more complex they are often developed
using formal design and modeling techniques.
• Database Management System (DBMS) – add, remove, update
records – retrieve data that match certain criteria – cross-reference
data in different tables – perform complex aggregate calculation •
Database consists of columns (attributes) and rows (records).
Data warehousing
Data warehousing is the process of
constructing and using a data
warehouse. A data warehouse is
constructed by integrating data from
multiple heterogeneous sources that
support analytical reporting,
structured and/or ad hoc queries, and
decision making. Data warehousing
involves data cleaning, data
integration, and data consolidations.
OLAP
Online Analytical Processing Server (OLAP) is based on the
multidimensional data model. It allows managers, and analysts to get
an insight of the information through fast, consistent, and interactive
access to information.
TYPES OF OLAP :
1. Relational OLAP (ROLAP)
2. Multidimensional OLAP (MOLAP)
3. Hybrid OLAP (HOLAP)
4. Specialized SQL Servers
Content
• What is data mining
• Kind to be mined
• Technologies used
• Major issues in data
mining
What is data mining
• The practice of examining large pre-existing databases in order to generate new
information.
• Data Mining is defined as extracting information from huge sets of data. In other
words, we can say that data mining is the procedure of mining knowledge from
data.
• The information or knowledge extracted so can be used for any of the following
applications −
1. Market Analysis
2. Fraud Detection
3. Customer Retention
4. Production Control
5. Science Exploration
Kind to be mined
• Kind of knowledge to be mined
• It refers to the kind of functions to be performed.
• These functions are −
I. Characterization
II. Discrimination
III. Association and Correlation Analysis
IV. Classification
V. Prediction
VI. Clustering
VII. Outlier Analysis
VIII.Evolution Analysis
Kind of data mined
1.Flat Files
2.Relational Databases
3.DataWarehouse
4.Transactional Databases
5.Multimedia Databases
6.Spatial Databases
7.Time Series Databases
8.World Wide Web(WWW)
Technologies used
Major issues in data mining
References
• Google
• Wikipedia
• Tutorials point
• www.lsamratl.tk

Introduction to data mining

  • 1.
    INTRODUCTION TO DATA MINING SamratDevidas Tayade TE IT ARMIET
  • 2.
    prerequisites • Knowledge ofdatabases • Data warehousing • Olap
  • 3.
    Knowledge of databases •Database : A database is an organized collection of data, generally stored and accessed electronically from a computer system. Where databases are more complex they are often developed using formal design and modeling techniques. • Database Management System (DBMS) – add, remove, update records – retrieve data that match certain criteria – cross-reference data in different tables – perform complex aggregate calculation • Database consists of columns (attributes) and rows (records).
  • 4.
    Data warehousing Data warehousingis the process of constructing and using a data warehouse. A data warehouse is constructed by integrating data from multiple heterogeneous sources that support analytical reporting, structured and/or ad hoc queries, and decision making. Data warehousing involves data cleaning, data integration, and data consolidations.
  • 5.
    OLAP Online Analytical ProcessingServer (OLAP) is based on the multidimensional data model. It allows managers, and analysts to get an insight of the information through fast, consistent, and interactive access to information. TYPES OF OLAP : 1. Relational OLAP (ROLAP) 2. Multidimensional OLAP (MOLAP) 3. Hybrid OLAP (HOLAP) 4. Specialized SQL Servers
  • 6.
    Content • What isdata mining • Kind to be mined • Technologies used • Major issues in data mining
  • 7.
    What is datamining • The practice of examining large pre-existing databases in order to generate new information. • Data Mining is defined as extracting information from huge sets of data. In other words, we can say that data mining is the procedure of mining knowledge from data. • The information or knowledge extracted so can be used for any of the following applications − 1. Market Analysis 2. Fraud Detection 3. Customer Retention 4. Production Control 5. Science Exploration
  • 8.
    Kind to bemined • Kind of knowledge to be mined • It refers to the kind of functions to be performed. • These functions are − I. Characterization II. Discrimination III. Association and Correlation Analysis IV. Classification V. Prediction VI. Clustering VII. Outlier Analysis VIII.Evolution Analysis
  • 9.
    Kind of datamined 1.Flat Files 2.Relational Databases 3.DataWarehouse 4.Transactional Databases 5.Multimedia Databases 6.Spatial Databases 7.Time Series Databases 8.World Wide Web(WWW)
  • 10.
  • 11.
    Major issues indata mining
  • 12.
    References • Google • Wikipedia •Tutorials point • www.lsamratl.tk