Introduction To Data mining
BY: BASMA GAMAL
RESARCHER AT COMPUTER SCIENCE- MINA UNIVERSITY
Outline
What is Data Mining?
Technologies used in data mining
Technologies used in data mining
Database Processing vs. Data Mining Processing
Data Mining Models and Tasks
Patterns in Data Mining
Types of Data
Data Mining Tools
What is Data Mining?
Data Mining is the process of extracting useful information from large database.
Data mining is also called as Knowledge discovery, Knowledge extraction, data/pattern analysis,
information harvesting, etc.
The information or knowledge extracted so can be used for any of the following applications:
oMarket Analysis
oFraud Detection
oCustomer Retention
oProduction Control
oScience Exploration
Technologies used in data mining
Statistics
•It uses the mathematical analysis to express representations, model and summarize empirical
data or real world observations.
•Statistical analysis involves the collection of methods, applicable to large amount of data to
conclude and report the trend.
Machine learning
•Arthur Samuel defined machine learning as a field of study that gives computers the ability to
learn without being programmed.
•When the new data is entered in the computer, algorithms help the data to grow or change due
to machine learning, an algorithm is constructed to predict the data from the available
database (Predictive analysis).
5
Database Processing vs. Data Mining
Processing
Query
◦ Well defined
◦ SQL
Query
◦ Poorly defined
◦ No precise query language
 Data
– Operational data
 Output
– Precise
– Subset of database
 Data
– Not operational data
 Output
– Fuzzy
– Not a subset of database
6
Data Mining Models and Tasks
Patterns in Data Mining
•1. Association
The items or objects in relational databases, transactional databases or any
other information repositories are considered, while finding associations or
correlations.
2. Classification
•The goal of classification is to construct a model with the help of historical
data that can accurately predict the value.
It maps the data into the predefined groups or classes and searches for the
new patterns.
For example:
To predict weather on a particular day will be categorized into - sunny, rainy, or cloudy.
3. Regression
Creates predictive models. Regression analysis is used to make predictions based on existing
data by applying formulas.
Regression is very useful for finding (or predicting) the information on the basis of previously
known information.
4. Cluster analysis
It is a process of portioning a set of data into a set of meaningful subclass, called as cluster.
It is used to place the data elements into the related groups without advanced knowledge of
the group definitions.
5. Forecasting
Forecasting is concerned with the discovery of knowledge or information patterns in data that
can lead to reasonable predictions about the future.
Data Mining Implementation Process
Business understanding:
•In this phase, business and data-mining goals are established.
•Understand business and client objectives.
•Using business objectives and current scenario, define your data mining goals.
Data understanding:
In this phase, sanity check on data is performed to check whether its
appropriate for the data mining goals.
Data preparation:
In this phase, data is made production ready.
The data preparation process consumes about 90% of the time of the project.
Modelling
In this phase, mathematical models are used to determine data patterns.
Evaluation:
In this phase, patterns identified are evaluated against the business objectives.
Deployment:
In the deployment phase, you ship your data mining discoveries to everyday
business operations.
Types of Data
Data mining can be performed on following types of data:
Relational databases
Data warehouses
Advanced DB and information repositories
Object-oriented and object-relational databases
Transactional and Spatial databases
Heterogeneous and legacy databases
Multimedia and streaming database
Text databases
Text mining and Web mining
Data Mining Tools
Following are 2 popular Data Mining Tools widely used in Industry:
R language is an open source tool for statistical computing and graphics. R has a wide variety of
statistical, classical statistical tests, time-series analysis, classification and graphical techniques.
It offers effective data handing and storage facility.
Oracle Data Mining popularly knowns as ODM is a module of the Oracle Advanced Analytics
Database. This Data mining tool allows data analysts to generate detailed insights and makes
predictions. It helps predict customer behavior, develops customer profiles, identifies cross-
selling opportunities.
Reference
Data Mining Tutorial
https://www.guru99.com/data-mining-tutorial.html
https://www.tutorialride.com/data-mining/
https://www.tutorialspoint.com/data_mining/

Data mining introduction

  • 1.
    Introduction To Datamining BY: BASMA GAMAL RESARCHER AT COMPUTER SCIENCE- MINA UNIVERSITY
  • 2.
    Outline What is DataMining? Technologies used in data mining Technologies used in data mining Database Processing vs. Data Mining Processing Data Mining Models and Tasks Patterns in Data Mining Types of Data Data Mining Tools
  • 3.
    What is DataMining? Data Mining is the process of extracting useful information from large database. Data mining is also called as Knowledge discovery, Knowledge extraction, data/pattern analysis, information harvesting, etc. The information or knowledge extracted so can be used for any of the following applications: oMarket Analysis oFraud Detection oCustomer Retention oProduction Control oScience Exploration
  • 4.
    Technologies used indata mining Statistics •It uses the mathematical analysis to express representations, model and summarize empirical data or real world observations. •Statistical analysis involves the collection of methods, applicable to large amount of data to conclude and report the trend. Machine learning •Arthur Samuel defined machine learning as a field of study that gives computers the ability to learn without being programmed. •When the new data is entered in the computer, algorithms help the data to grow or change due to machine learning, an algorithm is constructed to predict the data from the available database (Predictive analysis).
  • 5.
    5 Database Processing vs.Data Mining Processing Query ◦ Well defined ◦ SQL Query ◦ Poorly defined ◦ No precise query language  Data – Operational data  Output – Precise – Subset of database  Data – Not operational data  Output – Fuzzy – Not a subset of database
  • 6.
  • 7.
    Patterns in DataMining •1. Association The items or objects in relational databases, transactional databases or any other information repositories are considered, while finding associations or correlations. 2. Classification •The goal of classification is to construct a model with the help of historical data that can accurately predict the value. It maps the data into the predefined groups or classes and searches for the new patterns. For example: To predict weather on a particular day will be categorized into - sunny, rainy, or cloudy.
  • 8.
    3. Regression Creates predictivemodels. Regression analysis is used to make predictions based on existing data by applying formulas. Regression is very useful for finding (or predicting) the information on the basis of previously known information. 4. Cluster analysis It is a process of portioning a set of data into a set of meaningful subclass, called as cluster. It is used to place the data elements into the related groups without advanced knowledge of the group definitions. 5. Forecasting Forecasting is concerned with the discovery of knowledge or information patterns in data that can lead to reasonable predictions about the future.
  • 9.
  • 10.
    Business understanding: •In thisphase, business and data-mining goals are established. •Understand business and client objectives. •Using business objectives and current scenario, define your data mining goals. Data understanding: In this phase, sanity check on data is performed to check whether its appropriate for the data mining goals.
  • 11.
    Data preparation: In thisphase, data is made production ready. The data preparation process consumes about 90% of the time of the project. Modelling In this phase, mathematical models are used to determine data patterns. Evaluation: In this phase, patterns identified are evaluated against the business objectives.
  • 12.
    Deployment: In the deploymentphase, you ship your data mining discoveries to everyday business operations.
  • 13.
    Types of Data Datamining can be performed on following types of data: Relational databases Data warehouses Advanced DB and information repositories Object-oriented and object-relational databases Transactional and Spatial databases Heterogeneous and legacy databases Multimedia and streaming database Text databases Text mining and Web mining
  • 14.
    Data Mining Tools Followingare 2 popular Data Mining Tools widely used in Industry: R language is an open source tool for statistical computing and graphics. R has a wide variety of statistical, classical statistical tests, time-series analysis, classification and graphical techniques. It offers effective data handing and storage facility. Oracle Data Mining popularly knowns as ODM is a module of the Oracle Advanced Analytics Database. This Data mining tool allows data analysts to generate detailed insights and makes predictions. It helps predict customer behavior, develops customer profiles, identifies cross- selling opportunities.
  • 15.