1. Introduction to Data Mining
Mahmoud Rafeek Alfarra
http://mfarra.cst.ps
University College of Science & Technology- Khan yonis
Development of computer systems
2016
Chapter 1 – Lecture 2
2. Outline
Definition of Data Mining
Data Mining as an Interdisciplinary field
Process of Data Mining
Data Mining Tasks
Challenges of Data Mining
Data mining application examples
Introduction to RapidMiner
3. Outline
Definition of Data Mining
Data Mining as an Interdisciplinary field
Process of Data Mining
Data Mining Tasks
Challenges of Data Mining
Data mining application examples
Introduction to RapidMiner
4. Data Mining as an Interdisciplinary field
“Data mining is an interdisciplinary field bringing
together techniques from machine learning,
pattern recognition, statistics, databases, and
visualization to address the issue of information
extraction from large data bases”.
5. Data Mining as an Interdisciplinary field
Data Mining
Database
Technology
Statistics
Other
Disciplines
Artificial
Intelligence
Machine
Learning
Visualization
6. Data Mining as an Interdisciplinary field
Data mining is differ than statistics in kind of data
(not only numerical) , kinds of methods ( mostly use
machine learning methods), more than one
hypotheses, amount of data (statistics uses samples).
7. Data Mining as an Interdisciplinary field
Data Mining uses methods from Machine
Learning such as decision tree and neural nets.
Machine Learning uses samples and Data Mining
uses whole data.
Data Mining can access data from database.
Machine Learning some times used to replace
human where Data Mining to help human.
8. Data Mining as an Interdisciplinary field
Databases part of Data Mining that provide the
fast and reliable access to data.
Databases used for data operation (Storing and
retrieving data), Data Mining for Decision
making.
9. Data Mining as an Interdisciplinary field
Search techniques , Knowledge representation,
Knowledge acquisition, maintenance and
application are other branches of Artificial
Intelligence which are highly related with Data
Mining.
10. Data Mining as an Interdisciplinary field
Visualization is used to gain visual insights
into the structure of the data.
Visualization is in large quantities used as a
pre- and post-processing tool for data mining.
11. Process of Data Mining
Data Mining is essentially a process of data
drive extraction of not so obvious but useful
information from large databases.
The entire process is interactive and iterative.
12. Process of Data Mining
Data Cleaning
Data Integration
Databases
Data Warehouse
Task-relevant Data
Selection
Data Mining
Pattern Evaluation.
13. Data cleaning
Real-world data tends to be incomplete, noisy and inconsistent.
incomplete: lacking attribute values, lacking certain attributes of
interest.
◦ e.g., occupation=“ ” (missing data)
noisy: containing noise, errors, or outliers
◦ e.g., Salary=“−10” (an error)
inconsistent: containing difference in codes or names,
◦ e.g., Age=“42” Birthday=“03/07/1997”
14. Data Integration
Data integration is the merging of data
from multiple sources.
These sources may include multiple
databases, data cubes, or flat files.
15. Data Selection
Where data relevant to the analysis task are
retrieved from the database. Therefore,
irrelevant, weakly relevant or redundant
attributes may be detected and removed.
16. Data Transformation
Where data are transformed or consolidated into forms
appropriate for mining by performing:
Summary or aggregation operation, for example:
Daily sales may be aggregated to monthly sales or
annual sales.
Generalization, for example:
City may be generalized to country or age may
generalized to young, middle- age, senior.
17. Data Mining
An essential process where intelligent
methods are applied on data to covert it to
knowledge in for decision making.
Wide range of methods can be used in data
mining such neural nets, decision tree and
Association.
18. Pattern evaluation
To identify the truly interesting pattern based on
some interestingness measures.
A pattern consider interesting if it is:
Valid
Novel
Actionable
Understandable
19. Knowledge Representation
Knowledge presentation is the framework that
converts a large amount of data into a particular
data or procedure that human being can figure out
based on an intention.
In Knowledge representation visualization tools
and knowledge representation techniques are used
to present the mined knowledge to the user.