2. Evolution of Database
technology
YEAR PURPOSE
1960’s Network Model, Batch Reports
1970’s Relational data model, Executive information Systems
1980’s Application specific DBMS(spatial data, scientific data,
image data, …)
1990’s Terabyte Data warehouses, Object Oriented, middleware
and web technology
2000’s Business Process
2010’s Sensor DB systems, DBs on embedded systems, large
scale pub/ sub systems
Data Mining July 16, 2009 2
3. Motivation : Necessity is the
mother of invention
Data explosion problem
◦ Automated data collection tools and mature database technology
lead to tremendous amounts of data stored in databases, data
warehouses and other information repositories
We are drowning in data, but starving for knowledge!
Solution: Data warehousing and data mining
◦ Extraction of interesting knowledge (rules, regularities, patterns,
constraints) from data in large databases
Data Mining July 16, 2009 3
4. Why Data Mining?
Data, Data, Data Every where …
I can’t find data I need – data is
scattered over network
I can’t get the data I need
I can’t understand the data I
need
I can’t use the data I found
Data Mining July 16, 2009 4
5. An abundance of data This data occupies
Super Market Scanners, POS
data
Terabytes - 10^12 bytes
Credit cards transactions
Call Center records
Petabytes - 10^15 bytes
ATM Machines
Demographic data
Exabytes - 10^18bytes
Sensor Networks
Cameras
Zettabytes - 10^21bytes
Web server logs
Customer web site trails
Zottabytes-10^24bytes
Geographic Information System
National Medical Records Walmart - 24 Terabytes
Weather Images
Data Mining July 16, 2009 5
6. Process of sorting through large amounts of data and picking
out relevant information
Process of analyzing data from different perspectives and
summarizing it into useful information
Discovering hidden value in database
It is non-trivial process of identifying valid, novel, useful and
understandable patterns in data
Extracting or mining knowledge from large amounts of data
Data Mining July 16, 2009 6
7. History Notes – Many Names of Data
Mining
YEAR Names USES
1960 Data Fishing, Data Statisticians
Dredging
1990 Data Mining DB Community, business
1989 Knowledge Discovery AI, Machine Learning community
in databases
Other Names
Data Archaeology, Information Harvesting, Information Discovery,
Knowledge Extraction,
Data Mining July 16, 2009 7
8. Data Warehousing provides the
Enterprise with a memory
Data Mining provides the
Enterprise with intelligence
July 16, 2009 Data Mining 8
9. Why Data Mining?(Cont..)
Data Warehouse is single, complete and consistent store of data from
variety of different sources available to end users
For example, AT and T handles billions of calls per day. Europe's Very
Long Baseline Interferometer (VLBI) has 16 telescopes, each of which
produces 1 Gigabit/second of astronomical data over a 25-day
observation session
We need data mining for
Transforming data into useful information to users
Present data in useful format
Provide data access to business analyst, Information technology
professionals
Data Mining July 16, 2009 9
10. Data Mining Process
Data Mining is the technique used to carry out KDD.
Data Mining turns data into information and then to knowledge
Information
Data
Knowledge
Data Mining July 16, 2009 10
11. Steps in Data Mining
1. Data cleaning
To remove noise and inconsistent data
2. Data integration
To integrate (compile) multiple data
sources
3. Data selection
Data relevant to analysis is selected
4. Data transformation
Summary normalization aggregation operations are performed
(convert data into two dimension form) and consolidate the data
Data Mining July 16, 2009 11
12. Steps in Data Mining(Cont..)
5. Data mining
Intelligent methods are applied to the data to discover
knowledge or patterns
6. Pattern evaluation
Evaluation of the interesting patterns by thresholding
7. Knowledge Discovery
Visualization and presentation methods are used to present
the mined knowledge to the user.
Data Mining July 16, 2009 12
13. Pattern Evaluation
◦ Data mining: the core of
knowledge discovery
process. Data Mining
Task-relevant Data
Data Selection
Warehouse
Data Cleaning
Data Integration
Databases
Data Mining July 16, 2009 13
14. Data Mining Tasks
1. Classification
• Classification maps data into predefined groups or classes.
• It may be represented by methods such as decision trees, etc.
Decision tree
Flow chart like tree structure
Each node denotes test of
an attribute value
Each branch represents
outcome of test
Leaves represent classes
or class distribution.
Data Mining July 16, 2009 14
15. 2. Regression
Used to map a data item to a real valued prediction variable.
Example. A manager wants to reach a certain level of savings before his
retirement. Periodically he predicts his retirement savings by current value
and several past values. He uses a simple linear regressive formula to
predict the values of savings in future.
3. Prediction
Many real world applications can be seen
predicting future data states based on
past and current data.
Example - Predicting flooding is difficult problem
Data Mining July 16, 2009 15
16. 4. Clustering
Clustering is similar to classification
except that the groups are not predefined.
5. Association Rule
Association refers to uncovering relationship 1998
among data.
Used in retail sales community to identify the items Bread and
(products) that are frequently Jam sell
Zzzz...
purchased together. together!
Data Mining July 16, 2009 16
17. 6. Summarization
Summarization of general characteristics or features of target class of
data.
Data characterization presented in various forms - pie charts, bar
charts, curves.
Data discrimination comparison of general features of target class of
data objects with general features of objects from one or a set of
contrasting classes.
7. Outlier Analysis
Database may contain data objects that do not comply with general
behavior model of data. These data objects are called as outliers.
Data mining methods discard outliers as noise or exceptions.
In applications such as fraud detection, rare events may be more
interesting than regularly occurring events.
Data Mining July 16, 2009 17
18. Data Mining: Types of Data
Relational data and transactional data
Text
Images, video
Mixtures of data
Data Mining July 16, 2009 18
19. Data Mining Products
DataMind -- neurOagent
Information Discovery -- IDIS
SAS Institute -- SAS/Neuronets
19
Data Mining July 16, 2009
20. Data Mining Software
RapidMiner and Weka – Defining data mining process
Top 8 data mining software in 2008
Angoss software
Infor CRM Epiphany
Portrait Software
SAS
SPSS
ThinkAnalytics
Unica
Viscovery
Data Mining July 16, 2009 20
21. Application Areas
Industry Application
Finance Credit Card Analysis
Insurance Fraud Analysis
Telecommunication Call record analysis
July 16, 2009 Data Mining 21
22. Applications
Financial Industry, Banks, Businesses, E-commerce
◦ Stock and investment analysis
◦ Identify loyal customers and risky customer
◦ Predict customer spending
Database analysis and decision support
◦ Market analysis and management
target marketing, customer relation management, market basket
analysis.
◦ Risk analysis and management
Forecasting, quality control, competitive analysis
◦ Fraud detection and management
Data Mining July 16, 2009 22
23. Data Mining in Usage
1. Intelligent Miner
It is IBM data mining product
Distinct feature is include scalability of its mining algorithm and tight
integration with IBM DB2 related data base system.
5. DB Miner
Developed by DBMiner Technologies Inc.
Distinct features of DBMiner are Data cube based Online Analytical
Mining
Data Mining July 16, 2009 23
25. Conclusion
Data mining: discovering interesting patterns from large amounts of
data
A KDD process includes data cleaning, data integration, data
selection, transformation, data mining, pattern evaluation, and
knowledge presentation
Mining can be performed in a variety of information repositories
Data mining functionalities: characterization, discrimination,
association, classification, clustering, outlier etc
Data Mining July 16, 2009 25