September 27, 2024
Data Mining: Concepts and
Techniques 1
Chp-1: Introduction to
Data Mining
September 27, 2024
Data Mining: Concepts and
Techniques 2
Chapter 1. Introduction
 Motivation: Why data mining?
 What is data mining?
 Data Mining: On what kind of data?
 Kind of patterns to be mined
 Technologies used
 Major issues in data mining
September 27, 2024
Data Mining: Concepts and
Techniques 3
Why Data Mining?
 The Explosive Growth of Data: from terabytes to petabytes(1000
terabytes)

Data collection and data availability

Automated data collection tools, database systems, Web,
computerized society

Major sources of abundant data

Business: Web, e-commerce, transactions, stocks, …

Science: Remote sensing, bioinformatics, scientific simulation, …

Society and everyone: news, digital cameras, YouTube
 We are drowning in data, but starving for knowledge!
 “Necessity is the mother of invention”—Data mining—Automated
analysis of massive data sets
September 27, 2024
Data Mining: Concepts and
Techniques 4
Evolution of Database Technology
 1960s:
 Data collection, database creation and network DBMS
 1970s:
 Relational data model, relational DBMS implementation
 1980s:

RDBMS, advanced data models (extended-relational, OO, deductive, etc.)
 Application-oriented DBMS (spatial, scientific, engineering, etc.)
 1990s:
 Data mining, data warehousing, multimedia databases, and Web databases
 2000s

Stream data management and mining
 Data mining and its applications

Web technology (XML, data integration) and global information systems
September 27, 2024
Data Mining: Concepts and
Techniques 5
Evolution of database system technology
September 27, 2024
Data Mining: Concepts and
Techniques 6
What Is Data Mining?
 Data mining (knowledge discovery from data)
 Extraction of interesting (non-trivial, implicit, previously
unknown and potentially useful) patterns or knowledge
from huge amount of data
 Data mining: a misnomer?
 Alternative names
 Knowledge discovery (mining) in databases (KDD),
knowledge extraction, data/pattern analysis, data
archeology, data dredging, information harvesting,
business intelligence, etc.
Decision making
September 27, 2024
Data Mining: Concepts and
Techniques 7
September 27, 2024
Data Mining: Concepts and
Techniques 8
Knowledge Discovery (KDD) Process
 Data mining—core of
knowledge discovery
process
Data Cleaning
Data Integration
Databases
Data
Warehouse
Selection & Transformation
Data Mining
Pattern Evaluation
9
Data Mining in Business Intelligence
Increasing potential
to support
business decisions End User
Business
Analyst
Data
Analyst
DBA
Decision
Making
Data Presentation
Visualization Techniques
Data Mining
Information Discovery
Data Exploration
Statistical Summary, Querying, and Reporting
Data Preprocessing/Integration, Data Warehouses
Data Sources
Paper, Files, Web documents, Scientific experiments, Database Systems
Why Data Mining?—Potential Applications
 Data analysis and decision support

Market analysis and management

Target marketing, customer relationship
management (CRM), market basket analysis, cross
selling
 Risk analysis and management

Forecasting, customer retention, quality control,
competitive analysis
 Fraud detection and detection of unusual
patterns (outliers
 Other Applications

Text mining (news group, email, documents) and Web
mining 

Stream data mining

Bioinformatics and bio-data analysis
September 27, 2024
Data Mining: Concepts and
Techniques 10
Ex. 1: Market Analysis and Management
 Where does the data come from?—Credit card
transactions, loyalty cards, discount coupons,
customer complaint calls, plus (public) lifestyle studies
 Target marketing
 Find clusters of “model” customers who share the same
characteristics: interest, income level, spending habits, etc.

Determine customer purchasing patterns over time
 Cross-market analysis—Find associations/co-relations
between product sales, & predict based on such
association
 Customer profiling—What types of customers buy
what products (clustering or classification)
September 27, 2024
Data Mining: Concepts and
Techniques 11
Ex. 1: Market Analysis and Management
 Customer requirement analysis

Identify the best products for different groups of customers

Predict what factors will attract new customers
 Provision of summary information

Multidimensional summary reports

Statistical summary information (data central tendency and
variation)
September 27, 2024
Data Mining: Concepts and
Techniques 12
September 27, 2024
Data Mining: Concepts and
Techniques 13
Data Mining: On What Kinds of Data?
 Database-oriented data sets and applications
 Relational database, data warehouse, transactional database
 Advanced data sets and advanced applications
 Data streams and sensor data
 Time-series data, temporal data, sequence data (incl. bio-sequences)
 Structure data, graphs, social networks and multi-linked data
 Object-relational databases
 Heterogeneous databases and legacy databases
 Spatial data and spatiotemporal data(geographical data)
 Multimedia database
 Text databases
 The World-Wide Web
Data Mining: On What Kinds of Data?
 Mining relational databases

Eg. Anaylze customer data to predict the credit risk
of new customers based on their income, age and
previous credit information.
 Data Warehouses

Sales per item type per branch for third quarter.

Data stored to provide information from historical
perespective. Eg. In past 6 to 12 months,
summarized data

Modeled by multidimentional data structure called
data cube.
September 27, 2024
Data Mining: Concepts and
Techniques 14
September 27, 2024
Data Mining: Concepts and
Techniques 15
 Transactional data

Eg analyze which items are sold well together?

Printers are normally purchased together with
computer
September 27, 2024
Data Mining: Concepts and
Techniques 16
Data Mining: On What Kinds of Data?
Kinds of Patterns to be mined
September 27, 2024
Data Mining: Concepts and
Techniques 17
What Kinds of Patterns Can Be Mined?
1) Generalization
2) Association and Correlation Analysis
3) Classification
4) Cluster Analysis
5) Outlier Analysis
September 27, 2024
Data Mining: Concepts and
Techniques 18
Data Mining Function: (1) Generalization
 Multidimensional concept description:
Characterization and discrimination
 Generalize, summarize, and contrast data
characteristics, e.g., summarize the characteristics
of customers who spend more than Rs. 50,000 a
year at an electronics store

Data characterization is a summarization of the
general characteristics or features of a target class
of data
 Data cube technology for computing

OLAP (online analytical processing)
 Examples of Output forms : pie charts, MDD
cubes, bar charts, curves etc.
September 27, 2024
Data Mining: Concepts and
Techniques 19
Data Mining Function: (1) Generalization
contd.
 Data discrimination is a comparison of the general
features of the target class data objects against the
general features of objects from one or multiple
contrasting classes.

Eg. Compare 2 groups of customers- those who shop
for computer products regularly(more than twice a
month) and those who rarely shop for such
products(less than 3 times a year)
 Data cube technology for computing

Drill down on any dimension

Discriminant rules: Discrimination descriptions
expressed in the form of rules
 Output forms : same as that of data characterization
along with discrimination descriptions
September 27, 2024
Data Mining: Concepts and
Techniques 20
Data Mining Function: (2) Association and
Correlation Analysis
 Frequent patterns (or frequent itemsets)

What items are frequently purchased together in
your mart? Eg. Milk & bread
 Association, correlation vs. causality

A typical association rule
 Computer software [1%, 50%] (support,
→
confidence)
 Confidence means that if one buys a computer there is a 50%
chance that she will buy software too. A 1% support means
that 1% of all transactions under analysis show that computer
& software are purchased together
 Association rules are discarded as uninteresting if they
do not satisfy both a minimum support threshold
and a minimum confidence threshold
September 27, 2024
Data Mining: Concepts and
Techniques 21
Data Mining Function: (3) Classification
 Classification and label prediction
 Construct models (functions) based on some training examples

Describe and distinguish classes or concepts for future prediction

E.g., classify countries based on (climate), or classify cars
based on (gas mileage)

Predict some unknown class labels
 Typical methods

Decision trees, naïve Bayesian classification, support vector
machines, neural networks, rule-based classification, pattern-
based classification, logistic regression, …
 Typical applications:

Credit card fraud detection, direct marketing, classifying stars,
diseases, web-pages, …
September 27, 2024
Data Mining: Concepts and
Techniques 22
Various forms of a classification model
September 27, 2024
Data Mining: Concepts and
Techniques 23
Data Mining Function: (4) Cluster Analysis
 Unsupervised learning (i.e., Class label is
unknown)
 Group data to form new categories (i.e.,
clusters), e.g., cluster houses to find
distribution patterns
 Data objects are clustered or grouped based
on the principle of maximizing intraclass
similarity and minimizing interclass similarity
September 27, 2024
Data Mining: Concepts and
Techniques 24
Data Mining Function: (4) Cluster Analysis
September 27, 2024
Data Mining: Concepts and
Techniques 25
Data Mining Function: (5) Outlier Analysis
 Outlier analysis (anomaly mining)
 Outlier: A data object that does not comply
with the general behaviour of the data
 Noise or exception? ― One person’s garbage
could be another person’s treasure
 Methods: by product of clustering or
regression analysis, …
 Useful in fraud detection, rare events analysis
September 27, 2024
Data Mining: Concepts and
Techniques 26
September 27, 2024
Data Mining: Concepts and
Techniques 27
Are All the “Discovered” Patterns Interesting?
 Data mining may generate thousands of patterns: Not all of them
are interesting

Suggested approach: Human-centered, query-based, focused mining
 Interestingness measures
 A pattern is interesting if it is easily understood by humans, valid on
new or test data with some degree of certainty, potentially useful,
novel, or validates some hypothesis that a user seeks to confirm
 Objective vs. subjective interestingness measures

Objective: based on statistics and structures of patterns, e.g.,
support, confidence, etc.

Subjective: based on user’s belief in the data, e.g. large earthquake
often follows a cluster of small earthquake.
September 27, 2024
Data Mining: Concepts and
Techniques 28
Find All and Only Interesting Patterns?
 Find all the interesting patterns: Completeness
 Can a data mining system find all the interesting patterns? Do
we need to find all of the interesting patterns?
 Association vs. classification vs. clustering
 Search for only interesting patterns: An optimization problem
 Can a data mining system find only the interesting patterns?
 Approaches

First generate all the patterns and then filter out the
uninteresting ones

Generate only the interesting
Technologies Used
 As a highly application-driven domain, data mining has
incorporated many techniques from other domains
 The interdisciplinary nature of data mining research and
development contributes significantly to the success of data
mining and its extensive applications
September 27, 2024
Data Mining: Concepts and
Techniques 29
September 27, 2024
Data Mining: Concepts and
Techniques 30
Data Mining: Confluence of Multiple Disciplines
Data Mining
Machine
Learning
Statistics
Applications
Algorithm
Pattern
Recognition
High-Performance
Computing
Visualization
Database
Technology
Data Mining: Confluence of Multiple Disciplines
 Statistics

Statistical models are widely used to model data and
data classes.

Eg. We can use statistics to model noise and
missing data.
 Machine learning

Computer programs automatically learn to
recognize complex patterns and make intelligent
decisions based on data.

e.g. Handwritten postal codes
September 27, 2024
Data Mining: Concepts and
Techniques 31
September 27, 2024
Data Mining: Concepts and
Techniques 32
Why Confluence of Multiple Disciplines?
 Tremendous amount of data
 Algorithms must be highly scalable to handle such as tera-bytes
of data
 High-dimensionality of data
 Micro-array may have tens of thousands of dimensions
 High complexity of data
 Data streams and sensor data
 Time-series data, temporal data, sequence data
 Structure data, graphs, social networks and multi-linked data
 Heterogeneous databases and legacy databases
 Spatial, spatiotemporal, multimedia, text and Web data
 Software programs, scientific simulations
 New and sophisticated applications
September 27, 2024
Data Mining: Concepts and
Techniques 33
Major Issues in Data Mining
 Mining methodology
 Mining different kinds of knowledge from diverse data types,
e.g., files in pdf or doc
 Mining knowledge in multi-dimensional space.
 Data mining: An interdisciplinary effort( mine data with NL
text)
 Pattern evaluation: the interestingness problem
 Handling noise, uncertainty, and incompleteness of data
 Integration of the discovered knowledge with existing one:
knowledge fusion
 Pattern evaluation and pattern- or constraint-guided mining
September 27, 2024
Data Mining: Concepts and
Techniques 34
Major Issues in Data Mining (1)
 User interaction

Interactive mining( dynamically change focus of search)
 Incorporation of background knowledge(constraints, rules)
 presentation and visualization of data mining results
 Efficiency and Scalability
 Efficiency and scalability of data mining algorithms(run time …
predictable,short,acceptable)
 Parallel, distributed, stream, and incremental mining methods
 Diversity of data types

Handling complex types of data(simple to temporal data
objects)
 Mining dynamic, networked, and global data repositories
September 27, 2024
Data Mining: Concepts and
Techniques 35
Major Issues in Data Mining (2)
 Data mining and society
 Social impacts of data mining(benefit to society)
 Privacy-preserving data mining
 Invisible data mining(system have buit in function.. click of
mouse)
September 27, 2024
Data Mining: Concepts and
Techniques 36
Architecture: Typical Data Mining System
data cleaning, integration, and selection
Database or Data
Warehouse Server
Data Mining Engine
Pattern Evaluation
Graphical User Interface
Know
ledge
-Base
Database
Data
Warehouse
World-Wide
Web
Other Info
Repositories
September 27, 2024
Data Mining: Concepts and
Techniques 37
Summary
 Data mining: Discovering interesting patterns from large amounts
of data
 A natural evolution of database technology, in great demand, with
wide applications
 A KDD process includes data cleaning, data integration, data
selection, transformation, data mining, pattern evaluation, and
knowledge presentation
 Mining can be performed in a variety of information repositories
 Data mining functionalities: characterization, discrimination,
association, classification, clustering, outlier and trend analysis, etc.
 Data mining systems and architectures
 Major issues in data mining

ch_1_dm data preprocessing in data mining

  • 1.
    September 27, 2024 DataMining: Concepts and Techniques 1 Chp-1: Introduction to Data Mining
  • 2.
    September 27, 2024 DataMining: Concepts and Techniques 2 Chapter 1. Introduction  Motivation: Why data mining?  What is data mining?  Data Mining: On what kind of data?  Kind of patterns to be mined  Technologies used  Major issues in data mining
  • 3.
    September 27, 2024 DataMining: Concepts and Techniques 3 Why Data Mining?  The Explosive Growth of Data: from terabytes to petabytes(1000 terabytes)  Data collection and data availability  Automated data collection tools, database systems, Web, computerized society  Major sources of abundant data  Business: Web, e-commerce, transactions, stocks, …  Science: Remote sensing, bioinformatics, scientific simulation, …  Society and everyone: news, digital cameras, YouTube  We are drowning in data, but starving for knowledge!  “Necessity is the mother of invention”—Data mining—Automated analysis of massive data sets
  • 4.
    September 27, 2024 DataMining: Concepts and Techniques 4 Evolution of Database Technology  1960s:  Data collection, database creation and network DBMS  1970s:  Relational data model, relational DBMS implementation  1980s:  RDBMS, advanced data models (extended-relational, OO, deductive, etc.)  Application-oriented DBMS (spatial, scientific, engineering, etc.)  1990s:  Data mining, data warehousing, multimedia databases, and Web databases  2000s  Stream data management and mining  Data mining and its applications  Web technology (XML, data integration) and global information systems
  • 5.
    September 27, 2024 DataMining: Concepts and Techniques 5 Evolution of database system technology
  • 6.
    September 27, 2024 DataMining: Concepts and Techniques 6 What Is Data Mining?  Data mining (knowledge discovery from data)  Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data  Data mining: a misnomer?  Alternative names  Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc.
  • 7.
    Decision making September 27,2024 Data Mining: Concepts and Techniques 7
  • 8.
    September 27, 2024 DataMining: Concepts and Techniques 8 Knowledge Discovery (KDD) Process  Data mining—core of knowledge discovery process Data Cleaning Data Integration Databases Data Warehouse Selection & Transformation Data Mining Pattern Evaluation
  • 9.
    9 Data Mining inBusiness Intelligence Increasing potential to support business decisions End User Business Analyst Data Analyst DBA Decision Making Data Presentation Visualization Techniques Data Mining Information Discovery Data Exploration Statistical Summary, Querying, and Reporting Data Preprocessing/Integration, Data Warehouses Data Sources Paper, Files, Web documents, Scientific experiments, Database Systems
  • 10.
    Why Data Mining?—PotentialApplications  Data analysis and decision support  Market analysis and management  Target marketing, customer relationship management (CRM), market basket analysis, cross selling  Risk analysis and management  Forecasting, customer retention, quality control, competitive analysis  Fraud detection and detection of unusual patterns (outliers  Other Applications  Text mining (news group, email, documents) and Web mining   Stream data mining  Bioinformatics and bio-data analysis September 27, 2024 Data Mining: Concepts and Techniques 10
  • 11.
    Ex. 1: MarketAnalysis and Management  Where does the data come from?—Credit card transactions, loyalty cards, discount coupons, customer complaint calls, plus (public) lifestyle studies  Target marketing  Find clusters of “model” customers who share the same characteristics: interest, income level, spending habits, etc.  Determine customer purchasing patterns over time  Cross-market analysis—Find associations/co-relations between product sales, & predict based on such association  Customer profiling—What types of customers buy what products (clustering or classification) September 27, 2024 Data Mining: Concepts and Techniques 11
  • 12.
    Ex. 1: MarketAnalysis and Management  Customer requirement analysis  Identify the best products for different groups of customers  Predict what factors will attract new customers  Provision of summary information  Multidimensional summary reports  Statistical summary information (data central tendency and variation) September 27, 2024 Data Mining: Concepts and Techniques 12
  • 13.
    September 27, 2024 DataMining: Concepts and Techniques 13 Data Mining: On What Kinds of Data?  Database-oriented data sets and applications  Relational database, data warehouse, transactional database  Advanced data sets and advanced applications  Data streams and sensor data  Time-series data, temporal data, sequence data (incl. bio-sequences)  Structure data, graphs, social networks and multi-linked data  Object-relational databases  Heterogeneous databases and legacy databases  Spatial data and spatiotemporal data(geographical data)  Multimedia database  Text databases  The World-Wide Web
  • 14.
    Data Mining: OnWhat Kinds of Data?  Mining relational databases  Eg. Anaylze customer data to predict the credit risk of new customers based on their income, age and previous credit information.  Data Warehouses  Sales per item type per branch for third quarter.  Data stored to provide information from historical perespective. Eg. In past 6 to 12 months, summarized data  Modeled by multidimentional data structure called data cube. September 27, 2024 Data Mining: Concepts and Techniques 14
  • 15.
    September 27, 2024 DataMining: Concepts and Techniques 15
  • 16.
     Transactional data  Eganalyze which items are sold well together?  Printers are normally purchased together with computer September 27, 2024 Data Mining: Concepts and Techniques 16 Data Mining: On What Kinds of Data?
  • 17.
    Kinds of Patternsto be mined September 27, 2024 Data Mining: Concepts and Techniques 17
  • 18.
    What Kinds ofPatterns Can Be Mined? 1) Generalization 2) Association and Correlation Analysis 3) Classification 4) Cluster Analysis 5) Outlier Analysis September 27, 2024 Data Mining: Concepts and Techniques 18
  • 19.
    Data Mining Function:(1) Generalization  Multidimensional concept description: Characterization and discrimination  Generalize, summarize, and contrast data characteristics, e.g., summarize the characteristics of customers who spend more than Rs. 50,000 a year at an electronics store  Data characterization is a summarization of the general characteristics or features of a target class of data  Data cube technology for computing  OLAP (online analytical processing)  Examples of Output forms : pie charts, MDD cubes, bar charts, curves etc. September 27, 2024 Data Mining: Concepts and Techniques 19
  • 20.
    Data Mining Function:(1) Generalization contd.  Data discrimination is a comparison of the general features of the target class data objects against the general features of objects from one or multiple contrasting classes.  Eg. Compare 2 groups of customers- those who shop for computer products regularly(more than twice a month) and those who rarely shop for such products(less than 3 times a year)  Data cube technology for computing  Drill down on any dimension  Discriminant rules: Discrimination descriptions expressed in the form of rules  Output forms : same as that of data characterization along with discrimination descriptions September 27, 2024 Data Mining: Concepts and Techniques 20
  • 21.
    Data Mining Function:(2) Association and Correlation Analysis  Frequent patterns (or frequent itemsets)  What items are frequently purchased together in your mart? Eg. Milk & bread  Association, correlation vs. causality  A typical association rule  Computer software [1%, 50%] (support, → confidence)  Confidence means that if one buys a computer there is a 50% chance that she will buy software too. A 1% support means that 1% of all transactions under analysis show that computer & software are purchased together  Association rules are discarded as uninteresting if they do not satisfy both a minimum support threshold and a minimum confidence threshold September 27, 2024 Data Mining: Concepts and Techniques 21
  • 22.
    Data Mining Function:(3) Classification  Classification and label prediction  Construct models (functions) based on some training examples  Describe and distinguish classes or concepts for future prediction  E.g., classify countries based on (climate), or classify cars based on (gas mileage)  Predict some unknown class labels  Typical methods  Decision trees, naïve Bayesian classification, support vector machines, neural networks, rule-based classification, pattern- based classification, logistic regression, …  Typical applications:  Credit card fraud detection, direct marketing, classifying stars, diseases, web-pages, … September 27, 2024 Data Mining: Concepts and Techniques 22
  • 23.
    Various forms ofa classification model September 27, 2024 Data Mining: Concepts and Techniques 23
  • 24.
    Data Mining Function:(4) Cluster Analysis  Unsupervised learning (i.e., Class label is unknown)  Group data to form new categories (i.e., clusters), e.g., cluster houses to find distribution patterns  Data objects are clustered or grouped based on the principle of maximizing intraclass similarity and minimizing interclass similarity September 27, 2024 Data Mining: Concepts and Techniques 24
  • 25.
    Data Mining Function:(4) Cluster Analysis September 27, 2024 Data Mining: Concepts and Techniques 25
  • 26.
    Data Mining Function:(5) Outlier Analysis  Outlier analysis (anomaly mining)  Outlier: A data object that does not comply with the general behaviour of the data  Noise or exception? ― One person’s garbage could be another person’s treasure  Methods: by product of clustering or regression analysis, …  Useful in fraud detection, rare events analysis September 27, 2024 Data Mining: Concepts and Techniques 26
  • 27.
    September 27, 2024 DataMining: Concepts and Techniques 27 Are All the “Discovered” Patterns Interesting?  Data mining may generate thousands of patterns: Not all of them are interesting  Suggested approach: Human-centered, query-based, focused mining  Interestingness measures  A pattern is interesting if it is easily understood by humans, valid on new or test data with some degree of certainty, potentially useful, novel, or validates some hypothesis that a user seeks to confirm  Objective vs. subjective interestingness measures  Objective: based on statistics and structures of patterns, e.g., support, confidence, etc.  Subjective: based on user’s belief in the data, e.g. large earthquake often follows a cluster of small earthquake.
  • 28.
    September 27, 2024 DataMining: Concepts and Techniques 28 Find All and Only Interesting Patterns?  Find all the interesting patterns: Completeness  Can a data mining system find all the interesting patterns? Do we need to find all of the interesting patterns?  Association vs. classification vs. clustering  Search for only interesting patterns: An optimization problem  Can a data mining system find only the interesting patterns?  Approaches  First generate all the patterns and then filter out the uninteresting ones  Generate only the interesting
  • 29.
    Technologies Used  Asa highly application-driven domain, data mining has incorporated many techniques from other domains  The interdisciplinary nature of data mining research and development contributes significantly to the success of data mining and its extensive applications September 27, 2024 Data Mining: Concepts and Techniques 29
  • 30.
    September 27, 2024 DataMining: Concepts and Techniques 30 Data Mining: Confluence of Multiple Disciplines Data Mining Machine Learning Statistics Applications Algorithm Pattern Recognition High-Performance Computing Visualization Database Technology
  • 31.
    Data Mining: Confluenceof Multiple Disciplines  Statistics  Statistical models are widely used to model data and data classes.  Eg. We can use statistics to model noise and missing data.  Machine learning  Computer programs automatically learn to recognize complex patterns and make intelligent decisions based on data.  e.g. Handwritten postal codes September 27, 2024 Data Mining: Concepts and Techniques 31
  • 32.
    September 27, 2024 DataMining: Concepts and Techniques 32 Why Confluence of Multiple Disciplines?  Tremendous amount of data  Algorithms must be highly scalable to handle such as tera-bytes of data  High-dimensionality of data  Micro-array may have tens of thousands of dimensions  High complexity of data  Data streams and sensor data  Time-series data, temporal data, sequence data  Structure data, graphs, social networks and multi-linked data  Heterogeneous databases and legacy databases  Spatial, spatiotemporal, multimedia, text and Web data  Software programs, scientific simulations  New and sophisticated applications
  • 33.
    September 27, 2024 DataMining: Concepts and Techniques 33 Major Issues in Data Mining  Mining methodology  Mining different kinds of knowledge from diverse data types, e.g., files in pdf or doc  Mining knowledge in multi-dimensional space.  Data mining: An interdisciplinary effort( mine data with NL text)  Pattern evaluation: the interestingness problem  Handling noise, uncertainty, and incompleteness of data  Integration of the discovered knowledge with existing one: knowledge fusion  Pattern evaluation and pattern- or constraint-guided mining
  • 34.
    September 27, 2024 DataMining: Concepts and Techniques 34 Major Issues in Data Mining (1)  User interaction  Interactive mining( dynamically change focus of search)  Incorporation of background knowledge(constraints, rules)  presentation and visualization of data mining results  Efficiency and Scalability  Efficiency and scalability of data mining algorithms(run time … predictable,short,acceptable)  Parallel, distributed, stream, and incremental mining methods  Diversity of data types  Handling complex types of data(simple to temporal data objects)  Mining dynamic, networked, and global data repositories
  • 35.
    September 27, 2024 DataMining: Concepts and Techniques 35 Major Issues in Data Mining (2)  Data mining and society  Social impacts of data mining(benefit to society)  Privacy-preserving data mining  Invisible data mining(system have buit in function.. click of mouse)
  • 36.
    September 27, 2024 DataMining: Concepts and Techniques 36 Architecture: Typical Data Mining System data cleaning, integration, and selection Database or Data Warehouse Server Data Mining Engine Pattern Evaluation Graphical User Interface Know ledge -Base Database Data Warehouse World-Wide Web Other Info Repositories
  • 37.
    September 27, 2024 DataMining: Concepts and Techniques 37 Summary  Data mining: Discovering interesting patterns from large amounts of data  A natural evolution of database technology, in great demand, with wide applications  A KDD process includes data cleaning, data integration, data selection, transformation, data mining, pattern evaluation, and knowledge presentation  Mining can be performed in a variety of information repositories  Data mining functionalities: characterization, discrimination, association, classification, clustering, outlier and trend analysis, etc.  Data mining systems and architectures  Major issues in data mining

Editor's Notes

  • #7 Information refers to the processed data. This means that the data, facts, figures when acted upon to derive certain solutions and that processed data acts as an information. This information could later be utilized for decisions making and taking actions. Knowledge is what gets imbibed after taking the information in our minds. From the above diagram, it could be said that data and the information help in enhancing our knowledge and then based on the understanding, we take or make decisions.
  • #32 Add a definition/description of “traditional data analysis”.