SlideShare a Scribd company logo
Data Mining
Ajith G.S: poposir.orgfree.com
DATA MINING
• Extracting Knowledge
• Knowledge mining from data
• Knowledge Discovery from Data (KDD)
Ajith G.S: poposir.orgfree.com
Data Mining
Ajith G.S: poposir.orgfree.com
Data Mining
• KDD Process Steps
• 1) Data Clearing
• 2) Data Integration
• 3) Data Selection
• 4) Data transformation
• 5) Data mining
• 6) Pattern evaluation
• 7) Knowledge Presentation
Ajith G.S: poposir.orgfree.com
Data Mining
• KDD Process Steps
• 1) Data Clearing – remove noise and inconsistent data
• 2) Data Integration – combine multiple data source
• 3) Data Selection – select relevant data for analysis
• 4) Data transformation – convert into needed format
• 5) Data mining – apply methods to extract data pattern
• 6) Pattern evaluation – select needed pattern to represent
knowledge
• 7) Knowledge Presentation – diff visualization techniques
Ajith G.S: poposir.orgfree.com
Data Mining
• Data Mining is a step in knowledge discovery process
•
Ajith G.S: poposir.orgfree.com
Data Mining
• Architecture of data mining system
• .
Ajith G.S: poposir.orgfree.com
Data Mining
• Architecture of data mining system
• Components are
• Database, Data ware house, World wide web, other
information repository
• - data cleaning and integration techniques may be performed
on the data
• Database or data ware house server
• - responsible for fetching needed data
•
Ajith G.S: poposir.orgfree.com
Data Mining
• Architecture of data mining system
• Knowledge base
• - used to guide the search
• Data mining Engine
• - task such as characterization, association, correlation analysis,
classification, ..
• Pattern evaluation module
• - to select needed patterns
• User interface
• - user communication
Ajith G.S: poposir.orgfree.com
Data Mining
• It deals with a number of different data repositories on which mining can
be performed.
• Can be applicable to any kinds of repositories as well as data streams.
• Data Repositories like
• Relational Databases
• Data Warehouses
• Transactional Databases
• Advanced database systems
• Flat files
• Data streams
• WWW
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
• Advanced database systems like
• Object relational databases
• Temporal, sequence and time series database
• Spatial databases
• Multimedia databases
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
• Relational Databases
• DBMS - Collection of interrelated data + set of software programs
to access and manage the data
• Relational Database - A collection of tables, each of which is
assigned a unique name
• Each table consist of a set of attributes and stores a large set of
tuples
• Tuple represents an object identified by a unique key and described
by a set of attribute values
•
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
• Relational Databases
• Relational data can be accessed by relational query language
such as SQL or with assistance of GUI.
• A given query is transformed into relational operations such as
join, selection and projection
• Data mining in relational database  Searching for data
patterns Example: To predict credit risk of new customers
based on the data available in the database.
• Relational DB is most commonly available and is a rich
information repository.
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
• Data Warehouse
• It is a repository of information collected from multiple sources
stored under a unified schema and that usually resides at a
single site.
• Constructed using Data Cleaning, Integration,
Transformation, Loading and Periodic data refreshing.
• In a data warehouse rather than storing details it may store a
summary of the data from a historical perspective.
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
• Data Warehouse
• Multidimensional database structure Dimension- An attribute
or a set of attribute in the schema. Cell- Aggregate measure
• Usually by a multidimensional data cube.
• Data mart Department subset of a data warehouse and
focuses on selected subjects
• OLAP operations Roll up, Drill down
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
• Typical framework of a data warehouse
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
• Multidimensional data cube
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
• Transactional Database
• Consist of a file where each record represents a transaction.
• Includes a unique transaction identity number and list of items
making up the transaction
• Example: Transactional database for sales “Which items sold
well together?” Data mining for transactional data identifies
frequent item sets easily
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
• Advanced Data and Information Systems and Advanced
Applications
• Object Relational Databases
• Temporal Databases, Sequence Databases and Time-Series
Database
• Spatial Databases and spatio-temporal databases
• Text Databases and Multimedia Databases
• Heterogeneous Databases and legacy Databases
• Data Streams
• WWW
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
• Advanced Data and Information Systems and Advanced Applications
• Object Relational Databases
• Handles complex objects
• Each entity is considered as an object Individual items, employees etc.
• Data and code relating to an object are encapsulated into a single unit
• Each object has
• A set of variables Attributes
• A set of messages to communicate with other objects
• A set of methods Holds the code to implement the message
• Object class Objects that share a common set of properties
• Each object is an instance of a class.
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
• Advanced Data and Information Systems and Advanced Applications
• Temporal Databases, Sequence Databases and Time-Series Database
• Temporal databases handles data involving time Stores relational data
that include time related attributes
• Sequence Databases stores sequence of ordered events with or with out a
concrete notion of time. Example Customer shopping sequences
• Time Series Databases stores sequence of values or events obtained over
repeated measurements of time. Example  Data collected from the stock
exchange.
• Data mining techniques can be used to find the trends of changes for
objects in the database.
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
• Advanced Data and Information Systems and Advanced
Applications
• Spatial Databases and spatio-temporal databases
• Spatial database contains objects defined geometric space
Example Maps, CAD databases
• Using data mining the relationship among a set of spatial
objects can be examined
• Spatio temporal databases  Spatial DBs that stores spatial
objects that change with time Example : Tracking of moving
vehicles
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
• Advanced Data and Information Systems and Advanced
Applications
• Text Databases and Multimedia Databases
• Text databases contains word descriptions for objects Long
sequence of paragraphs. Example : Product specification
• Text databases may highly unstructured(Web pages on WWW),
semi structured(email) and well structured.
• By mining text data we can uncover general and concise
descriptions of the text documents, keywords etc.
• Multimedia databases store image, audio and video data Must
support large objects
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
• Advanced Data and Information Systems and Advanced
Applications
• Heterogeneous Databases and Legacy databases
• Heterogeneous databases consist of a set of interconnected
component databases where the objects in the component
databases differ greatly.
• Legacy database is a group of heterogeneous databases
• Information exchange across these databases is very difficult
due to diverse semantics Data mining is a solution by
transforming the data into higher and more generalized levels
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
• Advanced Data and Information Systems and Advanced
Applications
• Data Streams
• New kind of data where the data flow in and out of an
observation platform dynamically.
• Example: Video Surveillance
• Data streams are normally not stored in any kind of
repository Challenges to management and analysis
• Uses continuous query model
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
• Advanced Data and Information Systems and Advanced Applications
• World Wide Web
• Data objects are linked together to facilitate interactive access.
• Opportunity as well as challenge to data mining
• Web usage mining Capturing user access pattern in distributed
information environment
• Keyword-based search offer limited help to users
• Authoritative web page analysis Rank webpages based on their
importance
• Automated web page clustering and classification Arrange web pages
based on their contents
• Web community analysis Identifies hidden social networks and
communities
Ajith G.S: poposir.orgfree.com
Data Mining- On What Kinds of Data
• What kinds of patterns can be mined?
• Used to specify the kind of patterns to be found in data mining
tasks.
• Tasks can be classified into 2:
• Descriptive  Deals with the general properties of data in the
database
• Predictive  Perform inference on the current data in order to
make predictions
Ajith G.S: poposir.orgfree.com
Data Mining Functionalities
• Concept/ Class Description: Characterization and
Discrimination
• Mining frequent Patterns, Association and Correlations
• Classification and Prediction
• Cluster Analysis
• Outlier Analysis
• Evolution Analysis
Ajith G.S: poposir.orgfree.com
Data Mining Functionalities
• Concept/ Class Description: Characterization and Discrimination
• Data can be associated with classes or concepts.
• Example:
• classes of items for sales - computer and printers
• concepts of customers - big spenders and budget spenders
• Using precise terms we can describe individual classes and concepts.
• Such descriptions of a class or a concept are called class/concept descriptions
• These descriptions can be derived via
• Data Characterization − This refers to summarizing data of class under study -
Target Class
• Data Discrimination − By comparison of the target class with one or a set of
comparative classes- Contrasting classes
• Both the above methods
Ajith G.S: poposir.orgfree.com
Data Mining Functionalities
• Mining frequent Patterns
• Patterns that occur frequently in transactional data.
• Frequent Item Set − It refers to a set of items that frequently
appear together - milk and bread
• Frequent Subsequence − A sequence of patterns that occur
frequently - purchasing a camera is followed by memory card
• Frequent Sub Structure − Substructure refers to different structural
forms, such as graphs, trees, or lattices, which may be combined
with item−sets or subsequences.
• Mining frequent patterns lead to the discovery of interesting
associations and correlations within the data
Ajith G.S: poposir.orgfree.com
Data Mining Functionalities
• Association and Correlations
• Association Rules: 2 types
• Single dimensional association rules
• Multi-dimensional association rules
Ajith G.S: poposir.orgfree.com
Data Mining Functionalities
• Association and Correlations
• The association rules are discarded as uninteresting if they do
not satisfy both a minimum support threshold and a minimum
confidence threshold.
• Confidence- Certainty
• Support- indication of how frequently the items appear in the
database
Ajith G.S: poposir.orgfree.com
Data Mining Functionalities
• Classification
• Classification is the process of finding a model that describes the data
classes or concepts.
• This derived model is based on the analysis of sets of training data- Known
class labels
• Using this model to predict the class of objects whose class label is
unknown.
• The derived model can be presented in the following forms −
• (IF-THEN) Rules
• Decision Trees
• Mathematical Formulae
• Neural Networks
Ajith G.S: poposir.orgfree.com
Data Mining Functionalities
• Classification & Prediction
Ajith G.S: poposir.orgfree.com
Data Mining Functionalities
• Prediction
• Models continuous valued functions
• It is used to predict missing or unavailable numerical data
values rather than class labels.
• Regression Analysis is generally used for prediction.
Ajith G.S: poposir.orgfree.com
Data Mining Functionalities
• Cluster Analysis
• Analyzes data objects without consulting a known class label
• The objects are clustered or grouped based on the principle of
“ maximizing the intra-class similarity and minimizing the
interclass similarity”
• Within a cluster the data objects will have high similarity but
dissimilar to objects in other clusters
Ajith G.S: poposir.orgfree.com
Data Mining Functionalities
• Cluster Analysis
Ajith G.S: poposir.orgfree.com
Data Mining Functionalities
• Outlier Analysis
• Outliers- Data objects in a database that do not obey the
general behavior or model of data.
• In some applications, the rare events can be more interesting
than the regularly occurring ones Fraud detection Outlier
mining
Ajith G.S: poposir.orgfree.com
Data Mining Functionalities
• Evolution Analysis
• Evolution analysis refers to the description and model
regularities or trends for objects whose behavior changes over
time.
Ajith G.S: poposir.orgfree.com
Data Mining Functionalities
Ajith G.S: poposir.orgfree.com
Data Mining Classification of Data Mining System
• Classification according to the kinds of database mined
• Data models (Relational, Transactional, Object relational)
• Type of data (spatial, time series, text, stream , multimedia,
WWW)
• Classification according to the kinds of knowledge mined
• Based on different data mining functionalities
• According to the level of abstraction of knowledge mined
• According to the regularity or irregularity of data that is mined
Ajith G.S: poposir.orgfree.com
Data Mining Classification of Data Mining System
• Classification according to the kinds of techniques utilized
• Degree of user interactions involved
• Methods of data analysis involved (database oriented or data
warehouse oriented etc)
• Classification according to the applications adapted
• Finance
• Tele communication
• DNA
Ajith G.S: poposir.orgfree.com
Data Mining Classification of Data Mining System
• Each user will have a data mining task, to perform a task with
help of data mining query
• Query is defined as Data mining task primitives Allow the
users to interact with the data mining system.
• DMQL Data Mining query Language
Ajith G.S: poposir.orgfree.com
Data Mining Task Primitives
• The primitives specify
• The set of task relevant data to be mined
• Specifies the portions of database or the set of data in which the
user is interested
• It includes
• Database or data warehouse name
• Database tables or Data warehouse cubes
• Conditions for data selection
• Relevant attributes or dimensions
• Data grouping criteria
Ajith G.S: poposir.orgfree.com
Data Mining Task Primitives
• The primitives specify
• The kind of knowledge to be mined
• Specifies the data mining functions to be performed
• Characterization
• Discrimination
• Association/ Correlation
• Classification/Prediction
• Clustering
• Outlier or Evolution Analysis
Ajith G.S: poposir.orgfree.com
Data Mining Task Primitives
• The primitives specify
• The background knowledge to be used in the discovery process
• Knowledge about the domain to be mined
• Guides the knowledge discovery process and evaluations of
the patterns found
• User beliefs regarding the relationships in the data
Ajith G.S: poposir.orgfree.com
Data Mining Task Primitives
• The primitives specify
• The interestingness measures and threshold for pattern
evaluation
• Used to guide the mining process or evaluation of the
discovered patterns
• Different kind of knowledge have different interestingness
measures
• eg
• Support
• Confidence
Ajith G.S: poposir.orgfree.com
Data Mining Task Primitives
• The primitives specify
• The expected representation for visualizing the discovered patterns
• Refers to the form in which discovered patterns are to be displayed
• Rules
• Tables
• Charts
• Graphs
• Decision Trees
• Cubes
Ajith G.S: poposir.orgfree.com
Data Mining Task Primitives
• Integration of Data Mining System with Database or Data
Warehouse System
Ajith G.S: poposir.orgfree.com
• When DM work in an environment, it required to communicate
with other information components such DB and DW
• Diff integration schema are
• No coupling
• Loose coupling
• Semi tight coupling
• Tight coupling
Ajith G.S: poposir.orgfree.com
Integration of Data Mining System with Database or Data Warehouse System
• No coupling
• A DM system will not use facilities of a DB / DW system
• Fetch data from a particular source(file) and process the data
and stores the results in another file.
• Simple integration scheme
• Drawbacks
• Wastage of time for preprocessing the data
• Use other tools to extract data
• Poor Design
Ajith G.S: poposir.orgfree.com
Integration of Data Mining System with Database or Data Warehouse System
• Loose coupling
• A data mining system will use some facilities of a DB / DW
system
• Fetch data from a data repository and process the data and
stores the results in DB or DW
• It fetch the data using query processing, indexing and other
DB/DW system facilities
• Drawback
• Difficult to achieve high scalability and good performance with
large data sets
Ajith G.S: poposir.orgfree.com
Integration of Data Mining System with Database or Data Warehouse System
• Semi tight coupling
• Essential data mining primitives are provided in the DB/DW system
• Sorting
• Indexing
• Aggregation
• Histogram Analysis
• Pre-computation of statistical measures
• Also some frequently used intermediate mining results can be pre-
computed and stored in a DB/DW system.
• The design will enhance the performance of a DM system
Ajith G.S: poposir.orgfree.com
Integration of Data Mining System with Database or Data Warehouse System
• Tight coupling
• Smoothly integrated into the DB/DW system
• DM system is treated as one functional component of an
information system
• Data mining queries and functions are optimized based on
different methods of DB/DW system.
Ajith G.S: poposir.orgfree.com
Integration of Data Mining System with Database or Data Warehouse System
• Data mining is not an easy task,
• The algorithms use very complex data is not always available at
one place
• Needs to be integrated from various heterogeneous data
sources.
• Common Issues are
• Mining methodology and user interaction Issues
• Performance Issues
• Issues related to the different types of database
Ajith G.S: poposir.orgfree.com
Issues in Data Mining
• Mining different kinds of knowledge in the databases
• Different users may be interested in different kinds of knowledge.
It should cover a broad range of knowledge discovery
task(classification, clustering)
• Uses the same database in different ways
• Interactive mining of knowledge at multiple levels of abstraction
• The data mining process needs to be interactive  allows users to
focus the search for patterns, providing and refining data mining
requests based on the returned results.
• Enables the user to view the data from different angles and level
of abstractions
Ajith G.S: poposir.orgfree.com
Issues in Data Mining Mining methodology and user interaction Issues
• Incorporation of background knowledge(knowledge about the
domain under study)
• To guide discovery process and to express the discovered patterns,
the background knowledge can be used Express the discovered
patterns not only in concise terms but at multiple levels of
abstraction.
• Data mining query languages and ad hoc data mining
• Data Mining query language that allows the user to describe ad hoc
mining tasks should be developed.
• These languages should be integrated with a database or data
warehouse query language and optimized for efficient and flexible
data mining.
Ajith G.S: poposir.orgfree.com
Issues in Data Mining Mining methodology and user interaction Issues
• Presentation and visualization of data mining results
• Once the patterns are discovered it needs to be expressed in
high level languages, and visual representations.
• These representations should be easily understandable
• Handling noisy and incomplete data
• The data cleaning methods are required to handle the noise
and incomplete objects while mining the data regularities.
• If the data cleaning methods are not there then the accuracy
of the discovered patterns will be poor
Ajith G.S: poposir.orgfree.com
Issues in Data Mining Mining methodology and user interaction Issues
• Pattern evaluation
• The patterns discovered may be uninteresting because either
they represent common knowledge or lack novelty
• To guide the discovery process and reduce the search space,
interestingness measures or user specified constraints should
be there.
Ajith G.S: poposir.orgfree.com
Issues in Data Mining Mining methodology and user interaction Issues
• Efficiency and scalability of data mining algorithm
• In order to effectively extract the information from huge
amount of data in databases
• The running time must be predictable and scalable.
• Parallel, distributed and incremental mining algorithms
• These algorithms divide the data into partitions which is
further processed in a parallel fashion.
• Then the results from the partitions is merged.
• The incremental algorithms, update databases without mining
the data again from scratch.
Ajith G.S: poposir.orgfree.com
Issues in Data Mining Performance Issues
• Handling of relational and complex types of data
• The database may contain complex data objects, multimedia data
objects, spatial data, temporal data etc.
• It is not possible for one system to mine all these kind of data.
• Mining information from heterogeneous databases and global
information systems
• The data is available at different data sources on LAN or WAN.
• These data source may be structured, semi structured or
unstructured.
• Therefore mining the knowledge from them adds challenges to
data mining.
Ajith G.S: poposir.orgfree.com
Issues in Data Mining Issues relating to the diversity of database types
• When
Ajith G.S: poposir.orgfree.com
Issues in Data Mining

More Related Content

What's hot

Odbms concepts
Odbms conceptsOdbms concepts
Odbms concepts
Dabbal Singh Mahara
 
Introduction to database
Introduction to databaseIntroduction to database
Introduction to database
Pongsakorn U-chupala
 
11 Database Concepts
11 Database Concepts11 Database Concepts
11 Database Concepts
Praveen M Jigajinni
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
Krish_ver2
 
Database , 4 Data Integration
Database , 4 Data IntegrationDatabase , 4 Data Integration
Database , 4 Data Integration
Ali Usman
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
DataminingTools Inc
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Gajanand Sharma
 
Types of Database Models
Types of Database ModelsTypes of Database Models
Types of Database Models
Murassa Gillani
 
File organization 1
File organization 1File organization 1
File organization 1
Rupali Rana
 
Data mining query language
Data mining query languageData mining query language
Data mining query language
GowriLatha1
 
Data mining tasks
Data mining tasksData mining tasks
Data mining tasks
Khwaja Aamer
 
DBMS
DBMSDBMS
Advanced DBMS presentation
Advanced DBMS presentationAdvanced DBMS presentation
Advanced DBMS presentation
Hindustan Petroleum
 
Distributed Database System
Distributed Database SystemDistributed Database System
Distributed Database System
Sulemang
 
Database & Database Users
Database & Database UsersDatabase & Database Users
Database & Database Users
M.Zalmai Rahmani
 
Object Based Databases
Object Based DatabasesObject Based Databases
Object Based Databases
Farzad Nozarian
 
IBM DB2
IBM DB2IBM DB2
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
hktripathy
 
Chapter 2 database environment
Chapter 2 database environmentChapter 2 database environment
Chapter 2 database environment
>. <
 
Object oriented databases
Object oriented databasesObject oriented databases
Object oriented databases
Sajith Ekanayaka
 

What's hot (20)

Odbms concepts
Odbms conceptsOdbms concepts
Odbms concepts
 
Introduction to database
Introduction to databaseIntroduction to database
Introduction to database
 
11 Database Concepts
11 Database Concepts11 Database Concepts
11 Database Concepts
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
 
Database , 4 Data Integration
Database , 4 Data IntegrationDatabase , 4 Data Integration
Database , 4 Data Integration
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Types of Database Models
Types of Database ModelsTypes of Database Models
Types of Database Models
 
File organization 1
File organization 1File organization 1
File organization 1
 
Data mining query language
Data mining query languageData mining query language
Data mining query language
 
Data mining tasks
Data mining tasksData mining tasks
Data mining tasks
 
DBMS
DBMSDBMS
DBMS
 
Advanced DBMS presentation
Advanced DBMS presentationAdvanced DBMS presentation
Advanced DBMS presentation
 
Distributed Database System
Distributed Database SystemDistributed Database System
Distributed Database System
 
Database & Database Users
Database & Database UsersDatabase & Database Users
Database & Database Users
 
Object Based Databases
Object Based DatabasesObject Based Databases
Object Based Databases
 
IBM DB2
IBM DB2IBM DB2
IBM DB2
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
 
Chapter 2 database environment
Chapter 2 database environmentChapter 2 database environment
Chapter 2 database environment
 
Object oriented databases
Object oriented databasesObject oriented databases
Object oriented databases
 

Similar to Dm1.1

Unit 3 part i Data mining
Unit 3 part i Data miningUnit 3 part i Data mining
Unit 3 part i Data mining
Dhilsath Fathima
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
malathieswaran29
 
Web Mining
Web MiningWeb Mining
Web Mining
Mudit Dholakia
 
Web mining
Web miningWeb mining
Web mining
Innovative Pencils
 
Data, Text and Web Mining
Data, Text and Web Mining Data, Text and Web Mining
Data, Text and Web Mining
Jeremiah Fadugba
 
4- DB Ch6 18-3-2020.pptx
4- DB Ch6 18-3-2020.pptx4- DB Ch6 18-3-2020.pptx
4- DB Ch6 18-3-2020.pptx
Shoaibmirza18
 
Role of Database Management System in A Data Warehouse
Role of Database Management System in A Data Warehouse Role of Database Management System in A Data Warehouse
Role of Database Management System in A Data Warehouse
Lesa Cote
 
DW (1).ppt
DW (1).pptDW (1).ppt
DW (1).ppt
RahulSingh986955
 
Various Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.pptVarious Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.ppt
RafiulHasan19
 
Compilerpt
CompilerptCompilerpt
Compilerpt
Muhammad Tahir
 
Comparative study of modern databases
Comparative study of modern databasesComparative study of modern databases
Comparative study of modern databases
Anirban Konar
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introduction
Murli Jha
 
Foundations of business intelligence databases and information management
Foundations of business intelligence databases and information managementFoundations of business intelligence databases and information management
Foundations of business intelligence databases and information management
Amity University | FMS - DU | IMT | Stratford University | KKMI International Institute | AIMA | DTU
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
Laurent Leturgez
 
Data warehouseold
Data warehouseoldData warehouseold
Data warehouseold
Shwetabh Jaiswal
 
Managing Your Research Data
Managing Your Research DataManaging Your Research Data
Managing Your Research Data
Kristin Briney
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL
David Smelker
 
Combining Data Mining and Machine Learning for Effective User Profiling
Combining Data Mining and Machine Learning for Effective User ProfilingCombining Data Mining and Machine Learning for Effective User Profiling
Combining Data Mining and Machine Learning for Effective User Profiling
CodePolitan
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
AttaUrRahman78
 
kalyani.ppt
kalyani.pptkalyani.ppt
kalyani.ppt
ReyersonMax
 

Similar to Dm1.1 (20)

Unit 3 part i Data mining
Unit 3 part i Data miningUnit 3 part i Data mining
Unit 3 part i Data mining
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
 
Web Mining
Web MiningWeb Mining
Web Mining
 
Web mining
Web miningWeb mining
Web mining
 
Data, Text and Web Mining
Data, Text and Web Mining Data, Text and Web Mining
Data, Text and Web Mining
 
4- DB Ch6 18-3-2020.pptx
4- DB Ch6 18-3-2020.pptx4- DB Ch6 18-3-2020.pptx
4- DB Ch6 18-3-2020.pptx
 
Role of Database Management System in A Data Warehouse
Role of Database Management System in A Data Warehouse Role of Database Management System in A Data Warehouse
Role of Database Management System in A Data Warehouse
 
DW (1).ppt
DW (1).pptDW (1).ppt
DW (1).ppt
 
Various Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.pptVarious Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.ppt
 
Compilerpt
CompilerptCompilerpt
Compilerpt
 
Comparative study of modern databases
Comparative study of modern databasesComparative study of modern databases
Comparative study of modern databases
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introduction
 
Foundations of business intelligence databases and information management
Foundations of business intelligence databases and information managementFoundations of business intelligence databases and information management
Foundations of business intelligence databases and information management
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Data warehouseold
Data warehouseoldData warehouseold
Data warehouseold
 
Managing Your Research Data
Managing Your Research DataManaging Your Research Data
Managing Your Research Data
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL
 
Combining Data Mining and Machine Learning for Effective User Profiling
Combining Data Mining and Machine Learning for Effective User ProfilingCombining Data Mining and Machine Learning for Effective User Profiling
Combining Data Mining and Machine Learning for Effective User Profiling
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
kalyani.ppt
kalyani.pptkalyani.ppt
kalyani.ppt
 

Recently uploaded

原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
mz5nrf0n
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
Green Software Development
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
Remote DBA Services
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
pavan998932
 
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
kalichargn70th171
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
SOCRadar
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
Green Software Development
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
Sven Peters
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 

Recently uploaded (20)

原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
 
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 

Dm1.1

  • 1. Data Mining Ajith G.S: poposir.orgfree.com DATA MINING
  • 2. • Extracting Knowledge • Knowledge mining from data • Knowledge Discovery from Data (KDD) Ajith G.S: poposir.orgfree.com Data Mining
  • 4. • KDD Process Steps • 1) Data Clearing • 2) Data Integration • 3) Data Selection • 4) Data transformation • 5) Data mining • 6) Pattern evaluation • 7) Knowledge Presentation Ajith G.S: poposir.orgfree.com Data Mining
  • 5. • KDD Process Steps • 1) Data Clearing – remove noise and inconsistent data • 2) Data Integration – combine multiple data source • 3) Data Selection – select relevant data for analysis • 4) Data transformation – convert into needed format • 5) Data mining – apply methods to extract data pattern • 6) Pattern evaluation – select needed pattern to represent knowledge • 7) Knowledge Presentation – diff visualization techniques Ajith G.S: poposir.orgfree.com Data Mining
  • 6. • Data Mining is a step in knowledge discovery process • Ajith G.S: poposir.orgfree.com Data Mining
  • 7. • Architecture of data mining system • . Ajith G.S: poposir.orgfree.com Data Mining
  • 8. • Architecture of data mining system • Components are • Database, Data ware house, World wide web, other information repository • - data cleaning and integration techniques may be performed on the data • Database or data ware house server • - responsible for fetching needed data • Ajith G.S: poposir.orgfree.com Data Mining
  • 9. • Architecture of data mining system • Knowledge base • - used to guide the search • Data mining Engine • - task such as characterization, association, correlation analysis, classification, .. • Pattern evaluation module • - to select needed patterns • User interface • - user communication Ajith G.S: poposir.orgfree.com Data Mining
  • 10. • It deals with a number of different data repositories on which mining can be performed. • Can be applicable to any kinds of repositories as well as data streams. • Data Repositories like • Relational Databases • Data Warehouses • Transactional Databases • Advanced database systems • Flat files • Data streams • WWW Ajith G.S: poposir.orgfree.com Data Mining- On What Kinds of Data
  • 11. • Advanced database systems like • Object relational databases • Temporal, sequence and time series database • Spatial databases • Multimedia databases Ajith G.S: poposir.orgfree.com Data Mining- On What Kinds of Data
  • 12. • Relational Databases • DBMS - Collection of interrelated data + set of software programs to access and manage the data • Relational Database - A collection of tables, each of which is assigned a unique name • Each table consist of a set of attributes and stores a large set of tuples • Tuple represents an object identified by a unique key and described by a set of attribute values • Ajith G.S: poposir.orgfree.com Data Mining- On What Kinds of Data
  • 13. • Relational Databases • Relational data can be accessed by relational query language such as SQL or with assistance of GUI. • A given query is transformed into relational operations such as join, selection and projection • Data mining in relational database  Searching for data patterns Example: To predict credit risk of new customers based on the data available in the database. • Relational DB is most commonly available and is a rich information repository. Ajith G.S: poposir.orgfree.com Data Mining- On What Kinds of Data
  • 14. • Data Warehouse • It is a repository of information collected from multiple sources stored under a unified schema and that usually resides at a single site. • Constructed using Data Cleaning, Integration, Transformation, Loading and Periodic data refreshing. • In a data warehouse rather than storing details it may store a summary of the data from a historical perspective. Ajith G.S: poposir.orgfree.com Data Mining- On What Kinds of Data
  • 15. • Data Warehouse • Multidimensional database structure Dimension- An attribute or a set of attribute in the schema. Cell- Aggregate measure • Usually by a multidimensional data cube. • Data mart Department subset of a data warehouse and focuses on selected subjects • OLAP operations Roll up, Drill down Ajith G.S: poposir.orgfree.com Data Mining- On What Kinds of Data
  • 16. • Typical framework of a data warehouse Ajith G.S: poposir.orgfree.com Data Mining- On What Kinds of Data
  • 17. • Multidimensional data cube Ajith G.S: poposir.orgfree.com Data Mining- On What Kinds of Data
  • 18. • Transactional Database • Consist of a file where each record represents a transaction. • Includes a unique transaction identity number and list of items making up the transaction • Example: Transactional database for sales “Which items sold well together?” Data mining for transactional data identifies frequent item sets easily Ajith G.S: poposir.orgfree.com Data Mining- On What Kinds of Data
  • 19. • Advanced Data and Information Systems and Advanced Applications • Object Relational Databases • Temporal Databases, Sequence Databases and Time-Series Database • Spatial Databases and spatio-temporal databases • Text Databases and Multimedia Databases • Heterogeneous Databases and legacy Databases • Data Streams • WWW Ajith G.S: poposir.orgfree.com Data Mining- On What Kinds of Data
  • 20. • Advanced Data and Information Systems and Advanced Applications • Object Relational Databases • Handles complex objects • Each entity is considered as an object Individual items, employees etc. • Data and code relating to an object are encapsulated into a single unit • Each object has • A set of variables Attributes • A set of messages to communicate with other objects • A set of methods Holds the code to implement the message • Object class Objects that share a common set of properties • Each object is an instance of a class. Ajith G.S: poposir.orgfree.com Data Mining- On What Kinds of Data
  • 21. • Advanced Data and Information Systems and Advanced Applications • Temporal Databases, Sequence Databases and Time-Series Database • Temporal databases handles data involving time Stores relational data that include time related attributes • Sequence Databases stores sequence of ordered events with or with out a concrete notion of time. Example Customer shopping sequences • Time Series Databases stores sequence of values or events obtained over repeated measurements of time. Example  Data collected from the stock exchange. • Data mining techniques can be used to find the trends of changes for objects in the database. Ajith G.S: poposir.orgfree.com Data Mining- On What Kinds of Data
  • 22. • Advanced Data and Information Systems and Advanced Applications • Spatial Databases and spatio-temporal databases • Spatial database contains objects defined geometric space Example Maps, CAD databases • Using data mining the relationship among a set of spatial objects can be examined • Spatio temporal databases  Spatial DBs that stores spatial objects that change with time Example : Tracking of moving vehicles Ajith G.S: poposir.orgfree.com Data Mining- On What Kinds of Data
  • 23. • Advanced Data and Information Systems and Advanced Applications • Text Databases and Multimedia Databases • Text databases contains word descriptions for objects Long sequence of paragraphs. Example : Product specification • Text databases may highly unstructured(Web pages on WWW), semi structured(email) and well structured. • By mining text data we can uncover general and concise descriptions of the text documents, keywords etc. • Multimedia databases store image, audio and video data Must support large objects Ajith G.S: poposir.orgfree.com Data Mining- On What Kinds of Data
  • 24. • Advanced Data and Information Systems and Advanced Applications • Heterogeneous Databases and Legacy databases • Heterogeneous databases consist of a set of interconnected component databases where the objects in the component databases differ greatly. • Legacy database is a group of heterogeneous databases • Information exchange across these databases is very difficult due to diverse semantics Data mining is a solution by transforming the data into higher and more generalized levels Ajith G.S: poposir.orgfree.com Data Mining- On What Kinds of Data
  • 25. • Advanced Data and Information Systems and Advanced Applications • Data Streams • New kind of data where the data flow in and out of an observation platform dynamically. • Example: Video Surveillance • Data streams are normally not stored in any kind of repository Challenges to management and analysis • Uses continuous query model Ajith G.S: poposir.orgfree.com Data Mining- On What Kinds of Data
  • 26. • Advanced Data and Information Systems and Advanced Applications • World Wide Web • Data objects are linked together to facilitate interactive access. • Opportunity as well as challenge to data mining • Web usage mining Capturing user access pattern in distributed information environment • Keyword-based search offer limited help to users • Authoritative web page analysis Rank webpages based on their importance • Automated web page clustering and classification Arrange web pages based on their contents • Web community analysis Identifies hidden social networks and communities Ajith G.S: poposir.orgfree.com Data Mining- On What Kinds of Data
  • 27. • What kinds of patterns can be mined? • Used to specify the kind of patterns to be found in data mining tasks. • Tasks can be classified into 2: • Descriptive  Deals with the general properties of data in the database • Predictive  Perform inference on the current data in order to make predictions Ajith G.S: poposir.orgfree.com Data Mining Functionalities
  • 28. • Concept/ Class Description: Characterization and Discrimination • Mining frequent Patterns, Association and Correlations • Classification and Prediction • Cluster Analysis • Outlier Analysis • Evolution Analysis Ajith G.S: poposir.orgfree.com Data Mining Functionalities
  • 29. • Concept/ Class Description: Characterization and Discrimination • Data can be associated with classes or concepts. • Example: • classes of items for sales - computer and printers • concepts of customers - big spenders and budget spenders • Using precise terms we can describe individual classes and concepts. • Such descriptions of a class or a concept are called class/concept descriptions • These descriptions can be derived via • Data Characterization − This refers to summarizing data of class under study - Target Class • Data Discrimination − By comparison of the target class with one or a set of comparative classes- Contrasting classes • Both the above methods Ajith G.S: poposir.orgfree.com Data Mining Functionalities
  • 30. • Mining frequent Patterns • Patterns that occur frequently in transactional data. • Frequent Item Set − It refers to a set of items that frequently appear together - milk and bread • Frequent Subsequence − A sequence of patterns that occur frequently - purchasing a camera is followed by memory card • Frequent Sub Structure − Substructure refers to different structural forms, such as graphs, trees, or lattices, which may be combined with item−sets or subsequences. • Mining frequent patterns lead to the discovery of interesting associations and correlations within the data Ajith G.S: poposir.orgfree.com Data Mining Functionalities
  • 31. • Association and Correlations • Association Rules: 2 types • Single dimensional association rules • Multi-dimensional association rules Ajith G.S: poposir.orgfree.com Data Mining Functionalities
  • 32. • Association and Correlations • The association rules are discarded as uninteresting if they do not satisfy both a minimum support threshold and a minimum confidence threshold. • Confidence- Certainty • Support- indication of how frequently the items appear in the database Ajith G.S: poposir.orgfree.com Data Mining Functionalities
  • 33. • Classification • Classification is the process of finding a model that describes the data classes or concepts. • This derived model is based on the analysis of sets of training data- Known class labels • Using this model to predict the class of objects whose class label is unknown. • The derived model can be presented in the following forms − • (IF-THEN) Rules • Decision Trees • Mathematical Formulae • Neural Networks Ajith G.S: poposir.orgfree.com Data Mining Functionalities
  • 34. • Classification & Prediction Ajith G.S: poposir.orgfree.com Data Mining Functionalities
  • 35. • Prediction • Models continuous valued functions • It is used to predict missing or unavailable numerical data values rather than class labels. • Regression Analysis is generally used for prediction. Ajith G.S: poposir.orgfree.com Data Mining Functionalities
  • 36. • Cluster Analysis • Analyzes data objects without consulting a known class label • The objects are clustered or grouped based on the principle of “ maximizing the intra-class similarity and minimizing the interclass similarity” • Within a cluster the data objects will have high similarity but dissimilar to objects in other clusters Ajith G.S: poposir.orgfree.com Data Mining Functionalities
  • 37. • Cluster Analysis Ajith G.S: poposir.orgfree.com Data Mining Functionalities
  • 38. • Outlier Analysis • Outliers- Data objects in a database that do not obey the general behavior or model of data. • In some applications, the rare events can be more interesting than the regularly occurring ones Fraud detection Outlier mining Ajith G.S: poposir.orgfree.com Data Mining Functionalities
  • 39. • Evolution Analysis • Evolution analysis refers to the description and model regularities or trends for objects whose behavior changes over time. Ajith G.S: poposir.orgfree.com Data Mining Functionalities
  • 40. Ajith G.S: poposir.orgfree.com Data Mining Classification of Data Mining System
  • 41. • Classification according to the kinds of database mined • Data models (Relational, Transactional, Object relational) • Type of data (spatial, time series, text, stream , multimedia, WWW) • Classification according to the kinds of knowledge mined • Based on different data mining functionalities • According to the level of abstraction of knowledge mined • According to the regularity or irregularity of data that is mined Ajith G.S: poposir.orgfree.com Data Mining Classification of Data Mining System
  • 42. • Classification according to the kinds of techniques utilized • Degree of user interactions involved • Methods of data analysis involved (database oriented or data warehouse oriented etc) • Classification according to the applications adapted • Finance • Tele communication • DNA Ajith G.S: poposir.orgfree.com Data Mining Classification of Data Mining System
  • 43. • Each user will have a data mining task, to perform a task with help of data mining query • Query is defined as Data mining task primitives Allow the users to interact with the data mining system. • DMQL Data Mining query Language Ajith G.S: poposir.orgfree.com Data Mining Task Primitives
  • 44. • The primitives specify • The set of task relevant data to be mined • Specifies the portions of database or the set of data in which the user is interested • It includes • Database or data warehouse name • Database tables or Data warehouse cubes • Conditions for data selection • Relevant attributes or dimensions • Data grouping criteria Ajith G.S: poposir.orgfree.com Data Mining Task Primitives
  • 45. • The primitives specify • The kind of knowledge to be mined • Specifies the data mining functions to be performed • Characterization • Discrimination • Association/ Correlation • Classification/Prediction • Clustering • Outlier or Evolution Analysis Ajith G.S: poposir.orgfree.com Data Mining Task Primitives
  • 46. • The primitives specify • The background knowledge to be used in the discovery process • Knowledge about the domain to be mined • Guides the knowledge discovery process and evaluations of the patterns found • User beliefs regarding the relationships in the data Ajith G.S: poposir.orgfree.com Data Mining Task Primitives
  • 47. • The primitives specify • The interestingness measures and threshold for pattern evaluation • Used to guide the mining process or evaluation of the discovered patterns • Different kind of knowledge have different interestingness measures • eg • Support • Confidence Ajith G.S: poposir.orgfree.com Data Mining Task Primitives
  • 48. • The primitives specify • The expected representation for visualizing the discovered patterns • Refers to the form in which discovered patterns are to be displayed • Rules • Tables • Charts • Graphs • Decision Trees • Cubes Ajith G.S: poposir.orgfree.com Data Mining Task Primitives
  • 49. • Integration of Data Mining System with Database or Data Warehouse System Ajith G.S: poposir.orgfree.com
  • 50. • When DM work in an environment, it required to communicate with other information components such DB and DW • Diff integration schema are • No coupling • Loose coupling • Semi tight coupling • Tight coupling Ajith G.S: poposir.orgfree.com Integration of Data Mining System with Database or Data Warehouse System
  • 51. • No coupling • A DM system will not use facilities of a DB / DW system • Fetch data from a particular source(file) and process the data and stores the results in another file. • Simple integration scheme • Drawbacks • Wastage of time for preprocessing the data • Use other tools to extract data • Poor Design Ajith G.S: poposir.orgfree.com Integration of Data Mining System with Database or Data Warehouse System
  • 52. • Loose coupling • A data mining system will use some facilities of a DB / DW system • Fetch data from a data repository and process the data and stores the results in DB or DW • It fetch the data using query processing, indexing and other DB/DW system facilities • Drawback • Difficult to achieve high scalability and good performance with large data sets Ajith G.S: poposir.orgfree.com Integration of Data Mining System with Database or Data Warehouse System
  • 53. • Semi tight coupling • Essential data mining primitives are provided in the DB/DW system • Sorting • Indexing • Aggregation • Histogram Analysis • Pre-computation of statistical measures • Also some frequently used intermediate mining results can be pre- computed and stored in a DB/DW system. • The design will enhance the performance of a DM system Ajith G.S: poposir.orgfree.com Integration of Data Mining System with Database or Data Warehouse System
  • 54. • Tight coupling • Smoothly integrated into the DB/DW system • DM system is treated as one functional component of an information system • Data mining queries and functions are optimized based on different methods of DB/DW system. Ajith G.S: poposir.orgfree.com Integration of Data Mining System with Database or Data Warehouse System
  • 55. • Data mining is not an easy task, • The algorithms use very complex data is not always available at one place • Needs to be integrated from various heterogeneous data sources. • Common Issues are • Mining methodology and user interaction Issues • Performance Issues • Issues related to the different types of database Ajith G.S: poposir.orgfree.com Issues in Data Mining
  • 56. • Mining different kinds of knowledge in the databases • Different users may be interested in different kinds of knowledge. It should cover a broad range of knowledge discovery task(classification, clustering) • Uses the same database in different ways • Interactive mining of knowledge at multiple levels of abstraction • The data mining process needs to be interactive  allows users to focus the search for patterns, providing and refining data mining requests based on the returned results. • Enables the user to view the data from different angles and level of abstractions Ajith G.S: poposir.orgfree.com Issues in Data Mining Mining methodology and user interaction Issues
  • 57. • Incorporation of background knowledge(knowledge about the domain under study) • To guide discovery process and to express the discovered patterns, the background knowledge can be used Express the discovered patterns not only in concise terms but at multiple levels of abstraction. • Data mining query languages and ad hoc data mining • Data Mining query language that allows the user to describe ad hoc mining tasks should be developed. • These languages should be integrated with a database or data warehouse query language and optimized for efficient and flexible data mining. Ajith G.S: poposir.orgfree.com Issues in Data Mining Mining methodology and user interaction Issues
  • 58. • Presentation and visualization of data mining results • Once the patterns are discovered it needs to be expressed in high level languages, and visual representations. • These representations should be easily understandable • Handling noisy and incomplete data • The data cleaning methods are required to handle the noise and incomplete objects while mining the data regularities. • If the data cleaning methods are not there then the accuracy of the discovered patterns will be poor Ajith G.S: poposir.orgfree.com Issues in Data Mining Mining methodology and user interaction Issues
  • 59. • Pattern evaluation • The patterns discovered may be uninteresting because either they represent common knowledge or lack novelty • To guide the discovery process and reduce the search space, interestingness measures or user specified constraints should be there. Ajith G.S: poposir.orgfree.com Issues in Data Mining Mining methodology and user interaction Issues
  • 60. • Efficiency and scalability of data mining algorithm • In order to effectively extract the information from huge amount of data in databases • The running time must be predictable and scalable. • Parallel, distributed and incremental mining algorithms • These algorithms divide the data into partitions which is further processed in a parallel fashion. • Then the results from the partitions is merged. • The incremental algorithms, update databases without mining the data again from scratch. Ajith G.S: poposir.orgfree.com Issues in Data Mining Performance Issues
  • 61. • Handling of relational and complex types of data • The database may contain complex data objects, multimedia data objects, spatial data, temporal data etc. • It is not possible for one system to mine all these kind of data. • Mining information from heterogeneous databases and global information systems • The data is available at different data sources on LAN or WAN. • These data source may be structured, semi structured or unstructured. • Therefore mining the knowledge from them adds challenges to data mining. Ajith G.S: poposir.orgfree.com Issues in Data Mining Issues relating to the diversity of database types
  • 62. • When Ajith G.S: poposir.orgfree.com Issues in Data Mining