SlideShare a Scribd company logo
1.Introduction
 Knowledge discovery describes the process of automatically searching large
volumes of data for patterns that can be considered knowledge about the data.
 It can be categorized according to
1) what kind of data is searched
2) in what form is the result of the search represented.
 Knowledge discovery developed out of the Data mining domain, and is closely
related to it both in terms of methodology and terminology.
 Knowledge representation is a formalism for representing at least the data,
information and knowledge things in an application.
 Knowledge can be represented either as programs in an imperative language or
can be also represented as rules in a declarative language.
2.Knowledge Discovery
 It is also known as Knowledge Discovery in Databases (KDD).
Data
Knowledge
Discovery
Process
useful
information
Requires
much elapsed time.
Five steps of KDD process
3. Data Mining
Data mining involves many different algorithms to accomplish different
tasks
 Data mining algorithms can be characterized as consisting of three parts:
• The purpose of algorithm is to fit a model to the data.
Model
• Some criteria must be used to fit one model over another.
Preference
• All algorithms require some technique to search the data.
Search
4. Classification of Data
Mining
5. Working of Data Mining
 Data mining provides link between separate transaction and analytical systems.
 Data mining software analyzes relationships and patterns in stored transaction data
based on user queries.
 Generally four types of relationships are sought: classes, clusters, associations,
sequential patters.
Extract, transform,
and load
transaction data
Present the
data in a useful
format
Analyze the data
by application
software
Store and
manage the
data
Provide data
access to
business analysts
& IT professionals
Data mining
5. Clustering
WHAT IS A CLUSTER….?
 A cluster is collection of objects
which are “similar” between them
and are “dissimilar” to the objects
belonging to other clusters.
WHAT IS CLUSTERING….?
 The process of organizing objects
into groups whose members are
similar in some way.
 Distance-based clustering &
Conceptual clustering are some of
the types of clustering…
Possible applications of
Clustering
Marketing
Biology Libraries
WorldWideWeb
Problems of clustering
Problems
Cant address
all
requirements
adequately
Large data
items can
cause time
complexity
The result
can be
interpreted in
different
ways
If obvious
distance
measure does not
exist defining it
is not easy
Clustering
algorithms
Exclusive Overlapping Hierarchical Probabilistic
Classification of Clustering
Algorithms
K-means Clustering
Original data K-means clustering
Clustering on “mouse” data set
 K-means is as iterative
clustering algorithm in
which items are moved
among sets of clusters
until the desired set is
reached.
This definition
assumes that each ‘tuple’
has only one numeric
value as apposed to a
‘tuple’ with many
attribute values.
K-means algorithm
Input:
• D = {t1,t2,……..tn} //set of elements
• k //Number of desired clusters
Output:
• K //Set of clusters
Assign initial values for means m1,m2………..mk;
Repeat
Assign each item ti to the cluster which has the closest mean;
Calculate the new mean for each cluster;
Until
---Example---
k = 2
{2,4,10,12,3,20,
30,11,25}
I
N
P
U
T
Output
m1 m2 K1 K2
2 4 {2,3} {4,10,12,20,30
,11,25}
2.5 16 {2,3,4} {10,12,20,30,1
1,25}
3 18 {2,3,4,10} {12,20,30,11,2
5}
4.75 19.6 {2,3,4,10,11,12} {20,30,25}
7 25 {2,3,4,10,11,12} {20,30,25}
Pictorial Representation
So we conclude with...
ThankYou

More Related Content

What's hot

Emerging domain agnostic functionalities on the handle-centered networks
Emerging domain agnostic functionalities on the handle-centered networksEmerging domain agnostic functionalities on the handle-centered networks
Emerging domain agnostic functionalities on the handle-centered networks
National Institute of Informatics
 
Simple and Flexible DHTs
Simple and Flexible DHTsSimple and Flexible DHTs
Simple and Flexible DHTs
Luis Galárraga
 
data mining
data miningdata mining
data mining
manasa polu
 
Basic terminologies
Basic terminologiesBasic terminologies
Basic terminologies
Rajendran
 
ESWC 2011 - Designing an Ontology for the Data Documentation Initiative
ESWC 2011 -  Designing an Ontology for the Data Documentation InitiativeESWC 2011 -  Designing an Ontology for the Data Documentation Initiative
ESWC 2011 - Designing an Ontology for the Data Documentation Initiative
Dr.-Ing. Thomas Hartmann
 
Ghhh
GhhhGhhh
Ghhh
agammya
 
Elementary data organisation
Elementary data organisationElementary data organisation
Elementary data organisation
Muzamil Hussain
 
Mining named entities -IIITH
Mining named entities -IIITHMining named entities -IIITH
Mining named entities -IIITH
gaurav264
 
Document Classification Using Hierarchies Clusters Technique
Document Classification Using Hierarchies Clusters TechniqueDocument Classification Using Hierarchies Clusters Technique
Document Classification Using Hierarchies Clusters Technique
upendra singh
 
Multidimensioal database
Multidimensioal  databaseMultidimensioal  database
Multidimensioal database
TPO TPO
 
Big Data Taxonomy 8/26/2013
Big Data Taxonomy 8/26/2013Big Data Taxonomy 8/26/2013
Big Data Taxonomy 8/26/2013
DataTactics
 
EDI Training Module 12: An Introduction to Metadata and Data Repositories
EDI Training Module 12:  An Introduction to Metadata and Data RepositoriesEDI Training Module 12:  An Introduction to Metadata and Data Repositories
EDI Training Module 12: An Introduction to Metadata and Data Repositories
Environmental Data Initiative
 
A Rule based Slicing Approach to Achieve Data Publishing and Privacy
A Rule based Slicing Approach to Achieve Data Publishing and PrivacyA Rule based Slicing Approach to Achieve Data Publishing and Privacy
A Rule based Slicing Approach to Achieve Data Publishing and Privacy
ijsrd.com
 
EDI Training Module 4: Organizing Data Into Publishable Units
EDI Training Module 4: Organizing Data Into Publishable UnitsEDI Training Module 4: Organizing Data Into Publishable Units
EDI Training Module 4: Organizing Data Into Publishable Units
Environmental Data Initiative
 
3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
Azad public school
 
DM
DMDM
DM
sowfi
 
MS Sql Server: Introduction To Datamining Suing Sql Server
MS Sql Server: Introduction To Datamining Suing Sql ServerMS Sql Server: Introduction To Datamining Suing Sql Server
MS Sql Server: Introduction To Datamining Suing Sql Server
DataminingTools Inc
 
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASE
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASECONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASE
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASE
IJwest
 

What's hot (18)

Emerging domain agnostic functionalities on the handle-centered networks
Emerging domain agnostic functionalities on the handle-centered networksEmerging domain agnostic functionalities on the handle-centered networks
Emerging domain agnostic functionalities on the handle-centered networks
 
Simple and Flexible DHTs
Simple and Flexible DHTsSimple and Flexible DHTs
Simple and Flexible DHTs
 
data mining
data miningdata mining
data mining
 
Basic terminologies
Basic terminologiesBasic terminologies
Basic terminologies
 
ESWC 2011 - Designing an Ontology for the Data Documentation Initiative
ESWC 2011 -  Designing an Ontology for the Data Documentation InitiativeESWC 2011 -  Designing an Ontology for the Data Documentation Initiative
ESWC 2011 - Designing an Ontology for the Data Documentation Initiative
 
Ghhh
GhhhGhhh
Ghhh
 
Elementary data organisation
Elementary data organisationElementary data organisation
Elementary data organisation
 
Mining named entities -IIITH
Mining named entities -IIITHMining named entities -IIITH
Mining named entities -IIITH
 
Document Classification Using Hierarchies Clusters Technique
Document Classification Using Hierarchies Clusters TechniqueDocument Classification Using Hierarchies Clusters Technique
Document Classification Using Hierarchies Clusters Technique
 
Multidimensioal database
Multidimensioal  databaseMultidimensioal  database
Multidimensioal database
 
Big Data Taxonomy 8/26/2013
Big Data Taxonomy 8/26/2013Big Data Taxonomy 8/26/2013
Big Data Taxonomy 8/26/2013
 
EDI Training Module 12: An Introduction to Metadata and Data Repositories
EDI Training Module 12:  An Introduction to Metadata and Data RepositoriesEDI Training Module 12:  An Introduction to Metadata and Data Repositories
EDI Training Module 12: An Introduction to Metadata and Data Repositories
 
A Rule based Slicing Approach to Achieve Data Publishing and Privacy
A Rule based Slicing Approach to Achieve Data Publishing and PrivacyA Rule based Slicing Approach to Achieve Data Publishing and Privacy
A Rule based Slicing Approach to Achieve Data Publishing and Privacy
 
EDI Training Module 4: Organizing Data Into Publishable Units
EDI Training Module 4: Organizing Data Into Publishable UnitsEDI Training Module 4: Organizing Data Into Publishable Units
EDI Training Module 4: Organizing Data Into Publishable Units
 
3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
 
DM
DMDM
DM
 
MS Sql Server: Introduction To Datamining Suing Sql Server
MS Sql Server: Introduction To Datamining Suing Sql ServerMS Sql Server: Introduction To Datamining Suing Sql Server
MS Sql Server: Introduction To Datamining Suing Sql Server
 
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASE
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASECONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASE
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASE
 

Similar to Knowledge Discovery & Representation

Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
Amr Abd El Latief
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection method
IJSRD
 
Privacy preservation techniques in data mining
Privacy preservation techniques in data miningPrivacy preservation techniques in data mining
Privacy preservation techniques in data mining
eSAT Journals
 
Privacy preservation techniques in data mining
Privacy preservation techniques in data miningPrivacy preservation techniques in data mining
Privacy preservation techniques in data mining
eSAT Publishing House
 
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
theijes
 
Seminar Presentation
Seminar PresentationSeminar Presentation
Seminar Presentation
Vaibhav Dhattarwal
 
Data Mining System and Applications: A Review
Data Mining System and Applications: A ReviewData Mining System and Applications: A Review
Data Mining System and Applications: A Review
ijdpsjournal
 
TTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining TechniqueTTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining Technique
Mehmet Beyaz
 
CLUSTER ANALYSIS.pptx
CLUSTER ANALYSIS.pptxCLUSTER ANALYSIS.pptx
CLUSTER ANALYSIS.pptx
Lithal Fragrance
 
knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)
Kartik Kalpande Patil
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
IJERA Editor
 
Data mining
Data miningData mining
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIESA SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
IJCSES Journal
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
Dr. Abdul Ahad Abro
 
6 ijaems sept-2015-6-a review of data security primitives in data mining
6 ijaems sept-2015-6-a review of data security primitives in data mining6 ijaems sept-2015-6-a review of data security primitives in data mining
6 ijaems sept-2015-6-a review of data security primitives in data mining
INFOGAIN PUBLICATION
 
DATA MINING.doc
DATA MINING.docDATA MINING.doc
DATA MINING.doc
butest
 
data mining and data warehousing
data mining and data warehousingdata mining and data warehousing
data mining and data warehousing
Sunny Gandhi
 
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Universitas Pembangunan Panca Budi
 
Dma unit 1
Dma unit   1Dma unit   1
Dma unit 1
thamizh arasi
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 

Similar to Knowledge Discovery & Representation (20)

Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection method
 
Privacy preservation techniques in data mining
Privacy preservation techniques in data miningPrivacy preservation techniques in data mining
Privacy preservation techniques in data mining
 
Privacy preservation techniques in data mining
Privacy preservation techniques in data miningPrivacy preservation techniques in data mining
Privacy preservation techniques in data mining
 
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
 
Seminar Presentation
Seminar PresentationSeminar Presentation
Seminar Presentation
 
Data Mining System and Applications: A Review
Data Mining System and Applications: A ReviewData Mining System and Applications: A Review
Data Mining System and Applications: A Review
 
TTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining TechniqueTTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining Technique
 
CLUSTER ANALYSIS.pptx
CLUSTER ANALYSIS.pptxCLUSTER ANALYSIS.pptx
CLUSTER ANALYSIS.pptx
 
knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
 
Data mining
Data miningData mining
Data mining
 
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIESA SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
6 ijaems sept-2015-6-a review of data security primitives in data mining
6 ijaems sept-2015-6-a review of data security primitives in data mining6 ijaems sept-2015-6-a review of data security primitives in data mining
6 ijaems sept-2015-6-a review of data security primitives in data mining
 
DATA MINING.doc
DATA MINING.docDATA MINING.doc
DATA MINING.doc
 
data mining and data warehousing
data mining and data warehousingdata mining and data warehousing
data mining and data warehousing
 
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
 
Dma unit 1
Dma unit   1Dma unit   1
Dma unit 1
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 

Knowledge Discovery & Representation

  • 1.
  • 2.
  • 3. 1.Introduction  Knowledge discovery describes the process of automatically searching large volumes of data for patterns that can be considered knowledge about the data.  It can be categorized according to 1) what kind of data is searched 2) in what form is the result of the search represented.  Knowledge discovery developed out of the Data mining domain, and is closely related to it both in terms of methodology and terminology.  Knowledge representation is a formalism for representing at least the data, information and knowledge things in an application.  Knowledge can be represented either as programs in an imperative language or can be also represented as rules in a declarative language.
  • 4. 2.Knowledge Discovery  It is also known as Knowledge Discovery in Databases (KDD). Data Knowledge Discovery Process useful information Requires much elapsed time.
  • 5. Five steps of KDD process
  • 6. 3. Data Mining Data mining involves many different algorithms to accomplish different tasks  Data mining algorithms can be characterized as consisting of three parts: • The purpose of algorithm is to fit a model to the data. Model • Some criteria must be used to fit one model over another. Preference • All algorithms require some technique to search the data. Search
  • 7. 4. Classification of Data Mining
  • 8. 5. Working of Data Mining  Data mining provides link between separate transaction and analytical systems.  Data mining software analyzes relationships and patterns in stored transaction data based on user queries.  Generally four types of relationships are sought: classes, clusters, associations, sequential patters. Extract, transform, and load transaction data Present the data in a useful format Analyze the data by application software Store and manage the data Provide data access to business analysts & IT professionals Data mining
  • 9. 5. Clustering WHAT IS A CLUSTER….?  A cluster is collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters. WHAT IS CLUSTERING….?  The process of organizing objects into groups whose members are similar in some way.  Distance-based clustering & Conceptual clustering are some of the types of clustering…
  • 11. Problems of clustering Problems Cant address all requirements adequately Large data items can cause time complexity The result can be interpreted in different ways If obvious distance measure does not exist defining it is not easy
  • 12. Clustering algorithms Exclusive Overlapping Hierarchical Probabilistic Classification of Clustering Algorithms
  • 13. K-means Clustering Original data K-means clustering Clustering on “mouse” data set  K-means is as iterative clustering algorithm in which items are moved among sets of clusters until the desired set is reached. This definition assumes that each ‘tuple’ has only one numeric value as apposed to a ‘tuple’ with many attribute values.
  • 14. K-means algorithm Input: • D = {t1,t2,……..tn} //set of elements • k //Number of desired clusters Output: • K //Set of clusters Assign initial values for means m1,m2………..mk; Repeat Assign each item ti to the cluster which has the closest mean; Calculate the new mean for each cluster; Until
  • 15. ---Example--- k = 2 {2,4,10,12,3,20, 30,11,25} I N P U T Output m1 m2 K1 K2 2 4 {2,3} {4,10,12,20,30 ,11,25} 2.5 16 {2,3,4} {10,12,20,30,1 1,25} 3 18 {2,3,4,10} {12,20,30,11,2 5} 4.75 19.6 {2,3,4,10,11,12} {20,30,25} 7 25 {2,3,4,10,11,12} {20,30,25}
  • 17. So we conclude with...
  • 18.