SlideShare a Scribd company logo
1 of 32
SUBMITTED BY: SHUVRA GHOSH
ROLL NO: 07
COURSE: MLIS
GUIDED BY: PROF. UDAYAN BHATTACHARYA
DEPARTMENT OF LIBRARY AND
INFORMATION SCIENCE
JADAVPUR UNIVERSITY
*
*
Process of discovering valuable information from a
collection of data, or it is the process of converting raw
data into useful information.
Knowledge discovery is an activity that produces
knowledge by discovering it or deriving it from existing
information.
Knowledge Discovery refers to the overall process of
discovering useful knowledge from data, and data mining
refers to a particular step in this process.
*Why do we need knowledge discovery
process?
*
• Database data
• Data Warehouse
• Transactional data
• Other kinds of Data-
Time related data
Sequence data (historical data records, Stock Exchange)
Data streams (Video surveillance, Sensor data)
Spatial data (Maps)
Hypertext and Multimedia data (Text, Video, Audio)
Graph and networked data
Engineering design data (auto CAD)
Web
*
• Interactive
• Iterative
• Procedure to extract knowledge from data
• Knowledge being searched for is –
implicit
previously unknown
potentially useful
*
*
Data Cleaning − in this step, the noise and inconsistent data is
removed. Example Parsing the Data.
Cleaning is performed for detection
Of syntax error.
Parser decides the given string of
Data is acceptable within data
Specification.
*
Data Integration − in this step, multiple data sources are combined
Example: Retail loan application, commercial loan application,
demand deposit application are combined in bank data
warehouse.
.
Data Selection − in this step, data relevant to the analysis task
are retrieved from the database.
*
Data Transformation − in this step, data is transformed or consolidated into
forms appropriate for mining by performing summary or aggregation
operations.
The aggregation operators perform mathematical operations like Average,
Aggregate, Count, Max, Min and Sum, on the numeric property of the
elements in the collection.
*
Data Mining − in this step, intelligent methods are applied in order to
extract data patterns.
intelligent methods are –
• Association
• Classification
Decision tree
• Clustering
• Regression
*
*
*
*
Pattern Evaluation − in this step, data patterns are evaluated.
*
Knowledge Presentation − in this step, knowledge is
represented by various visualize tools.
 Table
 Chart
 Graph
*
Knowledge discovery process has three parts
Academic Research Models
Industrial Models
Hybrid Models
•
 The efforts to establish a KDP model were initiated in
academia, in the mid-1990s.
 when the DM field was being shaped, researchers started
defining multistep procedures to guide users of DM tools in
the complex knowledge discovery world.
 The two process models developed in 1996 and 1998 are the
nine-step model by Fayyad et al. and the eight-step model by
Anand and Buchner.
*
1.Developing and understanding the application domain. This step
includes learning the relevant prior knowledge and the goals of the end user of
the discovered knowledge.
2. Creating a target data set. Here the data miner selects a subset of variables
(attributes) and data points (examples) that will be used to perform discovery
tasks. This step usually includes querying the existing data to select the desired
subset.
3. Data cleaning and pre-processing. This step consists of removing outliers,
dealing with noise and missing values in the data, and accounting for time
sequence information and known changes.
4. Data reduction and projection. This step consists of finding useful
attributes by applying dimension reduction and transformation methods, and
finding invariant representation of the data.
5. Choosing the data mining task. Here the data miner matches the goals
defined in Step 1 with a particular DM method, such as classification,
regression, clustering, etc.
*
Two representative industrial models are the five-step model by
Cabena et al., with support from IBM and the industrial six-step
CRISP-DM model, developed by a large consortium of
European companies.
*
The CRISP-DM (Cross-Industry Standard Process for Data Mining)
was first established in the late 1990s by four companies: Integral
Solutions Ltd. (a provider of commercial data mining solutions),
NCR (a database provider), DaimlerChrysler (an automobile
manufacturer), and OHRA (an insurance company).
*
*
The development of academic and industrial models has led to the
development of hybrid models, i.e., models that combine aspects of both.
One such model is a six-step KDP model developed by Cios et al.
The main differences and extensions include
• providing more general, research-oriented description of the steps,
• introducing a data mining step instead of the modeling step,
• introducing several new explicit feedback mechanisms, (the CRISP-
DM model has only three major feedback sources, while the hybrid
model has more detailed feedback mechanisms) and
• Modification of the last step, since in the hybrid model, the
knowledge discovered for a particular domain may be applied in other
domains.
*
*
1. Understanding of the problem domain. This initial step involves
working closely with domain experts to define the problem and
determine the project goals, identifying key people, and learning about
current solutions to the problem. It also involves learning domain-
specific terminology. A description of the problem, including its
restrictions, is prepared. Finally, project goals are translated into DM
goals, and the initial selection of DM tools to be used later in the process
is performed.
2. Understanding of the data. This step includes collecting sample data
and deciding which data, including format and size, will be needed.
Background knowledge can be used to guide these efforts. Data are
checked for completeness, redundancy, missing values, plausibility of
attribute values, etc. Finally, the step includes verification of the
usefulness of the data with respect to the DM goals.
*
Knowledge Discovery in Databases is the process by which a task is
identified and performed upon a database in order to extract
information about the elements of the database. This process involves
first collecting the data to be analysed, cleaning up the data, and
reducing it to those features of interest to the process. At which time the
tool or tools to be used upon the data are identified. These tools are
then used to mine the data for information. Once the information has
been created, it must be evaluated as to it efficacy to the process. Any
knowledge thereupon gained is then re-incorporated into the process as
well as used for purposes outside the scope of the process.
This is a very complex process, but it is one that lends itself to a fair
degree of automation. As such, it enters into the field of artificial
intelligence, not just for the tools it employs, but for the fact that the
process tries to re-incorporate the knowledge it has created.
*
*Thank you

More Related Content

What's hot

Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
pcherukumalla
 

What's hot (20)

Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
 
Data cleansing
Data cleansingData cleansing
Data cleansing
 
Data mining
Data mining Data mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Data Mining
Data MiningData Mining
Data Mining
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Data mining tasks
Data mining tasksData mining tasks
Data mining tasks
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysis
 
Metadata ppt
Metadata pptMetadata ppt
Metadata ppt
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
 
Birch
BirchBirch
Birch
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
 

Similar to Knowledge discovery process

Introducition to Data scinece compiled by hu
Introducition to Data scinece compiled by huIntroducition to Data scinece compiled by hu
Introducition to Data scinece compiled by hu
wekineheshete
 

Similar to Knowledge discovery process (20)

dwdm unit 1.ppt
dwdm unit 1.pptdwdm unit 1.ppt
dwdm unit 1.ppt
 
knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)
 
6 ijaems sept-2015-6-a review of data security primitives in data mining
6 ijaems sept-2015-6-a review of data security primitives in data mining6 ijaems sept-2015-6-a review of data security primitives in data mining
6 ijaems sept-2015-6-a review of data security primitives in data mining
 
Seminar Report Vaibhav
Seminar Report VaibhavSeminar Report Vaibhav
Seminar Report Vaibhav
 
crisp.ppt
crisp.pptcrisp.ppt
crisp.ppt
 
crisp.ppt
crisp.pptcrisp.ppt
crisp.ppt
 
Data mining
Data miningData mining
Data mining
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introduction
 
ml-02x01.pdf
ml-02x01.pdfml-02x01.pdf
ml-02x01.pdf
 
KDD assignmnt data.docx
KDD assignmnt data.docxKDD assignmnt data.docx
KDD assignmnt data.docx
 
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVEDATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
 
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
IRJET-	 Fault Detection and Prediction of Failure using Vibration AnalysisIRJET-	 Fault Detection and Prediction of Failure using Vibration Analysis
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
 
From data mining to knowledge discovery in
From data mining to knowledge discovery inFrom data mining to knowledge discovery in
From data mining to knowledge discovery in
 
Data Mining Implementation process.pptx
Data Mining Implementation process.pptxData Mining Implementation process.pptx
Data Mining Implementation process.pptx
 
A review on data mining
A  review on data miningA  review on data mining
A review on data mining
 
Seminar Presentation
Seminar PresentationSeminar Presentation
Seminar Presentation
 
Dma unit 1
Dma unit   1Dma unit   1
Dma unit 1
 
Data Mining – A Perspective Approach
Data Mining – A Perspective ApproachData Mining – A Perspective Approach
Data Mining – A Perspective Approach
 
Unit 3.pdf
Unit 3.pdfUnit 3.pdf
Unit 3.pdf
 
Introducition to Data scinece compiled by hu
Introducition to Data scinece compiled by huIntroducition to Data scinece compiled by hu
Introducition to Data scinece compiled by hu
 

More from Shuvra Ghosh (6)

Intelligent Information Agent
Intelligent Information AgentIntelligent Information Agent
Intelligent Information Agent
 
Altmetrics
Altmetrics Altmetrics
Altmetrics
 
Fundamental Category
 Fundamental Category Fundamental Category
Fundamental Category
 
ISO 2709
ISO 2709ISO 2709
ISO 2709
 
Economics of information
Economics of information Economics of information
Economics of information
 
Web of Science
Web of ScienceWeb of Science
Web of Science
 

Recently uploaded

call Now 9811711561 Cash Payment乂 Call Girls in Dwarka Mor
call Now 9811711561 Cash Payment乂 Call Girls in Dwarka Morcall Now 9811711561 Cash Payment乂 Call Girls in Dwarka Mor
call Now 9811711561 Cash Payment乂 Call Girls in Dwarka Mor
vikas rana
 

Recently uploaded (15)

2k Shots ≽ 9205541914 ≼ Call Girls In Jasola (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Jasola (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Jasola (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Jasola (Delhi)
 
2k Shots ≽ 9205541914 ≼ Call Girls In Palam (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Palam (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Palam (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Palam (Delhi)
 
WOMEN EMPOWERMENT women empowerment.pptx
WOMEN EMPOWERMENT women empowerment.pptxWOMEN EMPOWERMENT women empowerment.pptx
WOMEN EMPOWERMENT women empowerment.pptx
 
$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...
$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...
$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...
 
call Now 9811711561 Cash Payment乂 Call Girls in Dwarka Mor
call Now 9811711561 Cash Payment乂 Call Girls in Dwarka Morcall Now 9811711561 Cash Payment乂 Call Girls in Dwarka Mor
call Now 9811711561 Cash Payment乂 Call Girls in Dwarka Mor
 
8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,
8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,
8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,
 
(Aarini) Russian Call Girls Surat Call Now 8250077686 Surat Escorts 24x7
(Aarini) Russian Call Girls Surat Call Now 8250077686 Surat Escorts 24x7(Aarini) Russian Call Girls Surat Call Now 8250077686 Surat Escorts 24x7
(Aarini) Russian Call Girls Surat Call Now 8250077686 Surat Escorts 24x7
 
2k Shots ≽ 9205541914 ≼ Call Girls In Dashrath Puri (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Dashrath Puri (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Dashrath Puri (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Dashrath Puri (Delhi)
 
9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls
9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls
9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls
 
2k Shots ≽ 9205541914 ≼ Call Girls In Mukherjee Nagar (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Mukherjee Nagar (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Mukherjee Nagar (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Mukherjee Nagar (Delhi)
 
Top Rated Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
LC_YouSaidYes_NewBelieverBookletDone.pdf
LC_YouSaidYes_NewBelieverBookletDone.pdfLC_YouSaidYes_NewBelieverBookletDone.pdf
LC_YouSaidYes_NewBelieverBookletDone.pdf
 
The Selfspace Journal Preview by Mindbrush
The Selfspace Journal Preview by MindbrushThe Selfspace Journal Preview by Mindbrush
The Selfspace Journal Preview by Mindbrush
 
(Anamika) VIP Call Girls Navi Mumbai Call Now 8250077686 Navi Mumbai Escorts ...
(Anamika) VIP Call Girls Navi Mumbai Call Now 8250077686 Navi Mumbai Escorts ...(Anamika) VIP Call Girls Navi Mumbai Call Now 8250077686 Navi Mumbai Escorts ...
(Anamika) VIP Call Girls Navi Mumbai Call Now 8250077686 Navi Mumbai Escorts ...
 
Pokemon Go... Unraveling the Conspiracy Theory
Pokemon Go... Unraveling the Conspiracy TheoryPokemon Go... Unraveling the Conspiracy Theory
Pokemon Go... Unraveling the Conspiracy Theory
 

Knowledge discovery process

  • 1. SUBMITTED BY: SHUVRA GHOSH ROLL NO: 07 COURSE: MLIS GUIDED BY: PROF. UDAYAN BHATTACHARYA DEPARTMENT OF LIBRARY AND INFORMATION SCIENCE JADAVPUR UNIVERSITY *
  • 2. * Process of discovering valuable information from a collection of data, or it is the process of converting raw data into useful information. Knowledge discovery is an activity that produces knowledge by discovering it or deriving it from existing information. Knowledge Discovery refers to the overall process of discovering useful knowledge from data, and data mining refers to a particular step in this process.
  • 3. *Why do we need knowledge discovery process?
  • 4. *
  • 5. • Database data • Data Warehouse • Transactional data • Other kinds of Data- Time related data Sequence data (historical data records, Stock Exchange) Data streams (Video surveillance, Sensor data) Spatial data (Maps) Hypertext and Multimedia data (Text, Video, Audio) Graph and networked data Engineering design data (auto CAD) Web *
  • 6. • Interactive • Iterative • Procedure to extract knowledge from data • Knowledge being searched for is – implicit previously unknown potentially useful *
  • 7. *
  • 8. Data Cleaning − in this step, the noise and inconsistent data is removed. Example Parsing the Data. Cleaning is performed for detection Of syntax error. Parser decides the given string of Data is acceptable within data Specification. *
  • 9. Data Integration − in this step, multiple data sources are combined Example: Retail loan application, commercial loan application, demand deposit application are combined in bank data warehouse. .
  • 10. Data Selection − in this step, data relevant to the analysis task are retrieved from the database. *
  • 11. Data Transformation − in this step, data is transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations. The aggregation operators perform mathematical operations like Average, Aggregate, Count, Max, Min and Sum, on the numeric property of the elements in the collection. *
  • 12. Data Mining − in this step, intelligent methods are applied in order to extract data patterns. intelligent methods are – • Association • Classification Decision tree • Clustering • Regression *
  • 13. *
  • 14. *
  • 15. *
  • 16. Pattern Evaluation − in this step, data patterns are evaluated. *
  • 17. Knowledge Presentation − in this step, knowledge is represented by various visualize tools.  Table  Chart  Graph *
  • 18. Knowledge discovery process has three parts Academic Research Models Industrial Models Hybrid Models •
  • 19.  The efforts to establish a KDP model were initiated in academia, in the mid-1990s.  when the DM field was being shaped, researchers started defining multistep procedures to guide users of DM tools in the complex knowledge discovery world.  The two process models developed in 1996 and 1998 are the nine-step model by Fayyad et al. and the eight-step model by Anand and Buchner. *
  • 20. 1.Developing and understanding the application domain. This step includes learning the relevant prior knowledge and the goals of the end user of the discovered knowledge. 2. Creating a target data set. Here the data miner selects a subset of variables (attributes) and data points (examples) that will be used to perform discovery tasks. This step usually includes querying the existing data to select the desired subset. 3. Data cleaning and pre-processing. This step consists of removing outliers, dealing with noise and missing values in the data, and accounting for time sequence information and known changes. 4. Data reduction and projection. This step consists of finding useful attributes by applying dimension reduction and transformation methods, and finding invariant representation of the data. 5. Choosing the data mining task. Here the data miner matches the goals defined in Step 1 with a particular DM method, such as classification, regression, clustering, etc. *
  • 21.
  • 22. Two representative industrial models are the five-step model by Cabena et al., with support from IBM and the industrial six-step CRISP-DM model, developed by a large consortium of European companies. *
  • 23. The CRISP-DM (Cross-Industry Standard Process for Data Mining) was first established in the late 1990s by four companies: Integral Solutions Ltd. (a provider of commercial data mining solutions), NCR (a database provider), DaimlerChrysler (an automobile manufacturer), and OHRA (an insurance company). *
  • 24. *
  • 25.
  • 26. The development of academic and industrial models has led to the development of hybrid models, i.e., models that combine aspects of both. One such model is a six-step KDP model developed by Cios et al. The main differences and extensions include • providing more general, research-oriented description of the steps, • introducing a data mining step instead of the modeling step, • introducing several new explicit feedback mechanisms, (the CRISP- DM model has only three major feedback sources, while the hybrid model has more detailed feedback mechanisms) and • Modification of the last step, since in the hybrid model, the knowledge discovered for a particular domain may be applied in other domains. *
  • 27. *
  • 28. 1. Understanding of the problem domain. This initial step involves working closely with domain experts to define the problem and determine the project goals, identifying key people, and learning about current solutions to the problem. It also involves learning domain- specific terminology. A description of the problem, including its restrictions, is prepared. Finally, project goals are translated into DM goals, and the initial selection of DM tools to be used later in the process is performed. 2. Understanding of the data. This step includes collecting sample data and deciding which data, including format and size, will be needed. Background knowledge can be used to guide these efforts. Data are checked for completeness, redundancy, missing values, plausibility of attribute values, etc. Finally, the step includes verification of the usefulness of the data with respect to the DM goals. *
  • 29.
  • 30.
  • 31. Knowledge Discovery in Databases is the process by which a task is identified and performed upon a database in order to extract information about the elements of the database. This process involves first collecting the data to be analysed, cleaning up the data, and reducing it to those features of interest to the process. At which time the tool or tools to be used upon the data are identified. These tools are then used to mine the data for information. Once the information has been created, it must be evaluated as to it efficacy to the process. Any knowledge thereupon gained is then re-incorporated into the process as well as used for purposes outside the scope of the process. This is a very complex process, but it is one that lends itself to a fair degree of automation. As such, it enters into the field of artificial intelligence, not just for the tools it employs, but for the fact that the process tries to re-incorporate the knowledge it has created. *