Knowledge Discovery Using Data Mining


Published on

-KDD process
-ETL tools
-Data Mining methodologies

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Knowledge Discovery Using Data Mining

  1. 1. <ul><li>CmpE 274 –Business Intelligence Technologies. </li></ul>
  2. 2. <ul><li>Jinal Shah (ID-005242095) </li></ul><ul><li>Sohel Dadia (ID-005177251) </li></ul><ul><li>Ankit Khera (ID-005226495) </li></ul><ul><li>Riddhi shah(ID-005359513) </li></ul><ul><li>Vivek Modi(Id-005208581) </li></ul><ul><li>Parth Vora (ID-005169100) </li></ul>
  3. 3. <ul><li>--Knowledge Discovery?? </li></ul><ul><li>--KDD Process </li></ul><ul><li>--Data Mining Algorithms </li></ul><ul><li>--Different forms of Mining Models </li></ul><ul><li>--Classification of Algorithms </li></ul><ul><li>--Weka </li></ul><ul><li>--DEMO </li></ul><ul><li>-- Questions?????? </li></ul>
  4. 4. <ul><ul><li>It is a process of searching knowledge from data and it focuses on the high level application of various data mining methods. </li></ul></ul><ul><ul><li>It main goal is mining information from raw data in the context of large databases. </li></ul></ul><ul><ul><li>It makes use of different data mining algorithms to extract information. </li></ul></ul>
  5. 5. <ul><ul><li>KDD is used in machine learning, pattern-recognition, databases , AI, MIS and lot of other applications. </li></ul></ul><ul><ul><li>It does the transformation according to the measures and thresholds. </li></ul></ul><ul><ul><li>It also takes in to account the preprocessing, sub-sampling, and transformation of the database if required. </li></ul></ul>
  6. 6. <ul><li>1. Data Cleaning </li></ul><ul><li>2. Data Integration </li></ul><ul><li>3. Data Selection </li></ul><ul><li>4. Data transformation </li></ul><ul><li>5. Data Mining </li></ul><ul><li>6. Pattern Evaluation </li></ul><ul><li>7. Knowledge Presentation </li></ul>
  7. 8. <ul><li>The data mining algorithm is the mechanism that creates mining models. </li></ul><ul><li>To create a model, an algorithm first analyzes a set of data, looking for specific patterns and trends. </li></ul><ul><li>The algorithm then uses the results of this analysis to define the parameters of the mining model. </li></ul>
  8. 9. <ul><li>Decision Trees and Rules </li></ul><ul><li>Non-linear regression and classification Methods </li></ul><ul><li>Example-based Methods </li></ul><ul><li>Probabilistic Graphical Dependency Models </li></ul><ul><li>Relational Learning Models </li></ul>
  9. 10. <ul><li>A set of rules that describe how products are grouped together in a transaction. </li></ul><ul><li>A decision tree that predicts whether a particular customer will buy a product. </li></ul><ul><li>A mathematical model that forecasts sales. </li></ul><ul><li>A set of clusters that describe how the cases in a dataset are related. </li></ul>
  10. 11. <ul><li>Classification algorithms predict one or more discrete variables, based on the other attributes in the dataset. </li></ul><ul><li>Regression algorithms predict one or more continuous variables, such as profit or loss, based on other attributes in the dataset. </li></ul><ul><li>Segmentation algorithms divide data into groups, or clusters, of items that have similar properties. </li></ul>
  11. 12. <ul><li>Association algorithms find correlations between different attributes in a dataset. The most common application of this kind of algorithm is for creating association rules, which can be used in a market basket analysis. </li></ul><ul><li>Sequence analysis algorithms summarize frequent sequences or episodes in data, such as a Web path flow. </li></ul>
  12. 13. <ul><li>Apriori Algorithm :- is a classic algorithm for learning association rules. </li></ul><ul><li>Apriori is designed to operate on databases containing transactions (for example, collections of items bought by customers, or details of a website frequentation). </li></ul><ul><li>Apriori uses breadth-first search and a hash tree structure to count candidate item sets efficiently. </li></ul>
  13. 14. <ul><li>What is Weka ? </li></ul><ul><ul><li>Weka is a collection of machine learning algorithms for data mining tasks. </li></ul></ul><ul><li>Why Weka ? </li></ul><ul><ul><li>Open Source. </li></ul></ul><ul><ul><li>The algorithms can either be applied directly to a dataset or called from your own Java code. </li></ul></ul>
  14. 15. <ul><ul><li>It contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. </li></ul></ul><ul><ul><li>It is also well-suited for developing new machine learning schemes. </li></ul></ul>
  15. 16. <ul><li>Java 1.4 (or later) is required to run Weka 3.4.x and older versions. </li></ul><ul><li>The developer versions, starting with 3.5.3, also require Java 5.0. </li></ul><ul><li>Platform : Windows/ Linux </li></ul>
  16. 21. <ul><li> </li></ul><ul><li> </li></ul><ul><li> </li></ul><ul><li>Text book “Data Mining” by Jiawei Han and Micheline Kamber </li></ul>