Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Knowledge Discovery Using Data Mining


Published on

-KDD process
-ETL tools
-Data Mining methodologies

Published in: Technology
  • Be the first to comment

Knowledge Discovery Using Data Mining

  1. 1. <ul><li>CmpE 274 –Business Intelligence Technologies. </li></ul>
  2. 2. <ul><li>Jinal Shah (ID-005242095) </li></ul><ul><li>Sohel Dadia (ID-005177251) </li></ul><ul><li>Ankit Khera (ID-005226495) </li></ul><ul><li>Riddhi shah(ID-005359513) </li></ul><ul><li>Vivek Modi(Id-005208581) </li></ul><ul><li>Parth Vora (ID-005169100) </li></ul>
  3. 3. <ul><li>--Knowledge Discovery?? </li></ul><ul><li>--KDD Process </li></ul><ul><li>--Data Mining Algorithms </li></ul><ul><li>--Different forms of Mining Models </li></ul><ul><li>--Classification of Algorithms </li></ul><ul><li>--Weka </li></ul><ul><li>--DEMO </li></ul><ul><li>-- Questions?????? </li></ul>
  4. 4. <ul><ul><li>It is a process of searching knowledge from data and it focuses on the high level application of various data mining methods. </li></ul></ul><ul><ul><li>It main goal is mining information from raw data in the context of large databases. </li></ul></ul><ul><ul><li>It makes use of different data mining algorithms to extract information. </li></ul></ul>
  5. 5. <ul><ul><li>KDD is used in machine learning, pattern-recognition, databases , AI, MIS and lot of other applications. </li></ul></ul><ul><ul><li>It does the transformation according to the measures and thresholds. </li></ul></ul><ul><ul><li>It also takes in to account the preprocessing, sub-sampling, and transformation of the database if required. </li></ul></ul>
  6. 6. <ul><li>1. Data Cleaning </li></ul><ul><li>2. Data Integration </li></ul><ul><li>3. Data Selection </li></ul><ul><li>4. Data transformation </li></ul><ul><li>5. Data Mining </li></ul><ul><li>6. Pattern Evaluation </li></ul><ul><li>7. Knowledge Presentation </li></ul>
  7. 8. <ul><li>The data mining algorithm is the mechanism that creates mining models. </li></ul><ul><li>To create a model, an algorithm first analyzes a set of data, looking for specific patterns and trends. </li></ul><ul><li>The algorithm then uses the results of this analysis to define the parameters of the mining model. </li></ul>
  8. 9. <ul><li>Decision Trees and Rules </li></ul><ul><li>Non-linear regression and classification Methods </li></ul><ul><li>Example-based Methods </li></ul><ul><li>Probabilistic Graphical Dependency Models </li></ul><ul><li>Relational Learning Models </li></ul>
  9. 10. <ul><li>A set of rules that describe how products are grouped together in a transaction. </li></ul><ul><li>A decision tree that predicts whether a particular customer will buy a product. </li></ul><ul><li>A mathematical model that forecasts sales. </li></ul><ul><li>A set of clusters that describe how the cases in a dataset are related. </li></ul>
  10. 11. <ul><li>Classification algorithms predict one or more discrete variables, based on the other attributes in the dataset. </li></ul><ul><li>Regression algorithms predict one or more continuous variables, such as profit or loss, based on other attributes in the dataset. </li></ul><ul><li>Segmentation algorithms divide data into groups, or clusters, of items that have similar properties. </li></ul>
  11. 12. <ul><li>Association algorithms find correlations between different attributes in a dataset. The most common application of this kind of algorithm is for creating association rules, which can be used in a market basket analysis. </li></ul><ul><li>Sequence analysis algorithms summarize frequent sequences or episodes in data, such as a Web path flow. </li></ul>
  12. 13. <ul><li>Apriori Algorithm :- is a classic algorithm for learning association rules. </li></ul><ul><li>Apriori is designed to operate on databases containing transactions (for example, collections of items bought by customers, or details of a website frequentation). </li></ul><ul><li>Apriori uses breadth-first search and a hash tree structure to count candidate item sets efficiently. </li></ul>
  13. 14. <ul><li>What is Weka ? </li></ul><ul><ul><li>Weka is a collection of machine learning algorithms for data mining tasks. </li></ul></ul><ul><li>Why Weka ? </li></ul><ul><ul><li>Open Source. </li></ul></ul><ul><ul><li>The algorithms can either be applied directly to a dataset or called from your own Java code. </li></ul></ul>
  14. 15. <ul><ul><li>It contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. </li></ul></ul><ul><ul><li>It is also well-suited for developing new machine learning schemes. </li></ul></ul>
  15. 16. <ul><li>Java 1.4 (or later) is required to run Weka 3.4.x and older versions. </li></ul><ul><li>The developer versions, starting with 3.5.3, also require Java 5.0. </li></ul><ul><li>Platform : Windows/ Linux </li></ul>
  16. 21. <ul><li> </li></ul><ul><li> </li></ul><ul><li> </li></ul><ul><li>Text book “Data Mining” by Jiawei Han and Micheline Kamber </li></ul>