Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to Data Mining


Published on

Introduction to Data mining

Published in: Technology, Education

Introduction to Data Mining

  1. 1. Introduction on Data Mining<br />
  2. 2. What is Data Mining<br />Non-trivial extraction of implicit, previously unknown and potentially useful information from data<br />Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns<br />Data mining is the process of automatically discovering useful information in large data repositories<br /> --<br />
  3. 3. Simple Examples for Data Mining<br /><ul><li>Predicting whether a newly arrived customer will spend more than 100$ at a department store.
  4. 4. Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest,,)</li></li></ul><li>Why Data Mining<br />Credit ratings/targeted marketing:<br />Given a database of 100,000 names, which persons are the least likely to default on their credit cards? <br />Identify likely responders to sales promotions<br />Fraud detection<br />Which types of transactions are likely to be fraudulent, given the demographics and transactional history of a particular customer?<br />
  5. 5. Origins of Data Mining<br />Draws ideas from machine learning/AI, pattern recognition, statistics, and database systems<br />Traditional Techniquesmay be unsuitable due to <br />Enormity of data<br />High dimensionality of data<br />Heterogeneous, distributed nature of data<br />
  6. 6. Data Mining Tasks<br />Prediction Methods<br />Use some variables to predict unknown or future values of other variables<br />Description Methods<br />Find human-interpretable patterns that describe the data.<br />
  7. 7. Data Mining Tasks<br />Classification [Predictive]<br />Clustering [Descriptive]<br />Association Rule Discovery [Descriptive]<br />Sequential Pattern Discovery [Descriptive]<br />Regression [Predictive]<br />Deviation Detection [Predictive]<br />
  8. 8. Classification: Definition<br />It is used for discrete target variables<br />Ex: predicting whether a Web user will make a purchase at an online store is an classification tasks because the target variabe is binary-valued.<br />
  9. 9. Clustering: Definition<br />- Clustering analysis seeks to find groups of closely related observations that belong to the same cluster are more similar to each other than observations that observations that belong s to other clusters.<br /> Ex:<br /> -to find areas of ocean that have aq significant impact on the earth’s climate.<br />
  10. 10. Association Rule Discovery: Definition<br /> Given a set of records each of which contain some number of items from a given collection;<br />Produce dependency rules which will predict occurrence of an item based on occurrences of other items.<br />
  11. 11. Contd…<br />Rules Discovered:<br />{Milk} --&gt; {Coke}<br /> {Diaper, Milk} --&gt; {Beer}<br />
  12. 12. Sequential Pattern Discovery: Definition<br /> Given is a set of objects, with each object associated with its own timeline of events, find rules that predict strong sequential dependencies among different events.<br />(A B) (C) ---&gt; (D E)<br />
  13. 13. Contd…<br /> Rules are formed by first disovering patterns. Event occurrences in the patterns are governed by timing constraints.<br />(A B) (C) (D E)<br />&lt;= xg<br /> &gt;ng<br />&lt;= ws<br />&lt;= ms<br />
  14. 14. Sequential Pattern Discovery: Example<br /> In telecommunications alarm logs, <br />(Inverter_ProblemExcessive_Line_Current) <br /> (Rectifier_Alarm) --&gt; (Fire_Alarm)<br />
  15. 15. Regression<br /> Predict a value of a given continuous valued variable based on the values of other variables, assuming a linear or nonlinear model of dependency.<br />Greatly studied in statistics, neural network fields.<br />
  16. 16. Regression-examples<br /> Predicting sales amounts of new product based on advertising expenditure.<br />Predicting wind velocities as a function of temperature, humidity, air pressure, etc.<br />Time series prediction of stock market indices.<br />
  17. 17. Deviation/Anomaly Detection<br />Detect significant deviations from normal behavior<br />Applications:<br />Credit Card Fraud Detection<br />Network Intrusion Detection<br />
  18. 18. Visit more self help tutorials<br />Pick a tutorial of your choice and browse through it at your own pace.<br />The tutorials section is free, self-guiding and will not involve any additional support.<br />Visit us at<br />