Introduction to Data Mining


Published on

Introduction to Data mining

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Introduction to Data Mining

  1. 1. Introduction on Data Mining<br />
  2. 2. What is Data Mining<br />Non-trivial extraction of implicit, previously unknown and potentially useful information from data<br />Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns<br />Data mining is the process of automatically discovering useful information in large data repositories<br /> --<br />
  3. 3. Simple Examples for Data Mining<br /><ul><li>Predicting whether a newly arrived customer will spend more than 100$ at a department store.
  4. 4. Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest,,)</li></li></ul><li>Why Data Mining<br />Credit ratings/targeted marketing:<br />Given a database of 100,000 names, which persons are the least likely to default on their credit cards? <br />Identify likely responders to sales promotions<br />Fraud detection<br />Which types of transactions are likely to be fraudulent, given the demographics and transactional history of a particular customer?<br />
  5. 5. Origins of Data Mining<br />Draws ideas from machine learning/AI, pattern recognition, statistics, and database systems<br />Traditional Techniquesmay be unsuitable due to <br />Enormity of data<br />High dimensionality of data<br />Heterogeneous, distributed nature of data<br />
  6. 6. Data Mining Tasks<br />Prediction Methods<br />Use some variables to predict unknown or future values of other variables<br />Description Methods<br />Find human-interpretable patterns that describe the data.<br />
  7. 7. Data Mining Tasks<br />Classification [Predictive]<br />Clustering [Descriptive]<br />Association Rule Discovery [Descriptive]<br />Sequential Pattern Discovery [Descriptive]<br />Regression [Predictive]<br />Deviation Detection [Predictive]<br />
  8. 8. Classification: Definition<br />It is used for discrete target variables<br />Ex: predicting whether a Web user will make a purchase at an online store is an classification tasks because the target variabe is binary-valued.<br />
  9. 9. Clustering: Definition<br />- Clustering analysis seeks to find groups of closely related observations that belong to the same cluster are more similar to each other than observations that observations that belong s to other clusters.<br /> Ex:<br /> -to find areas of ocean that have aq significant impact on the earth’s climate.<br />
  10. 10. Association Rule Discovery: Definition<br /> Given a set of records each of which contain some number of items from a given collection;<br />Produce dependency rules which will predict occurrence of an item based on occurrences of other items.<br />
  11. 11. Contd…<br />Rules Discovered:<br />{Milk} --&gt; {Coke}<br /> {Diaper, Milk} --&gt; {Beer}<br />
  12. 12. Sequential Pattern Discovery: Definition<br /> Given is a set of objects, with each object associated with its own timeline of events, find rules that predict strong sequential dependencies among different events.<br />(A B) (C) ---&gt; (D E)<br />
  13. 13. Contd…<br /> Rules are formed by first disovering patterns. Event occurrences in the patterns are governed by timing constraints.<br />(A B) (C) (D E)<br />&lt;= xg<br /> &gt;ng<br />&lt;= ws<br />&lt;= ms<br />
  14. 14. Sequential Pattern Discovery: Example<br /> In telecommunications alarm logs, <br />(Inverter_ProblemExcessive_Line_Current) <br /> (Rectifier_Alarm) --&gt; (Fire_Alarm)<br />
  15. 15. Regression<br /> Predict a value of a given continuous valued variable based on the values of other variables, assuming a linear or nonlinear model of dependency.<br />Greatly studied in statistics, neural network fields.<br />
  16. 16. Regression-examples<br /> Predicting sales amounts of new product based on advertising expenditure.<br />Predicting wind velocities as a function of temperature, humidity, air pressure, etc.<br />Time series prediction of stock market indices.<br />
  17. 17. Deviation/Anomaly Detection<br />Detect significant deviations from normal behavior<br />Applications:<br />Credit Card Fraud Detection<br />Network Intrusion Detection<br />
  18. 18. Visit more self help tutorials<br />Pick a tutorial of your choice and browse through it at your own pace.<br />The tutorials section is free, self-guiding and will not involve any additional support.<br />Visit us at<br />