Introduction on Data Mining<br />
What is Data Mining<br />Non-trivial extraction of implicit, previously unknown and potentially useful information from da...
Simple Examples for Data Mining<br /><ul><li>Predicting  whether a newly arrived customer will spend more than 100$ at a d...
Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com,...
Origins of Data Mining<br />Draws ideas from machine learning/AI, pattern recognition, statistics, and database systems<br...
Data Mining Tasks<br />Prediction Methods<br />Use some variables to predict unknown or future values of other variables<b...
Data Mining Tasks<br />Classification [Predictive]<br />Clustering [Descriptive]<br />Association Rule Discovery [Descript...
Classification: Definition<br />It is used for discrete target variables<br />Ex: predicting whether a Web user will make ...
Clustering: Definition<br />-	Clustering  analysis  seeks to find groups of closely related observations that belong to th...
Association Rule Discovery: Definition<br />	Given a set of records each of which contain some number of items from a give...
Contd…<br />Rules Discovered:<br />{Milk} --&gt; {Coke}<br />    {Diaper, Milk} --&gt; {Beer}<br />
Sequential Pattern Discovery: Definition<br />	Given is a set of objects, with each object associated with its own timelin...
Contd…<br />	Rules are formed by first disovering patterns. Event occurrences in the patterns are governed by timing const...
Sequential Pattern Discovery: Example<br />	 In telecommunications alarm logs, <br />(Inverter_ProblemExcessive_Line_Curre...
Regression<br />	Predict a value of a given continuous valued variable based on the values of other variables, assuming a ...
Regression-examples<br />	Predicting sales amounts of new product based on advertising expenditure.<br />Predicting wind v...
Deviation/Anomaly Detection<br />Detect significant deviations from normal behavior<br />Applications:<br />Credit Card Fr...
Visit more self help tutorials<br />Pick a tutorial of your choice and browse through it at your own pace.<br />The tutori...
Upcoming SlideShare
Loading in …5
×

Introduction to data mining

1,060 views

Published on

Introduction to data mining

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,060
On SlideShare
0
From Embeds
0
Number of Embeds
47
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Introduction to data mining

  1. 1. Introduction on Data Mining<br />
  2. 2. What is Data Mining<br />Non-trivial extraction of implicit, previously unknown and potentially useful information from data<br />Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns<br />Data mining is the process of automatically discovering useful information in large data repositories<br /> --<br />
  3. 3. Simple Examples for Data Mining<br /><ul><li>Predicting whether a newly arrived customer will spend more than 100$ at a department store.
  4. 4. Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com,)</li></li></ul><li>Why Data Mining<br />Credit ratings/targeted marketing:<br />Given a database of 100,000 names, which persons are the least likely to default on their credit cards? <br />Identify likely responders to sales promotions<br />Fraud detection<br />Which types of transactions are likely to be fraudulent, given the demographics and transactional history of a particular customer?<br />
  5. 5. Origins of Data Mining<br />Draws ideas from machine learning/AI, pattern recognition, statistics, and database systems<br />Traditional Techniquesmay be unsuitable due to <br />Enormity of data<br />High dimensionality of data<br />Heterogeneous, distributed nature of data<br />
  6. 6. Data Mining Tasks<br />Prediction Methods<br />Use some variables to predict unknown or future values of other variables<br />Description Methods<br />Find human-interpretable patterns that describe the data.<br />
  7. 7. Data Mining Tasks<br />Classification [Predictive]<br />Clustering [Descriptive]<br />Association Rule Discovery [Descriptive]<br />Sequential Pattern Discovery [Descriptive]<br />Regression [Predictive]<br />Deviation Detection [Predictive]<br />
  8. 8. Classification: Definition<br />It is used for discrete target variables<br />Ex: predicting whether a Web user will make a purchase at an online store is an classification tasks because the target variabe is binary-valued.<br />
  9. 9. Clustering: Definition<br />- Clustering analysis seeks to find groups of closely related observations that belong to the same cluster are more similar to each other than observations that observations that belong s to other clusters.<br /> Ex:<br /> -to find areas of ocean that have aq significant impact on the earth’s climate.<br />
  10. 10. Association Rule Discovery: Definition<br /> Given a set of records each of which contain some number of items from a given collection;<br />Produce dependency rules which will predict occurrence of an item based on occurrences of other items.<br />
  11. 11. Contd…<br />Rules Discovered:<br />{Milk} --&gt; {Coke}<br /> {Diaper, Milk} --&gt; {Beer}<br />
  12. 12. Sequential Pattern Discovery: Definition<br /> Given is a set of objects, with each object associated with its own timeline of events, find rules that predict strong sequential dependencies among different events.<br />(A B) (C) ---&gt; (D E)<br />
  13. 13. Contd…<br /> Rules are formed by first disovering patterns. Event occurrences in the patterns are governed by timing constraints.<br />(A B) (C) (D E)<br />&lt;= xg<br /> &gt;ng<br />&lt;= ws<br />&lt;= ms<br />
  14. 14. Sequential Pattern Discovery: Example<br /> In telecommunications alarm logs, <br />(Inverter_ProblemExcessive_Line_Current) <br /> (Rectifier_Alarm) --&gt; (Fire_Alarm)<br />
  15. 15. Regression<br /> Predict a value of a given continuous valued variable based on the values of other variables, assuming a linear or nonlinear model of dependency.<br />Greatly studied in statistics, neural network fields.<br />
  16. 16. Regression-examples<br /> Predicting sales amounts of new product based on advertising expenditure.<br />Predicting wind velocities as a function of temperature, humidity, air pressure, etc.<br />Time series prediction of stock market indices.<br />
  17. 17. Deviation/Anomaly Detection<br />Detect significant deviations from normal behavior<br />Applications:<br />Credit Card Fraud Detection<br />Network Intrusion Detection<br />
  18. 18. Visit more self help tutorials<br />Pick a tutorial of your choice and browse through it at your own pace.<br />The tutorials section is free, self-guiding and will not involve any additional support.<br />Visit us at www.dataminingtools.net<br />

×