2.
What is Data Mining<br />Non-trivial extraction of implicit, previously unknown and potentially useful information from data<br />Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns<br />Data mining is the process of automatically discovering useful information in large data repositories<br /> --<br />
3.
Simple Examples for Data Mining<br /><ul><li>Predicting whether a newly arrived customer will spend more than 100$ at a department store.
4.
Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com,)</li></li></ul><li>Why Data Mining<br />Credit ratings/targeted marketing:<br />Given a database of 100,000 names, which persons are the least likely to default on their credit cards? <br />Identify likely responders to sales promotions<br />Fraud detection<br />Which types of transactions are likely to be fraudulent, given the demographics and transactional history of a particular customer?<br />
5.
Origins of Data Mining<br />Draws ideas from machine learning/AI, pattern recognition, statistics, and database systems<br />Traditional Techniquesmay be unsuitable due to <br />Enormity of data<br />High dimensionality of data<br />Heterogeneous, distributed nature of data<br />
6.
Data Mining Tasks<br />Prediction Methods<br />Use some variables to predict unknown or future values of other variables<br />Description Methods<br />Find human-interpretable patterns that describe the data.<br />
8.
Classification: Definition<br />It is used for discrete target variables<br />Ex: predicting whether a Web user will make a purchase at an online store is an classification tasks because the target variabe is binary-valued.<br />
9.
Clustering: Definition<br />- Clustering analysis seeks to find groups of closely related observations that belong to the same cluster are more similar to each other than observations that observations that belong s to other clusters.<br /> Ex:<br /> -to find areas of ocean that have aq significant impact on the earth’s climate.<br />
10.
Association Rule Discovery: Definition<br /> Given a set of records each of which contain some number of items from a given collection;<br />Produce dependency rules which will predict occurrence of an item based on occurrences of other items.<br />
12.
Sequential Pattern Discovery: Definition<br /> Given is a set of objects, with each object associated with its own timeline of events, find rules that predict strong sequential dependencies among different events.<br />(A B) (C) ---> (D E)<br />
13.
Contd…<br /> Rules are formed by first disovering patterns. Event occurrences in the patterns are governed by timing constraints.<br />(A B) (C) (D E)<br /><= xg<br /> >ng<br /><= ws<br /><= ms<br />
15.
Regression<br /> Predict a value of a given continuous valued variable based on the values of other variables, assuming a linear or nonlinear model of dependency.<br />Greatly studied in statistics, neural network fields.<br />
16.
Regression-examples<br /> Predicting sales amounts of new product based on advertising expenditure.<br />Predicting wind velocities as a function of temperature, humidity, air pressure, etc.<br />Time series prediction of stock market indices.<br />
17.
Deviation/Anomaly Detection<br />Detect significant deviations from normal behavior<br />Applications:<br />Credit Card Fraud Detection<br />Network Intrusion Detection<br />
18.
Visit more self help tutorials<br />Pick a tutorial of your choice and browse through it at your own pace.<br />The tutorials section is free, self-guiding and will not involve any additional support.<br />Visit us at www.dataminingtools.net<br />
Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.
Be the first to comment