Introduction to Data Mining

What is Data MiningNon-trivial extraction of implicit, previously unknown and potentially useful information from dataExploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patternsData mining is the process of automatically discovering useful information in large data repositories --

Simple Examples for Data MiningPredicting whether a newly arrived customer will spend more than 100$ at a department store.

Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com,)Why Data MiningCredit ratings/targeted marketing:Given a database of 100,000 names, which persons are the least likely to default on their credit cards? Identify likely responders to sales promotionsFraud detectionWhich types of transactions are likely to be fraudulent, given the demographics and transactional history of a particular customer?

Origins of Data MiningDraws ideas from machine learning/AI, pattern recognition, statistics, and database systemsTraditional Techniquesmay be unsuitable due to Enormity of dataHigh dimensionality of dataHeterogeneous, distributed nature of data

Data Mining TasksPrediction MethodsUse some variables to predict unknown or future values of other variablesDescription MethodsFind human-interpretable patterns that describe the data.

Data Mining TasksClassification [Predictive]Clustering [Descriptive]Association Rule Discovery [Descriptive]Sequential Pattern Discovery [Descriptive]Regression [Predictive]Deviation Detection [Predictive]

Classification: DefinitionIt is used for discrete target variablesEx: predicting whether a Web user will make a purchase at an online store is an classification tasks because the target variabe is binary-valued.

Clustering: Definition- Clustering analysis seeks to find groups of closely related observations that belong to the same cluster are more similar to each other than observations that observations that belong s to other clusters. Ex: -to find areas of ocean that have aq significant impact on the earth’s climate.

Association Rule Discovery: Definition Given a set of records each of which contain some number of items from a given collection;Produce dependency rules which will predict occurrence of an item based on occurrences of other items.

Contd…Rules Discovered:{Milk} --> {Coke} {Diaper, Milk} --> {Beer}

Sequential Pattern Discovery: Definition Given is a set of objects, with each object associated with its own timeline of events, find rules that predict strong sequential dependencies among different events.(A B) (C) ---> (D E)

Contd… Rules are formed by first disovering patterns. Event occurrences in the patterns are governed by timing constraints.(A B) (C) (D E)<= xg >ng<= ws<= ms

Sequential Pattern Discovery: Example In telecommunications alarm logs, (Inverter_ProblemExcessive_Line_Current) (Rectifier_Alarm) --> (Fire_Alarm)

Regression Predict a value of a given continuous valued variable based on the values of other variables, assuming a linear or nonlinear model of dependency.Greatly studied in statistics, neural network fields.

Regression-examples Predicting sales amounts of new product based on advertising expenditure.Predicting wind velocities as a function of temperature, humidity, air pressure, etc.Time series prediction of stock market indices.

Deviation/Anomaly DetectionDetect significant deviations from normal behaviorApplications:Credit Card Fraud DetectionNetwork Intrusion Detection

Visit more self help tutorialsPick a tutorial of your choice and browse through it at your own pace.The tutorials section is free, self-guiding and will not involve any additional support.Visit us at www.dataminingtools.net

Introduction to Data Mining

More Related Content

What's hot

Viewers also liked

Similar to Introduction to Data Mining

More from DataminingTools Inc

Recently uploaded

Introduction to Data Mining