Introduction on Data Mining
What is Data MiningNon-trivial extraction of implicit, previously unknown and potentially useful information from dataExploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patternsData mining is the process of automatically discovering useful information in large data repositories	--
Simple Examples for Data MiningPredicting  whether a newly arrived customer will spend more than 100$ at a department  store.
Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com,)Why Data MiningCredit ratings/targeted marketing:Given a database of 100,000 names, which persons are the least likely to default on their credit cards? Identify likely responders to sales promotionsFraud detectionWhich types of transactions are likely to be fraudulent, given the demographics and transactional history of a particular customer?
Origins of Data MiningDraws ideas from machine learning/AI, pattern recognition, statistics, and database systemsTraditional Techniquesmay be unsuitable due to Enormity of dataHigh dimensionality of dataHeterogeneous, distributed nature of data
Data Mining TasksPrediction MethodsUse some variables to predict unknown or future values of other variablesDescription MethodsFind human-interpretable patterns that describe the data.
Data Mining TasksClassification [Predictive]Clustering [Descriptive]Association Rule Discovery [Descriptive]Sequential Pattern Discovery [Descriptive]Regression [Predictive]Deviation Detection [Predictive]
Classification: DefinitionIt is used for discrete target variablesEx: predicting whether a Web user will make a purchase at  an online store is an classification tasks because the target variabe is binary-valued.
Clustering: Definition-	Clustering  analysis  seeks to find groups of closely related observations that belong to the same cluster are more similar to each other than observations  that observations that belong s to other clusters. Ex:          -to find areas of ocean that have aq significant impact on the earth’s climate.
Association Rule Discovery: Definition	Given a set of records each of which contain some number of items from a given collection;Produce dependency rules which will predict occurrence of an item based on occurrences of other items.
Contd…Rules Discovered:{Milk} --> {Coke}    {Diaper, Milk} --> {Beer}
Sequential Pattern Discovery: Definition	Given is a set of objects, with each object associated with its own timeline of events, find rules that predict strong sequential dependencies among different events.(A   B)     (C)  --->   (D   E)
Contd…	Rules are formed by first disovering patterns. Event occurrences in the patterns are governed by timing constraints.(A   B)     (C)    (D   E)<= xg >ng<= ws<= ms
Sequential Pattern Discovery: Example	 In telecommunications alarm logs, (Inverter_ProblemExcessive_Line_Current)         (Rectifier_Alarm) --> (Fire_Alarm)
Regression	Predict a value of a given continuous valued variable based on the values of other variables, assuming a linear or nonlinear model of dependency.Greatly studied in statistics, neural network fields.
Regression-examples	Predicting sales amounts of new product based on advertising expenditure.Predicting wind velocities as a function of temperature, humidity, air pressure, etc.Time series prediction of stock market indices.
Deviation/Anomaly DetectionDetect significant deviations from normal behaviorApplications:Credit Card Fraud DetectionNetwork Intrusion Detection
Visit more self help tutorialsPick a tutorial of your choice and browse through it at your own pace.The tutorials section is free, self-guiding and will not involve any additional support.Visit us at www.dataminingtools.net

Introduction to Data Mining

  • 1.
  • 2.
    What is DataMiningNon-trivial extraction of implicit, previously unknown and potentially useful information from dataExploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patternsData mining is the process of automatically discovering useful information in large data repositories --
  • 3.
    Simple Examples forData MiningPredicting whether a newly arrived customer will spend more than 100$ at a department store.
  • 4.
    Group together similardocuments returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com,)Why Data MiningCredit ratings/targeted marketing:Given a database of 100,000 names, which persons are the least likely to default on their credit cards? Identify likely responders to sales promotionsFraud detectionWhich types of transactions are likely to be fraudulent, given the demographics and transactional history of a particular customer?
  • 5.
    Origins of DataMiningDraws ideas from machine learning/AI, pattern recognition, statistics, and database systemsTraditional Techniquesmay be unsuitable due to Enormity of dataHigh dimensionality of dataHeterogeneous, distributed nature of data
  • 6.
    Data Mining TasksPredictionMethodsUse some variables to predict unknown or future values of other variablesDescription MethodsFind human-interpretable patterns that describe the data.
  • 7.
    Data Mining TasksClassification[Predictive]Clustering [Descriptive]Association Rule Discovery [Descriptive]Sequential Pattern Discovery [Descriptive]Regression [Predictive]Deviation Detection [Predictive]
  • 8.
    Classification: DefinitionIt isused for discrete target variablesEx: predicting whether a Web user will make a purchase at an online store is an classification tasks because the target variabe is binary-valued.
  • 9.
    Clustering: Definition- Clustering analysis seeks to find groups of closely related observations that belong to the same cluster are more similar to each other than observations that observations that belong s to other clusters. Ex: -to find areas of ocean that have aq significant impact on the earth’s climate.
  • 10.
    Association Rule Discovery:Definition Given a set of records each of which contain some number of items from a given collection;Produce dependency rules which will predict occurrence of an item based on occurrences of other items.
  • 11.
    Contd…Rules Discovered:{Milk} -->{Coke} {Diaper, Milk} --> {Beer}
  • 12.
    Sequential Pattern Discovery:Definition Given is a set of objects, with each object associated with its own timeline of events, find rules that predict strong sequential dependencies among different events.(A B) (C) ---> (D E)
  • 13.
    Contd… Rules are formedby first disovering patterns. Event occurrences in the patterns are governed by timing constraints.(A B) (C) (D E)<= xg >ng<= ws<= ms
  • 14.
    Sequential Pattern Discovery:Example In telecommunications alarm logs, (Inverter_ProblemExcessive_Line_Current) (Rectifier_Alarm) --> (Fire_Alarm)
  • 15.
    Regression Predict a valueof a given continuous valued variable based on the values of other variables, assuming a linear or nonlinear model of dependency.Greatly studied in statistics, neural network fields.
  • 16.
    Regression-examples Predicting sales amountsof new product based on advertising expenditure.Predicting wind velocities as a function of temperature, humidity, air pressure, etc.Time series prediction of stock market indices.
  • 17.
    Deviation/Anomaly DetectionDetect significantdeviations from normal behaviorApplications:Credit Card Fraud DetectionNetwork Intrusion Detection
  • 18.
    Visit more selfhelp tutorialsPick a tutorial of your choice and browse through it at your own pace.The tutorials section is free, self-guiding and will not involve any additional support.Visit us at www.dataminingtools.net