Data mining is the process of automatically discovering useful information from large data sets. It draws from machine learning, statistics, and database systems to analyze data and identify patterns. Common data mining tasks include classification, clustering, association rule mining, and sequential pattern mining. These tasks are used for applications like credit risk assessment, fraud detection, customer segmentation, and market basket analysis. Data mining aims to extract unknown and potentially useful patterns from large data sets.
What is DataMiningNon-trivial extraction of implicit, previously unknown and potentially useful information from dataExploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patternsData mining is the process of automatically discovering useful information in large data repositories --
3.
Simple Examples forData MiningPredicting whether a newly arrived customer will spend more than 100$ at a department store.
4.
Group together similardocuments returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com,)Why Data MiningCredit ratings/targeted marketing:Given a database of 100,000 names, which persons are the least likely to default on their credit cards? Identify likely responders to sales promotionsFraud detectionWhich types of transactions are likely to be fraudulent, given the demographics and transactional history of a particular customer?
5.
Origins of DataMiningDraws ideas from machine learning/AI, pattern recognition, statistics, and database systemsTraditional Techniquesmay be unsuitable due to Enormity of dataHigh dimensionality of dataHeterogeneous, distributed nature of data
6.
Data Mining TasksPredictionMethodsUse some variables to predict unknown or future values of other variablesDescription MethodsFind human-interpretable patterns that describe the data.
Classification: DefinitionIt isused for discrete target variablesEx: predicting whether a Web user will make a purchase at an online store is an classification tasks because the target variabe is binary-valued.
9.
Clustering: Definition- Clustering analysis seeks to find groups of closely related observations that belong to the same cluster are more similar to each other than observations that observations that belong s to other clusters. Ex: -to find areas of ocean that have aq significant impact on the earth’s climate.
10.
Association Rule Discovery:Definition Given a set of records each of which contain some number of items from a given collection;Produce dependency rules which will predict occurrence of an item based on occurrences of other items.
Sequential Pattern Discovery:Definition Given is a set of objects, with each object associated with its own timeline of events, find rules that predict strong sequential dependencies among different events.(A B) (C) ---> (D E)
13.
Contd… Rules are formedby first disovering patterns. Event occurrences in the patterns are governed by timing constraints.(A B) (C) (D E)<= xg >ng<= ws<= ms
Regression Predict a valueof a given continuous valued variable based on the values of other variables, assuming a linear or nonlinear model of dependency.Greatly studied in statistics, neural network fields.
16.
Regression-examples Predicting sales amountsof new product based on advertising expenditure.Predicting wind velocities as a function of temperature, humidity, air pressure, etc.Time series prediction of stock market indices.
Visit more selfhelp tutorialsPick a tutorial of your choice and browse through it at your own pace.The tutorials section is free, self-guiding and will not involve any additional support.Visit us at www.dataminingtools.net