Data mining could be thought of as essentially ‘Customer analytics’, or more precisely, analytics instigated at the request of a customerwith the purpose of gaining insight (knowledge) of some data. Typically we view customer analytics as predictive and descriptive modelling, which isusually in relation to large CRM (Customer Relationship Management)/Marketing databases. It is often the case that data mining exercises model customers, however any entity for which there is data stored can be investigated. Others could include: households, websessions, calls, etc.http://www.thebusinessintelligenceguide.com/bi_tools/Difference_Between_Analytics_and_Advanced_Analytics.php
At this point, we must consider if the model does indeed reflect the reality ofwhat it is we’re attempting to model, and (more importantly)that the model will in fact achieve the business objectives. Thus the model must be thoroughlyevaluated, and this includes reviewing the steps taken to construct themodel. In particular, it is essential that we ensure the model incorporatesevery important business issue. This may mean that the model needs to bereviewed and worked on – so we have some interaction between phases 4and 5. This phase typically concludes with a decision on how the datamining results achieved will be used.
Data description and summarisationInitial exploratory data analysis can help to investigate and understand the data, and provide potential hypotheses for hidden information. Summarisation also plays a significant role in the presentation of final results.SegmentationA segmentation data mining analysis aims to separate the data into interesting and meaningful subgroups or classes, so that members of a subgroup share common characteristics. A classic example would be a shopping basket analysis where the segments of baskets depends on the items they contain.Concept descriptionsConcept description aims to give an understandable description of the concepts or classes. This is not done to produce complete models with high prediction accuracy, but instead it is done in order to gain insights. E.g. a company might be interested in learning more about their loyal and disloyal customers. From concept descriptions such is this, a company could then conclude what might be done in order to keep customers loyal, or transform disloyal customers into loyal ones. Concept description has close connections with both segmentation and classification. Segmentation could lead generating a concept or class of data without really any understandable description of the elements in that class. ClassificationClassification has connections to almost all other problem types. An example of this is the following: credit scoring attempts to assess the credit risk of a new customer. This problem can be transformed into a classification problem by partitioning customers into two new classes: good customers, and bad customers. This new model can then be used to assign prospective customers into one of the two classes available, and hence either accept or reject them.PredictionPrediction problems are similar to classification problems, with one major difference: in prediction, the target attribute (or class) is not a qualitative discrete attribute, but instead a continuous one. This means that the aim of a prediction model is to find and assign a numerical value of a target attribute for unseen objects.In particular, if the prediction model is dealing with time series data, then it is often referred to as forecasting.Dependency analysisDependency analysis consists of finding a model that describes significant dependencies (or associations) between data items or events. Dependencies can be used to predict the value of a data item given information on other items. Dependencies can be used for predictive modelling; however in general they are mostly used for understanding.
Key Principles Of Data Mining
Key Principles of Data Mining<br />Presentation by Tobie Muir (Data-Decisions)<br />Henry Stewart Briefing:<br />An Introduction to Marketing Analytics<br />London, 23rd June 2010<br />
What is data mining?<br />“Data mining is the process of finding patterns in your data which you can use to do your business better”<br />Alan Montgomery, formerly Managing Director, Integral Solutions Limited (now part of IBM/SPSS)<br /><ul><li>Nowadays every credit card used, every transaction processed, every loan application, etc. is recorded digitally, creating massive databases of raw information.
These datasets can be incomprehensibly large – too large to analyse without the aid of computer-driven processes.
The role of data mining is to introduce (semi) automated computer-driven processes and statistical techniques, to extract meaningful patterns from such data with the goal of improving the business in question. A classic example in marketing is using DM insights to achieve revenue with less marketing budget.
For very large datasets data mining can focus on a sample within a dataset – instead of analysing millions (billions!) of records, which can be computationally expensive / slow – we analyse a subset of this data in the hope that patterns prevalent in the subset also apply to the entire dataset.
Where does data mining fit with BI tools?<br /><ul><li>Data mining is generally thought of as a smaller subset of Business Intelligence (BI).
Business intelligence tools can also encompass the extraction, storage, visualisation and distribution of business information, not just the analysis of business data.
Leading BI tools will typically contain data mining capabilities as well as other more general activities including decision support systems, query and reporting, online analytical processing (OLAP), statistical analysis and forecasting.</li></ul>Business Intelligence<br /><ul><li>Decision Support Systems
How models should be evaluated and monitored<br /><ul><li>Keeping data up to date is essential – data becomes obsolete quickly (within a year, even), and so re-evaluating models frequently with up to date data will help keep them accurate. This includes updating the data with the latest campaign data, response data etc., and reassessing error rates in the model with the new data to help.
Models need to be evaluated to see that the results produced are compatible with the project objectives.