SlideShare a Scribd company logo
1 of 17
Key Principles of Data Mining Presentation by Tobie Muir (Data-Decisions) Henry Stewart Briefing: An Introduction to Marketing Analytics London, 23rd June 2010
What is data mining? “Data mining is the process of finding patterns in your data which you can use to do your business better” Alan Montgomery, formerly Managing Director, Integral Solutions Limited 		(now part of IBM/SPSS) ,[object Object]
These datasets can be incomprehensibly large – too large to analyse without the aid of computer-driven processes.
The role of data mining is to introduce (semi) automated computer-driven processes and statistical techniques, to extract meaningful patterns from such data with the goal of improving the business in question. A classic example in marketing is using DM insights to achieve revenue with less marketing budget.
For very large datasets data mining can focus on a sample within a dataset – instead of analysing millions (billions!) of records, which can be computationally expensive / slow – we analyse a subset of this data in the hope that patterns prevalent in the subset also apply to the entire dataset.
Careful analysis is then required to determine whether any patterns found are meaningful: they could be spurious, coincidental, or it may be such a pattern is only found in the subset. 2 Copyright © 2010 Data-Decisions Ltd
Where does data mining fit with BI tools? ,[object Object]
Business intelligence tools can also encompass the extraction, storage, visualisation and distribution of business information, not just the analysis of business data.
Leading BI tools will typically contain data mining capabilities as well as other more general activities including decision support systems, query and reporting, online analytical processing (OLAP), statistical analysis and forecasting.Business Intelligence ,[object Object]
Online analytical processing (OLAP)
Statistical analysis and forecasting
Query and ReportingData Mining 3 Copyright © 2010 Data-Decisions Ltd
Business Intelligence Data Mining 4 Copyright © 2010 Data-Decisions Ltd
The Relationship between Data Mining and Advanced Analytics Advanced Analytics Data Mining  Focus on Customers Everything else... ,[object Object]
Optimise Best Media Mix
Optimise Responses Customer Acquisition ,[object Object],Customer Retention ,[object Object]
Cross-Sell

More Related Content

What's hot

What's hot (20)

Air Cargo transport
 Air Cargo transport Air Cargo transport
Air Cargo transport
 
2. forward chaining and backward chaining
2. forward chaining and backward chaining2. forward chaining and backward chaining
2. forward chaining and backward chaining
 
Routing
RoutingRouting
Routing
 
Token ring
Token ringToken ring
Token ring
 
Location Aided Routing (LAR)
Location Aided Routing (LAR) Location Aided Routing (LAR)
Location Aided Routing (LAR)
 
Multi-Task Learning for NLP
Multi-Task Learning for NLPMulti-Task Learning for NLP
Multi-Task Learning for NLP
 
Knowledge representation
Knowledge representationKnowledge representation
Knowledge representation
 
Blockchain and Sustainability
Blockchain and SustainabilityBlockchain and Sustainability
Blockchain and Sustainability
 
Data mining
Data miningData mining
Data mining
 
Hyper threading
Hyper threadingHyper threading
Hyper threading
 
Comprehensive survey on routing protocols for IoT
Comprehensive survey on routing protocols for IoTComprehensive survey on routing protocols for IoT
Comprehensive survey on routing protocols for IoT
 
Artificial Intelligence: Case-based & Model-based Reasoning
Artificial Intelligence: Case-based & Model-based ReasoningArtificial Intelligence: Case-based & Model-based Reasoning
Artificial Intelligence: Case-based & Model-based Reasoning
 
Green it
Green itGreen it
Green it
 
Internet congestion
Internet congestionInternet congestion
Internet congestion
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationss
 
VTU 4TH SEM CSE COMPUTER ORGANIZATION SOLVED PAPERS OF JUNE-2013 JUNE-2014 & ...
VTU 4TH SEM CSE COMPUTER ORGANIZATION SOLVED PAPERS OF JUNE-2013 JUNE-2014 & ...VTU 4TH SEM CSE COMPUTER ORGANIZATION SOLVED PAPERS OF JUNE-2013 JUNE-2014 & ...
VTU 4TH SEM CSE COMPUTER ORGANIZATION SOLVED PAPERS OF JUNE-2013 JUNE-2014 & ...
 
Leach
Leach Leach
Leach
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
 
Geographic Routing in WSN
Geographic Routing in WSNGeographic Routing in WSN
Geographic Routing in WSN
 
Hardware Multi-Threading
Hardware Multi-ThreadingHardware Multi-Threading
Hardware Multi-Threading
 

Viewers also liked

Data mining PPT
Data mining PPTData mining PPT
Data mining PPT
Kapil Rode
 

Viewers also liked (20)

What are the keys to effective internal marketing
What are the keys to effective internal marketingWhat are the keys to effective internal marketing
What are the keys to effective internal marketing
 
Governing Big Data : Principles and practices
Governing Big Data : Principles and practicesGoverning Big Data : Principles and practices
Governing Big Data : Principles and practices
 
Neural networks
Neural networksNeural networks
Neural networks
 
Preparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR PrinciplesPreparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR Principles
 
FAIR data overview
FAIR data overviewFAIR data overview
FAIR data overview
 
Data ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housingData ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housing
 
Data-ware Housing
Data-ware HousingData-ware Housing
Data-ware Housing
 
Intro to network Science
Intro to network ScienceIntro to network Science
Intro to network Science
 
SPSS Solutions
SPSS SolutionsSPSS Solutions
SPSS Solutions
 
A Pragmatic Approach to Identity and Access Management
A Pragmatic Approach to Identity and Access ManagementA Pragmatic Approach to Identity and Access Management
A Pragmatic Approach to Identity and Access Management
 
Imperatives for market driven strategy
Imperatives for market driven strategyImperatives for market driven strategy
Imperatives for market driven strategy
 
Do you have english class today
Do you have english class todayDo you have english class today
Do you have english class today
 
Network Science: Theory, Modeling and Applications
Network Science: Theory, Modeling and ApplicationsNetwork Science: Theory, Modeling and Applications
Network Science: Theory, Modeling and Applications
 
Analytics et Big Data, une histoire de cubes...
Analytics et Big Data, une histoire de cubes...Analytics et Big Data, une histoire de cubes...
Analytics et Big Data, une histoire de cubes...
 
Introduction data mining
Introduction data miningIntroduction data mining
Introduction data mining
 
Data mining PPT
Data mining PPTData mining PPT
Data mining PPT
 
Big Data
Big DataBig Data
Big Data
 
Data warehousing and Data mining
Data warehousing and Data mining Data warehousing and Data mining
Data warehousing and Data mining
 
50 data principles for loosely coupled identity management v1 0
50 data principles for loosely coupled identity management v1 050 data principles for loosely coupled identity management v1 0
50 data principles for loosely coupled identity management v1 0
 
Big data-analytics-ebook
Big data-analytics-ebookBig data-analytics-ebook
Big data-analytics-ebook
 

Similar to Key Principles Of Data Mining

Data mining (prefinals)
Data mining (prefinals)Data mining (prefinals)
Data mining (prefinals)
sadam33146
 
Big Data Analytics in light of Financial Industry
Big Data Analytics in light of Financial Industry Big Data Analytics in light of Financial Industry
Big Data Analytics in light of Financial Industry
Capgemini
 
BIG DATA & BUSINESS ANALYTICS
BIG DATA & BUSINESS ANALYTICSBIG DATA & BUSINESS ANALYTICS
BIG DATA & BUSINESS ANALYTICS
Vikram Joshi
 

Similar to Key Principles Of Data Mining (20)

Implementing Data Mesh WP LTIMindtree White Paper
Implementing Data Mesh WP LTIMindtree White PaperImplementing Data Mesh WP LTIMindtree White Paper
Implementing Data Mesh WP LTIMindtree White Paper
 
PPT1-Buss Intel Analytics.pptx
PPT1-Buss Intel  Analytics.pptxPPT1-Buss Intel  Analytics.pptx
PPT1-Buss Intel Analytics.pptx
 
Data mining (prefinals)
Data mining (prefinals)Data mining (prefinals)
Data mining (prefinals)
 
Big Data Analytics in light of Financial Industry
Big Data Analytics in light of Financial Industry Big Data Analytics in light of Financial Industry
Big Data Analytics in light of Financial Industry
 
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptxLecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
 
Big data - The next best thing
Big data - The next best thingBig data - The next best thing
Big data - The next best thing
 
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...
Data Quality in  Data Warehouse and Business Intelligence Environments - Disc...Data Quality in  Data Warehouse and Business Intelligence Environments - Disc...
Data Quality in Data Warehouse and Business Intelligence Environments - Disc...
 
Integrating Analytics into the Operational Fabric of Your Business
Integrating Analytics into the Operational Fabric of Your BusinessIntegrating Analytics into the Operational Fabric of Your Business
Integrating Analytics into the Operational Fabric of Your Business
 
Big dataplatform operationalstrategy
Big dataplatform operationalstrategyBig dataplatform operationalstrategy
Big dataplatform operationalstrategy
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
BIG DATA & BUSINESS ANALYTICS
BIG DATA & BUSINESS ANALYTICSBIG DATA & BUSINESS ANALYTICS
BIG DATA & BUSINESS ANALYTICS
 
9sight operational analytics white paper
9sight   operational analytics white paper9sight   operational analytics white paper
9sight operational analytics white paper
 
DATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptxDATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptx
 
Why Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieWhy Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A Lie
 
Egypt hackathon 2014 analytics & spss session
Egypt hackathon 2014   analytics & spss sessionEgypt hackathon 2014   analytics & spss session
Egypt hackathon 2014 analytics & spss session
 
data analysis-mining
data analysis-miningdata analysis-mining
data analysis-mining
 
The Softer Skills Analysts need to make an impact
The Softer Skills Analysts need to make an impactThe Softer Skills Analysts need to make an impact
The Softer Skills Analysts need to make an impact
 
Analytics
AnalyticsAnalytics
Analytics
 
Data Elicitation corporate presentation (june 2014)
Data Elicitation corporate presentation (june 2014)Data Elicitation corporate presentation (june 2014)
Data Elicitation corporate presentation (june 2014)
 
Cloud Analytics Playbook
Cloud Analytics PlaybookCloud Analytics Playbook
Cloud Analytics Playbook
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 

Key Principles Of Data Mining

  • 1. Key Principles of Data Mining Presentation by Tobie Muir (Data-Decisions) Henry Stewart Briefing: An Introduction to Marketing Analytics London, 23rd June 2010
  • 2.
  • 3. These datasets can be incomprehensibly large – too large to analyse without the aid of computer-driven processes.
  • 4. The role of data mining is to introduce (semi) automated computer-driven processes and statistical techniques, to extract meaningful patterns from such data with the goal of improving the business in question. A classic example in marketing is using DM insights to achieve revenue with less marketing budget.
  • 5. For very large datasets data mining can focus on a sample within a dataset – instead of analysing millions (billions!) of records, which can be computationally expensive / slow – we analyse a subset of this data in the hope that patterns prevalent in the subset also apply to the entire dataset.
  • 6. Careful analysis is then required to determine whether any patterns found are meaningful: they could be spurious, coincidental, or it may be such a pattern is only found in the subset. 2 Copyright © 2010 Data-Decisions Ltd
  • 7.
  • 8. Business intelligence tools can also encompass the extraction, storage, visualisation and distribution of business information, not just the analysis of business data.
  • 9.
  • 12. Query and ReportingData Mining 3 Copyright © 2010 Data-Decisions Ltd
  • 13. Business Intelligence Data Mining 4 Copyright © 2010 Data-Decisions Ltd
  • 14.
  • 16.
  • 18. Up-SellCustomer Expansion 5 Copyright © 2010 Data-Decisions Ltd
  • 19. The CRISP data mining process CRISP stands for Cross-Industry Standard Process for Data Mining Developed by the CRISP-DM consortium, consisting of DaimlerChrysler (formally Daimler-Benz), SPSS (formally ISL), and NCR. The idea was to standardise the process of data mining across the industry – a common pattern for the process of data mining was established among all collaborators, and CRISP-DM was also a mechanism to introduce uniform terminology and differentiation. CRISP-DM 1.0 was rolled out in Aug 2000, including detailed documentation To the right is the standard six-part CRISP model for how the data mining process occurs from this document: The model highlights the relationships and interdependencies between all 6 phases – the data mining process is one that is dynamic 6 Copyright © 2010 Data-Decisions Ltd
  • 20. The CRISP data mining processPhase 1 and 2 1. Business understanding We begin by understanding the requirements of the project from the business perspective – what does the company in question want to achieve/ get out of this? What are the priorities? How will we the measure outcome? We conclude this phase by producing a preliminary (phase) plan to tackle the established objectives.   2. Data understanding The data understanding phase has two broad aims. The first is to test the data (on which the analysis will be based) in order to identify any quality issues. The second is to try and discover any initial insights into the data that might provide any additional meaningful information. Some basic data visualisation – scatter plots, bar charts, distribution analysis is a great way to get to grips with the data, spot any immediate patterns, as well as test the general data sufficiency, which leads logically onto the next phase, Data Preparation. 7 Copyright © 2010 Data-Decisions Ltd
  • 21. The CRISP data mining process Phase 3 and 4 3. Data preparation The data preparation phase does exactly as its name suggests: this is the phase when the initial (raw) data is modified to produce the final dataset upon which the analysis will take place. Data preparation covers all activities that turn the raw data into the final dataset, ready for the modelling phase, including merging separate datasets and further data pooling, table/record/attribute selection, missing values imputation, data cleaning and spurious data removal and transformation. It is also advisable to consider how to partition the data into modelling and testing segments (typically on a 70/30 split, depending on data volumes). Data preparation, in my experience, is the most time consuming, but absolutely ESSENTIAL, phase out of the entire CRISP process. 8 Copyright © 2010 Data-Decisions Ltd
  • 22. The CRISP data mining process Phase 3 and 4 4. Modelling The modelling phase is the heart of the CRISP model. This is the point when we take the modified dataset and apply (typically) several modelling techniques. We would want to use several techniques as no single technique is perfect, and the range of results gathered should overcome the limitations of any one particular model. There is some interaction between phases 3 and 4: different techniques may require the data in different forms, and so it may be necessary to prepare the data in multiple ways to prep it for the various models. We will cover some of the different modelling techniques later in the presentation. 9 Copyright © 2010 Data-Decisions Ltd
  • 23. The CRISP data mining process Phase 5 5. Evaluation There are many different techniques and methods for evaluating the models created during the modelling phase. First and foremost you are looking to compare the model error rates, or inversely, the model accuracy rates – this is estimated by how well the models perform on the test data (data that was omitted during the model building phase). There are a number of ways to measure this, but most methods simply amount to providing a score that allows you to choose the model with the lowest error rate. Lift charts provide a very effective way to visualise and compare model performances over the test set. This is also a good way to access whether you may need to combine models together to arrive at an overall better solution. 10 Copyright © 2010 Data-Decisions Ltd
  • 24.
  • 25. The CRISP data mining process Phase 6 6. Deployment The deployment phase consolidates the results that the Model produces in a form that is useable to the customer. It could be that the data mining exercise was undertaken with the aim of simply increasing the knowledge of the data, but even in this restricted remit, and more generally, any knowledge gained from the exercise must be presented in a way that is of use to the customer.  Depending on the nature of the data mining project undertaken, the deployment phase can vary from being simply a report generated all the way through to implementing a repeatable data mining process across the enterprise. It is not unusual for the customer to perform the deployment phase (as opposed to the data analyst), and in either case it is important that the customer understands the actions that need to be carried out in order to make best use of the models created. 12 Copyright © 2010 Data-Decisions Ltd
  • 26.
  • 33.
  • 36. Genetic algorithmsDecision-trees Bayes Clustering 13 Copyright © 2010 Data-Decisions Ltd
  • 37. How data mining models are built and applied 14 Copyright © 2010 Data-Decisions Ltd
  • 38.
  • 39. Models need to be evaluated to see that the results produced are compatible with the project objectives.
  • 40. No model is ever perfect, so should always be work-in-progress and subject to continuous on-going scheduled refinements and improvements.15 Copyright © 2010 Data-Decisions Ltd
  • 41. Conclusion “Data mining is the process of finding patterns in your data which you can use to do your business better” Data mining is a subset of a much larger sphere known as Business Intelligence, which includes data parsing, visualisation, OLAP and data warehousing Advanced analytics encompasses Data Mining but also includes non-customer focussed activities that require mathematical and statistical approaches CRISP is an established proven Data Mining framework Key emphasis in Data Mining must be on understanding – also never underestimate the importance or amount of work involved in data mining No model is ever perfect and is only the starting point for future iterative improvements 16 Copyright © 2010 Data-Decisions Ltd
  • 42.
  • 45.
  • 46. Applied Data Mining: Statistical Methods for Business and Industry (Paolo Giudici)
  • 47. Data Mining Techniques: for Marketing, Sales and Customer Relationship Management (Berry and Linoff)Tobie Muir (Managing Director) E. tobie@data-decisions.co.uk T. 0208 144 7422 /07903 525358 W. data-decisions.co.uk 17 Copyright © 2010 Data-Decisions Ltd

Editor's Notes

  1. Data mining could be thought of as essentially ‘Customer analytics’, or more precisely, analytics instigated at the request of a customerwith the purpose of gaining insight (knowledge) of some data. Typically we view customer analytics as predictive and descriptive modelling, which isusually in relation to large CRM (Customer Relationship Management)/Marketing databases. It is often the case that data mining exercises model customers, however any entity for which there is data stored can be investigated. Others could include: households, websessions, calls, etc.http://www.thebusinessintelligenceguide.com/bi_tools/Difference_Between_Analytics_and_Advanced_Analytics.php
  2. http://www.spss.fi/pdf/crisp-dm.pdf
  3. At this point, we must consider if the model does indeed reflect the reality ofwhat it is we’re attempting to model, and (more importantly)that the model will in fact achieve the business objectives. Thus the model must be thoroughlyevaluated, and this includes reviewing the steps taken to construct themodel. In particular, it is essential that we ensure the model incorporatesevery important business issue. This may mean that the model needs to bereviewed and worked on – so we have some interaction between phases 4and 5. This phase typically concludes with a decision on how the datamining results achieved will be used. 
  4. http://msdn.microsoft.com/en-us/library/ms175428(SQL.100).aspx
  5. Data description and summarisationInitial exploratory data analysis can help to investigate and understand the data, and provide potential hypotheses for hidden information. Summarisation also plays a significant role in the presentation of final results.SegmentationA segmentation data mining analysis aims to separate the data into interesting and meaningful subgroups or classes, so that members of a subgroup share common characteristics. A classic example would be a shopping basket analysis where the segments of baskets depends on the items they contain.Concept descriptionsConcept description aims to give an understandable description of the concepts or classes. This is not done to produce complete models with high prediction accuracy, but instead it is done in order to gain insights. E.g. a company might be interested in learning more about their loyal and disloyal customers. From concept descriptions such is this, a company could then conclude what might be done in order to keep customers loyal, or transform disloyal customers into loyal ones. Concept description has close connections with both segmentation and classification. Segmentation could lead generating a concept or class of data without really any understandable description of the elements in that class. ClassificationClassification has connections to almost all other problem types. An example of this is the following: credit scoring attempts to assess the credit risk of a new customer. This problem can be transformed into a classification problem by partitioning customers into two new classes: good customers, and bad customers. This new model can then be used to assign prospective customers into one of the two classes available, and hence either accept or reject them.PredictionPrediction problems are similar to classification problems, with one major difference: in prediction, the target attribute (or class) is not a qualitative discrete attribute, but instead a continuous one. This means that the aim of a prediction model is to find and assign a numerical value of a target attribute for unseen objects.In particular, if the prediction model is dealing with time series data, then it is often referred to as forecasting.Dependency analysisDependency analysis consists of finding a model that describes significant dependencies (or associations) between data items or events. Dependencies can be used to predict the value of a data item given information on other items. Dependencies can be used for predictive modelling; however in general they are mostly used for understanding.