SlideShare a Scribd company logo
1 of 54
Business Intelligence and Analytics:
Systems for Decision Support
(10th Edition)
Chapter 5:
Data Mining
Business Intelligence and Analytics:
Systems for Decision Support
(10th Edition)
Copyright © 2014 Pearson Education, Inc. 5-2
Learning Objectives
for business intelligence
business analytics and data mining
nge of applications of data
mining
-DM
Copyright © 2014 Pearson Education, Inc. 5-3
Learning Objectives
preprocessing for data mining
mining
software tools
mining
Copyright © 2014 Pearson Education, Inc. 5-4
Opening Vignette…
Cabela’s Reels in More Customers with
Advanced Analytics and Data Mining
Copyright © 2014 Pearson Education, Inc. 5-5
Questions for the
Opening Vignette
1. Why should retailers, especially omni-channel
retailers, pay extra attention to advanced
analytics and data mining?
2. What are the top challenges for multi-channel
retailers? Can you think of other industry
segments that face similar problems/challenges?
3. What are the sources of data that retailers such
as Cabela’s use for their data mining projects?
4. What does it mean to have a “single view of the
customer”? How can it be accomplished?
Copyright © 2014 Pearson Education, Inc. 5-6
Data Mining Concepts/Definitions
Why Data Mining?
ty of quality data on customers, vendors,
transactions, Web, etc.
into data warehouses.
storage capabilities; and decrease in cost.
toward conversion of information
resources into nonphysical form.
Copyright © 2014 Pearson Education, Inc. 5-7
Definition of Data Mining
novel, potentially useful, and ultimately
understandable patterns in data stored in
structured databases. - Fayyad et al., (1996)
valid, novel, potentially useful, understandable.
ction, pattern
analysis, knowledge discovery, information
harvesting, pattern searching, data dredging,…
Copyright © 2014 Pearson Education, Inc. 5-8
S
ta
tis
tic
s
Management Science &
Information Systems
A
rtificia
l In
te
llig
e
n
ce
Databases
Pattern
Recognition
Machine
Learning
Mathematical
Modeling
DATA
MINING
Data Mining is at the Intersection
of Many Disciplines
Copyright © 2014 Pearson Education, Inc. 5-9
ed
data warehouse (not always!).
-server or a
Web-based information systems architecture.
may include soft/unstructured data.
are essential (Web, Parallel processing, etc.)
Data Mining
Characteristics/Objectives
Copyright © 2014 Pearson Education, Inc. 5-10
Application Case 5.1
Smarter Insurance: Infinity P&C Improves
Customer Service and Combats Fraud with
Predictive Analytics
Questions For Discussion
1. How did Infinity P&C improve customer service
with data mining?
2. What were the challenges, the proposed solution,
and the obtained results?
3. What was their implementation strategy? Why is it
important to produce results as early as possible
in data mining studies?
Copyright © 2014 Pearson Education, Inc. 5-11
Data in Data Mining
ta: a collection of facts usually obtained as the result of
experiences, observations, or experiments.
and knowledge are derived).
Data
Categorical Numerical
Nominal Ordinal Interval Ratio
Structured
Unstructured or
Semi-Structured
MultimediaTextual HTML/XML
Copyright © 2014 Pearson Education, Inc. 5-12
or
symbolic) relationship among data items
What Does DM Do?
How Does it Work?
Copyright © 2014 Pearson Education, Inc. 5-13
Application Case 5.2
Harnessing Analytics to Combat Crime:
Predictive Analytics Helps Memphis
Police Department Pinpoint Crime and
Focus Police Resources
Questions For Discussion
1. How did the Memphis Police Department
used data mining to better combat crime?
2. What were the challenges, the proposed
solution, and the obtained results?
Copyright © 2014 Pearson Education, Inc. 5-14
A Taxonomy for
Data Mining Tasks
Data Mining
Prediction
Classification
Regression
Clustering
Association
Link analysis
Sequence analysis
Learning Method Popular Algorithms
Supervised
Supervised
Supervised
Unsupervised
Unsupervised
Unsupervised
Unsupervised
Decision trees, ANN/MLP, SVM, Rough
sets, Genetic Algorithms
Linear/Nonlinear Regression, Regression
trees, ANN/MLP, SVM
Expectation Maximization, Apriory
Algorithm, Graph-based Matching
Apriory Algorithm, FP-Growth technique
K-means, ANN/SOM
Outlier analysis Unsupervised K-means, Expectation
Maximization (EM)
Apriory, OneR, ZeroR, Eclat
Classification and Regression Trees,
ANN, SVM, Genetic Algorithms
Copyright © 2014 Pearson Education, Inc. 5-15
Data Mining Tasks (cont.)
-series forecasting
ualization
-driven data mining
-driven data mining
Copyright © 2014 Pearson Education, Inc. 5-16
Data Mining Applications
ze return on marketing campaigns
-, up-selling)
etecting fraudulent transactions
-, up-selling)
Copyright © 2014 Pearson Education, Inc. 5-17
Data Mining Applications (cont.)
ptimize inventory levels at different locations
chinery failures
the use manufacturing capacity
Copyright © 2014 Pearson Education, Inc. 5-18
Data Mining Applications (cont.)
Copyright © 2014 Pearson Education, Inc. 5-19
Data Mining Applications (cont.)
Increasingly more
popular application areas
for data mining
Copyright © 2014 Pearson Education, Inc. 5-20
Application Case 5.3
A Mine on Terrorist Funding
Questions For Discussion
1. How can data mining be used to fight
terrorism? Comment on what else can be
done beyond what is covered in this short
application case.
2. Do you think data mining, while essential
for fighting terrorist cells, also jeopardizes
individuals’ rights of privacy?
Copyright © 2014 Pearson Education, Inc. 5-21
Data Mining Process
-DM (Cross-Industry Standard Process
for Data Mining)
Assess)
Copyright © 2014 Pearson Education, Inc. 5-22
Data Mining Process
Source: KDNuggets.com
Copyright © 2014 Pearson Education, Inc. 5-23
Data Mining Process: CRISP-DM
Data Sources
Business
Understanding
Data
Preparation
Model
Building
Testing and
Evaluation
Deployment
Data
Understanding
6
1 2
3
5
4
Copyright © 2014 Pearson Education, Inc. 5-24
Data Mining Process: CRISP-DM
Step 1: Business Understanding
Step 2: Data Understanding
Step 3: Data Preparation (!)
Step 4: Model Building
Step 5: Testing and Evaluation
Step 6: Deployment
experimental (DM: art versus science?)
Accounts for
~85% of total
project time
Copyright © 2014 Pearson Education, Inc. 5-25
Data Preparation – A Critical DM
Task
Data Consolidation
Data Cleaning
Data Transformation
Data Reduction
Well-formed
Data
Real-world
Data
· Collect data
· Select data
· Integrate data
· Impute missing values
· Reduce noise in data
· Eliminate inconsistencies
· Normalize data
· Discretize/aggregate data
· Construct new attributes
· Reduce number of variables
· Reduce number of cases
· Balance skewed data
Copyright © 2014 Pearson Education, Inc. 5-26
Data Mining Process: SEMMA
Sample
(Generate a representative
sample of the data)
Modify
(Select variables, transform
variable representations)
Explore
(Visualization and basic
description of the data)
Model
(Use variety of statistical and
machine learning models )
Assess
(Evaluate the accuracy and
usefulness of the models)
SEMMA
Copyright © 2014 Pearson Education, Inc. 5-27
Application Case 5.4
Data Mining in Cancer Research
Questions For Discussion
1. How can data mining be used for
ultimately curing illnesses like cancer?
2. What do you think are the promises and
major challenges for data miners in
contributing to medical and biological
research endeavors?
Copyright © 2014 Pearson Education, Inc. 5-28
Data Mining Methods:
Classification
-learning family
or ordinal) in nature
Copyright © 2014 Pearson Education, Inc. 5-29
ctive accuracy
Assessment Methods for
Classification
Copyright © 2014 Pearson Education, Inc. 5-30
Accuracy of Classification Models
accuracy estimation is the confusion matrix
True
Positive
Count (TP)
False
Positive
Count (FP)
True
Negative
Count (TN)
False
Negative
Count (FN)
True Class
Positive Negative
P
o
s
it
iv
e
N
e
g
a
ti
v
e
P
re
d
ic
te
d
C
la
s
s
FNTP
TP
RatePositiveTrue
FPTN
TN
RateNegativeTrue
FNFPTNTP
TNTP
Accuracy
FPTP
TP
recision
FNTP
TP
callRe
Copyright © 2014 Pearson Education, Inc. 5-31
Estimation Methodologies for
Classification
estimation)
(~70%) and testing (30%)
is split into three sub-sets (training
[~60%], validation [~20%], testing [~20%])
Preprocessed
Data
Training Data
Testing Data
Model
Development
Model
Assessment
(scoring)
2/3
1/3
Classifier
Prediction
Accuracy
Copyright © 2014 Pearson Education, Inc. 5-32
Estimation Methodologies for
Classification
-Fold Cross Validation (rotation estimation)
subsets as training
prediction accuracy training
-one-out, bootstrapping, jackknifing
Copyright © 2014 Pearson Education, Inc. 5-33
Estimation Methodologies for
Classification – ROC Curve
10.90.80.70.60.50.40.30.20.10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
1
0.9
0.8
False Positive Rate (1 - Specificity)
T
ru
e
P
o
si
tiv
e
R
a
te
(
S
e
n
si
tiv
ity
) A
B
C
Copyright © 2014 Pearson Education, Inc. 5-34
Classification Techniques
-based reasoning
an classifiers
Copyright © 2014 Pearson Education, Inc. 5-35
Decision Trees
1. Create a root node and assign all of the training
data to it.
2. Select the best splitting attribute.
3. Add a branch to the root node for each value of
the split. Split the data into mutually exclusive
subsets along the lines of the specific split.
4. Repeat the steps 2 and 3 for each and every leaf
node until the stopping criteria is reached.
A general
algorithm
for
decision
tree
building
division consists of examples from one class
Copyright © 2014 Pearson Education, Inc. 5-36
Decision Trees
thms mainly differ on
1. Splitting criteria
2. Stopping criteria
3. Pruning (generalization method)
-pruning versus post-pruning
Copyright © 2014 Pearson Education, Inc. 5-37
Decision Trees
class as a result of a decision to branch along
a particular attribute/value
extent of uncertainty or randomness of a
particular attribute/value split
-square statistics (used in CHAID)
Copyright © 2014 Pearson Education, Inc. 5-38
Application Case 5.5
2degrees Gets a 1275 Percent Boost in
Churn Identification
Questions For Discussion
1. What does 2degrees do? Why is it important for
2degrees to accurately identify churn?
2. What were the challenges, the proposed solution,
and the obtained results?
3. How can data mining help in identifying customer
churn? How do some companies do it without using
data mining tools and techniques?
4. Why is it important for Delta Lloyd Group to comply
with industry regulations?
Copyright © 2014 Pearson Education, Inc. 5-39
Cluster Analysis for Data Mining
natural groupings of things
-learning family
data, then assigns new instances
Copyright © 2014 Pearson Education, Inc. 5-40
Cluster Analysis for Data Mining
lts may be used to
classes for targeting/diagnostic purposes
of populations
f problems
for other data mining methods
rare-event detection)
Copyright © 2014 Pearson Education, Inc. 5-41
Cluster Analysis for Data Mining
luding both
hierarchical and nonhierarchical), such as k-
means, k-modes, and so on.
[ART], self-organizing map [SOM])
-means algorithm)
Copyright © 2014 Pearson Education, Inc. 5-42
Cluster Analysis for Data Mining
it
use of a distance measure to calculate the
closeness between pairs of items.
distance
Copyright © 2014 Pearson Education, Inc. 5-43
Cluster Analysis for Data Mining
-Means Clustering Algorithm
-determined number of clusters
Step 1: Randomly generate k random points as initial
cluster centers.
Step 2: Assign each point to the nearest cluster center.
Step 3: Re-compute the new cluster centers.
Repetition step: Repeat steps 3 and 4 until some
convergence criterion is met (usually that the
assignment of points to clusters becomes stable).
Copyright © 2014 Pearson Education, Inc. 5-44
Cluster Analysis for Data Mining -
k-Means Clustering Algorithm
Step 1 Step 2 Step 3
Copyright © 2014 Pearson Education, Inc. 5-45
Association Rule Mining
between variables (items or events)
ne learning family
ordinary people, such as the famous
“relationship between diapers and beers!”
Copyright © 2014 Pearson Education, Inc. 5-46
Association Rule Mining
-of-sale transaction data
“Customer who bought a lap-top computer and
a virus protection software, also bought
extended service plan 70 percent of the time."
from each other!
Copyright © 2014 Pearson Education, Inc. 5-47
Association Rule Mining
mining include
-marketing, cross-selling, store
design, catalog design, e-commerce site design,
optimization of online advertising, product pricing,
and sales/promotion configuration
illnesses; diagnosis and patient characteristics and
treatments (to be used in medical DSS); and genes
and their functions (to be used in genomics projects)
Copyright © 2014 Pearson Education, Inc. 5-48
Association Rule Mining
X, Y: products and/or services
X: Left-hand-side (LHS)
Y: Right-hand-side (RHS)
S: Support: how often X and Y go together
C: Confidence: how often Y go together with the X
{Extended Service Plan} [30%, 70%]
Copyright © 2014 Pearson Education, Inc. 5-49
Association Rule Mining
association rules
-Growth
dentify the frequent
item sets, which are, then converted to
association rules
Copyright © 2014 Pearson Education, Inc. 5-50
Association Rule Mining
minimum number of the itemsets
-up approach
(the size of frequent subsets increases from one-
item subsets to two-item subsets, then three-item
subsets, and so on), and
el are tested
against the data for minimum support.
--
Copyright © 2014 Pearson Education, Inc. 5-51
Association Rule Mining
Apriori Algorithm
Itemset
(SKUs)
Support
Transaction
No
SKUs
(Item No)
1
1
1
1
1
1
1, 2, 3, 4
2, 3, 4
2, 3
1, 2, 4
1, 2, 3, 4
2, 4
Raw Transaction Data
1
2
3
4
3
6
4
5
Itemset
(SKUs)
Support
1, 2
1, 3
1, 4
2, 3
3
2
3
4
3, 4
5
3
2, 4
Itemset
(SKUs)
Support
1, 2, 4
2, 3, 4
3
3
One-item Itemsets Two-item Itemsets Three-item Itemsets
Copyright © 2014 Pearson Education, Inc. 5-52
Data Mining
Software
(formerly Clementine)
- Enterprise Miner
- Intelligent Miner
– Statistica Data
Miner
0 50 100 150 200 250 300
Predixion Software (3)
WordStat (3)
11 Ants Analytics (4)
Teradata Miner (4)
RapidInsight/Veera (5)
Angoss (7)
SAP (BusinessObjects/Sybase/Hana)(7)
XLSTAT (7)
Salford SPM/CART/MARS/TreeNet/RF (9)
Revolution Computing (11)
C4.5/C5.0/See5 (13)
Bayesia (14)
KXEN (14)
Zementis (14)
Stata (15)
IBM Cognos (16)
Miner3D (19)
Mathematica (23)
JMP (32)
Other commercial software (32)
Oracle Data Miner (35)
Tableau (35)
TIBCO Spotfire / S+ / Miner (37)
Other free software (39)
Microsoft SQL Server (40)
Orange (42)
SAS Enterprise Miner (46)
IBM SPSS Modeler (54)
IBM SPSS Statistics (62)
MATLAB (80)
Rapid-I RapidAnalytics (83)
SAS (101)
StatSoft Statistica (112)
Weka / Pentaho (118)
KNIME (174)
Rapid-I RapidMiner (213)
Excel (238)
R (245)
Source: KDNuggets.com
Copyright © 2014 Pearson Education, Inc. 5-53
Big Data Software Tools
and Platforms
0 10 20 30 40 50 60 70 80
Other Hadoop-based tools (10)
Other Big Data software (21)
NoSQL databases (33)
Amazon Web Services (AWS) (36)
Apache Hadoop/Hbase/Pig/Hive (67)
0 50 100 150 200 250 300
F# (5)
Awk/Gawk/Shell (31)
Perl (37)
Other languages (57)
C/C++ (66)
Python (119)
Java (138)
SQL (185)
R (245)
Copyright © 2014 Pearson Education, Inc. 5-54
Application Case 5.6
Data Mining Goes to Hollywood:
Predicting Financial Success of Movies
Questions For Discussion
oblem
Copyright © 2014 Pearson Education, Inc. 5-55
Application Case 5.6
Data Mining Goes to Hollywood!
Independent Variable
Number of
Values
Possible Values
MPAA Rating 5 G, PG, PG-13, R, NR
Competition 3 High, Medium, Low
Star value 3 High, Medium, Low
Genre 10
Sci-Fi, Historic Epic Drama,
Modern Drama, Politically
Related, Thriller, Horror,
Comedy, Cartoon, Action,
Documentary
Special effects 3 High, Medium, Low
Sequel 1 Yes, No
Number of screens 1 Positive integer
Class No. 1 2 3 4 5 6 7 8 9
Range
(in $Millions)
< 1
(Flop)
> 1
< 10
> 10
< 20
> 20
< 40
> 40
< 65
> 65
< 100
> 100
< 150
> 150
< 200
> 200
(Blockbuster)
Dependent
Variable
Independent
Variables
A Typical
Classification
Problem
Copyright © 2014 Pearson Education, Inc. 5-56
Application Case 5.6
Data Mining Goes to Hollywood!
Model
Development
process
Model
Assessment
process
The DM
Process
Map in
IBM
SPSS
Modeler
Copyright © 2014 Pearson Education, Inc. 5-57
Application Case 5.6
Data Mining Goes to Hollywood!
Prediction Models
Individual Models Ensemble Models
Performance
Measure SVM ANN C&RT
Random
Forest
Boosted
Tree
Fusion
(Average)
Count (Bingo) 192 182 140 189 187 194
Count (1-Away) 104 120 126 121 104 120
Accuracy (% Bingo) 55.49% 52.60% 40.46% 54.62% 54.05%
56.07%
Accuracy (% 1-Away) 85.55% 87.28% 76.88% 89.60% 84.10%
90.75%
Standard deviation 0.93 0.87 1.05 0.76 0.84 0.63
* Training set: 1998 – 2005 movies; Test set: 2006 movies
Copyright © 2014 Pearson Education, Inc. 5-58
Application Case 5.7
Data Mining & Privacy Issues
Predicting Customer Buying Patterns—
The Target Story
Questions For Discussion
1. What do you think about data mining and its
implication for privacy? What is the threshold
between discovery of knowledge and
infringement of privacy?
2. Did Target go too far? Did it do anything
illegal? What do you think Target should have
done? What do you think Target should do
next (quit these types of practices)?
Copyright © 2014 Pearson Education, Inc. 5-59
Data Mining Myths
provides instant solutions/predictions
degrees
customer data
ther name for the good-old statistics
Copyright © 2014 Pearson Education, Inc. 5-60
Common Data Mining Blunders
1. Selecting the wrong problem for data mining
2. Ignoring what your sponsor thinks data mining
is and what it really can/cannot do
3. Not leaving insufficient time for data
acquisition, selection and preparation
4. Looking only at aggregated results and not at
individual records/predictions
5. Being sloppy about keeping track of the data
mining procedure and results
6. …more in the book
Copyright © 2014 Pearson Education, Inc. 5-61
End of the Chapter
Copyright © 2014 Pearson Education, Inc. 5-62
All rights reserved. No part of this publication may be
reproduced,
stored in a retrieval system, or transmitted, in any form or by
any
means, electronic, mechanical, photocopying, recording, or
otherwise,
without the prior written permission of the publisher. Printed in
the
United States of America.
Copyright © 2014 Pearson Education, Inc.

More Related Content

Similar to Business Intelligence and Analytics Systems for Decision .docx

Technological Hpothesis Research Plan In The CRM Future1
Technological Hpothesis Research Plan In The CRM Future1Technological Hpothesis Research Plan In The CRM Future1
Technological Hpothesis Research Plan In The CRM Future1Ram Srivastava
 
Gathering Information and Measuring Market Demand
Gathering Information and Measuring Market DemandGathering Information and Measuring Market Demand
Gathering Information and Measuring Market DemandCarlo Muyot
 
Data-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing StrategiesData-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing StrategiesData Blueprint
 
Data-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesData-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesDATAVERSITY
 
Impact of big data on analytics
Impact of big data on analyticsImpact of big data on analytics
Impact of big data on analyticsCapgemini
 
Learn How to Make Machine Learning Work
Learn How to Make Machine Learning WorkLearn How to Make Machine Learning Work
Learn How to Make Machine Learning WorkiTrainMalaysia1
 
Assessment of Constraints to Data Use
Assessment of Constraints to Data UseAssessment of Constraints to Data Use
Assessment of Constraints to Data UseMEASURE Evaluation
 
Moving Beyond Batch: Transactional Databases for Real-time Data
Moving Beyond Batch: Transactional Databases for Real-time DataMoving Beyond Batch: Transactional Databases for Real-time Data
Moving Beyond Batch: Transactional Databases for Real-time DataVoltDB
 
Data Analytics Course in Pitampura
Data Analytics Course in Pitampura Data Analytics Course in Pitampura
Data Analytics Course in Pitampura RidhimaChauhan10
 
Bus169 Kotler Chapter 04
Bus169 Kotler Chapter 04Bus169 Kotler Chapter 04
Bus169 Kotler Chapter 04Alwyn Lau
 
PoT - probeer de mogelijkheden van datamining zelf uit 30-10-2014
PoT - probeer de mogelijkheden van datamining zelf uit 30-10-2014PoT - probeer de mogelijkheden van datamining zelf uit 30-10-2014
PoT - probeer de mogelijkheden van datamining zelf uit 30-10-2014Daniel Westzaan
 
Business Driven Information Systems 8th Edition by Paige Baltzan solution man...
Business Driven Information Systems 8th Edition by Paige Baltzan solution man...Business Driven Information Systems 8th Edition by Paige Baltzan solution man...
Business Driven Information Systems 8th Edition by Paige Baltzan solution man...ssuserf63bd7
 
Text Analytics Past, Present & Future: An Industry View
Text Analytics Past, Present & Future: An Industry ViewText Analytics Past, Present & Future: An Industry View
Text Analytics Past, Present & Future: An Industry ViewSeth Grimes
 
3 marketing research
3 marketing research3 marketing research
3 marketing researchrrhhoohhii
 
Idc info brief-choosing_dbms_to_address_challenges_of_the_third-platform
Idc info brief-choosing_dbms_to_address_challenges_of_the_third-platformIdc info brief-choosing_dbms_to_address_challenges_of_the_third-platform
Idc info brief-choosing_dbms_to_address_challenges_of_the_third-platformRobert Bira
 
Impact of Data Science
Impact of Data Science Impact of Data Science
Impact of Data Science kumari36
 

Similar to Business Intelligence and Analytics Systems for Decision .docx (20)

Technological Hpothesis Research Plan In The CRM Future1
Technological Hpothesis Research Plan In The CRM Future1Technological Hpothesis Research Plan In The CRM Future1
Technological Hpothesis Research Plan In The CRM Future1
 
Gathering Information and Measuring Market Demand
Gathering Information and Measuring Market DemandGathering Information and Measuring Market Demand
Gathering Information and Measuring Market Demand
 
Aanlytics on Telecom
Aanlytics on TelecomAanlytics on Telecom
Aanlytics on Telecom
 
Data-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing StrategiesData-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing Strategies
 
Data-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesData-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse Strategies
 
Impact of big data on analytics
Impact of big data on analyticsImpact of big data on analytics
Impact of big data on analytics
 
Learn How to Make Machine Learning Work
Learn How to Make Machine Learning WorkLearn How to Make Machine Learning Work
Learn How to Make Machine Learning Work
 
Assessment of Constraints to Data Use
Assessment of Constraints to Data UseAssessment of Constraints to Data Use
Assessment of Constraints to Data Use
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Moving Beyond Batch: Transactional Databases for Real-time Data
Moving Beyond Batch: Transactional Databases for Real-time DataMoving Beyond Batch: Transactional Databases for Real-time Data
Moving Beyond Batch: Transactional Databases for Real-time Data
 
Data Analytics Course in Pitampura
Data Analytics Course in Pitampura Data Analytics Course in Pitampura
Data Analytics Course in Pitampura
 
Lecture 4 (using information technology for competitive advantage)
Lecture 4 (using information technology for competitive advantage)Lecture 4 (using information technology for competitive advantage)
Lecture 4 (using information technology for competitive advantage)
 
Lecture 4 (using information technology for competitive advantage)
Lecture 4 (using information technology for competitive advantage)Lecture 4 (using information technology for competitive advantage)
Lecture 4 (using information technology for competitive advantage)
 
Bus169 Kotler Chapter 04
Bus169 Kotler Chapter 04Bus169 Kotler Chapter 04
Bus169 Kotler Chapter 04
 
PoT - probeer de mogelijkheden van datamining zelf uit 30-10-2014
PoT - probeer de mogelijkheden van datamining zelf uit 30-10-2014PoT - probeer de mogelijkheden van datamining zelf uit 30-10-2014
PoT - probeer de mogelijkheden van datamining zelf uit 30-10-2014
 
Business Driven Information Systems 8th Edition by Paige Baltzan solution man...
Business Driven Information Systems 8th Edition by Paige Baltzan solution man...Business Driven Information Systems 8th Edition by Paige Baltzan solution man...
Business Driven Information Systems 8th Edition by Paige Baltzan solution man...
 
Text Analytics Past, Present & Future: An Industry View
Text Analytics Past, Present & Future: An Industry ViewText Analytics Past, Present & Future: An Industry View
Text Analytics Past, Present & Future: An Industry View
 
3 marketing research
3 marketing research3 marketing research
3 marketing research
 
Idc info brief-choosing_dbms_to_address_challenges_of_the_third-platform
Idc info brief-choosing_dbms_to_address_challenges_of_the_third-platformIdc info brief-choosing_dbms_to_address_challenges_of_the_third-platform
Idc info brief-choosing_dbms_to_address_challenges_of_the_third-platform
 
Impact of Data Science
Impact of Data Science Impact of Data Science
Impact of Data Science
 

More from RAHUL126667

Applying the Four Principles Case StudyPart 1 Chart (60 points)B.docx
Applying the Four Principles Case StudyPart 1 Chart (60 points)B.docxApplying the Four Principles Case StudyPart 1 Chart (60 points)B.docx
Applying the Four Principles Case StudyPart 1 Chart (60 points)B.docxRAHUL126667
 
APPLYING ANALYTIC TECHNIQUES TO BUSINESS1APPLYING ANALYTIC T.docx
APPLYING ANALYTIC TECHNIQUES TO BUSINESS1APPLYING ANALYTIC T.docxAPPLYING ANALYTIC TECHNIQUES TO BUSINESS1APPLYING ANALYTIC T.docx
APPLYING ANALYTIC TECHNIQUES TO BUSINESS1APPLYING ANALYTIC T.docxRAHUL126667
 
Apply the general overview of court structure in the United States (.docx
Apply the general overview of court structure in the United States (.docxApply the general overview of court structure in the United States (.docx
Apply the general overview of court structure in the United States (.docxRAHUL126667
 
Apply the Paramedic Method to the following five selections.docx
Apply the Paramedic Method to the following five selections.docxApply the Paramedic Method to the following five selections.docx
Apply the Paramedic Method to the following five selections.docxRAHUL126667
 
Application of Standards of CareDiscuss the standard(s) of c.docx
Application of Standards of CareDiscuss the standard(s) of c.docxApplication of Standards of CareDiscuss the standard(s) of c.docx
Application of Standards of CareDiscuss the standard(s) of c.docxRAHUL126667
 
Application of the Nursing Process to Deliver Culturally Compe.docx
Application of the Nursing Process to Deliver Culturally Compe.docxApplication of the Nursing Process to Deliver Culturally Compe.docx
Application of the Nursing Process to Deliver Culturally Compe.docxRAHUL126667
 
Application Ware House-Application DesignAppointyAppoi.docx
Application Ware House-Application DesignAppointyAppoi.docxApplication Ware House-Application DesignAppointyAppoi.docx
Application Ware House-Application DesignAppointyAppoi.docxRAHUL126667
 
Applied Psycholinguistics 31 (2010), 413–438doi10.1017S014.docx
Applied Psycholinguistics 31 (2010), 413–438doi10.1017S014.docxApplied Psycholinguistics 31 (2010), 413–438doi10.1017S014.docx
Applied Psycholinguistics 31 (2010), 413–438doi10.1017S014.docxRAHUL126667
 
Application of the Belmont PrinciplesFirst, identify your .docx
Application of the Belmont PrinciplesFirst, identify your .docxApplication of the Belmont PrinciplesFirst, identify your .docx
Application of the Belmont PrinciplesFirst, identify your .docxRAHUL126667
 
APPLE is only one of the multiple companies that have approved and d.docx
APPLE is only one of the multiple companies that have approved and d.docxAPPLE is only one of the multiple companies that have approved and d.docx
APPLE is only one of the multiple companies that have approved and d.docxRAHUL126667
 
Appliance Warehouse Service Plan.The discussion focuses on the.docx
Appliance Warehouse Service Plan.The discussion focuses on the.docxAppliance Warehouse Service Plan.The discussion focuses on the.docx
Appliance Warehouse Service Plan.The discussion focuses on the.docxRAHUL126667
 
Applicants must submit a 500 essay describing how current or future .docx
Applicants must submit a 500 essay describing how current or future .docxApplicants must submit a 500 essay describing how current or future .docx
Applicants must submit a 500 essay describing how current or future .docxRAHUL126667
 
Apple Inc., Microsoft Corp., Berkshire Hathaway, and Facebook ha.docx
Apple Inc., Microsoft Corp., Berkshire Hathaway, and Facebook ha.docxApple Inc., Microsoft Corp., Berkshire Hathaway, and Facebook ha.docx
Apple Inc., Microsoft Corp., Berkshire Hathaway, and Facebook ha.docxRAHUL126667
 
Appcelerator Titanium was released in December 2008, and has been st.docx
Appcelerator Titanium was released in December 2008, and has been st.docxAppcelerator Titanium was released in December 2008, and has been st.docx
Appcelerator Titanium was released in December 2008, and has been st.docxRAHUL126667
 
APA Style300 words per topic2 peer reviewed resources per to.docx
APA Style300 words per topic2 peer reviewed resources per to.docxAPA Style300 words per topic2 peer reviewed resources per to.docx
APA Style300 words per topic2 peer reviewed resources per to.docxRAHUL126667
 
Ape and Human Cognition What’s theDifferenceMichael To.docx
Ape and Human Cognition What’s theDifferenceMichael To.docxApe and Human Cognition What’s theDifferenceMichael To.docx
Ape and Human Cognition What’s theDifferenceMichael To.docxRAHUL126667
 
Apply what you have learned about Health Promotion and Disease P.docx
Apply what you have learned about Health Promotion and Disease P.docxApply what you have learned about Health Promotion and Disease P.docx
Apply what you have learned about Health Promotion and Disease P.docxRAHUL126667
 
APA formatCite there peer-reviewed, scholarly references300 .docx
APA formatCite there peer-reviewed, scholarly references300 .docxAPA formatCite there peer-reviewed, scholarly references300 .docx
APA formatCite there peer-reviewed, scholarly references300 .docxRAHUL126667
 
APA formatCite 2 peer-reviewed reference175-265 word count.docx
APA formatCite 2 peer-reviewed reference175-265 word count.docxAPA formatCite 2 peer-reviewed reference175-265 word count.docx
APA formatCite 2 peer-reviewed reference175-265 word count.docxRAHUL126667
 
APA formatCite at least 1 referenceWrite a 175- to 265-w.docx
APA formatCite at least 1 referenceWrite a 175- to 265-w.docxAPA formatCite at least 1 referenceWrite a 175- to 265-w.docx
APA formatCite at least 1 referenceWrite a 175- to 265-w.docxRAHUL126667
 

More from RAHUL126667 (20)

Applying the Four Principles Case StudyPart 1 Chart (60 points)B.docx
Applying the Four Principles Case StudyPart 1 Chart (60 points)B.docxApplying the Four Principles Case StudyPart 1 Chart (60 points)B.docx
Applying the Four Principles Case StudyPart 1 Chart (60 points)B.docx
 
APPLYING ANALYTIC TECHNIQUES TO BUSINESS1APPLYING ANALYTIC T.docx
APPLYING ANALYTIC TECHNIQUES TO BUSINESS1APPLYING ANALYTIC T.docxAPPLYING ANALYTIC TECHNIQUES TO BUSINESS1APPLYING ANALYTIC T.docx
APPLYING ANALYTIC TECHNIQUES TO BUSINESS1APPLYING ANALYTIC T.docx
 
Apply the general overview of court structure in the United States (.docx
Apply the general overview of court structure in the United States (.docxApply the general overview of court structure in the United States (.docx
Apply the general overview of court structure in the United States (.docx
 
Apply the Paramedic Method to the following five selections.docx
Apply the Paramedic Method to the following five selections.docxApply the Paramedic Method to the following five selections.docx
Apply the Paramedic Method to the following five selections.docx
 
Application of Standards of CareDiscuss the standard(s) of c.docx
Application of Standards of CareDiscuss the standard(s) of c.docxApplication of Standards of CareDiscuss the standard(s) of c.docx
Application of Standards of CareDiscuss the standard(s) of c.docx
 
Application of the Nursing Process to Deliver Culturally Compe.docx
Application of the Nursing Process to Deliver Culturally Compe.docxApplication of the Nursing Process to Deliver Culturally Compe.docx
Application of the Nursing Process to Deliver Culturally Compe.docx
 
Application Ware House-Application DesignAppointyAppoi.docx
Application Ware House-Application DesignAppointyAppoi.docxApplication Ware House-Application DesignAppointyAppoi.docx
Application Ware House-Application DesignAppointyAppoi.docx
 
Applied Psycholinguistics 31 (2010), 413–438doi10.1017S014.docx
Applied Psycholinguistics 31 (2010), 413–438doi10.1017S014.docxApplied Psycholinguistics 31 (2010), 413–438doi10.1017S014.docx
Applied Psycholinguistics 31 (2010), 413–438doi10.1017S014.docx
 
Application of the Belmont PrinciplesFirst, identify your .docx
Application of the Belmont PrinciplesFirst, identify your .docxApplication of the Belmont PrinciplesFirst, identify your .docx
Application of the Belmont PrinciplesFirst, identify your .docx
 
APPLE is only one of the multiple companies that have approved and d.docx
APPLE is only one of the multiple companies that have approved and d.docxAPPLE is only one of the multiple companies that have approved and d.docx
APPLE is only one of the multiple companies that have approved and d.docx
 
Appliance Warehouse Service Plan.The discussion focuses on the.docx
Appliance Warehouse Service Plan.The discussion focuses on the.docxAppliance Warehouse Service Plan.The discussion focuses on the.docx
Appliance Warehouse Service Plan.The discussion focuses on the.docx
 
Applicants must submit a 500 essay describing how current or future .docx
Applicants must submit a 500 essay describing how current or future .docxApplicants must submit a 500 essay describing how current or future .docx
Applicants must submit a 500 essay describing how current or future .docx
 
Apple Inc., Microsoft Corp., Berkshire Hathaway, and Facebook ha.docx
Apple Inc., Microsoft Corp., Berkshire Hathaway, and Facebook ha.docxApple Inc., Microsoft Corp., Berkshire Hathaway, and Facebook ha.docx
Apple Inc., Microsoft Corp., Berkshire Hathaway, and Facebook ha.docx
 
Appcelerator Titanium was released in December 2008, and has been st.docx
Appcelerator Titanium was released in December 2008, and has been st.docxAppcelerator Titanium was released in December 2008, and has been st.docx
Appcelerator Titanium was released in December 2008, and has been st.docx
 
APA Style300 words per topic2 peer reviewed resources per to.docx
APA Style300 words per topic2 peer reviewed resources per to.docxAPA Style300 words per topic2 peer reviewed resources per to.docx
APA Style300 words per topic2 peer reviewed resources per to.docx
 
Ape and Human Cognition What’s theDifferenceMichael To.docx
Ape and Human Cognition What’s theDifferenceMichael To.docxApe and Human Cognition What’s theDifferenceMichael To.docx
Ape and Human Cognition What’s theDifferenceMichael To.docx
 
Apply what you have learned about Health Promotion and Disease P.docx
Apply what you have learned about Health Promotion and Disease P.docxApply what you have learned about Health Promotion and Disease P.docx
Apply what you have learned about Health Promotion and Disease P.docx
 
APA formatCite there peer-reviewed, scholarly references300 .docx
APA formatCite there peer-reviewed, scholarly references300 .docxAPA formatCite there peer-reviewed, scholarly references300 .docx
APA formatCite there peer-reviewed, scholarly references300 .docx
 
APA formatCite 2 peer-reviewed reference175-265 word count.docx
APA formatCite 2 peer-reviewed reference175-265 word count.docxAPA formatCite 2 peer-reviewed reference175-265 word count.docx
APA formatCite 2 peer-reviewed reference175-265 word count.docx
 
APA formatCite at least 1 referenceWrite a 175- to 265-w.docx
APA formatCite at least 1 referenceWrite a 175- to 265-w.docxAPA formatCite at least 1 referenceWrite a 175- to 265-w.docx
APA formatCite at least 1 referenceWrite a 175- to 265-w.docx
 

Recently uploaded

A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 

Recently uploaded (20)

Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 

Business Intelligence and Analytics Systems for Decision .docx

  • 1. Business Intelligence and Analytics: Systems for Decision Support (10th Edition) Chapter 5: Data Mining Business Intelligence and Analytics: Systems for Decision Support (10th Edition) Copyright © 2014 Pearson Education, Inc. 5-2 Learning Objectives for business intelligence business analytics and data mining nge of applications of data mining
  • 2. -DM Copyright © 2014 Pearson Education, Inc. 5-3 Learning Objectives preprocessing for data mining mining software tools mining Copyright © 2014 Pearson Education, Inc. 5-4 Opening Vignette… Cabela’s Reels in More Customers with Advanced Analytics and Data Mining
  • 3. Copyright © 2014 Pearson Education, Inc. 5-5 Questions for the Opening Vignette 1. Why should retailers, especially omni-channel retailers, pay extra attention to advanced analytics and data mining? 2. What are the top challenges for multi-channel retailers? Can you think of other industry segments that face similar problems/challenges? 3. What are the sources of data that retailers such as Cabela’s use for their data mining projects? 4. What does it mean to have a “single view of the customer”? How can it be accomplished? Copyright © 2014 Pearson Education, Inc. 5-6
  • 4. Data Mining Concepts/Definitions Why Data Mining? ty of quality data on customers, vendors, transactions, Web, etc. into data warehouses. storage capabilities; and decrease in cost. toward conversion of information resources into nonphysical form. Copyright © 2014 Pearson Education, Inc. 5-7 Definition of Data Mining novel, potentially useful, and ultimately understandable patterns in data stored in structured databases. - Fayyad et al., (1996) valid, novel, potentially useful, understandable.
  • 5. ction, pattern analysis, knowledge discovery, information harvesting, pattern searching, data dredging,… Copyright © 2014 Pearson Education, Inc. 5-8 S ta tis tic s Management Science & Information Systems A rtificia l In te llig e n ce
  • 6. Databases Pattern Recognition Machine Learning Mathematical Modeling DATA MINING Data Mining is at the Intersection of Many Disciplines Copyright © 2014 Pearson Education, Inc. 5-9 ed data warehouse (not always!). -server or a Web-based information systems architecture. may include soft/unstructured data.
  • 7. are essential (Web, Parallel processing, etc.) Data Mining Characteristics/Objectives Copyright © 2014 Pearson Education, Inc. 5-10 Application Case 5.1 Smarter Insurance: Infinity P&C Improves Customer Service and Combats Fraud with Predictive Analytics Questions For Discussion 1. How did Infinity P&C improve customer service with data mining? 2. What were the challenges, the proposed solution, and the obtained results? 3. What was their implementation strategy? Why is it important to produce results as early as possible in data mining studies? Copyright © 2014 Pearson Education, Inc. 5-11 Data in Data Mining
  • 8. ta: a collection of facts usually obtained as the result of experiences, observations, or experiments. and knowledge are derived). Data Categorical Numerical Nominal Ordinal Interval Ratio Structured Unstructured or Semi-Structured MultimediaTextual HTML/XML Copyright © 2014 Pearson Education, Inc. 5-12 or symbolic) relationship among data items
  • 9. What Does DM Do? How Does it Work? Copyright © 2014 Pearson Education, Inc. 5-13 Application Case 5.2 Harnessing Analytics to Combat Crime: Predictive Analytics Helps Memphis Police Department Pinpoint Crime and Focus Police Resources Questions For Discussion 1. How did the Memphis Police Department used data mining to better combat crime? 2. What were the challenges, the proposed solution, and the obtained results? Copyright © 2014 Pearson Education, Inc. 5-14 A Taxonomy for Data Mining Tasks Data Mining Prediction
  • 10. Classification Regression Clustering Association Link analysis Sequence analysis Learning Method Popular Algorithms Supervised Supervised Supervised Unsupervised Unsupervised Unsupervised Unsupervised Decision trees, ANN/MLP, SVM, Rough sets, Genetic Algorithms Linear/Nonlinear Regression, Regression trees, ANN/MLP, SVM
  • 11. Expectation Maximization, Apriory Algorithm, Graph-based Matching Apriory Algorithm, FP-Growth technique K-means, ANN/SOM Outlier analysis Unsupervised K-means, Expectation Maximization (EM) Apriory, OneR, ZeroR, Eclat Classification and Regression Trees, ANN, SVM, Genetic Algorithms Copyright © 2014 Pearson Education, Inc. 5-15 Data Mining Tasks (cont.) -series forecasting ualization -driven data mining
  • 12. -driven data mining Copyright © 2014 Pearson Education, Inc. 5-16 Data Mining Applications ze return on marketing campaigns -, up-selling) etecting fraudulent transactions -, up-selling) Copyright © 2014 Pearson Education, Inc. 5-17
  • 13. Data Mining Applications (cont.) ptimize inventory levels at different locations chinery failures the use manufacturing capacity Copyright © 2014 Pearson Education, Inc. 5-18 Data Mining Applications (cont.)
  • 14. Copyright © 2014 Pearson Education, Inc. 5-19 Data Mining Applications (cont.)
  • 15. Increasingly more popular application areas for data mining Copyright © 2014 Pearson Education, Inc. 5-20 Application Case 5.3 A Mine on Terrorist Funding Questions For Discussion 1. How can data mining be used to fight terrorism? Comment on what else can be done beyond what is covered in this short application case. 2. Do you think data mining, while essential for fighting terrorist cells, also jeopardizes individuals’ rights of privacy? Copyright © 2014 Pearson Education, Inc. 5-21
  • 16. Data Mining Process -DM (Cross-Industry Standard Process for Data Mining) Assess) Copyright © 2014 Pearson Education, Inc. 5-22 Data Mining Process Source: KDNuggets.com Copyright © 2014 Pearson Education, Inc. 5-23 Data Mining Process: CRISP-DM Data Sources Business
  • 18. Step 1: Business Understanding Step 2: Data Understanding Step 3: Data Preparation (!) Step 4: Model Building Step 5: Testing and Evaluation Step 6: Deployment experimental (DM: art versus science?) Accounts for ~85% of total project time Copyright © 2014 Pearson Education, Inc. 5-25 Data Preparation – A Critical DM Task Data Consolidation Data Cleaning Data Transformation Data Reduction
  • 19. Well-formed Data Real-world Data · Collect data · Select data · Integrate data · Impute missing values · Reduce noise in data · Eliminate inconsistencies · Normalize data · Discretize/aggregate data · Construct new attributes · Reduce number of variables · Reduce number of cases · Balance skewed data Copyright © 2014 Pearson Education, Inc. 5-26
  • 20. Data Mining Process: SEMMA Sample (Generate a representative sample of the data) Modify (Select variables, transform variable representations) Explore (Visualization and basic description of the data) Model (Use variety of statistical and machine learning models ) Assess (Evaluate the accuracy and usefulness of the models) SEMMA Copyright © 2014 Pearson Education, Inc. 5-27 Application Case 5.4
  • 21. Data Mining in Cancer Research Questions For Discussion 1. How can data mining be used for ultimately curing illnesses like cancer? 2. What do you think are the promises and major challenges for data miners in contributing to medical and biological research endeavors? Copyright © 2014 Pearson Education, Inc. 5-28 Data Mining Methods: Classification -learning family or ordinal) in nature
  • 22. Copyright © 2014 Pearson Education, Inc. 5-29 ctive accuracy Assessment Methods for Classification Copyright © 2014 Pearson Education, Inc. 5-30 Accuracy of Classification Models accuracy estimation is the confusion matrix True
  • 23. Positive Count (TP) False Positive Count (FP) True Negative Count (TN) False Negative Count (FN) True Class Positive Negative P o s it iv e N
  • 25. FNFPTNTP TNTP Accuracy FPTP TP recision FNTP TP callRe Copyright © 2014 Pearson Education, Inc. 5-31 Estimation Methodologies for Classification
  • 26. estimation) (~70%) and testing (30%) is split into three sub-sets (training [~60%], validation [~20%], testing [~20%]) Preprocessed Data Training Data Testing Data Model Development Model Assessment (scoring) 2/3 1/3
  • 27. Classifier Prediction Accuracy Copyright © 2014 Pearson Education, Inc. 5-32 Estimation Methodologies for Classification -Fold Cross Validation (rotation estimation) subsets as training prediction accuracy training -one-out, bootstrapping, jackknifing Copyright © 2014 Pearson Education, Inc. 5-33 Estimation Methodologies for
  • 28. Classification – ROC Curve 10.90.80.70.60.50.40.30.20.10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 1 0.9 0.8 False Positive Rate (1 - Specificity) T ru e P o
  • 29. si tiv e R a te ( S e n si tiv ity ) A B C Copyright © 2014 Pearson Education, Inc. 5-34 Classification Techniques
  • 30. -based reasoning an classifiers Copyright © 2014 Pearson Education, Inc. 5-35 Decision Trees 1. Create a root node and assign all of the training data to it. 2. Select the best splitting attribute. 3. Add a branch to the root node for each value of the split. Split the data into mutually exclusive subsets along the lines of the specific split. 4. Repeat the steps 2 and 3 for each and every leaf node until the stopping criteria is reached. A general algorithm for decision tree building
  • 31. division consists of examples from one class Copyright © 2014 Pearson Education, Inc. 5-36 Decision Trees thms mainly differ on 1. Splitting criteria 2. Stopping criteria 3. Pruning (generalization method) -pruning versus post-pruning Copyright © 2014 Pearson Education, Inc. 5-37 Decision Trees
  • 32. class as a result of a decision to branch along a particular attribute/value extent of uncertainty or randomness of a particular attribute/value split -square statistics (used in CHAID) Copyright © 2014 Pearson Education, Inc. 5-38 Application Case 5.5 2degrees Gets a 1275 Percent Boost in Churn Identification Questions For Discussion 1. What does 2degrees do? Why is it important for 2degrees to accurately identify churn? 2. What were the challenges, the proposed solution, and the obtained results? 3. How can data mining help in identifying customer churn? How do some companies do it without using data mining tools and techniques?
  • 33. 4. Why is it important for Delta Lloyd Group to comply with industry regulations? Copyright © 2014 Pearson Education, Inc. 5-39 Cluster Analysis for Data Mining natural groupings of things -learning family data, then assigns new instances Copyright © 2014 Pearson Education, Inc. 5-40 Cluster Analysis for Data Mining lts may be used to classes for targeting/diagnostic purposes
  • 34. of populations f problems for other data mining methods rare-event detection) Copyright © 2014 Pearson Education, Inc. 5-41 Cluster Analysis for Data Mining luding both hierarchical and nonhierarchical), such as k- means, k-modes, and so on. [ART], self-organizing map [SOM]) -means algorithm) Copyright © 2014 Pearson Education, Inc. 5-42 Cluster Analysis for Data Mining
  • 35. it use of a distance measure to calculate the closeness between pairs of items. distance Copyright © 2014 Pearson Education, Inc. 5-43 Cluster Analysis for Data Mining -Means Clustering Algorithm -determined number of clusters Step 1: Randomly generate k random points as initial cluster centers. Step 2: Assign each point to the nearest cluster center. Step 3: Re-compute the new cluster centers. Repetition step: Repeat steps 3 and 4 until some convergence criterion is met (usually that the assignment of points to clusters becomes stable).
  • 36. Copyright © 2014 Pearson Education, Inc. 5-44 Cluster Analysis for Data Mining - k-Means Clustering Algorithm Step 1 Step 2 Step 3 Copyright © 2014 Pearson Education, Inc. 5-45 Association Rule Mining between variables (items or events) ne learning family ordinary people, such as the famous “relationship between diapers and beers!” Copyright © 2014 Pearson Education, Inc. 5-46 Association Rule Mining
  • 37. -of-sale transaction data “Customer who bought a lap-top computer and a virus protection software, also bought extended service plan 70 percent of the time." from each other! Copyright © 2014 Pearson Education, Inc. 5-47 Association Rule Mining mining include -marketing, cross-selling, store design, catalog design, e-commerce site design, optimization of online advertising, product pricing, and sales/promotion configuration illnesses; diagnosis and patient characteristics and treatments (to be used in medical DSS); and genes
  • 38. and their functions (to be used in genomics projects) Copyright © 2014 Pearson Education, Inc. 5-48 Association Rule Mining X, Y: products and/or services X: Left-hand-side (LHS) Y: Right-hand-side (RHS) S: Support: how often X and Y go together C: Confidence: how often Y go together with the X {Extended Service Plan} [30%, 70%] Copyright © 2014 Pearson Education, Inc. 5-49
  • 39. Association Rule Mining association rules -Growth dentify the frequent item sets, which are, then converted to association rules Copyright © 2014 Pearson Education, Inc. 5-50 Association Rule Mining minimum number of the itemsets -up approach (the size of frequent subsets increases from one- item subsets to two-item subsets, then three-item subsets, and so on), and el are tested
  • 40. against the data for minimum support. -- Copyright © 2014 Pearson Education, Inc. 5-51 Association Rule Mining Apriori Algorithm Itemset (SKUs) Support Transaction No SKUs (Item No) 1 1 1 1 1 1
  • 41. 1, 2, 3, 4 2, 3, 4 2, 3 1, 2, 4 1, 2, 3, 4 2, 4 Raw Transaction Data 1 2 3 4 3 6 4 5 Itemset (SKUs) Support 1, 2
  • 42. 1, 3 1, 4 2, 3 3 2 3 4 3, 4 5 3 2, 4 Itemset (SKUs) Support 1, 2, 4 2, 3, 4 3 3
  • 43. One-item Itemsets Two-item Itemsets Three-item Itemsets Copyright © 2014 Pearson Education, Inc. 5-52 Data Mining Software (formerly Clementine) - Enterprise Miner - Intelligent Miner – Statistica Data Miner 0 50 100 150 200 250 300 Predixion Software (3) WordStat (3)
  • 44. 11 Ants Analytics (4) Teradata Miner (4) RapidInsight/Veera (5) Angoss (7) SAP (BusinessObjects/Sybase/Hana)(7) XLSTAT (7) Salford SPM/CART/MARS/TreeNet/RF (9) Revolution Computing (11) C4.5/C5.0/See5 (13) Bayesia (14) KXEN (14) Zementis (14) Stata (15) IBM Cognos (16) Miner3D (19) Mathematica (23) JMP (32) Other commercial software (32)
  • 45. Oracle Data Miner (35) Tableau (35) TIBCO Spotfire / S+ / Miner (37) Other free software (39) Microsoft SQL Server (40) Orange (42) SAS Enterprise Miner (46) IBM SPSS Modeler (54) IBM SPSS Statistics (62) MATLAB (80) Rapid-I RapidAnalytics (83) SAS (101) StatSoft Statistica (112) Weka / Pentaho (118) KNIME (174) Rapid-I RapidMiner (213) Excel (238) R (245)
  • 46. Source: KDNuggets.com Copyright © 2014 Pearson Education, Inc. 5-53 Big Data Software Tools and Platforms 0 10 20 30 40 50 60 70 80 Other Hadoop-based tools (10) Other Big Data software (21) NoSQL databases (33) Amazon Web Services (AWS) (36) Apache Hadoop/Hbase/Pig/Hive (67) 0 50 100 150 200 250 300 F# (5) Awk/Gawk/Shell (31) Perl (37) Other languages (57) C/C++ (66) Python (119)
  • 47. Java (138) SQL (185) R (245) Copyright © 2014 Pearson Education, Inc. 5-54 Application Case 5.6 Data Mining Goes to Hollywood: Predicting Financial Success of Movies Questions For Discussion oblem Copyright © 2014 Pearson Education, Inc. 5-55 Application Case 5.6 Data Mining Goes to Hollywood! Independent Variable Number of
  • 48. Values Possible Values MPAA Rating 5 G, PG, PG-13, R, NR Competition 3 High, Medium, Low Star value 3 High, Medium, Low Genre 10 Sci-Fi, Historic Epic Drama, Modern Drama, Politically Related, Thriller, Horror, Comedy, Cartoon, Action, Documentary Special effects 3 High, Medium, Low Sequel 1 Yes, No Number of screens 1 Positive integer Class No. 1 2 3 4 5 6 7 8 9 Range (in $Millions) < 1
  • 49. (Flop) > 1 < 10 > 10 < 20 > 20 < 40 > 40 < 65 > 65 < 100 > 100 < 150 > 150 < 200 > 200 (Blockbuster)
  • 50. Dependent Variable Independent Variables A Typical Classification Problem Copyright © 2014 Pearson Education, Inc. 5-56 Application Case 5.6 Data Mining Goes to Hollywood! Model Development process Model Assessment process The DM Process Map in IBM SPSS Modeler
  • 51. Copyright © 2014 Pearson Education, Inc. 5-57 Application Case 5.6 Data Mining Goes to Hollywood! Prediction Models Individual Models Ensemble Models Performance Measure SVM ANN C&RT Random Forest Boosted Tree Fusion (Average) Count (Bingo) 192 182 140 189 187 194 Count (1-Away) 104 120 126 121 104 120 Accuracy (% Bingo) 55.49% 52.60% 40.46% 54.62% 54.05% 56.07% Accuracy (% 1-Away) 85.55% 87.28% 76.88% 89.60% 84.10% 90.75%
  • 52. Standard deviation 0.93 0.87 1.05 0.76 0.84 0.63 * Training set: 1998 – 2005 movies; Test set: 2006 movies Copyright © 2014 Pearson Education, Inc. 5-58 Application Case 5.7 Data Mining & Privacy Issues Predicting Customer Buying Patterns— The Target Story Questions For Discussion 1. What do you think about data mining and its implication for privacy? What is the threshold between discovery of knowledge and infringement of privacy? 2. Did Target go too far? Did it do anything illegal? What do you think Target should have done? What do you think Target should do next (quit these types of practices)? Copyright © 2014 Pearson Education, Inc. 5-59 Data Mining Myths
  • 53. provides instant solutions/predictions degrees customer data ther name for the good-old statistics Copyright © 2014 Pearson Education, Inc. 5-60 Common Data Mining Blunders 1. Selecting the wrong problem for data mining 2. Ignoring what your sponsor thinks data mining is and what it really can/cannot do 3. Not leaving insufficient time for data acquisition, selection and preparation 4. Looking only at aggregated results and not at individual records/predictions 5. Being sloppy about keeping track of the data mining procedure and results 6. …more in the book
  • 54. Copyright © 2014 Pearson Education, Inc. 5-61 End of the Chapter Copyright © 2014 Pearson Education, Inc. 5-62 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. Printed in the United States of America. Copyright © 2014 Pearson Education, Inc.