5. @DLC-UNISBANK
ANALYTICS FOR STRUCTURED AND
UNSTRUCTURED DATA
CRM analytics for structured data are well developed. Simple statistical
procedures such as computing totals, averages, modes, medians and ranges
are the foundation of many of the descriptive standard reports generated by
CRM users.
Unstructured data are data that do not fit a pre-defined data model: textual
and non-textual files including spreadsheets, documents, PDFs, handwritten
notes, and image, audio, video and multimedia data are unstructured.
Unstructured data often reside outside the business in social media data
repositories, which can be huge, hence the term “big data”. Analytics for
these types of data are still evolving. The most advanced form of
unstructured data analytics currently is text analytics.
6. @DLC-UNISBANK
BIG DATA ANALYTICS
Big data is characterized by 3Vs
Volume. Whilst some big data assets do include structured data (for example, sensor
data), much big data are unstructured. The massive scale and growth of unstructured
data have outpaced traditional storage and analytical solutions. The volume of data is
set to increase dramatically with the advent of the "internet of Things“.
Variety. Big data are collected from new sources that have not been mined for insight
in the past. Traditional analytical processes applied to structured data cannot cope
with the heterogeneity of big data, which includes email, social media posts, video,
images, blogs, location and sensor data.
Velocity. Big data are not just batched data, but also streamed and produced in real
time. Streamed data do not reside quietly in back-office relational databases ready to
be analyzed periodically. Streamed data update continually.
8. @DLC-UNISBANK
Data mining
In the CRM context, data mining can be defined as follows:
Data mining is the application of descriptive and
predictive analytics to large datasets to support the
marketing, sales and service functions.
9. @DLC-UNISBANK
There are two approaches to data mining.
1. Directed data mining (also called supervised, predictive or targeted data
mining) has the goal of predicting some future event or value. The analyst
uses input data to predict a specified output. For example: What is the
probability that customers will respond positively to our next offer? Which
customers are most likely to churn in the next year? What is the profile of
customers who default on payment? Directed data mining stresses
classification, prediction and estimation.
2. Undirected (or unsupervised) data mining is simply exploration of a
dataset to see what can be learned. It is about discovering new patterns in
the data. The analyst is not trying to predict or estimate some output. The
following questions require undirected data mining: How can we segment
our customer base? Are there any patterns of purchasing behaviour in our
customer base? Undirected data mining uses clustering and affinity-
grouping techniques.
11. @DLC-UNISBANK
Decision trees are so called because the graphical model output of decision tree
analysis has the appearance of an inverted root and branch structure. Decision
trees work through a process called recursive partitioning.
12. @DLC-UNISBANK
Logistic regression measures the influence of one or more
independent variables that are usually continuous (interval or
ratio data) on a categorical dependent variable (nominal or
ordinal data). The output of linear regression modelling reports
regression coefficients that represent the effects of the predictor
independent variables on the dependent variable.
Multiple regression (like logistic regression) is a technique that
uses two or more predictor variables to predict a dependent
variable, but in the case of multiple regression the dependent
variable is a continuous (interval or ratio) variable.
13. @DLC-UNISBANK
Discriminant analysis. Whereas regressions are essentially
scoring models, discriminant analysis (DA) clusters observations
into two or more classes.
Neural networks are another way of fitting a model to existing
data for classification, estimation and prediction purposes.
Despite the anthropomorphic metaphor of brain function,
neural networks foundations are machine learning and artificial
intelligence.
14. @DLC-UNISBANK
Hierarchical clustering is the “Mother of all clustering models”.
It works by assuming each record is a cluster of one and
gradually groups records together until there is one super-
cluster comprising all records. The results are presented in a
table or dendrogram.
15. @DLC-UNISBANK
The example of a dendrogram that groups export markets into clusters on
the basis of historical sales, and the sales mix.
16. @DLC-UNISBANK
K-means clustering is the most widely used form of clustering routine. It works by
clustering the records into a predetermined number of clusters. The predetermined
number is “k”. The reference to “means” refers to the use of averages in the
computation.
K-means clustering
output
17. @DLC-UNISBANK
Two-step clustering combines predetermined and hierarchical
clustering processes. At stage one, records are assigned to a
predetermined number of clusters (alternatively you can allow
the algorithm to determine the number of clusters). At step
two, each of these clusters is treated as a single case and the
records within each cluster subjected to hierarchical
clustering.
Factor analysis is a data reduction procedure. It does this by
identifying underlying unobservable (latent) variables that are
reflected in the observed variables (manifest variables).