2. Contents
1. About Organisation
2. Predictive Analytics
3. Background in Market Analytics
4. Segmentation
5. Limitations
6. References 2
3. About CDAC
Centre for Development of Advanced Computing (C-DAC) is the premier R&D
organization of the Ministry of Electronics and Information Technology
(MeitY) for carrying out R&D in IT, Electronics and associated areas.
Different areas of C-DAC, had originated at different times, many of which
came out as a result of identification of opportunities.
3
4. The setting up of C-DAC in 1988 itself was to built Supercomputers in context of
denial of import of Supercomputers by USA. Since then C-DAC has been
undertaking building of multiple generations of Supercomputer starting from
PARAM with 1 GF in 1988.
Almost at the same time, C-DAC started building Indian Language Computing
Solutions with setting up of GIST group (Graphics and Intelligence based
Script Technology). National Centre for Software Technology (NCST) set up
in 1985 had also initiated work in Indian Language Computing around the
same period.
4
6. What is Predictive Analytics
?
Data mining has been in use for many purposes including finding interesting
trends and patterns from data.
In the mid-2000s the term “predictive analytics” became synonymous with the
use of data mining to develop tools to predict the behavior of individuals (or
other entities, such as limited companies).
One of the earliest applications of predictive analytics was credit scoring, which
was first used to decide who to give credit to.
6
7. By the mid 1980s - became the primary decision making tool across the
financial services industry.
Predictive analytics is used to analyze data from thousands of historic loan
agreements to identify what characteristics of borrowers were indicative of
them being “good” customers who repaid their loans or “bad” customers who
defaulted.
These relationships are encapsulated by the model. One can then use the
model to make predictions about the future repayment behavior of new loan
applicants. For example, Cibil score.
7
8. Market Analytics
Deals with analyzing logged data like customer purchases.
Helpful in gaining insightful knowledge about business processes to maximise
profits.
Makes predictions about the future.
Number of customers
Annual Growth
Type of customers 8
9. Background
From the Web 2.0 era, the primary source of individual (consumer) data was
the electronic footprints left behind through credit card transactions, online
purchases, among others.
This information was used to generate bills, keep accounts up to date, and to
provide an audit of the transactions that happened between service providers
and their customers.
In recent years organizations have become increasingly interested in the
spaces between the user transactions and the paths that led the users to the9
10. As users do more things electronically, information that gives insights about
users’ thought processes and the influences that led them to engage in one
activity or another has become available.
All this information about people is very useful for many reasons, but one
application in particular is predicting future behavior. By using information
about people’s lifestyles, movements and past behaviors, organizations can
predict what they are likely to do, when they will do it and where that activity
will occur.
These predictions are used to tailor how organizations interact with people.
Their reason for doing this is to influence people’s behavior, in order to
maximize the value of the relationships that they have with them.
10
13. K - Means
1. Ask user how many clusters are expected, say k
2. Randomly guess k centroids
3. Each object finds out which centroid it’s closest to
4. Each cluster finds the centroid of the objects it owns
5. Repeat steps 3 and 4 until terminated
13
14. Issues in k - means
Need to know k in advance.
Sensitive to noise and outlier data.
Not suitable for clusters with non -
convex shapes.
14
15. Hierarchical Agglomerative Clustering (HAC)
1. Put each object in a cluster by itself
2. Find the most mergeable pair of clusters and merge them into a single cluster
(now we have one less cluster!)
3. Compute the distance between the new cluster and the each of the old
clusters Repeat the steps 2 & 3 until all the objects are clustered into a single
cluster of size one
15
16. Properties of HAC
Creates a complete binary tree (“Dendrogram”) of clusters
Various ways to determine mergeability.
Single-link
distance between closest neighbors
Complete-link
distance between farthest neighbors
16
17. Issues in HAC
Decision of merge or split points is critical.
Once a cluster is formed at any stage, it can’t be undone at any later stage.
The method does not scale well.
17
18. Segmentation in Market Data Analytics
Clustering used for segmentation in Market Analytics.
Hierarchical Agglomerative Clustering is used in this example.
Distances between closest neighbours is taken as a mergeability criteria in
HAC.
Classes of users are formed.
18
19. Limitations of Clustering in Market Analytics
Market data gets updated frequently. Hence cluster models need to be updated
accordingly.
Another issue is of new customers getting added to the data. Hence it
necessitates creation of new cluster models.
Stability
19