SlideShare a Scribd company logo
1 of 22
Industrial Training
Market Analytics in R
Presented To
Presented By
Mr. Vikas Bhatnagar
Vivek Kumar 1
Contents
1. About Organisation
2. Predictive Analytics
3. Background in Market Analytics
4. Segmentation
5. Limitations
6. References 2
About CDAC
Centre for Development of Advanced Computing (C-DAC) is the premier R&D
organization of the Ministry of Electronics and Information Technology
(MeitY) for carrying out R&D in IT, Electronics and associated areas.
Different areas of C-DAC, had originated at different times, many of which
came out as a result of identification of opportunities.
3
The setting up of C-DAC in 1988 itself was to built Supercomputers in context of
denial of import of Supercomputers by USA. Since then C-DAC has been
undertaking building of multiple generations of Supercomputer starting from
PARAM with 1 GF in 1988.
Almost at the same time, C-DAC started building Indian Language Computing
Solutions with setting up of GIST group (Graphics and Intelligence based
Script Technology). National Centre for Software Technology (NCST) set up
in 1985 had also initiated work in Indian Language Computing around the
same period.
4
CDAC, Mumbai
5
What is Predictive Analytics
?
Data mining has been in use for many purposes including finding interesting
trends and patterns from data.
In the mid-2000s the term “predictive analytics” became synonymous with the
use of data mining to develop tools to predict the behavior of individuals (or
other entities, such as limited companies).
One of the earliest applications of predictive analytics was credit scoring, which
was first used to decide who to give credit to.
6
By the mid 1980s - became the primary decision making tool across the
financial services industry.
Predictive analytics is used to analyze data from thousands of historic loan
agreements to identify what characteristics of borrowers were indicative of
them being “good” customers who repaid their loans or “bad” customers who
defaulted.
These relationships are encapsulated by the model. One can then use the
model to make predictions about the future repayment behavior of new loan
applicants. For example, Cibil score.
7
Market Analytics
Deals with analyzing logged data like customer purchases.
Helpful in gaining insightful knowledge about business processes to maximise
profits.
Makes predictions about the future.
Number of customers
Annual Growth
Type of customers 8
Background
From the Web 2.0 era, the primary source of individual (consumer) data was
the electronic footprints left behind through credit card transactions, online
purchases, among others.
This information was used to generate bills, keep accounts up to date, and to
provide an audit of the transactions that happened between service providers
and their customers.
In recent years organizations have become increasingly interested in the
spaces between the user transactions and the paths that led the users to the9
As users do more things electronically, information that gives insights about
users’ thought processes and the influences that led them to engage in one
activity or another has become available.
All this information about people is very useful for many reasons, but one
application in particular is predicting future behavior. By using information
about people’s lifestyles, movements and past behaviors, organizations can
predict what they are likely to do, when they will do it and where that activity
will occur.
These predictions are used to tailor how organizations interact with people.
Their reason for doing this is to influence people’s behavior, in order to
maximize the value of the relationships that they have with them.
10
Segmentation
Unsupervised Learning Model.
Also called descriptive modelling.
Customers are segmented into groups.
Inactive
New active
Active Low Value
Active High Value
11
Segmentation Algorithms
1. Partitional Clustering
a. k-means Clustering
1. Hierarchical Clustering
a. Hierarchical Agglomerative Clustering (HAC)
12
K - Means
1. Ask user how many clusters are expected, say k
2. Randomly guess k centroids
3. Each object finds out which centroid it’s closest to
4. Each cluster finds the centroid of the objects it owns
5. Repeat steps 3 and 4 until terminated
13
Issues in k - means
Need to know k in advance.
Sensitive to noise and outlier data.
Not suitable for clusters with non -
convex shapes.
14
Hierarchical Agglomerative Clustering (HAC)
1. Put each object in a cluster by itself
2. Find the most mergeable pair of clusters and merge them into a single cluster
(now we have one less cluster!)
3. Compute the distance between the new cluster and the each of the old
clusters Repeat the steps 2 & 3 until all the objects are clustered into a single
cluster of size one
15
Properties of HAC
Creates a complete binary tree (“Dendrogram”) of clusters
Various ways to determine mergeability.
Single-link
distance between closest neighbors
Complete-link
distance between farthest neighbors
16
Issues in HAC
Decision of merge or split points is critical.
Once a cluster is formed at any stage, it can’t be undone at any later stage.
The method does not scale well.
17
Segmentation in Market Data Analytics
Clustering used for segmentation in Market Analytics.
Hierarchical Agglomerative Clustering is used in this example.
Distances between closest neighbours is taken as a mergeability criteria in
HAC.
Classes of users are formed.
18
Limitations of Clustering in Market Analytics
Market data gets updated frequently. Hence cluster models need to be updated
accordingly.
Another issue is of new customers getting added to the data. Hence it
necessitates creation of new cluster models.
Stability
19
Visualizations
20
21
References
1. http://www.cdac.in/
2. https://cran.r-project.org/
3. R Mailing Lists
4. https://www.analyticsvidhya.com
5. https://data.gov.in
22

More Related Content

What's hot

AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data ScienceAI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data ScienceOptum
 
GraphChain
GraphChainGraphChain
GraphChainsopekmir
 
Token Design as Optimization Design
Token Design as Optimization DesignToken Design as Optimization Design
Token Design as Optimization DesignTrent McConaghy
 
Introduction to the Neo4j Graph Platform & use cases
Introduction to the Neo4j Graph Platform & use casesIntroduction to the Neo4j Graph Platform & use cases
Introduction to the Neo4j Graph Platform & use casesNeo4j
 
Big data characteristics, value chain and challenges
Big data characteristics, value chain and challengesBig data characteristics, value chain and challenges
Big data characteristics, value chain and challengesMusfiqur Rahman
 
AI & Big Data Analytics : Innovation trends and use cases
AI & Big Data Analytics : Innovation trends and use casesAI & Big Data Analytics : Innovation trends and use cases
AI & Big Data Analytics : Innovation trends and use casesSarvesh Kumar
 
Big, small or just complex data?
Big, small or just complex data?Big, small or just complex data?
Big, small or just complex data?panoratio
 
Hybrid Algorithm for Clustering Mixed Data Sets
Hybrid Algorithm for Clustering Mixed Data SetsHybrid Algorithm for Clustering Mixed Data Sets
Hybrid Algorithm for Clustering Mixed Data SetsIOSR Journals
 
Shift Remote: AI: Smarter AI with analytical graph databases - Victor Lee (Ti...
Shift Remote: AI: Smarter AI with analytical graph databases - Victor Lee (Ti...Shift Remote: AI: Smarter AI with analytical graph databases - Victor Lee (Ti...
Shift Remote: AI: Smarter AI with analytical graph databases - Victor Lee (Ti...Shift Conference
 
Data in action in a smart way
Data in action in a smart wayData in action in a smart way
Data in action in a smart wayZoltán Dankó
 
Big DataParadigm, Challenges, Analysis, and Application
Big DataParadigm, Challenges, Analysis, and ApplicationBig DataParadigm, Challenges, Analysis, and Application
Big DataParadigm, Challenges, Analysis, and ApplicationUyoyo Edosio
 
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...KamleshKumar394
 
Blockchains for AI [With New Applications]
Blockchains for AI [With New Applications]Blockchains for AI [With New Applications]
Blockchains for AI [With New Applications]Trent McConaghy
 
Smart Data Webinar: Machine Learning Update
Smart Data Webinar: Machine Learning UpdateSmart Data Webinar: Machine Learning Update
Smart Data Webinar: Machine Learning UpdateDATAVERSITY
 
Big data course | big data training | big data classes
Big data course | big data training | big data classesBig data course | big data training | big data classes
Big data course | big data training | big data classesNaviWalker
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceSrishti44
 
A modified k means algorithm for big data clustering
A modified k means algorithm for big data clusteringA modified k means algorithm for big data clustering
A modified k means algorithm for big data clusteringSK Ahammad Fahad
 
Graph intelligence: the future of data-driven investigations
Graph intelligence: the future of data-driven investigationsGraph intelligence: the future of data-driven investigations
Graph intelligence: the future of data-driven investigationsConnected Data World
 
Top 20 Big Data Tools 2019
Top 20 Big Data Tools 2019Top 20 Big Data Tools 2019
Top 20 Big Data Tools 2019Bibrainia
 

What's hot (20)

AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data ScienceAI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
 
GraphChain
GraphChainGraphChain
GraphChain
 
Token Design as Optimization Design
Token Design as Optimization DesignToken Design as Optimization Design
Token Design as Optimization Design
 
Introduction to the Neo4j Graph Platform & use cases
Introduction to the Neo4j Graph Platform & use casesIntroduction to the Neo4j Graph Platform & use cases
Introduction to the Neo4j Graph Platform & use cases
 
Big data characteristics, value chain and challenges
Big data characteristics, value chain and challengesBig data characteristics, value chain and challenges
Big data characteristics, value chain and challenges
 
AI & Big Data Analytics : Innovation trends and use cases
AI & Big Data Analytics : Innovation trends and use casesAI & Big Data Analytics : Innovation trends and use cases
AI & Big Data Analytics : Innovation trends and use cases
 
Big, small or just complex data?
Big, small or just complex data?Big, small or just complex data?
Big, small or just complex data?
 
Hybrid Algorithm for Clustering Mixed Data Sets
Hybrid Algorithm for Clustering Mixed Data SetsHybrid Algorithm for Clustering Mixed Data Sets
Hybrid Algorithm for Clustering Mixed Data Sets
 
Shift Remote: AI: Smarter AI with analytical graph databases - Victor Lee (Ti...
Shift Remote: AI: Smarter AI with analytical graph databases - Victor Lee (Ti...Shift Remote: AI: Smarter AI with analytical graph databases - Victor Lee (Ti...
Shift Remote: AI: Smarter AI with analytical graph databases - Victor Lee (Ti...
 
Data in action in a smart way
Data in action in a smart wayData in action in a smart way
Data in action in a smart way
 
Big DataParadigm, Challenges, Analysis, and Application
Big DataParadigm, Challenges, Analysis, and ApplicationBig DataParadigm, Challenges, Analysis, and Application
Big DataParadigm, Challenges, Analysis, and Application
 
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
Mining on Relationships in Big Data era using Improve Apriori Algorithm with ...
 
Blockchains for AI [With New Applications]
Blockchains for AI [With New Applications]Blockchains for AI [With New Applications]
Blockchains for AI [With New Applications]
 
Smart Data Webinar: Machine Learning Update
Smart Data Webinar: Machine Learning UpdateSmart Data Webinar: Machine Learning Update
Smart Data Webinar: Machine Learning Update
 
Big data
Big dataBig data
Big data
 
Big data course | big data training | big data classes
Big data course | big data training | big data classesBig data course | big data training | big data classes
Big data course | big data training | big data classes
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
A modified k means algorithm for big data clustering
A modified k means algorithm for big data clusteringA modified k means algorithm for big data clustering
A modified k means algorithm for big data clustering
 
Graph intelligence: the future of data-driven investigations
Graph intelligence: the future of data-driven investigationsGraph intelligence: the future of data-driven investigations
Graph intelligence: the future of data-driven investigations
 
Top 20 Big Data Tools 2019
Top 20 Big Data Tools 2019Top 20 Big Data Tools 2019
Top 20 Big Data Tools 2019
 

Viewers also liked

The monastery settelement anusha maam
The monastery settelement   anusha maamThe monastery settelement   anusha maam
The monastery settelement anusha maamAnusha Fatima
 
Alli Hodge's presentation at Entertainment Marketing 2016
Alli Hodge's presentation at Entertainment Marketing 2016Alli Hodge's presentation at Entertainment Marketing 2016
Alli Hodge's presentation at Entertainment Marketing 2016Ruperta Daher
 
Matt Ware's presentation at Finance Marketing 2016
Matt Ware's presentation at Finance Marketing 2016Matt Ware's presentation at Finance Marketing 2016
Matt Ware's presentation at Finance Marketing 2016Ruperta Daher
 
Seven Places in India for Delightful Photography
Seven Places in India for Delightful PhotographySeven Places in India for Delightful Photography
Seven Places in India for Delightful Photographysharmaneha21
 
Celebrate fairs and festivals of Udaipur
Celebrate fairs and festivals of UdaipurCelebrate fairs and festivals of Udaipur
Celebrate fairs and festivals of Udaipursharmaneha21
 
Pre-Con Ed: Planning Disaster Recovery for CA Workload Automation AE
Pre-Con Ed: Planning Disaster Recovery for CA Workload Automation AEPre-Con Ed: Planning Disaster Recovery for CA Workload Automation AE
Pre-Con Ed: Planning Disaster Recovery for CA Workload Automation AECA Technologies
 
Evolutionary architecture guiding principles
Evolutionary architecture guiding principlesEvolutionary architecture guiding principles
Evolutionary architecture guiding principlesAidan Casey
 
CliqTags - Engagerande mobila kampanjsajter
CliqTags - Engagerande mobila kampanjsajterCliqTags - Engagerande mobila kampanjsajter
CliqTags - Engagerande mobila kampanjsajterCliqTags
 
Technology-The Core Strength Of India Presentation For Schools
Technology-The Core Strength Of India Presentation For SchoolsTechnology-The Core Strength Of India Presentation For Schools
Technology-The Core Strength Of India Presentation For SchoolsGaurav Kardam
 

Viewers also liked (17)

The monastery settelement anusha maam
The monastery settelement   anusha maamThe monastery settelement   anusha maam
The monastery settelement anusha maam
 
Alli Hodge's presentation at Entertainment Marketing 2016
Alli Hodge's presentation at Entertainment Marketing 2016Alli Hodge's presentation at Entertainment Marketing 2016
Alli Hodge's presentation at Entertainment Marketing 2016
 
Matt Ware's presentation at Finance Marketing 2016
Matt Ware's presentation at Finance Marketing 2016Matt Ware's presentation at Finance Marketing 2016
Matt Ware's presentation at Finance Marketing 2016
 
Sistemas operativos
Sistemas operativosSistemas operativos
Sistemas operativos
 
Seven Places in India for Delightful Photography
Seven Places in India for Delightful PhotographySeven Places in India for Delightful Photography
Seven Places in India for Delightful Photography
 
Espen 2016 first announcement final hi res
Espen 2016 first announcement final hi resEspen 2016 first announcement final hi res
Espen 2016 first announcement final hi res
 
Celebrate fairs and festivals of Udaipur
Celebrate fairs and festivals of UdaipurCelebrate fairs and festivals of Udaipur
Celebrate fairs and festivals of Udaipur
 
Presentación arboles...2
 Presentación arboles...2 Presentación arboles...2
Presentación arboles...2
 
Pre-Con Ed: Planning Disaster Recovery for CA Workload Automation AE
Pre-Con Ed: Planning Disaster Recovery for CA Workload Automation AEPre-Con Ed: Planning Disaster Recovery for CA Workload Automation AE
Pre-Con Ed: Planning Disaster Recovery for CA Workload Automation AE
 
ADF performance monitor at AMIS25
ADF performance monitor at AMIS25ADF performance monitor at AMIS25
ADF performance monitor at AMIS25
 
Evolutionary architecture guiding principles
Evolutionary architecture guiding principlesEvolutionary architecture guiding principles
Evolutionary architecture guiding principles
 
CliqTags - Engagerande mobila kampanjsajter
CliqTags - Engagerande mobila kampanjsajterCliqTags - Engagerande mobila kampanjsajter
CliqTags - Engagerande mobila kampanjsajter
 
introduction to Beacons --- Conclusion disruptive
introduction to Beacons --- Conclusion disruptiveintroduction to Beacons --- Conclusion disruptive
introduction to Beacons --- Conclusion disruptive
 
Architecture of mosque
Architecture of mosqueArchitecture of mosque
Architecture of mosque
 
Islamic Mosque
Islamic MosqueIslamic Mosque
Islamic Mosque
 
Historia del Tango
Historia del TangoHistoria del Tango
Historia del Tango
 
Technology-The Core Strength Of India Presentation For Schools
Technology-The Core Strength Of India Presentation For SchoolsTechnology-The Core Strength Of India Presentation For Schools
Technology-The Core Strength Of India Presentation For Schools
 

Similar to marketAnalyticsFinal

KM.doc
KM.docKM.doc
KM.docbutest
 
Service Level Comparison for Online Shopping using Data Mining
Service Level Comparison for Online Shopping using Data MiningService Level Comparison for Online Shopping using Data Mining
Service Level Comparison for Online Shopping using Data MiningIIRindia
 
An Improved Differential Evolution Algorithm for Data Stream Clustering
An Improved Differential Evolution Algorithm for Data Stream ClusteringAn Improved Differential Evolution Algorithm for Data Stream Clustering
An Improved Differential Evolution Algorithm for Data Stream ClusteringIJECEIAES
 
Era ofdataeconomyv4short
Era ofdataeconomyv4shortEra ofdataeconomyv4short
Era ofdataeconomyv4shortJun Miyazaki
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera, Inc.
 
Constellation Labs - Business Whitepaper
Constellation Labs - Business WhitepaperConstellation Labs - Business Whitepaper
Constellation Labs - Business Whitepaperrun_frictionless
 
Bigdatacooltools
BigdatacooltoolsBigdatacooltools
Bigdatacooltoolssuresh sood
 
Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"Lviv Startup Club
 
A REVIEW ON BLOCKCHAIN BASED CHARITIES
A REVIEW ON BLOCKCHAIN BASED CHARITIESA REVIEW ON BLOCKCHAIN BASED CHARITIES
A REVIEW ON BLOCKCHAIN BASED CHARITIESIRJET Journal
 
Association rule visualization technique
Association rule visualization techniqueAssociation rule visualization technique
Association rule visualization techniquemustafasmart
 
DIGITAL INVESTMENT PREDICTION IN CRYPTOCURRENCY
DIGITAL INVESTMENT PREDICTION IN CRYPTOCURRENCYDIGITAL INVESTMENT PREDICTION IN CRYPTOCURRENCY
DIGITAL INVESTMENT PREDICTION IN CRYPTOCURRENCYIRJET Journal
 
Survey of the Euro Currency Fluctuation by Using Data Mining
Survey of the Euro Currency Fluctuation by Using Data MiningSurvey of the Euro Currency Fluctuation by Using Data Mining
Survey of the Euro Currency Fluctuation by Using Data Miningijcsit
 
V2 i9 ijertv2is90699-1
V2 i9 ijertv2is90699-1V2 i9 ijertv2is90699-1
V2 i9 ijertv2is90699-1warishali570
 
The boom in Xaas and the knowledge graph
The boom in Xaas and the knowledge graphThe boom in Xaas and the knowledge graph
The boom in Xaas and the knowledge graphAlan Morrison
 
An architectural approach for decentralized applications
An architectural approach for decentralized applicationsAn architectural approach for decentralized applications
An architectural approach for decentralized applicationsOWASP Indonesia Chapter
 

Similar to marketAnalyticsFinal (20)

KM.doc
KM.docKM.doc
KM.doc
 
Seminar Report Vaibhav
Seminar Report VaibhavSeminar Report Vaibhav
Seminar Report Vaibhav
 
Service Level Comparison for Online Shopping using Data Mining
Service Level Comparison for Online Shopping using Data MiningService Level Comparison for Online Shopping using Data Mining
Service Level Comparison for Online Shopping using Data Mining
 
An Improved Differential Evolution Algorithm for Data Stream Clustering
An Improved Differential Evolution Algorithm for Data Stream ClusteringAn Improved Differential Evolution Algorithm for Data Stream Clustering
An Improved Differential Evolution Algorithm for Data Stream Clustering
 
Era ofdataeconomyv4short
Era ofdataeconomyv4shortEra ofdataeconomyv4short
Era ofdataeconomyv4short
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
 
Constellation Labs - Business Whitepaper
Constellation Labs - Business WhitepaperConstellation Labs - Business Whitepaper
Constellation Labs - Business Whitepaper
 
Bigdatacooltools
BigdatacooltoolsBigdatacooltools
Bigdatacooltools
 
Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"
 
Big data business case
Big data   business caseBig data   business case
Big data business case
 
Clustering
ClusteringClustering
Clustering
 
A REVIEW ON BLOCKCHAIN BASED CHARITIES
A REVIEW ON BLOCKCHAIN BASED CHARITIESA REVIEW ON BLOCKCHAIN BASED CHARITIES
A REVIEW ON BLOCKCHAIN BASED CHARITIES
 
Association rule visualization technique
Association rule visualization techniqueAssociation rule visualization technique
Association rule visualization technique
 
DIGITAL INVESTMENT PREDICTION IN CRYPTOCURRENCY
DIGITAL INVESTMENT PREDICTION IN CRYPTOCURRENCYDIGITAL INVESTMENT PREDICTION IN CRYPTOCURRENCY
DIGITAL INVESTMENT PREDICTION IN CRYPTOCURRENCY
 
Online Credit Card Fraud Detection and Anomaly User Blocking
Online Credit Card Fraud Detection and Anomaly User Blocking Online Credit Card Fraud Detection and Anomaly User Blocking
Online Credit Card Fraud Detection and Anomaly User Blocking
 
Survey of the Euro Currency Fluctuation by Using Data Mining
Survey of the Euro Currency Fluctuation by Using Data MiningSurvey of the Euro Currency Fluctuation by Using Data Mining
Survey of the Euro Currency Fluctuation by Using Data Mining
 
V2 i9 ijertv2is90699-1
V2 i9 ijertv2is90699-1V2 i9 ijertv2is90699-1
V2 i9 ijertv2is90699-1
 
TechReport
TechReportTechReport
TechReport
 
The boom in Xaas and the knowledge graph
The boom in Xaas and the knowledge graphThe boom in Xaas and the knowledge graph
The boom in Xaas and the knowledge graph
 
An architectural approach for decentralized applications
An architectural approach for decentralized applicationsAn architectural approach for decentralized applications
An architectural approach for decentralized applications
 

marketAnalyticsFinal

  • 1. Industrial Training Market Analytics in R Presented To Presented By Mr. Vikas Bhatnagar Vivek Kumar 1
  • 2. Contents 1. About Organisation 2. Predictive Analytics 3. Background in Market Analytics 4. Segmentation 5. Limitations 6. References 2
  • 3. About CDAC Centre for Development of Advanced Computing (C-DAC) is the premier R&D organization of the Ministry of Electronics and Information Technology (MeitY) for carrying out R&D in IT, Electronics and associated areas. Different areas of C-DAC, had originated at different times, many of which came out as a result of identification of opportunities. 3
  • 4. The setting up of C-DAC in 1988 itself was to built Supercomputers in context of denial of import of Supercomputers by USA. Since then C-DAC has been undertaking building of multiple generations of Supercomputer starting from PARAM with 1 GF in 1988. Almost at the same time, C-DAC started building Indian Language Computing Solutions with setting up of GIST group (Graphics and Intelligence based Script Technology). National Centre for Software Technology (NCST) set up in 1985 had also initiated work in Indian Language Computing around the same period. 4
  • 6. What is Predictive Analytics ? Data mining has been in use for many purposes including finding interesting trends and patterns from data. In the mid-2000s the term “predictive analytics” became synonymous with the use of data mining to develop tools to predict the behavior of individuals (or other entities, such as limited companies). One of the earliest applications of predictive analytics was credit scoring, which was first used to decide who to give credit to. 6
  • 7. By the mid 1980s - became the primary decision making tool across the financial services industry. Predictive analytics is used to analyze data from thousands of historic loan agreements to identify what characteristics of borrowers were indicative of them being “good” customers who repaid their loans or “bad” customers who defaulted. These relationships are encapsulated by the model. One can then use the model to make predictions about the future repayment behavior of new loan applicants. For example, Cibil score. 7
  • 8. Market Analytics Deals with analyzing logged data like customer purchases. Helpful in gaining insightful knowledge about business processes to maximise profits. Makes predictions about the future. Number of customers Annual Growth Type of customers 8
  • 9. Background From the Web 2.0 era, the primary source of individual (consumer) data was the electronic footprints left behind through credit card transactions, online purchases, among others. This information was used to generate bills, keep accounts up to date, and to provide an audit of the transactions that happened between service providers and their customers. In recent years organizations have become increasingly interested in the spaces between the user transactions and the paths that led the users to the9
  • 10. As users do more things electronically, information that gives insights about users’ thought processes and the influences that led them to engage in one activity or another has become available. All this information about people is very useful for many reasons, but one application in particular is predicting future behavior. By using information about people’s lifestyles, movements and past behaviors, organizations can predict what they are likely to do, when they will do it and where that activity will occur. These predictions are used to tailor how organizations interact with people. Their reason for doing this is to influence people’s behavior, in order to maximize the value of the relationships that they have with them. 10
  • 11. Segmentation Unsupervised Learning Model. Also called descriptive modelling. Customers are segmented into groups. Inactive New active Active Low Value Active High Value 11
  • 12. Segmentation Algorithms 1. Partitional Clustering a. k-means Clustering 1. Hierarchical Clustering a. Hierarchical Agglomerative Clustering (HAC) 12
  • 13. K - Means 1. Ask user how many clusters are expected, say k 2. Randomly guess k centroids 3. Each object finds out which centroid it’s closest to 4. Each cluster finds the centroid of the objects it owns 5. Repeat steps 3 and 4 until terminated 13
  • 14. Issues in k - means Need to know k in advance. Sensitive to noise and outlier data. Not suitable for clusters with non - convex shapes. 14
  • 15. Hierarchical Agglomerative Clustering (HAC) 1. Put each object in a cluster by itself 2. Find the most mergeable pair of clusters and merge them into a single cluster (now we have one less cluster!) 3. Compute the distance between the new cluster and the each of the old clusters Repeat the steps 2 & 3 until all the objects are clustered into a single cluster of size one 15
  • 16. Properties of HAC Creates a complete binary tree (“Dendrogram”) of clusters Various ways to determine mergeability. Single-link distance between closest neighbors Complete-link distance between farthest neighbors 16
  • 17. Issues in HAC Decision of merge or split points is critical. Once a cluster is formed at any stage, it can’t be undone at any later stage. The method does not scale well. 17
  • 18. Segmentation in Market Data Analytics Clustering used for segmentation in Market Analytics. Hierarchical Agglomerative Clustering is used in this example. Distances between closest neighbours is taken as a mergeability criteria in HAC. Classes of users are formed. 18
  • 19. Limitations of Clustering in Market Analytics Market data gets updated frequently. Hence cluster models need to be updated accordingly. Another issue is of new customers getting added to the data. Hence it necessitates creation of new cluster models. Stability 19
  • 21. 21
  • 22. References 1. http://www.cdac.in/ 2. https://cran.r-project.org/ 3. R Mailing Lists 4. https://www.analyticsvidhya.com 5. https://data.gov.in 22

Editor's Notes

  1. Unsupervised -> Unknown target variables Intra Cluster similarity should be high Inter Cluster similarity should be low