DATA ANALYTICS IN DECISION MAKING

DATA ANALYTICS IN
DECISION MAKING
S Anand, Chief Data Scientist, Gramener

DO THESE FOUR CITIES LOOK IDENTICAL TO YOU?
So is the variance in sales.Variance in price is the same.
Average sales is the same too.Average price is the same.
Take a look at the sales report
alongside. A company has
branches in 4 cities, and each
branch changes the product
price every month. This leads to
a corresponding change in the
sales.
Here is the performance of the
four branches with their
monthly price and sales for each
month.
Looking at the average, the four
branches have an identical
performance.
2010 Boston Chicago Detroit New York
Month Price Sales Price Sales Price Sales Price Sales
Jan 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
Feb 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
Mar 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
Apr 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
May 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
Jun 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
Jul 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
Aug 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
Sep 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
Oct 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
Nov 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
Average 9.0 7.50 9.0 7.50 9.0 7.50 9.0 7.50
Variance 10.0 3.75 10.0 3.75 10.0 3.75 10.0 3.75
DO YOU AGREE?

ARE THEY REALLY IDENTICAL? CHECK AGAIN…
But in fact, the four cities are
totally different in behaviour.
Boston’s sales has generally
increased with price.
Detroit has a nearly perfect
increase in sales with price,
except for one aberration.
Chicago shows a decline in sales
beyond a price of 10.
New York’s sales fluctuates
despite a nearly constant price.
Boston Detroit
Chicago New York

Rural
Semi-urban
Urban
Metro
Total
Sanctioned
Utilised
Gap
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005

INVESTMENTS IN BIG DATA & ANALYTICS NEED
NOT GUARANTEE BUSINESS EFFECTIVENESS
No coherent
consumption
Enterprises have a disjoint view
of data across divisions. This
impedes org action & speed
Last-mile
disconnect
Longer
Realizations
Processed & analyzed data is not
presented effectively as a story.
Meaningful consumption is an issue
Implementation takes years. System
stabilization takes 1-2 years or more,
with prohibitive cost of change
ENTERPRISES NEED HELP CROSSING THE ANALYTICS CHASM
Org design
Impedes
Org structures & authorization
processes impede quick action after
data bears needed action

COUNTER-INTUITION:
INSIGHTS FROM DATA

PREDICTING MARKS
“
What determines a child’s marks?
Do girls score better than boys?
Does the choice of subject matter?
Does the medium of instruction
matter?
Does community or religion
matter?
Does their birthday matter?
Does the first letter of their name
matter?
EDUCATION

TN CLASS X: ENGLISH
0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

TN CLASS X: SOCIAL SCIENCE
0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

TN CLASS X: MATHEMATICS
0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

CBSE 2013 CLASS XII: ENGLISH MARKS

DETECTING FRAUD
“
We know meter readings are
incorrect, for various reasons.
We don’t, however, have the
concrete proof we need to start
the process of meter reading
automation.
Part of our problem is the
volume of data that needs to be
analysed. The other is the
inexperience in tools or
analyses to identify such
patterns.
ENERGY UTILITY

AN ENERGY UTILITY DETECTED BILLING FRAUD
This plot shows the frequency of all meter readings from Apr-
2010 to Mar-2011. An unusually large number of readings are
aligned with the slab boundaries.
Below is a simple histogram (or frequency distribution) of usage levels.
Each bar represents the number of customers with a customers with a
specific bill amount (in units, or KWh).
Tariffs are based on the usage slab. Someone with 101 units is billed in
full at a higher tariff than someone with 100 units. So people have a
strong incentive to stay at or within a slab boundary.
An energy utility (with over 50 million
subscribers) had 10 years worth of
customer billing data available.
Most fraud detection software failed to
load the data, and sampled data
revealed little or no insight.
This can happen in one of two ways.
First, people may be monitoring their
usage very carefully, and turn of their
lights and fans the instant their usage
hits the slab boundary.
Or, more realistically, there’s probably some level of corruption
involved, where customers pay a small sum to the meter reading staff
to ensure that it stays exactly at the slab boundary, giving them the
advantage of a lower price.

This plot shows the frequency of all meter readings from Apr-
2010 to Mar-2011. An unusually large number of readings are
aligned with the tariff slab boundaries.
This clearly shows collusion
of some form with the
customers.
Apr-10 May-10Jun-10Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 Dec-10 Jan-11 Feb-11 Mar-11
217 219 200 200 200 200 200 200 200 350 200 200
250 200 200 200 201 200 200 200 250 200 200 150
250 150 150 200 200 200 200 200 200 200 200 150
150 200 200 200 200 200 200 200 200 200 200 50
200 200 200 150 180 150 50 100 50 70 100 100
100 100 100 100 100 100 100 100 100 100 110 100
100 150 123 123 50 100 50 100 100 100 100 100
0 111 100 100 100 100 100 100 100 100 50 50
0 100 27 100 50 100 100 100 100 100 70 100
1 1 1 100 99 50 100 100 100 100 100 100
This happens with specific
customers, not randomly.
Here are such customers’
meter readings.
Section Apr-10 May-10Jun-10 Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 Dec-10 Jan-11 Feb-11 Mar-11
Section 1 70% 97% 136% 65% 110% 116% 121% 107% 114% 88% 74% 109%
Section 2 66% 92% 66% 87% 70% 64% 63% 50% 58% 38% 41% 54%
Section 3 90% 46% 47% 43% 28% 31% 50% 32% 19% 38% 8% 34%
Section 4 44% 24% 36% 39% 21% 18% 24% 49% 56% 44% 31% 14%
Section 5 4% 63% -27% 20% 41% 82% 26% 34% 43% 2% 37% 15%
Section 6 18% 23% 30% 21% 28% 33% 39% 41% 39% 18% 0% 33%
Section 7 36% 51% 33% 33% 27% 35% 10% 39% 12% 5% 15% 14%
Section 8 22% 21% 28% 12% 24% 27% 10% 31% 13% 11% 22% 17%
Section 9 19% 35% 14% 9% 16% 32% 37% 12% 9% 5% -3% 11%
If we define the “extent of
fraud” as the percentage
excess of the 100 unit
meter reading, the
value varies
considerably
across sections,
and time
New section
manager arrives
… and is
transferred out
… with some
explainable
anomalies.
Why would
these happen?

SIMPLE HEURISTICS
EMERGENCY
“
A man is rushed to a hospital in
the throes of a heart attack.
The nurse needs to decide
whether the victim should be
admitted into emergency care.
Although this decision can save
or cost a life, the nurse must
decide using only the available
cues, and within a few seconds
– preferably using some fancy
statistical software package.

SIMPLE HEURISTICS
EMERGENCY
Pressure < 91
Age > 62
Pulse > 100
No Yes
No Yes
No Yes

8.3% 0.0%100 0.0%Base
OK
WASTED
Marketing cost
Rs 40
MISSED
Acquisition cost
Rs 80
OK
No churn Churn
NochurnChurn
Prediction
Actual
MISSED WASTEDCOST PER CUST. IMPROVEMENTMODEL

3.2% 3.6%
MISSED WASTED
61.7
COST PER CUST.
39.3%
IMPROVEMENT
Decision tree
MODEL
Outgoing call
0 0 - 4 15+5-14
1
REFILL AMOUNT
> 50 RS
01
YN
> 1
RECHARGE
0
N Y

0.6% 2.5%
MISSED WASTED
34.0
COST PER CUST.
66.6%
IMPROVEMENT
SVM
MODEL

TAKEAWAYS
1. In a single circle with 2 crore customers, this
improvement represents a saving of Rs 2.6 x 2 cr
~ Rs 5 cr / month / circle
2. Testing structure allows us to test out any
number of models, and evaluate their
effectiveness
3. Need to trade-off between simplicity vs over-
fitting. Incremental improvements often not
worth the trouble
4. Implementation needs to be constantly
monitored, with continuous re-evaluation of the
model

ANALYSING CAUSAL DRIVERS
We group by
every input factor
… and calculate the
impact on every metric.
By moving from average to the best
group, what’s the improvement?
The actual performance
by each group is shown
0-3m 3-6m 6m-1yr 1-2 yrs > 2 yrs
11 12.3 12.7 15.3 16.1
Only significant results shown

Tata Teleservices
Tata Consultancy Services
Tata Business Support Services
Tata Global Beverages
Tata Infotech (merged)
Tata Toyo Radiator
Honeywell Automation India
Tata Communications
A G C Networks
Tata Technologies
Tata Projects
Tata Power
Tata Finance
Idea Cellular
Tata Motors
Tata Sons
Tata Steel
Tayo Rolls
Tata Securities
Tata Coffee
Tata Investment Corp
A J Engineer
H H Malgham
H K Sethna
Keshub Mahindra
Ravi Kant
Russi Mody
Sujit Gupta
A S Bam
Amal Ganguli
D B Engineer
D N Ghosh
M N Bhagwat
N N Kampani
U M Rao
B Muthuraman
Ishaat Hussain
J J Irani
N A Palkhivala
N A Soonawala
R Gopalakrishnan
Ratan Tata
S Ramadorai
S Ramakrishnan
DIRECTORSHIPS AT THE TATAS
Every person who was a Director at the Tata
Group is shown here as an orange circle. The size of
the circle is based on the number of directorship
positions held over their lifetime.
Every company in the Tata Group is
shown here as a blue circle. The size of the
circle is based on the number of directors the
company has had over time.
Every directorship relation is shown
by a line. If a person has held a
directorship position at a company, the two
are connected by a line.
The group appears to be divided into
two clusters based on the network of
directorship roles.
Prominent leaders
bridge the groups
Second group of companies
First group of companies
Some directors are
mainly associated with
the first group of
companies
Some directors are
mainly associated with
the second group of
companies

SIMILARITIES IN AN SME TRANSACTION NETWORK
The same visual was
applied to the SME
clientele of a bank
• Identified clusters of
SMEs transacting with
each other
• Targeted non-clients
in the middle of a
client cluster
• Enhanced service for
client in the middle of
non-clients
This resulted in a
28% QOQ GROWTH
in new accounts
(against a default QoQ
base of 3-8% in the
city for the last 5
years)
We’ve used network diagrams to detect terrorism, corporate fraud,
de-dup customers, and identify product affinities

PORTFOLIO PERFORMANCE VISUAL
Worldwide$288.0mn
A: Accelerate$68.9mn
B: Build$77.2mn
C: Cut down$141.9mn
Worldwide:
$288 mn
The visualization shows the market
opportunities across various countries to
identify areas of focus. This chart has
been built as an interactive-app to
present the key findings, while letting
user click-through and drill-down to a
custom view across 4 different levels.
Open

BANKING DASHBOARD
Product Profitability
Cross Holding Analysis
ATM Transactions
Branch Performance
Employee Productivity
600+ mn transactions
40+ GB of data
11,000+ ATMs
2000+ Branches
120+ products
Hourly view
Data processed

LIVE MONITORING: IMPACT OF BUDGET ON STOCKS

FINDING PATTERNS
“
Which securities move together?
How should I diversify?
What should I sell to reduce risk?
What’s a reliable predictor of a
security?
SECURITIES

68% correlation
between AUD & EUR
Plot of 6 month daily
AUD - EUR values
Block of correlated
currencies
… clustered
hierarchically

RESTAURANT: PRODUCT SALES CORRELATION

RESTAURANT FOUND AN UNUSUAL DIP IN SALES
A restaurant chain had data for every
single transaction made over a few
years. Plotting this as a time series
showed them nothing unusual.
However, the same data on a calendar
map reveals a very different story.
Specifically, at the bottom left point-of-sale terminal, sales dips on every
Wednesday. At the bottom right point-of-sale terminal, sales rises on
every Wednesday (almost as if to compensate for the loss.)
It turns out that the manager closes the bottom-left counter every
Wednesday afternoon due to shortage of staff, assuming that it results in
no loss of sales. There is, however, a net loss every Wednesday.

BANK FOUND ALL LOANS BEFORE 20TH POOR
Every loan disbursed after the 20th of the month, i.e. from the 21st to
the end of the month, shows consistently lower non-performing assets
(i.e. better quality) than any loan disbursed prior to the 20th.
The bank mapped this back to their incentive scheme. The sales team’s
commission is based only on loans disbursed until the 20th. Hence new
loans are squeezed into this period without regard for their quality.
The personal finance division of a
bank, focusing on retail loans, drove
its sales through a branch sales team.
A study of the non-performing assets
of loans generated over the course of
one year shows a strange pattern.
Analytics can detect something that you’re specifically looking for.
It takes a visual to detect what we don’t know to look for
This representation, known as a
calendar map, can show some
interesting patterns, particularly
weekday-based patterns, as the next
example will show.

How does Mahabharata, one of the largest epics with 1.8
million words lend itself to text analytics?
Can this ‘unstructured data’ be processed to extract
analytical insights?
What does sentiment analysis of this tome convey?
Is there a better way to explore relations between
characters?
How can closeness of characters be analysed &
visualized?
VISUALISING THE MAHABHARATA

3642 LIC
3148 MTNL
2494 BSES
444 RELIANCE ENERGY
426 ESCROW
396 ICICI
378 CLG RTD
294 MAHANAGAR GAS
232 HDFC
216 MAHANGAR GAS LTD
212 ORANGE
204 LIC OF INDIA
190 ESCROW A/C

BUILDING ANALYTIC CAPABILITY
DATA → INSIGHTS → ACTION

TWO ROUTES TO BUILDING ANALYTIC CAPABILITY
Stakeholder
groups
Objectives Initiatives Questions Data
have a set of that can be met by which answer specific using
for that meet that can address suggests
Business driven approach
Data driven approach
Importance
Ease
Quick wins
Strategic
Deferred
Revenue impact
Breadth of usage
Effort reduction
Data availability
Technology feasibility
Start small with quick
wins
Cover strategic landscape
Deferreds become easier
with growing capability
Actions
Gap in current reports
Addressed by current reports
1
2

TYPICAL INITIATIVES WE SEE ACROSS BANKS TODAY
Deposit mobilisation
Product
performance
Branch performance
Employee
performance
Transaction
performance (e.g.
ATM)
Performance
Product bundling
Competitive
positioning
Product management
Predicting churn
Driving cross-sell
Product
recommendations
Customer mgmt
Fraud detection
Scenario modelling
(e.g. interest rate
change)
Risk management
Data driven insights
in statements
Social listening
Client communication
Infrastructure Initiatives in parallel: Digitisation and Data Cleansing

NEW TECHNIQUES MAKE THESE POSSIBLE
The visuals shown in the earlier slides
were created using the Gramener
visualization server, which leverages
some of the recent innovations at
Gramener in automating
Visuals are templatized.
As the data or the parameters
change, the visuals are re-drawn
to match the data, ensuring that
the view shows live data in real-
time.
We’ve extracted common
patterns of insights that apply
across all datasets. When data is
fed in, these automated analysis
components perform a sequence
of analytic steps and display
results visually.
Binding visuals together into a
logical story using text or audio
that weaves a story is an integral
part of communicating insights.
This too is automated in
Gramener’s visualizations.
Visualizations Analysis Narration
For e.g., this has been used to
• view social media events
• election results
• oil leakages in fuel stations
• monitor retail inventory
• plan truck delivery
• monitor sentiments on social
media
This has been applied to
• identify which security would
go well with a given portfolio
• predict which telecom
customers will leave
• assess the impact of changing
delivery channel for proxy votes
This has been applied to
• automatically “writing” a
newspaper column on the day’s
stock market
• automatically writing the report
summarising the status of
clinical trials
• automated videos
These techniques are focused on automating patterns of insights made
by humans – effectively systematizing the “magic” that happens when
we find something interesting in data. This is similar to how chess
playing programs work. It’s not intelligent, as such. It just calculates
and evaluates so many moves automatically that it seems intelligent.
AUTOMATION

TAKE YOUR NEXT STEP TOWARDS
DATA-DRIVEN LEADERSHIP
S Anand, Chief Data Scientist, Gramener

DATA ANALYTICS IN DECISION MAKING

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to DATA ANALYTICS IN DECISION MAKING

Similar to DATA ANALYTICS IN DECISION MAKING (20)

More from Gramener

More from Gramener (20)

Recently uploaded

Recently uploaded (20)

DATA ANALYTICS IN DECISION MAKING

Editor's Notes