Data monetization

S ANAND, CHIEF DATA SCIENTIST, GRAMENER
MONETISING DATA
REMOVING YOUR MENTAL HURDLES

DATA
ANALYSIS VISUALSEXPLORATION
IS
EVERYWHERE

DATA
IS
EVERYWHERE
COMMON COMPLAINT #1
WE DON’T HAVE DATA

We have internal
information. Getting
information from outside is
our challenge. There’s no way
of doing that.
– Senior Editor
Leading Media Company
“

UNCOVER YOUR DARK DATA
Source: http://www.patrickcheesman.com/dark-data-problems-and-solutions/
• INACCESSIBLE data (e.g. technology is outdated)
• FORGOTTEN data (e.g. collected, but not actively used)
• UNCOLLECTED data (e.g. information exists, not digitized)
• SINGLE PURPOSE data (e.g. used for a specific purpose)

We’ve used network diagrams to detect terrorism, corporate fraud, product
affinities and behavioural customer segmentation

AUGMENT YOUR
DATA
SOURCES
DATA IS
EVERYWHERE
COMMON COMPLAINT #1
WE DON’T HAVE DATA
COMMON COMPLAINT #2
THE DATA ISN’T STRUCTURED
CRM DATA
SALES DATA
PRICING DATA
CALL RECORDS
WEB LOG DATA
VENDOR INVOICES
SOCIAL MEDIA DATA
CLICKTHROUGH DATA
COMPETITOR RESEARCH
CUSTOMER TRANSACTIONS
…
CENSUS DATA
E-COMMERCE PRICES
COMMODITY PRICES
STOCK MARKET DATA
FINANCIAL REPORTING
SOCIAL MEDIA DATA
MOBILE PENETRATION
AADHAR DATA
COURT CASE BRIEFS
SHAPE FILES
…

How does Mahabharata, one of the largest epics with 1.8
million words lend itself to text analytics?
Can this ‘unstructured data’ be processed to extract
analytical insights?
What does sentiment analysis of this tome convey?
Is there a better way to explore relations between
characters?
How can closeness of characters be analysed & visualized?
Visualising the Mahabharata

“ Can we help CFOs
understand what questions
are being asked by
investors and analysts
during earnings releases?
How this is different from
competition?
– Product Head
Global Financial
Services Firm

WHAT DO FINANCIAL ANALYSTS ASK IBM VS
MSFT?

DATA IS
EVERYWHERE
EXTRACT THE
META DATA
AUGMENT YOUR
DATA
SOURCES
COMMON COMPLAINT #2
THE DATA ISN’T STRUCTURED
COMMON COMPLAINT #3
THE DATA ISN’T RICH / CLEAN
COMMON
WHO, WHAT, WHEN, WHERE
TEXT
TEXT KEYWORDS
SENTIMENT
IMAGE
VISUAL RECOGNITION
AUDIO / CALLS
TRANSCRIPTS
MOOD ANALYSIS

“ Can we get the results of
every single election in
history, and create a portal
to visualize these results?
– Rajdeep Sardesai
CNN-IBN

The PDF files have a reasonably clear structure

… that translates into text that can be parsed

Not every spelling error is easily identifiable by the first letter

… with several names spelt wrong
These are, in fact two
different constituencies
But these are exactly
the same
... and so are these
I’ve no idea if these are
2, or 3, constituencies!

… with the ability for the system to correct errors automatically

DATA IS
EVERYWHERE
TRANSFORM THE DATA &
ENRICH IT
EXTRACT THE
META DATA
AUGMENT YOUR
DATA
SOURCES
COMMON COMPLAINT #3
THE DATA ISN’T RICH / CLEAN

DATA
IS
EVERYWHERE
COMMON COMPLAINT #1
WE DON’T HAVE THE TOOLS

This is a dataset (1975 – 1990) that has
been around for several years, and has
been studied extensively. Yet, a
visualization can reveal patterns that
are neither obvious nor well known.
For example,
• Are birthdays uniformly distributed?
• Do doctors or parents exercise the C-section option to move dates?
• Is there any day of the month that has unusually high or low births?
• Are there any months with relatively high or low births?
More births Fewer births … on average, for each day of the year (from 1975 to 1990)
LET’S LOOK AT 15 YEARS OF US BIRTH DATA

THE PATTERN IN INDIA IS QUITE DIFFERENT
This is a birth date dataset that’s
obtained from school admission data
for over 10 million children. When we
compare this with births in the US, we
see none of the same patterns.
For example,
• Is there an aversion to the 13th or is there a local cultural nuance?
• Are holidays avoided for births?
• Which months have a higher propensity for births, and why?
• Are there any patterns not found in the US data?
More births Fewer births … on average, for each day of the year (from 2007 to 2013)

THIS ADVERSELY IMPACTS CHILDREN’S MARKS
It’s a well established fact that older
children tend to do better at school in
most activities. Since many children
have had their birth dates brought
forward, these younger children suffer.
The average marks of children “born” on the 1st, 5th, 10th, 15th etc. of the
month tend to score lower marks.
• Are holidays avoided for births?
• Which months have a higher propensity for births, and why?
• Are there any patterns not found in the US data?
Higher marks Lower marks … on average, for children born on a given day of the year (from 2007 to 2013)

DEPLOY
MODERN
TOOLS
ANALYSIS IS
EVERYWHERE
COMMON COMPLAINT #1
WE DON’T HAVE THE TOOLS
COMMON COMPLAINT #2
WE DON’T GET INSIGHTS
R
SAS
EXCEL
PYTHON
DATABASES
ML SERVICES

RESTAURANT FOUND AN UNUSUAL DIP IN
SALES
A restaurant chain had data for every
single transaction made over a few
years. Plotting this as a time series
showed them nothing unusual.
However, the same data on a calendar
map reveals a very different story.
Specifically, at the bottom left point-of-sale terminal, sales dips on every
Wednesday. At the bottom right point-of-sale terminal, sales rises on
every Wednesday (almost as if to compensate for the loss.)
It turns out that the manager closes the bottom-left counter every
Wednesday afternoon due to shortage of staff, assuming that it results in
no loss of sales. There is, however, a net loss every Wednesday.

DEPLOY
MODERN
TOOLS
ANALYSIS IS
EVERYWHERE
TEST DATASETS
ANONYMISATION
EVALUATION CRITERIA
IMPROVEMENT METRIC
DATA INFRASTRUCTURE
MODEL INFRASTRUCTURE
VISUALS INFRASTRUCTURE
SET UP AN ML PLATFORM
INFRASTRUCTURE FOR
RAPIDITY
COMMON COMPLAINT #2
MODELS ARE COMPLICATED
COMMON COMPLAINT #3
IMPLEMENTATIONS ARE SLOW

Nation-wide statistics on
behaviour and performance of students
Over 1,000 questions each administered to
several lakhs of students across the country

Having books improves reading ability
Having more books at home improves the performance of children when it
comes to reading. (But children typically only have only 1-10 books at home)
… but the impact in social is less
While having more books improves the reading % score by 8%, it only
increases the social % by 4%

Tuitions help very little
… but children of illiterate parents do
worse

Watching TV occasionally is good
Children who watch TV
every day don’t do as well
as children who watch TV
only once a week.
But children who never
watch TV fare the worst.
Watching TV every day
helps improve children’s
reading ability a little bit
more…
… but mathematical
abilities fall dramatically at
that point

Having educated parents helps most
This table shows the % improvement in score due to each factor
THIS TECHNIQUE CAN BE
APPLIED TO ANY DATASET

AUTOMATING ANALYSIS IN POULTRY FARMING
We group by every
input factor
… and calculate the
impact on every metric.
By moving from average to the best
group, what’s the improvement?
The actual performance
by each group is shown
0-3m 3-6m 6m-1yr 1-2 yrs > 2 yrs
11 12.3 12.7 15.3 16.1
Our product can create visualisations from data automatically, without any supervision.
Above is an example. Irrespective of the dataset, this visual shows which input parameters
have a significant impact on the output. Another such example is the cluster scatterplot.
Only significant results shown

68% correlation
between AUD & EUR
Plot of 6 month daily
AUD - EUR values
Block of correlated
currencies
… clustered
hierarchically

Restaurant: Product Sales Correlation

Restaurant: Product sales correlation

DEPLOY
MODERN
TOOLS
ANALYSIS IS
EVERYWHERE
CLUSTER PLOTS
CORRELATIONS
CROSS TABULATION
GROUP MEANS
KEYWORD EXTRACTION
NETWORK ANALYSIS
SANKEY DRILLDOWNS
SENTIMENT ANALYSIS
…
INFRASTRUCTURE FOR
RAPIDITY
COMMON COMPLAINT #3
IMPLEMENTATIONS ARE SLOW
BUILD AND USE
TEMPLATES

S ANAND, CHIEF DATA SCIENTIST, GRAMENER
THE CAPABILITIES ARE
IN YOUR REACH TODAY
EXPLORE THE ART OF DATA

Data monetization

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Data monetization

Similar to Data monetization (20)

More from Gramener

More from Gramener (20)

Recently uploaded

Recently uploaded (20)

Data monetization

Editor's Notes