“Story”fying your Data : How to go from Data to Insights to Stories
Shravan KumarDSS, Sept 14th, 2020
How to make yourself Indispensable in your Career with Data
How a nurse changed the course of a war using data storytelling
Nightingale, helped curtail the death rate from a whopping 40% to a mere 2%
3
Created by Florence Nightingale for Queen
Victoria during England’s war with France.
Visualizes deaths due to:
Red: War wounds
Black: Other war-related causes
Blue: Avoidable hospital diseases
4
INTRODUCTION
Shravan Kumar A
Director, Client Success
“Simplify Data Science for all”100+ Clients
Insights as Stories
Help start, apply and adopt Data Science
@sh_ra_van
/shravankumara
Introduction to Data Portraits
5
How to Create a Data Portrait
6
7Source: McKinsey – COVID-19 Briefing materials
COVID-19 Impact on Industries – A Perspective
8Source: McKinsey – COVID-19 Briefing materials
COVID-19 Impact on Industries – A Perspective
9
Companies are working to minimize COVID-19 impact and build resilience
1 Source: BCG Covid-19 report, Apr 2, 2020
2 Source: McKinsey - How CDOs can navigate COVID-19 response, Apr 2020
COVID-19 has disrupted every industry. All
sectors display an element of fragility and
are susceptible to shock.2
Industries at the forefront of the crisis are
relying on data to inform their response and
rebound strategies.
McKinsey1 suggests three waves of data-
driven actions that organizations can take:
1. Ensure data teams – and the whole
organization remain operational.
2. Lead solutions to prepare for the crisis-
triggered challenges.
3. Prepare for the next normal and get
ready to execute the plans.
The effects of the outbreak aren’t going away quickly. This realization has settled in.
10
DATA SCIENCE:
WHAT’S THE VALUE?
IT’S A RECESSION.
WHY DATA NOW?
REALITY CHECK: HOW
TO THRIVE?
11
Senior Data ScientistPrincipal AI StorytellerChief Data Wizard
FEELING LUCKY? HERE’S A DATA SCIENCE TITLE GENERATOR!
Data
Statistical
ML
AI
Chief
Principal
Senior
Junior
Associate
Deputy
Assistant
Scientist
Engineer
Analyst
Designer
Developer
Designer
Storyteller
Ninja
Chef
Wrangler
Evangelist
Rock Star
Wizard
Alchemist
Vanity keywords Areas Activities
12
BUZZWORDS AND BUSTED BUDGETS
13
THE JOURNEY FROM DATA TO DECISIONS
Data Engineering
MaturityPhases
Data Science
Data as
‘Culture’
Data
Collection
Data
Storage
Data
Transformation
Reporting Insights Consumption Decisions
Source: Article – When and how to build out your data science team
14
THE JOURNEY FROM DATA TO DECISIONS
Data Engineering Data Science
Data
Collection
Data
Storage
Data
Transformatio
n
Reporting Insights Consumption
MaturityPhases
Source: Article – When and how to build out your data science team
Data as
‘Culture’
Decisions
15
REPORTING: DESCRIPTIVE SUMMARIES
2019 Boston Chicago Detroit New York
Month Price Sales Price Sales Price Sales Price Sales
Jan 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
Feb 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
Mar 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
Apr 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
May 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
Jun 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
Jul 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
Aug 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
Sep 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
Oct 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
Nov 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
Average 9.0 7.50 9.0 7.50 9.0 7.50 9.0 7.50
Variance 10.0 3.75 10.0 3.75 10.0 3.75 10.0 3.75
Revenue numbers from four Cities
16
INSIGHT: PREDICTING TELCO CUSTOMER CHURN
Tenure (months)
0 - 12 36+12-36
Data Usage >
1.5 GB
01
YN
Bill > $65
0
N Y
• Simple Decision-tree model offered ~30% reduction in churn
• Advanced black-box models offered ~50%, but with low explainability
0Low Risk
1
High Risk
Source: Gramener
17
CONSUMPTION: WHEN ARE PEOPLE BORN IN THE US?
Source: https://gramener.com/posters/Birthdays.pdf
..so, conceptions
might happen here
Very high
births..
Love the Valentine’s?
Too busy holidaying?
Avoid April
Fool’s Day?
Unlucky 13th?
More births
Fewer births
18
More births
CONSUMPTION: WHAT’S THE BIRTH PATTERN IN INDIA?
Source: https://gramener.com/posters/Birthdays.pdf
Fewer births
Most births in
the first half
A striking birth pattern seen on the 5th, 10th,
15th, 20th and 25th of each month…
Very low births
Aug onwards
Why? Birthdates are ‘changed’ to
aid early school admissions
.. this is a typical
indication of fraud!
This adversely impacts children’s marks
It’s a well-established fact that older children tend to do
better at school in most activities. Since many children
have had their birth dates brought forward, these younger
children suffer.
The average marks of children “born” on the 1st, 5th, 10th, 15th etc.. of the month tend
to score lower marks.
• Are holidays avoided for births?
• Which months have a higher propensity for births, and why?
• Are there any patterns not found in the US data?
Higher marks Lower marks … on average, for children born on a given day of the year (from 2007 to 2013)
Children “born” on round numbered days score lower marks on average,
due to a higher proportion of younger children
Class Xth English Marks Distribution
0
5,000
10,000
15,000
20,000
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
Stories have four types of narratives to explain visualizations
Remember “SEAR”: Summarize, Explain, Annotate, Recommend 21
0
5,000
10,000
15,000
20,000
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
Marks
# students
Teachers add marks to stop some students from failing
This chart shows Class 10 students’ English
marks in Tamil Nadu, India, in 2011. The X-axis
has the mark a student has scored. The Y-axis
has the # of students who scored that mark.
Large number of
students score
exactly 35 marks
Few (but not 0) students
fail at 31-34 marks
What’s unusual
Large number of students
score 35 marks.
Few (but not 0) students score
between 30-35
Only some students get this benefit.
Identify a fair policy that will be applied consistently.
Summarize the visual in its title
Don’t describe the chart.
Don’t write the user’s question.
Write the answer itself. Like a headline.
Explain & interpret the visual
How should the user read it?
What do you say when you talk through it?
Explain what the visual is. Then the axes.
Then its contents. Then the inference.
Recommend an action
How should I act on this?
You need to change the audience.
(Otherwise, you made no difference.)
Annotate essential elements
What should the user focus their eyes on?
Point it out, or highlight it with colors
Interpret what they’re seeing – in words.
This is a bell curve. But the spike at 35 (the mark
at which students pass) is unusual. Teachers
must be adding marks to some of the students
who are likely to fail by a small margin.
No one scores 0-4
marks
An energy utility detected billing fraud
This plot shows the frequency of all meter readings from Apr-2010 to Mar-2011.
An unusually large number of readings are aligned with the slab boundaries.
Below is a simple histogram (or frequency distribution) of usage levels. Each bar represents the
number of customers with a customers with a specific bill amount (in units, or KWh).
Tariffs are based on the usage slab. Someone with 101 units is billed in full at a higher tariff than
someone with 100 units. So people have a strong incentive to stay at or within a slab boundary.
An energy utility (with over 50 million subscribers)
had 10 years worth of customer billing data
available.
Most fraud detection software failed to load the
data, and sampled data revealed little or no insight.
This can happen in one of two ways.
First, people may be monitoring their usage very
carefully, and turn off their lights and fans the instant
their usage hits the slab boundary.
Or, more realistically, there’s probably some level of corruption
involved, where customers pay a small sum to the meter reading staff
to ensure that it stays exactly at the slab boundary, giving them the
advantage of a lower price.
23
CONSUMPTION: DECODING MAHABHARATHA’S RELATIONSHIP
Source: https://gramener.com/mahabharatha/
24
INSIGHT + CONSUMPTION: DATA STORIES FROM THE WORLD BANK
Source: World bank storytelling, by Gramener
25
DATA & AI CAN SAVE LIVES
TOO
The Story of
Marikina City, Philippines
Link
• Highly urbanized city situated on the
river basin of Marikina
• Faced with huge flood hazard
levels. Better & resilient
infrastructure planning needed
• How can Urban planners plan for
better emergency evac & rescue?
• Can AI be applied to solve this
problem? If applied, how can the
urban planner understand it?
26
INSIGHT: IDENTIFYING QUALITY OF LIFE FROM SATELLITE IMAGES
Source: https://qol.gramener.com/
Data stories through Comicgen
An e.g. CoVID-19 Data Explained by Data Comics
Link
Comic character in a data callout:
Samuel L. Jackson
Harrison Ford
Morgan Freeman
Tom Hanks
Tom Cruise
Insights and Story telling approach
30
Stage 1- Identify
Business Problem
Define the problem
statement by understanding:
• What is the basic need
and desired outcome?
• Who will benefit?
• What is the impact?
• What is the success
criteria?
Stage 2- Translate to Data
Problem
• Breakdown the problem
statement into multiple use-
cases
• Connect each use case with
a data set
• Understand any limitations
on data sources- Internal
and External?
Stage 4- Translate to
Business Answer
• Stitch insights from
individual use case to
create a story
• Connect data story to help
in better decision making
• Measure success
Stage 3- Data Answer
Target each use case with
data through:
• EDA and transformation
• Modelling
• Generating insights
• Sales Rep
• Data Consultant
• Account Manager
• Solution Lead
• Analyst Lead
• Data Consultant
• Account Manager
• Solution Architect
• Solution Lead
• Analyst Lead
• Data Consultant
• Data Scientist
• Solution Architect
• Solution Lead
• Data Consultant
• Account Manager
• Solution Lead
In summary, here are the 9 steps to go from data to a data story
31
Who is your audience? They determine the story
What is their problem? That defines your analysis
Find the right analysis to solve the problem
Filter for big, useful, surprising insights
Start with the takeaway. Summarize your entire story
Add supporting analyses as a tree
Pick a format based on how your audience will consume the story
Pick a visual design based on the takeaway
Annotate to explain & engage. Use four types of narratives
32
DATA SCIENCE:
WHAT’S THE VALUE?
IT’S A RECESSION.
WHY DATA NOW?
REALITY CHECK: HOW
TO THRIVE?
33
1. Most Data Science projects solve the wrong Problem..
Tip #1: Master the application of knowledge
34
AI IS COMING FOR THE DATA SCIENCE JOBS
AI and automation will
do away with most of
the grunt work in the
data science workflow
today.
Applied knowledge will
keep you relevant for
much longer.
Wolbachia blocks dengue, Zika and chikungunya virus
transmission
Wolbachia mosquito releases
Adults Eggs Community
Model design
20,000 ppl / km2
15,000 ppl / km2
Identify where people live Detect buildings
Estimate human population
density
100m2
grids
e.g.
Site scoping
• Set boundary of potential
release area
• Identify the areas where
people live
• Map mosquito release points
over area with a grid
• Organise release area into
stages
39
2. Data Analytics needs a lot more than Data & Analytics..
Tip #2: Learn non-core skills
40
DATA SCIENCE SOLUTION: LET’S TAKE THIS EXAMPLE..
Source: World bank storytelling, by Gramener
41
..AND BREAK IT DOWN INTO THE BUILDING BLOCKS
Domain
Design
Analytics
Development
• Impact analytics
• Clustering techniques
• Business workflow
• Influencing factors
• Frontend/backend coding
• Data transformation
• User journey
• Visuals & aesthetics
Project
Management
• Piecing it all together
• Change management
42
HERE ARE THE 5 ROLES & SKILLS CRITICAL FOR DATA SCIENCE
Data
Translator
ML
Engineer
Information
Designer
Data
Scientist
Data Science
Manager
Comic characters from Gramener Comicgen library
Domain
Design
Analytics
Development
Project
Management
• Domain expertise
• Business analysis
• Solutioning
• Software engineering
• Front/back-end coding
• Data pipelining
• Information design
• User centered design
• Interface/visual design (parts)
• Stats & ML
• Interpret insights
• Scripting skills
• Project management
• Business analysis/solutioning
• Team handling
43
3. Data cleaning takes up a majority of time on projects..
Tip #3: Sharpen ability to handle data
44
In data science, 80% of the time is spent preparing data,
and the other 20% on complaining about preparing the data!
- Kirk Borne
“
45
4. Technology goes obsolete faster in Data Science..
Tip #4: Learn new tools quickly
46
WHAT DOES THE DATA TOOLS LANDSCAPE LOOK LIKE?
The tool does not matter. A person’s skill with the tool does.
Pick an ability to learn new tools rapidly
Source: https://mattturck.com/data2019/
47
EXAMPLE: WHAT ARE YOUR TOOL OPTIONS TO VISUALIZE DATA?
Code-based
Plug-n-
play
Flexibility
Complexity
Google Data Studio
Excel
Google Sheets
Tableau
Raw
Vismio
Datawrapper
Timeline JS
Polestar
Vega
Vega-lite
d3,
matplotlib
C3
High charts
Nvd3
Gramex
ggplot, bokeh
Plotly
Choose tools based on flexibility, your background and tool availability
48
Tip #4: Learn new tools quickly
Tip #2: Learn non-core skills
Tip #3: Sharpen ability to handle data
Tip #1: Master the application of knowledge
49
DATA SCIENCE:
WHAT’S THE VALUE?
IT’S A RECESSION.
WHY DATA NOW?
REALITY CHECK: HOW
TO THRIVE?
50
WHAT DOES THE RECESSION MEAN FOR JOBS IN DATA SCIENCE?
Source: McKinsey report – Lives and Livelihoods
Data jobs and specialized professions
are relatively less impacted
Industries with the lowest wages and
lowest educational attainment are hit
the hardest
51
HERE’S WHY DATA IS KEY FOR COVID-19 AND THE RECESSION
Enterprises
B
Community
C
Remote workforce & collaboration
Market demand & Cash flows1
2
Supply chain & Logistics3
Identifying vulnerability and contact-tracing
Tracking the COVID-19 patient lifecycle1
2
Predicting infection rates and spread2
Public Health
A
Understand behavioral shifts
Mapping the effectiveness of shutdown1
2
Address people concerns during Covid-193
Source: Gramener – NYC 311 analysisSource: Kinsa Health weather map Source: Gramener – Supply Chain flow
52
HOW DO YOU STAY RELEVANT AND GROW IN YOUR CAREER PATH?
Do your own
data projects
Read/Write on
data science
Maintain a public
portfolio
Compete, learn &
re-apply
Source: Article – How to demonstrate your passion for Data
53
@sh_ra_van
/shravankumara
Please help me improve the session by
answering the feedback survey that will
be sent to your email 
THANK YOU!
GRACIAS!
MERCI!

Storyfying your Data: How to go from Data to Insights to Stories

  • 1.
    “Story”fying your Data: How to go from Data to Insights to Stories Shravan KumarDSS, Sept 14th, 2020 How to make yourself Indispensable in your Career with Data
  • 2.
    How a nursechanged the course of a war using data storytelling
  • 3.
    Nightingale, helped curtailthe death rate from a whopping 40% to a mere 2% 3 Created by Florence Nightingale for Queen Victoria during England’s war with France. Visualizes deaths due to: Red: War wounds Black: Other war-related causes Blue: Avoidable hospital diseases
  • 4.
    4 INTRODUCTION Shravan Kumar A Director,Client Success “Simplify Data Science for all”100+ Clients Insights as Stories Help start, apply and adopt Data Science @sh_ra_van /shravankumara
  • 5.
  • 6.
    How to Createa Data Portrait 6
  • 7.
    7Source: McKinsey –COVID-19 Briefing materials COVID-19 Impact on Industries – A Perspective
  • 8.
    8Source: McKinsey –COVID-19 Briefing materials COVID-19 Impact on Industries – A Perspective
  • 9.
    9 Companies are workingto minimize COVID-19 impact and build resilience 1 Source: BCG Covid-19 report, Apr 2, 2020 2 Source: McKinsey - How CDOs can navigate COVID-19 response, Apr 2020 COVID-19 has disrupted every industry. All sectors display an element of fragility and are susceptible to shock.2 Industries at the forefront of the crisis are relying on data to inform their response and rebound strategies. McKinsey1 suggests three waves of data- driven actions that organizations can take: 1. Ensure data teams – and the whole organization remain operational. 2. Lead solutions to prepare for the crisis- triggered challenges. 3. Prepare for the next normal and get ready to execute the plans. The effects of the outbreak aren’t going away quickly. This realization has settled in.
  • 10.
    10 DATA SCIENCE: WHAT’S THEVALUE? IT’S A RECESSION. WHY DATA NOW? REALITY CHECK: HOW TO THRIVE?
  • 11.
    11 Senior Data ScientistPrincipalAI StorytellerChief Data Wizard FEELING LUCKY? HERE’S A DATA SCIENCE TITLE GENERATOR! Data Statistical ML AI Chief Principal Senior Junior Associate Deputy Assistant Scientist Engineer Analyst Designer Developer Designer Storyteller Ninja Chef Wrangler Evangelist Rock Star Wizard Alchemist Vanity keywords Areas Activities
  • 12.
  • 13.
    13 THE JOURNEY FROMDATA TO DECISIONS Data Engineering MaturityPhases Data Science Data as ‘Culture’ Data Collection Data Storage Data Transformation Reporting Insights Consumption Decisions Source: Article – When and how to build out your data science team
  • 14.
    14 THE JOURNEY FROMDATA TO DECISIONS Data Engineering Data Science Data Collection Data Storage Data Transformatio n Reporting Insights Consumption MaturityPhases Source: Article – When and how to build out your data science team Data as ‘Culture’ Decisions
  • 15.
    15 REPORTING: DESCRIPTIVE SUMMARIES 2019Boston Chicago Detroit New York Month Price Sales Price Sales Price Sales Price Sales Jan 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58 Feb 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76 Mar 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71 Apr 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84 May 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47 Jun 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04 Jul 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25 Aug 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50 Sep 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56 Oct 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91 Nov 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89 Average 9.0 7.50 9.0 7.50 9.0 7.50 9.0 7.50 Variance 10.0 3.75 10.0 3.75 10.0 3.75 10.0 3.75 Revenue numbers from four Cities
  • 16.
    16 INSIGHT: PREDICTING TELCOCUSTOMER CHURN Tenure (months) 0 - 12 36+12-36 Data Usage > 1.5 GB 01 YN Bill > $65 0 N Y • Simple Decision-tree model offered ~30% reduction in churn • Advanced black-box models offered ~50%, but with low explainability 0Low Risk 1 High Risk Source: Gramener
  • 17.
    17 CONSUMPTION: WHEN AREPEOPLE BORN IN THE US? Source: https://gramener.com/posters/Birthdays.pdf ..so, conceptions might happen here Very high births.. Love the Valentine’s? Too busy holidaying? Avoid April Fool’s Day? Unlucky 13th? More births Fewer births
  • 18.
    18 More births CONSUMPTION: WHAT’STHE BIRTH PATTERN IN INDIA? Source: https://gramener.com/posters/Birthdays.pdf Fewer births Most births in the first half A striking birth pattern seen on the 5th, 10th, 15th, 20th and 25th of each month… Very low births Aug onwards Why? Birthdates are ‘changed’ to aid early school admissions .. this is a typical indication of fraud!
  • 19.
    This adversely impactschildren’s marks It’s a well-established fact that older children tend to do better at school in most activities. Since many children have had their birth dates brought forward, these younger children suffer. The average marks of children “born” on the 1st, 5th, 10th, 15th etc.. of the month tend to score lower marks. • Are holidays avoided for births? • Which months have a higher propensity for births, and why? • Are there any patterns not found in the US data? Higher marks Lower marks … on average, for children born on a given day of the year (from 2007 to 2013) Children “born” on round numbered days score lower marks on average, due to a higher proportion of younger children
  • 20.
    Class Xth EnglishMarks Distribution 0 5,000 10,000 15,000 20,000 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
  • 21.
    Stories have fourtypes of narratives to explain visualizations Remember “SEAR”: Summarize, Explain, Annotate, Recommend 21 0 5,000 10,000 15,000 20,000 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Marks # students Teachers add marks to stop some students from failing This chart shows Class 10 students’ English marks in Tamil Nadu, India, in 2011. The X-axis has the mark a student has scored. The Y-axis has the # of students who scored that mark. Large number of students score exactly 35 marks Few (but not 0) students fail at 31-34 marks What’s unusual Large number of students score 35 marks. Few (but not 0) students score between 30-35 Only some students get this benefit. Identify a fair policy that will be applied consistently. Summarize the visual in its title Don’t describe the chart. Don’t write the user’s question. Write the answer itself. Like a headline. Explain & interpret the visual How should the user read it? What do you say when you talk through it? Explain what the visual is. Then the axes. Then its contents. Then the inference. Recommend an action How should I act on this? You need to change the audience. (Otherwise, you made no difference.) Annotate essential elements What should the user focus their eyes on? Point it out, or highlight it with colors Interpret what they’re seeing – in words. This is a bell curve. But the spike at 35 (the mark at which students pass) is unusual. Teachers must be adding marks to some of the students who are likely to fail by a small margin. No one scores 0-4 marks
  • 22.
    An energy utilitydetected billing fraud This plot shows the frequency of all meter readings from Apr-2010 to Mar-2011. An unusually large number of readings are aligned with the slab boundaries. Below is a simple histogram (or frequency distribution) of usage levels. Each bar represents the number of customers with a customers with a specific bill amount (in units, or KWh). Tariffs are based on the usage slab. Someone with 101 units is billed in full at a higher tariff than someone with 100 units. So people have a strong incentive to stay at or within a slab boundary. An energy utility (with over 50 million subscribers) had 10 years worth of customer billing data available. Most fraud detection software failed to load the data, and sampled data revealed little or no insight. This can happen in one of two ways. First, people may be monitoring their usage very carefully, and turn off their lights and fans the instant their usage hits the slab boundary. Or, more realistically, there’s probably some level of corruption involved, where customers pay a small sum to the meter reading staff to ensure that it stays exactly at the slab boundary, giving them the advantage of a lower price.
  • 23.
    23 CONSUMPTION: DECODING MAHABHARATHA’SRELATIONSHIP Source: https://gramener.com/mahabharatha/
  • 24.
    24 INSIGHT + CONSUMPTION:DATA STORIES FROM THE WORLD BANK Source: World bank storytelling, by Gramener
  • 25.
    25 DATA & AICAN SAVE LIVES TOO The Story of Marikina City, Philippines Link • Highly urbanized city situated on the river basin of Marikina • Faced with huge flood hazard levels. Better & resilient infrastructure planning needed • How can Urban planners plan for better emergency evac & rescue? • Can AI be applied to solve this problem? If applied, how can the urban planner understand it?
  • 26.
    26 INSIGHT: IDENTIFYING QUALITYOF LIFE FROM SATELLITE IMAGES Source: https://qol.gramener.com/
  • 27.
    Data stories throughComicgen An e.g. CoVID-19 Data Explained by Data Comics Link
  • 28.
    Comic character ina data callout:
  • 29.
    Samuel L. Jackson HarrisonFord Morgan Freeman Tom Hanks Tom Cruise
  • 30.
    Insights and Storytelling approach 30 Stage 1- Identify Business Problem Define the problem statement by understanding: • What is the basic need and desired outcome? • Who will benefit? • What is the impact? • What is the success criteria? Stage 2- Translate to Data Problem • Breakdown the problem statement into multiple use- cases • Connect each use case with a data set • Understand any limitations on data sources- Internal and External? Stage 4- Translate to Business Answer • Stitch insights from individual use case to create a story • Connect data story to help in better decision making • Measure success Stage 3- Data Answer Target each use case with data through: • EDA and transformation • Modelling • Generating insights • Sales Rep • Data Consultant • Account Manager • Solution Lead • Analyst Lead • Data Consultant • Account Manager • Solution Architect • Solution Lead • Analyst Lead • Data Consultant • Data Scientist • Solution Architect • Solution Lead • Data Consultant • Account Manager • Solution Lead
  • 31.
    In summary, hereare the 9 steps to go from data to a data story 31 Who is your audience? They determine the story What is their problem? That defines your analysis Find the right analysis to solve the problem Filter for big, useful, surprising insights Start with the takeaway. Summarize your entire story Add supporting analyses as a tree Pick a format based on how your audience will consume the story Pick a visual design based on the takeaway Annotate to explain & engage. Use four types of narratives
  • 32.
    32 DATA SCIENCE: WHAT’S THEVALUE? IT’S A RECESSION. WHY DATA NOW? REALITY CHECK: HOW TO THRIVE?
  • 33.
    33 1. Most DataScience projects solve the wrong Problem.. Tip #1: Master the application of knowledge
  • 34.
    34 AI IS COMINGFOR THE DATA SCIENCE JOBS AI and automation will do away with most of the grunt work in the data science workflow today. Applied knowledge will keep you relevant for much longer.
  • 35.
    Wolbachia blocks dengue,Zika and chikungunya virus transmission
  • 36.
  • 37.
    Model design 20,000 ppl/ km2 15,000 ppl / km2 Identify where people live Detect buildings Estimate human population density 100m2 grids e.g.
  • 38.
    Site scoping • Setboundary of potential release area • Identify the areas where people live • Map mosquito release points over area with a grid • Organise release area into stages
  • 39.
    39 2. Data Analyticsneeds a lot more than Data & Analytics.. Tip #2: Learn non-core skills
  • 40.
    40 DATA SCIENCE SOLUTION:LET’S TAKE THIS EXAMPLE.. Source: World bank storytelling, by Gramener
  • 41.
    41 ..AND BREAK ITDOWN INTO THE BUILDING BLOCKS Domain Design Analytics Development • Impact analytics • Clustering techniques • Business workflow • Influencing factors • Frontend/backend coding • Data transformation • User journey • Visuals & aesthetics Project Management • Piecing it all together • Change management
  • 42.
    42 HERE ARE THE5 ROLES & SKILLS CRITICAL FOR DATA SCIENCE Data Translator ML Engineer Information Designer Data Scientist Data Science Manager Comic characters from Gramener Comicgen library Domain Design Analytics Development Project Management • Domain expertise • Business analysis • Solutioning • Software engineering • Front/back-end coding • Data pipelining • Information design • User centered design • Interface/visual design (parts) • Stats & ML • Interpret insights • Scripting skills • Project management • Business analysis/solutioning • Team handling
  • 43.
    43 3. Data cleaningtakes up a majority of time on projects.. Tip #3: Sharpen ability to handle data
  • 44.
    44 In data science,80% of the time is spent preparing data, and the other 20% on complaining about preparing the data! - Kirk Borne “
  • 45.
    45 4. Technology goesobsolete faster in Data Science.. Tip #4: Learn new tools quickly
  • 46.
    46 WHAT DOES THEDATA TOOLS LANDSCAPE LOOK LIKE? The tool does not matter. A person’s skill with the tool does. Pick an ability to learn new tools rapidly Source: https://mattturck.com/data2019/
  • 47.
    47 EXAMPLE: WHAT AREYOUR TOOL OPTIONS TO VISUALIZE DATA? Code-based Plug-n- play Flexibility Complexity Google Data Studio Excel Google Sheets Tableau Raw Vismio Datawrapper Timeline JS Polestar Vega Vega-lite d3, matplotlib C3 High charts Nvd3 Gramex ggplot, bokeh Plotly Choose tools based on flexibility, your background and tool availability
  • 48.
    48 Tip #4: Learnnew tools quickly Tip #2: Learn non-core skills Tip #3: Sharpen ability to handle data Tip #1: Master the application of knowledge
  • 49.
    49 DATA SCIENCE: WHAT’S THEVALUE? IT’S A RECESSION. WHY DATA NOW? REALITY CHECK: HOW TO THRIVE?
  • 50.
    50 WHAT DOES THERECESSION MEAN FOR JOBS IN DATA SCIENCE? Source: McKinsey report – Lives and Livelihoods Data jobs and specialized professions are relatively less impacted Industries with the lowest wages and lowest educational attainment are hit the hardest
  • 51.
    51 HERE’S WHY DATAIS KEY FOR COVID-19 AND THE RECESSION Enterprises B Community C Remote workforce & collaboration Market demand & Cash flows1 2 Supply chain & Logistics3 Identifying vulnerability and contact-tracing Tracking the COVID-19 patient lifecycle1 2 Predicting infection rates and spread2 Public Health A Understand behavioral shifts Mapping the effectiveness of shutdown1 2 Address people concerns during Covid-193 Source: Gramener – NYC 311 analysisSource: Kinsa Health weather map Source: Gramener – Supply Chain flow
  • 52.
    52 HOW DO YOUSTAY RELEVANT AND GROW IN YOUR CAREER PATH? Do your own data projects Read/Write on data science Maintain a public portfolio Compete, learn & re-apply Source: Article – How to demonstrate your passion for Data
  • 53.
    53 @sh_ra_van /shravankumara Please help meimprove the session by answering the feedback survey that will be sent to your email  THANK YOU! GRACIAS! MERCI!