SlideShare a Scribd company logo
The Art Of Data Analysis

Karthik Shashidhar
Quant Consultant

                © Karthik Shashidhar

Six-step process

  Case Study

Common Pitfalls

    © Karthik Shashidhar
Why do you need this workshop?

We are moving to an increasingly data-driven world

Ability to use data for day-to-day decision-making
can prove to be a massive competitive advantage

This workshop equips managers with basic tools for
                dealing with data
                     © Karthik Shashidhar
Who needs this workshop?

                        What is the optimal level of sales
 Sales Managers         commissions in order to maximize

   Production         How do we set daily production targets
   Managers           given probabilities of line shut downs?

                       What are the factors that determine
  HR Managers
                              employee attrition?

This workshop is suitable for personnel in middle to
     senior management roles across functions
                      © Karthik Shashidhar

Six-step process

  Case Study

Common Pitfalls

    © Karthik Shashidhar
Frame a clear and concise problem

                                Break down your problem into
                                smaller problems, and then use
                                 those to generate hypotheses

                                 Gather, clean and prepare data
A structured, iterative
approach to data-driven
decision making                 Test hypotheses. In the process,
                                generate additional hypotheses

                                 Consolidate results to solve the
                                         main problem

                                    Make the data tell a story

                          © Karthik Shashidhar

Six-step process

  Case Study

Common Pitfalls

    © Karthik Shashidhar
The Rs. 32 Poverty Line

 Based on data from the 66th NSSO Survey, the Planning
Commission fixed the “Poverty Line” at Rs. 32 per person
 per day for people living in urban areas. This has led to
 much controversy and protests. The Prime Minister has
   asked for your inputs. What do you recommend?

                         © Karthik Shashidhar
Frame a clear and concise problem

                           Break down your problem into
                           smaller problems, and then use
                            those to generate hypotheses

                            Gather, clean and prepare data

For your reference
                           Test hypotheses. In the process,
                           generate additional hypotheses

                            Consolidate results to solve the
                                    main problem

                               Make the data tell a story

                     © Karthik Shashidhar
Frame a clear and concise problem
How would you frame the problem
    statement for this one?                             Break down your problem into
                                                        smaller problems, and then use
                                                         those to generate hypotheses
• Your client may not have framed the
  question precisely. You need to do
  that job and frame a precise problem                  Gather, clean and prepare data
• “Solving this problem” should tell
  you everything you want to know                       Test hypotheses. In the process,
  from your analysis                                    generate additional hypotheses
• Be concise, so that you remain
  focused towards answering your
  question                                              Consolidate results to solve the
• Frame your question such that it has                          main problem
  an objective answer. Yes/No
  questions or questions with
  numerical answers are preferred                          Make the data tell a story

                                © Karthik Shashidhar
Frame a clear and concise problem
Has the poverty line been set
 too low at Rs. 32 per day?                               Break down your problem into
                                                          smaller problems, and then use
                                                           those to generate hypotheses
• This problem statement has an
  objective answer (yes/no)
• The solution to this will be necessary                  Gather, clean and prepare data
  and sufficient to answer the
  question our client (the PM)
  demands                                                 Test hypotheses. In the process,
• The question addresses directly the                     generate additional hypotheses
  situation (people complaining that
  the poverty line has been set too
  low)                                                    Consolidate results to solve the
• This problem statement is to the                                main problem
  point and doesn’t take on additional
  responsibilities (such as defining an
  alternate poverty line)                                    Make the data tell a story

                                  © Karthik Shashidhar
Frame a clear and concise problem
What problems do we need to                                             statement

 solve in order to solve the
                                                             Break down your problem into
      main problem?                                          smaller problems, and then use
                                                              those to generate hypotheses

•   The set of “level two problems” must be
    precise and complete, in that:                           Gather, clean and prepare data
      • The combination of solution of all
         level two problems leads to the
         solution of the main problem
      • The solution of each level two                       Test hypotheses. In the process,
         problem directly impacts the main                   generate additional hypotheses
•   Once again, it is key to frame problems
    concisely and with objective answers
•   We need not stop at two levels. Some                     Consolidate results to solve the
                                                                     main problem
    level two problems might require
    solution of deeper problems. Add them
    to the list of sub-problems
                                                                Make the data tell a story

                                     © Karthik Shashidhar
Frame a clear and concise problem
   What do we need to know to                                          statement

answer “Has the poverty line been
                                                            Break down your problem into
  set too low at Rs. 32 per day?”                           smaller problems, and then use
                                                             those to generate hypotheses
• How is “poverty line” defined?
• What are the implications of poverty
  line?                                                     Gather, clean and prepare data
• What is the distribution of income in
• Does the distribution of income vary                      Test hypotheses. In the process,
  across states? If it varies significantly                 generate additional hypotheses
  does it make sense to have a state-
  wise poverty line?
• What are the essential goods that                         Consolidate results to solve the
  most people need?                                                 main problem
• For a given income level, what
  essential goods can a person afford?
                                                               Make the data tell a story

                                    © Karthik Shashidhar
Frame a clear and concise problem
Problems generate sub-problems,                                     statement

  and some of these will lead to
                                                         Break down your problem into
          hypotheses.                                    smaller problems, and then use
                                                          those to generate hypotheses

                                                         Gather, clean and prepare data
• Hypothesis1: There is significant
  difference in income level across
• Hypothesis2: Essential goods are                       Test hypotheses. In the process,
                                                         generate additional hypotheses
  those that the poorest people
  consume. Also, their use flattens out
  as income goes up
                                                         Consolidate results to solve the
                                                                 main problem

                                                            Make the data tell a story

                                 © Karthik Shashidhar
Frame a clear and concise problem
   Some problems, however, are                                        statement
direct, and don’t need hypotheses.
 Some are qualitative while others                         Break down your problem into
             need data                                     smaller problems, and then use
                                                            those to generate hypotheses
• Question1: How is “poverty line”
   • Poverty line is the minimum                           Gather, clean and prepare data
       income level that is deemed
   • If a family is “below poverty                         Test hypotheses. In the process,
       line” it qualifies for additional                   generate additional hypotheses
       state benefits
• Question2: What is the distribution
  of incomes in each state?                                Consolidate results to solve the
• Question3: Is there some kind of a                               main problem
  threshold about the proportion of
  population that can be below
  poverty line?                                               Make the data tell a story

                                   © Karthik Shashidhar
Frame a clear and concise problem

  What data do you need here?
                                                        Break down your problem into
                                                        smaller problems, and then use
                                                         those to generate hypotheses

• It is important to frame problem and                  Gather, clean and prepare data
  break it down into components
  before listing data requirements,
  else data could bias you                              Test hypotheses. In the process,
• Define data requirements in a                         generate additional hypotheses
  general fashion, to allow you to
  easily access proxies
• Remember to gather data that both                     Consolidate results to solve the
  answers your questions and will                               main problem
  allow you to test your hypotheses

                                                           Make the data tell a story

                                © Karthik Shashidhar
Frame a clear and concise problem
   Once you’ve identified data                                    statement

requirements, identify sources and
                                                       Break down your problem into
          gather data                                  smaller problems, and then use
                                                        those to generate hypotheses

                                                       Gather, clean and prepare data
• Here we need
   • Distribution of a measure of
      income for India
   • Distribution of a measure of                      Test hypotheses. In the process,
                                                       generate additional hypotheses
      income for each state
   • Spending patterns for different
      income levels
   • Data on household sizes in                        Consolidate results to solve the
                                                               main problem
      different states

                                                          Make the data tell a story

                               © Karthik Shashidhar
Frame a clear and concise problem
   Once you’ve identified data                                       statement

requirements, identify sources and
                                                          Break down your problem into
          gather data                                     smaller problems, and then use
                                                           those to generate hypotheses

• The National Sample Survey                              Gather, clean and prepare data
  Organization (NSSO) conducts
  surveys every 5 years about income
  and expenditure, so we could                            Test hypotheses. In the process,
  perhaps use this                                        generate additional hypotheses
• However, income data gathered from
  surveys are notorious with respect to
  quality                                                 Consolidate results to solve the
• Poor have little savings so their total                         main problem
  consumption is a better indicator of
  income than the income data
                                                             Make the data tell a story

                                  © Karthik Shashidhar
Frame a clear and concise problem
    Data cleaning is an ugly but
          important step                                  Break down your problem into
                                                          smaller problems, and then use
                                                           those to generate hypotheses
• It is important to make sure names
  from data procured from different
  sources match                                           Gather, clean and prepare data
    • For example, some government
         sites say “AndhraPradesh”, while
         others say “Andhra Pradesh”.                     Test hypotheses. In the process,
         Fails if you want to do a join                   generate additional hypotheses
• If data set is small, go through it
  once to check numbers for
  consistency. For example, if you have                   Consolidate results to solve the
  data on percentages, make sure it                               main problem
  adds up to 100%
• For larger data sets, try write scripts
  to do basic cleaning                                       Make the data tell a story

                                  © Karthik Shashidhar
Frame a clear and concise problem
  Understand and prepare data
  before you dive into analysis                         Break down your problem into
                                                        smaller problems, and then use
                                                         those to generate hypotheses

• Get a general feel for the numbers                    Gather, clean and prepare data
  before getting into the analysis
• Simple visualization techniques such
  as scatter plots and density plots                    Test hypotheses. In the process,
  help                                                  generate additional hypotheses
• Use simple summary statistics
  (mean, median, SD, quartiles) to get
  a better feel for the data                            Consolidate results to solve the
• Check out what different functional                           main problem
  forms of your data look like

                                                           Make the data tell a story

                                © Karthik Shashidhar
Frame a clear and concise problem
While testing hypotheses, be on the                                    statement

        lookout for anything
                                                            Break down your problem into
         interesting/unusual                                smaller problems, and then use
                                                             those to generate hypotheses
 • It is impossible to generate all
   possible hypotheses before you
   begin the analysis                                       Gather, clean and prepare data
 • Usually, as you test out some
   hypotheses, something in the data
   will stand out which will lead to                        Test hypotheses. In the process,
   further hypotheses                                       generate additional hypotheses
 • It is ok to generate these
   hypotheses, which is what makes it
   an iterative process                                     Consolidate results to solve the
 • However, one needs to be careful to                              main problem
   not stray from the original objective
   – each new hypothesis should
   directly tie in to the original question                    Make the data tell a story

                                    © Karthik Shashidhar
Frame a clear and concise problem

        Consolidate results
                                                        Break down your problem into
                                                        smaller problems, and then use
                                                         those to generate hypotheses

• Build up your case in a bottom-up
  manner                                                Gather, clean and prepare data
• Sometimes different pieces of
  analysis can throw up contradictory
  inferences. Check, and reconcile                      Test hypotheses. In the process,
  before you integrate                                  generate additional hypotheses
• Make sure all components of the
  solution that you required are
  available                                             Consolidate results to solve the
• Don’t include results in the final                            main problem
  analysis unless it makes a definite
  contribution to the final solution
                                                           Make the data tell a story

                                © Karthik Shashidhar
Frame a clear and concise problem

     Use graphics intelligently!
                                                                 Break down your problem into
                                                                 smaller problems, and then use
                                                                  those to generate hypotheses

•   A picture is worth a thousand words, so
    use clear and easy-to-use visualizations                     Gather, clean and prepare data
    to communicate your findings
•   Use visualizations that make the solution
    self-evident, rather than something that
    requires a lot of explanation                                Test hypotheses. In the process,
•   Use your graphics to communicate, not                        generate additional hypotheses
    to confuse. If the intent of a graphic is to
    confuse, it is better to leave out that
•   Sometimes all it takes to solve the                          Consolidate results to solve the
                                                                         main problem
    problem is to visualize the data from a
    different perspective!

                                                                    Make the data tell a story

                                         © Karthik Shashidhar
Frame a clear and concise problem
This graphic shows the decile in                               statement

which Rs. 32 per day (Rs. 960 per
                                                    Break down your problem into
 month) would fall in each state                    smaller problems, and then use
                                                     those to generate hypotheses

                                                    Gather, clean and prepare data

                                                    Test hypotheses. In the process,
                                                    generate additional hypotheses

                                                    Consolidate results to solve the
                                                            main problem

                                                       Make the data tell a story

                            © Karthik Shashidhar

Six-step process

  Case Study

Common Pitfalls

    © Karthik Shashidhar
Correlation does                   Beware of
                          not imply                       anecdotal
                           causality                      evidence
    Beware of                                                          Don’t overfit
     Outliers                                                            models

                          Data-driven inference is
                       fraught with pitfalls. Drawing                  Contradictory
Start with getting
a feel for the data
                        the wrong conclusion out of                   inferences from
                      data is easier than drawing the                    same data

                              right conclusion.
  Don’t simply                                                          Don’t over-
throw everything                                                        complicate
   into the mix                                       Models can         graphics
                        Graphics can
                          deceive                     misbehave

                                   © Karthik Shashidhar
Outliers can
significantly distort

                        © Karthik Shashidhar
everything into the
   mix” may not
always produce an
  accurate model

                      © Karthik Shashidhar
It could lead to
  for example

     According to this regression, the tallest person should have an
  extremely large right foot and a tiny left foot! That makes no sense!
                                 © Karthik Shashidhar
Over-fitting can
   lead to spurious

It helps to keep your models as simple as possible. A simple rule of thumb – a
       good model is one that can be easily explained in simple English

                                    © Karthik Shashidhar
Diving into model
fitting without first
understanding the
  data can lead to
suboptimal results

   People are prone to doing regressions without actually
 looking at the data. Here, a simple linear regression gives a
                  reasonable fit (R^2 = 42%).
   However, a simple scatter plot would suggest a clear Y=
  1/X kind of relationship which the regression completely
                        misses out on
                           © Karthik Shashidhar
inferences can be
 derived from the
    same data

                    © Karthik Shashidhar







      0   2   4   6   8   10   12   14     16    140


           Choice of axes and                    120

            scales can have a                    110

          significant impact on                  100

            the message your                      90

             graphic conveys                      80
                                                       0    2   4   6   8   10   12   14   16

                                         © Karthik Shashidhar
Correlation does not imply causality

              © Karthik Shashidhar
Mistaking correlation for
 causality can lead to
 hilarious conclusions

                            © Karthik Shashidhar
Readers get turned
    off by overly
complicated graphics

                       © Karthik Shashidhar
insufficient data
can lead to false

                    © Karthik Shashidhar
A model is just
that: a model. It is
 not a substitute
    for reality

                       © Karthik Shashidhar
The Art of Data Analysis will be further illustrated
by means of a detailed Case Study relevant to your

    For a half-day workshop on The Art of Data Analysis
  (including a case study), contact Karthik Shashidhar at

                         © Karthik Shashidhar

More Related Content

Viewers also liked

Information needs and user studies
Information needs and user studiesInformation needs and user studies
Information needs and user studies
Chihwei Liu
Scenario Analysis Use Case: 3G/4G Wireless Data
Scenario Analysis Use Case: 3G/4G Wireless DataScenario Analysis Use Case: 3G/4G Wireless Data
Scenario Analysis Use Case: 3G/4G Wireless Data
August Jackson
內容分析法(Content Analysis)
內容分析法(Content Analysis)內容分析法(Content Analysis)
內容分析法(Content Analysis)
Chihwei Liu
Beckett Hsieh
Sentiment Analysis Training Guide [Simplified Chinese]
Sentiment Analysis Training Guide [Simplified Chinese]Sentiment Analysis Training Guide [Simplified Chinese]
Sentiment Analysis Training Guide [Simplified Chinese]
Michael Ding
Critical discourse analysis and an application
Critical discourse analysis and an applicationCritical discourse analysis and an application
Critical discourse analysis and an application
Suaad Zahawi
Analytical Thinking Training
Analytical Thinking TrainingAnalytical Thinking Training
Analytical Thinking Training
M Furqan Aslam
Machine Learning Introduction
Machine Learning IntroductionMachine Learning Introduction
Machine Learning Introduction
Mark Chang
客戶數據分析四大難題一次解決: IBM 數據分析解決方案
客戶數據分析四大難題一次解決:  IBM 數據分析解決方案客戶數據分析四大難題一次解決:  IBM 數據分析解決方案
客戶數據分析四大難題一次解決: IBM 數據分析解決方案
Randy Lin
Wanju Wang
Stockflare 强大的股票筛选工具嘉维证券合作伙伴独享
Stockflare 强大的股票筛选工具嘉维证券合作伙伴独享Stockflare 强大的股票筛选工具嘉维证券合作伙伴独享
Stockflare 强大的股票筛选工具嘉维证券合作伙伴独享
Shane Leonard, CFA
20161017 R語言資料分析實務 (2)
20161017 R語言資料分析實務 (2)20161017 R語言資料分析實務 (2)
20161017 R語言資料分析實務 (2)
Wanju Wang
[系列活動] 手把手教你R語言資料分析實務
[系列活動] 手把手教你R語言資料分析實務[系列活動] 手把手教你R語言資料分析實務
[系列活動] 手把手教你R語言資料分析實務
Chen-Pan Liao
[SDX2016] 網站分析工作的領悟 / 鍾喬后 Isobar 安索帕 資料分析經理
[SDX2016] 網站分析工作的領悟 / 鍾喬后 Isobar 安索帕 資料分析經理[SDX2016] 網站分析工作的領悟 / 鍾喬后 Isobar 安索帕 資料分析經理
[SDX2016] 網站分析工作的領悟 / 鍾喬后 Isobar 安索帕 資料分析經理
綠生活 GreenLife
NT150 Com

Viewers also liked (20)

Information needs and user studies
Information needs and user studiesInformation needs and user studies
Information needs and user studies
Scenario Analysis Use Case: 3G/4G Wireless Data
Scenario Analysis Use Case: 3G/4G Wireless DataScenario Analysis Use Case: 3G/4G Wireless Data
Scenario Analysis Use Case: 3G/4G Wireless Data
內容分析法(Content Analysis)
內容分析法(Content Analysis)內容分析法(Content Analysis)
內容分析法(Content Analysis)
Sentiment Analysis Training Guide [Simplified Chinese]
Sentiment Analysis Training Guide [Simplified Chinese]Sentiment Analysis Training Guide [Simplified Chinese]
Sentiment Analysis Training Guide [Simplified Chinese]
Critical discourse analysis and an application
Critical discourse analysis and an applicationCritical discourse analysis and an application
Critical discourse analysis and an application
Analytical Thinking Training
Analytical Thinking TrainingAnalytical Thinking Training
Analytical Thinking Training
Machine Learning Introduction
Machine Learning IntroductionMachine Learning Introduction
Machine Learning Introduction
客戶數據分析四大難題一次解決: IBM 數據分析解決方案
客戶數據分析四大難題一次解決:  IBM 數據分析解決方案客戶數據分析四大難題一次解決:  IBM 數據分析解決方案
客戶數據分析四大難題一次解決: IBM 數據分析解決方案
Stockflare 强大的股票筛选工具嘉维证券合作伙伴独享
Stockflare 强大的股票筛选工具嘉维证券合作伙伴独享Stockflare 强大的股票筛选工具嘉维证券合作伙伴独享
Stockflare 强大的股票筛选工具嘉维证券合作伙伴独享
20161017 R語言資料分析實務 (2)
20161017 R語言資料分析實務 (2)20161017 R語言資料分析實務 (2)
20161017 R語言資料分析實務 (2)
[系列活動] 手把手教你R語言資料分析實務
[系列活動] 手把手教你R語言資料分析實務[系列活動] 手把手教你R語言資料分析實務
[系列活動] 手把手教你R語言資料分析實務
[SDX2016] 網站分析工作的領悟 / 鍾喬后 Isobar 安索帕 資料分析經理
[SDX2016] 網站分析工作的領悟 / 鍾喬后 Isobar 安索帕 資料分析經理[SDX2016] 網站分析工作的領悟 / 鍾喬后 Isobar 安索帕 資料分析經理
[SDX2016] 網站分析工作的領悟 / 鍾喬后 Isobar 安索帕 資料分析經理

Similar to The art of data analysis

7 steps to master problem solving
7 steps to master problem solving7 steps to master problem solving
7 steps to master problem solving
Yuri Kaminski
Developing a Project Plan
Developing a Project PlanDeveloping a Project Plan
Developing a Project Plan
Problem Solving & Critical Thinking
Problem Solving & Critical ThinkingProblem Solving & Critical Thinking
Problem Solving & Critical Thinking
TKMG, Inc.
Basic tool for improvement.pdf
Basic tool for improvement.pdfBasic tool for improvement.pdf
Basic tool for improvement.pdf
Twelve Heuristics for Solving Tough Problems—Faster and Better
Twelve Heuristics for Solving Tough Problems—Faster and BetterTwelve Heuristics for Solving Tough Problems—Faster and Better
Twelve Heuristics for Solving Tough Problems—Faster and Better
TechWell - Structured Problem Solving & Hypothesis Generation - Structured Problem Solving & Hypothesis - Structured Problem Solving & Hypothesis Generation - Structured Problem Solving & Hypothesis Generation
David Tracy
Is it worth it agile2012 0
Is it worth it agile2012 0Is it worth it agile2012 0
Is it worth it agile2012 0drewz lin
How to Start Thinking Like a Data Scientist
How to Start Thinking Like a Data ScientistHow to Start Thinking Like a Data Scientist
How to Start Thinking Like a Data Scientist
Decision making & problem solving
Decision making & problem solvingDecision making & problem solving
Decision making & problem solvingashish1afmi
Strategic Planning & Deployment Using The X Matrix W225
Strategic Planning & Deployment Using The X Matrix W225Strategic Planning & Deployment Using The X Matrix W225
Strategic Planning & Deployment Using The X Matrix W225
Robert Mitchell
T4 case analysis_workbook_may_2011
T4 case analysis_workbook_may_2011T4 case analysis_workbook_may_2011
T4 case analysis_workbook_may_2011Alex
T4 case analysis_workbook_may_2011
T4 case analysis_workbook_may_2011T4 case analysis_workbook_may_2011
T4 case analysis_workbook_may_2011
Dr. Sudarshan Rao K
ExactTarget & Crown Audience Builder
ExactTarget & Crown Audience BuilderExactTarget & Crown Audience Builder
ExactTarget & Crown Audience Builder
Is It Worth It? Using A Business Value Model To Guide Decisions
Is It Worth It?  Using A Business Value Model To Guide DecisionsIs It Worth It?  Using A Business Value Model To Guide Decisions
Is It Worth It? Using A Business Value Model To Guide Decisions
Kent McDonald
Analytical thinking training
Analytical thinking trainingAnalytical thinking training
Analytical thinking trainingras1215

Similar to The art of data analysis (20)

7 steps to master problem solving
7 steps to master problem solving7 steps to master problem solving
7 steps to master problem solving
Developing a Project Plan
Developing a Project PlanDeveloping a Project Plan
Developing a Project Plan
Problem Solving & Critical Thinking
Problem Solving & Critical ThinkingProblem Solving & Critical Thinking
Problem Solving & Critical Thinking
Basic tool for improvement.pdf
Basic tool for improvement.pdfBasic tool for improvement.pdf
Basic tool for improvement.pdf
Twelve Heuristics for Solving Tough Problems—Faster and Better
Twelve Heuristics for Solving Tough Problems—Faster and BetterTwelve Heuristics for Solving Tough Problems—Faster and Better
Twelve Heuristics for Solving Tough Problems—Faster and Better - Structured Problem Solving & Hypothesis Generation - Structured Problem Solving & Hypothesis - Structured Problem Solving & Hypothesis Generation - Structured Problem Solving & Hypothesis Generation
Is it worth it agile2012 0
Is it worth it agile2012 0Is it worth it agile2012 0
Is it worth it agile2012 0
How to Start Thinking Like a Data Scientist
How to Start Thinking Like a Data ScientistHow to Start Thinking Like a Data Scientist
How to Start Thinking Like a Data Scientist
Decision making & problem solving
Decision making & problem solvingDecision making & problem solving
Decision making & problem solving
Strategic planning & execution using the x matrix w225
Strategic planning & execution using the x matrix w225Strategic planning & execution using the x matrix w225
Strategic planning & execution using the x matrix w225
Strategic Planning & Deployment Using The X Matrix W225
Strategic Planning & Deployment Using The X Matrix W225Strategic Planning & Deployment Using The X Matrix W225
Strategic Planning & Deployment Using The X Matrix W225
T4 case analysis_workbook_may_2011
T4 case analysis_workbook_may_2011T4 case analysis_workbook_may_2011
T4 case analysis_workbook_may_2011
T4 case analysis_workbook_may_2011
T4 case analysis_workbook_may_2011T4 case analysis_workbook_may_2011
T4 case analysis_workbook_may_2011
ExactTarget & Crown Audience Builder
ExactTarget & Crown Audience BuilderExactTarget & Crown Audience Builder
ExactTarget & Crown Audience Builder
Is It Worth It? Using A Business Value Model To Guide Decisions
Is It Worth It?  Using A Business Value Model To Guide DecisionsIs It Worth It?  Using A Business Value Model To Guide Decisions
Is It Worth It? Using A Business Value Model To Guide Decisions
Analytical thinking training
Analytical thinking trainingAnalytical thinking training
Analytical thinking training
Dig Deeper And Sell Faster
Dig Deeper And Sell FasterDig Deeper And Sell Faster
Dig Deeper And Sell Faster

More from Karthik Shashidhar

Berry's Quiz 5th September 2021
Berry's Quiz 5th September 2021Berry's Quiz 5th September 2021
Berry's Quiz 5th September 2021
Karthik Shashidhar
Berrys Quiz 15th August 2021
Berrys Quiz 15th August 2021Berrys Quiz 15th August 2021
Berrys Quiz 15th August 2021
Karthik Shashidhar
Berrys Quiz 1st August 2021
Berrys Quiz 1st August 2021Berrys Quiz 1st August 2021
Berrys Quiz 1st August 2021
Karthik Shashidhar
Berry's Quiz 25th July 2021
Berry's Quiz 25th July 2021Berry's Quiz 25th July 2021
Berry's Quiz 25th July 2021
Karthik Shashidhar
Berry's Quiz 18th July 2021
Berry's Quiz 18th July 2021Berry's Quiz 18th July 2021
Berry's Quiz 18th July 2021
Karthik Shashidhar
Berry's Quiz 11th July 2021
Berry's Quiz 11th July 2021Berry's Quiz 11th July 2021
Berry's Quiz 11th July 2021
Karthik Shashidhar
Berry's Quiz 4th July
Berry's Quiz 4th JulyBerry's Quiz 4th July
Berry's Quiz 4th July
Karthik Shashidhar
Berry's Quiz 27th June
Berry's Quiz 27th JuneBerry's Quiz 27th June
Berry's Quiz 27th June
Karthik Shashidhar
Berry's Quiz 20th June
Berry's Quiz 20th JuneBerry's Quiz 20th June
Berry's Quiz 20th June
Karthik Shashidhar
Berry's Quiz 13th June
Berry's Quiz 13th JuneBerry's Quiz 13th June
Berry's Quiz 13th June
Karthik Shashidhar
Berry's Quiz 6th June
Berry's Quiz 6th JuneBerry's Quiz 6th June
Berry's Quiz 6th June
Karthik Shashidhar
Berry's Quiz 30th May
Berry's Quiz 30th MayBerry's Quiz 30th May
Berry's Quiz 30th May
Karthik Shashidhar
Berry's Quiz 23rd May
Berry's Quiz 23rd MayBerry's Quiz 23rd May
Berry's Quiz 23rd May
Karthik Shashidhar
Berry's Quiz 16 May
Berry's Quiz 16 MayBerry's Quiz 16 May
Berry's Quiz 16 May
Karthik Shashidhar
Bespoke Data Insights at New Finance
Bespoke Data Insights at New FinanceBespoke Data Insights at New Finance
Bespoke Data Insights at New Finance
Karthik Shashidhar
Importance of coalitions
Importance of coalitionsImportance of coalitions
Importance of coalitions
Karthik Shashidhar
Hubbub-a The 6th KQA Bangalore Quiz
Hubbub-a The 6th KQA Bangalore QuizHubbub-a The 6th KQA Bangalore Quiz
Hubbub-a The 6th KQA Bangalore Quiz
Karthik Shashidhar

More from Karthik Shashidhar (17)

Berry's Quiz 5th September 2021
Berry's Quiz 5th September 2021Berry's Quiz 5th September 2021
Berry's Quiz 5th September 2021
Berrys Quiz 15th August 2021
Berrys Quiz 15th August 2021Berrys Quiz 15th August 2021
Berrys Quiz 15th August 2021
Berrys Quiz 1st August 2021
Berrys Quiz 1st August 2021Berrys Quiz 1st August 2021
Berrys Quiz 1st August 2021
Berry's Quiz 25th July 2021
Berry's Quiz 25th July 2021Berry's Quiz 25th July 2021
Berry's Quiz 25th July 2021
Berry's Quiz 18th July 2021
Berry's Quiz 18th July 2021Berry's Quiz 18th July 2021
Berry's Quiz 18th July 2021
Berry's Quiz 11th July 2021
Berry's Quiz 11th July 2021Berry's Quiz 11th July 2021
Berry's Quiz 11th July 2021
Berry's Quiz 4th July
Berry's Quiz 4th JulyBerry's Quiz 4th July
Berry's Quiz 4th July
Berry's Quiz 27th June
Berry's Quiz 27th JuneBerry's Quiz 27th June
Berry's Quiz 27th June
Berry's Quiz 20th June
Berry's Quiz 20th JuneBerry's Quiz 20th June
Berry's Quiz 20th June
Berry's Quiz 13th June
Berry's Quiz 13th JuneBerry's Quiz 13th June
Berry's Quiz 13th June
Berry's Quiz 6th June
Berry's Quiz 6th JuneBerry's Quiz 6th June
Berry's Quiz 6th June
Berry's Quiz 30th May
Berry's Quiz 30th MayBerry's Quiz 30th May
Berry's Quiz 30th May
Berry's Quiz 23rd May
Berry's Quiz 23rd MayBerry's Quiz 23rd May
Berry's Quiz 23rd May
Berry's Quiz 16 May
Berry's Quiz 16 MayBerry's Quiz 16 May
Berry's Quiz 16 May
Bespoke Data Insights at New Finance
Bespoke Data Insights at New FinanceBespoke Data Insights at New Finance
Bespoke Data Insights at New Finance
Importance of coalitions
Importance of coalitionsImportance of coalitions
Importance of coalitions
Hubbub-a The 6th KQA Bangalore Quiz
Hubbub-a The 6th KQA Bangalore QuizHubbub-a The 6th KQA Bangalore Quiz
Hubbub-a The 6th KQA Bangalore Quiz

Recently uploaded

GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne

Recently uploaded (20)

GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales

The art of data analysis

  • 1. The Art Of Data Analysis Karthik Shashidhar Quant Consultant © Karthik Shashidhar
  • 2. Introduction Six-step process Case Study Common Pitfalls © Karthik Shashidhar
  • 3. Why do you need this workshop? We are moving to an increasingly data-driven world Ability to use data for day-to-day decision-making can prove to be a massive competitive advantage This workshop equips managers with basic tools for dealing with data © Karthik Shashidhar
  • 4. Who needs this workshop? What is the optimal level of sales Sales Managers commissions in order to maximize profitability? Production How do we set daily production targets Managers given probabilities of line shut downs? What are the factors that determine HR Managers employee attrition? This workshop is suitable for personnel in middle to senior management roles across functions © Karthik Shashidhar
  • 5. Introduction Six-step process Case Study Common Pitfalls © Karthik Shashidhar
  • 6. Frame a clear and concise problem statement Break down your problem into smaller problems, and then use those to generate hypotheses Gather, clean and prepare data A structured, iterative approach to data-driven decision making Test hypotheses. In the process, generate additional hypotheses Consolidate results to solve the main problem Make the data tell a story © Karthik Shashidhar
  • 7. Introduction Six-step process Case Study Common Pitfalls © Karthik Shashidhar
  • 8. The Rs. 32 Poverty Line Based on data from the 66th NSSO Survey, the Planning Commission fixed the “Poverty Line” at Rs. 32 per person per day for people living in urban areas. This has led to much controversy and protests. The Prime Minister has asked for your inputs. What do you recommend? © Karthik Shashidhar
  • 9. Frame a clear and concise problem statement Break down your problem into smaller problems, and then use those to generate hypotheses Gather, clean and prepare data For your reference Test hypotheses. In the process, generate additional hypotheses Consolidate results to solve the main problem Make the data tell a story © Karthik Shashidhar
  • 10. Frame a clear and concise problem statement How would you frame the problem statement for this one? Break down your problem into smaller problems, and then use those to generate hypotheses • Your client may not have framed the question precisely. You need to do that job and frame a precise problem Gather, clean and prepare data statement • “Solving this problem” should tell you everything you want to know Test hypotheses. In the process, from your analysis generate additional hypotheses • Be concise, so that you remain focused towards answering your question Consolidate results to solve the • Frame your question such that it has main problem an objective answer. Yes/No questions or questions with numerical answers are preferred Make the data tell a story © Karthik Shashidhar
  • 11. Frame a clear and concise problem statement Has the poverty line been set too low at Rs. 32 per day? Break down your problem into smaller problems, and then use those to generate hypotheses • This problem statement has an objective answer (yes/no) • The solution to this will be necessary Gather, clean and prepare data and sufficient to answer the question our client (the PM) demands Test hypotheses. In the process, • The question addresses directly the generate additional hypotheses situation (people complaining that the poverty line has been set too low) Consolidate results to solve the • This problem statement is to the main problem point and doesn’t take on additional responsibilities (such as defining an alternate poverty line) Make the data tell a story © Karthik Shashidhar
  • 12. Frame a clear and concise problem What problems do we need to statement solve in order to solve the Break down your problem into main problem? smaller problems, and then use those to generate hypotheses • The set of “level two problems” must be precise and complete, in that: Gather, clean and prepare data • The combination of solution of all level two problems leads to the solution of the main problem • The solution of each level two Test hypotheses. In the process, problem directly impacts the main generate additional hypotheses problem • Once again, it is key to frame problems concisely and with objective answers • We need not stop at two levels. Some Consolidate results to solve the main problem level two problems might require solution of deeper problems. Add them to the list of sub-problems Make the data tell a story © Karthik Shashidhar
  • 13. Frame a clear and concise problem What do we need to know to statement answer “Has the poverty line been Break down your problem into set too low at Rs. 32 per day?” smaller problems, and then use those to generate hypotheses • How is “poverty line” defined? • What are the implications of poverty line? Gather, clean and prepare data • What is the distribution of income in India? • Does the distribution of income vary Test hypotheses. In the process, across states? If it varies significantly generate additional hypotheses does it make sense to have a state- wise poverty line? • What are the essential goods that Consolidate results to solve the most people need? main problem • For a given income level, what essential goods can a person afford? Make the data tell a story © Karthik Shashidhar
  • 14. Frame a clear and concise problem Problems generate sub-problems, statement and some of these will lead to Break down your problem into hypotheses. smaller problems, and then use those to generate hypotheses Gather, clean and prepare data • Hypothesis1: There is significant difference in income level across states • Hypothesis2: Essential goods are Test hypotheses. In the process, generate additional hypotheses those that the poorest people consume. Also, their use flattens out as income goes up Consolidate results to solve the main problem Make the data tell a story © Karthik Shashidhar
  • 15. Frame a clear and concise problem Some problems, however, are statement direct, and don’t need hypotheses. Some are qualitative while others Break down your problem into need data smaller problems, and then use those to generate hypotheses • Question1: How is “poverty line” defined? • Poverty line is the minimum Gather, clean and prepare data income level that is deemed adequate • If a family is “below poverty Test hypotheses. In the process, line” it qualifies for additional generate additional hypotheses state benefits • Question2: What is the distribution of incomes in each state? Consolidate results to solve the • Question3: Is there some kind of a main problem threshold about the proportion of population that can be below poverty line? Make the data tell a story © Karthik Shashidhar
  • 16. Frame a clear and concise problem statement What data do you need here? Break down your problem into smaller problems, and then use those to generate hypotheses • It is important to frame problem and Gather, clean and prepare data break it down into components before listing data requirements, else data could bias you Test hypotheses. In the process, • Define data requirements in a generate additional hypotheses general fashion, to allow you to easily access proxies • Remember to gather data that both Consolidate results to solve the answers your questions and will main problem allow you to test your hypotheses Make the data tell a story © Karthik Shashidhar
  • 17. Frame a clear and concise problem Once you’ve identified data statement requirements, identify sources and Break down your problem into gather data smaller problems, and then use those to generate hypotheses Gather, clean and prepare data • Here we need • Distribution of a measure of income for India • Distribution of a measure of Test hypotheses. In the process, generate additional hypotheses income for each state • Spending patterns for different income levels • Data on household sizes in Consolidate results to solve the main problem different states Make the data tell a story © Karthik Shashidhar
  • 18. Frame a clear and concise problem Once you’ve identified data statement requirements, identify sources and Break down your problem into gather data smaller problems, and then use those to generate hypotheses • The National Sample Survey Gather, clean and prepare data Organization (NSSO) conducts surveys every 5 years about income and expenditure, so we could Test hypotheses. In the process, perhaps use this generate additional hypotheses • However, income data gathered from surveys are notorious with respect to quality Consolidate results to solve the • Poor have little savings so their total main problem consumption is a better indicator of income than the income data Make the data tell a story © Karthik Shashidhar
  • 19. Frame a clear and concise problem statement Data cleaning is an ugly but important step Break down your problem into smaller problems, and then use those to generate hypotheses • It is important to make sure names from data procured from different sources match Gather, clean and prepare data • For example, some government sites say “AndhraPradesh”, while others say “Andhra Pradesh”. Test hypotheses. In the process, Fails if you want to do a join generate additional hypotheses • If data set is small, go through it once to check numbers for consistency. For example, if you have Consolidate results to solve the data on percentages, make sure it main problem adds up to 100% • For larger data sets, try write scripts to do basic cleaning Make the data tell a story © Karthik Shashidhar
  • 20. Frame a clear and concise problem statement Understand and prepare data before you dive into analysis Break down your problem into smaller problems, and then use those to generate hypotheses • Get a general feel for the numbers Gather, clean and prepare data before getting into the analysis • Simple visualization techniques such as scatter plots and density plots Test hypotheses. In the process, help generate additional hypotheses • Use simple summary statistics (mean, median, SD, quartiles) to get a better feel for the data Consolidate results to solve the • Check out what different functional main problem forms of your data look like Make the data tell a story © Karthik Shashidhar
  • 21. Frame a clear and concise problem While testing hypotheses, be on the statement lookout for anything Break down your problem into interesting/unusual smaller problems, and then use those to generate hypotheses • It is impossible to generate all possible hypotheses before you begin the analysis Gather, clean and prepare data • Usually, as you test out some hypotheses, something in the data will stand out which will lead to Test hypotheses. In the process, further hypotheses generate additional hypotheses • It is ok to generate these hypotheses, which is what makes it an iterative process Consolidate results to solve the • However, one needs to be careful to main problem not stray from the original objective – each new hypothesis should directly tie in to the original question Make the data tell a story © Karthik Shashidhar
  • 22. Frame a clear and concise problem statement Consolidate results Break down your problem into smaller problems, and then use those to generate hypotheses • Build up your case in a bottom-up manner Gather, clean and prepare data • Sometimes different pieces of analysis can throw up contradictory inferences. Check, and reconcile Test hypotheses. In the process, before you integrate generate additional hypotheses • Make sure all components of the solution that you required are available Consolidate results to solve the • Don’t include results in the final main problem analysis unless it makes a definite contribution to the final solution Make the data tell a story © Karthik Shashidhar
  • 23. Frame a clear and concise problem statement Use graphics intelligently! Break down your problem into smaller problems, and then use those to generate hypotheses • A picture is worth a thousand words, so use clear and easy-to-use visualizations Gather, clean and prepare data to communicate your findings • Use visualizations that make the solution self-evident, rather than something that requires a lot of explanation Test hypotheses. In the process, • Use your graphics to communicate, not generate additional hypotheses to confuse. If the intent of a graphic is to confuse, it is better to leave out that graphic • Sometimes all it takes to solve the Consolidate results to solve the main problem problem is to visualize the data from a different perspective! Make the data tell a story © Karthik Shashidhar
  • 24. Frame a clear and concise problem This graphic shows the decile in statement which Rs. 32 per day (Rs. 960 per Break down your problem into month) would fall in each state smaller problems, and then use those to generate hypotheses Gather, clean and prepare data Test hypotheses. In the process, generate additional hypotheses Consolidate results to solve the main problem Make the data tell a story © Karthik Shashidhar
  • 25. Introduction Six-step process Case Study Common Pitfalls © Karthik Shashidhar
  • 26. Correlation does Beware of not imply anecdotal causality evidence Beware of Don’t overfit Outliers models Data-driven inference is fraught with pitfalls. Drawing Contradictory Start with getting a feel for the data the wrong conclusion out of inferences from data is easier than drawing the same data right conclusion. Don’t simply Don’t over- throw everything complicate into the mix Models can graphics Graphics can deceive misbehave © Karthik Shashidhar
  • 27. Outliers can significantly distort inferences © Karthik Shashidhar
  • 28. “Throwing everything into the mix” may not always produce an accurate model © Karthik Shashidhar
  • 29. It could lead to multicollinearity, for example According to this regression, the tallest person should have an extremely large right foot and a tiny left foot! That makes no sense! © Karthik Shashidhar
  • 30. Over-fitting can lead to spurious models It helps to keep your models as simple as possible. A simple rule of thumb – a good model is one that can be easily explained in simple English © Karthik Shashidhar
  • 31. Diving into model fitting without first understanding the data can lead to suboptimal results People are prone to doing regressions without actually looking at the data. Here, a simple linear regression gives a reasonable fit (R^2 = 42%). However, a simple scatter plot would suggest a clear Y= 1/X kind of relationship which the regression completely misses out on © Karthik Shashidhar
  • 32. Contradictory inferences can be derived from the same data © Karthik Shashidhar
  • 33. 160 140 120 100 80 60 40 20 150 0 0 2 4 6 8 10 12 14 16 140 130 Choice of axes and 120 scales can have a 110 significant impact on 100 the message your 90 graphic conveys 80 0 2 4 6 8 10 12 14 16 © Karthik Shashidhar
  • 34. Correlation does not imply causality © Karthik Shashidhar
  • 35. Mistaking correlation for causality can lead to hilarious conclusions © Karthik Shashidhar
  • 36. Readers get turned off by overly complicated graphics © Karthik Shashidhar
  • 37. Anecdotal/ insufficient data can lead to false conclusions © Karthik Shashidhar
  • 38. A model is just that: a model. It is not a substitute for reality © Karthik Shashidhar
  • 39. The Art of Data Analysis will be further illustrated by means of a detailed Case Study relevant to your company/industry For a half-day workshop on The Art of Data Analysis (including a case study), contact Karthik Shashidhar at © Karthik Shashidhar