SlideShare a Scribd company logo
1 of 33
Data Visualization
Introduction – Data Basics- Variables – Sampling
Shruti Nigam-iNurture Education Solutions Pvt Ltd. 1
∗ Introduction to Data Visualization
∗ Data Basics
∗ Variables and their types
∗ Relationship between Variables
∗ Population and Sample
∗ Observation studies and Experiments
∗ Sampling Methods
Objective
Shruti Nigam-iNurture Education Solutions Pvt Ltd. 2
∗ History of Data Visualization:
Introduction to Data Visualization
History of DV
Shruti Nigam-iNurture Education Solutions Pvt Ltd. 3
∗ Effective presentation and description of data is a
first step in most analyses.
∗ Collecting, Analyzing and Visualizaing data as well as
Data base decisions.
∗ Making sense of data using statistical tools in order to
explore relation between variables.
∗ Making informed decisions.
Why to study Data Basics
Shruti Nigam-iNurture Education Solutions Pvt Ltd. 4
∗ Let’s take an example:
∗ What is represented by these columns individually?
Data
Shruti Nigam-iNurture Education Solutions Pvt Ltd. 5
∗ Each row in the table represents a single email or
case.
∗ The columns represent characteristics, called
variables, for each of the emails.
∗ The data in table represent a data matrix, which is a
common way to organize data.
∗ These observations will be referred to as the email50
data set.
∗ They are a random sample from a larger data set.
Terminologies
Shruti Nigam-iNurture Education Solutions Pvt Ltd. 6
Types of Variables
Shruti Nigam-iNurture Education Solutions Pvt Ltd. 7
Exercise – Identify type of variables
Shruti Nigam-iNurture Education Solutions Pvt Ltd. 8
∗ A social scientist may like to answer some of the following
questions from the previous data set:
∗ Is federal spending, on average, higher or lower in counties
with high rates of poverty?
∗ If homeownership is lower than the national average in one
county, will the percent of multi-unit structures in that county
likely be above or below the national average?
∗ Which counties have a higher average income: those that
enact one or more smoking bans or those that do not?
Relationship between Variables
Shruti Nigam-iNurture Education Solutions Pvt Ltd. 9
∗ To answer these questions, data must be collected,
such as the county data set.
∗ Examining summary statistics could provide insights
for each of the three questions about counties.
∗ Additionally, graphs can be used to visually
summarize data and are useful for answering such
questions as well.
∗ Scatterplots are one type of graph used to study the
relationship between two numerical variables.
∗ Following figure compares the variables fed spend
and poverty. Each point on the plot represents a
single county.
Shruti Nigam-iNurture Education Solutions Pvt Ltd. 10
The fed spend and poverty
variables are said to be
associated because the
plot showsa discernible
pattern. When two
variables show some
connection with one
another, they arecalled
associated variables.
Associated variables can
also be called dependent
variables and vice-versa.
Because there is a downward trend in
Figure { counties with more units in
multi-
unit structures are associated with
lower homeownership { these
variables are said to be
negatively associated. A positive
association is shown in the relationship
between
the poverty and fed spend variables
represented, where counties with
higher
poverty rates tend to receive more
federal spending per capita.
Shruti Nigam-iNurture Education Solutions Pvt Ltd. 11
∗ If two variables are not associated, then they are said
to be independent. That is, two variables are
independent if there is no evident relationship
between the two.
Note: Associated or independent, not both
A pair of variables are either related in some way (associated) or not
(independent).
No pair of variables is both associated and independent.
Shruti Nigam-iNurture Education Solutions Pvt Ltd. 12
∗ Step 1. Collect Data
∗ Step 2. Organize Data
∗ Step 3. Determine types of Variables in the dataset
Then proceed with statistical tools..
Steps to find an answer to a research
question
∗ How will you answer the following Questions:
∗ What is the average mercury content in swordsh in the Atlantic
Ocean?
∗ Over the last 3 years, what is the average time to complete a
degree for Jagran undergraduate students?
∗ Does a new drug reduce the number of deaths in patients with
severe heart disease?
Shruti Nigam-iNurture Education Solutions Pvt Ltd. 13
∗ Each research question refers to a target population. In
the first question, the target population is all swordsh
in the Atlantic ocean, and each fish represents a case.
∗ Often times, it is too expensive to collect data for every
case in a population. Instead, a sample is taken.
∗ A sample represents a subset of the cases and is often a
small fraction of the population. For instance, 60
swordfish (or some other number) in the population
might be selected, and this sample data may be used to
provide an estimate of the population average and
answer the research question.
Data Collection Basics and
terminologies
Shruti Nigam-iNurture Education Solutions Pvt Ltd. 14
∗ Over the last 3 years, what is the average time to
complete a degree for Jagran undergraduate students?
∗ Does a new drug reduce the number of deaths in
patients with severe heart disease?
Exercise - For the second and third questions above, identify
the target population and what represents an individual case.
Shruti Nigam-iNurture Education Solutions Pvt Ltd. 15
∗ A man on the news got mercury poisoning from eating
swordfish, so the average mercury concentration in
swordfish must be dangerously high.
∗ I met two students who took more than 4 years to
graduate from Jagran, so it must take longer to graduate
at Duke than at many other colleges.
∗ My friend's dad had a heart attack and died after they gave
him a new heart disease drug, so the drug must not work.
∗ Each conclusion is based on data.
∗ Can you Identify the problem with this data?Can you Identify the problem with this data?
Consider the following possible responses to
the three research questions:
Shruti Nigam-iNurture Education Solutions Pvt Ltd. 16
∗ Notice that the first question is only relevant to students
who complete their degree; the average cannot be
computed using a student who never finished her degree.
Thus, only Jagran undergraduate students who have
graduated in the last 5 years represent cases in the
population under consideration.
∗ Each such student would represent an individual case.
∗ A person with severe heart disease represents acase. The
population includes all people with severe heart disease.
∗ First, the data only represent one or two cases
∗ Second, and more importantly, it is unclear whether
these cases are actually representative of the
population.
∗ Data collected in this haphazard fashion are called
anecdotal evidence.Shruti Nigam-iNurture Education Solutions Pvt Ltd. 17
∗ Census includes everyone and “sample” entire
population.
∗ Cons-
∗ Lots of resources are needed i.e expensive
∗ Hard to locate some remote population/persons
∗ Unique characteristics of persons, very unreliable data
∗ Population rarely stand constant.
Census Vs. Sample
Shruti Nigam-iNurture Education Solutions Pvt Ltd. 18
∗ How do you pick sample population of Jagran
graduates who has passed in last 3 years?
∗ (Determine population and sample) All graduates in
the last 5 years represent the population, and
graduates who are selected for review are collectively
called the sample.
∗ In general, we always seek to randomly select a
sample from a population. Eg. Lottery
∗ Why do we prefer to select sample randomly?Why do we prefer to select sample randomly?
Sampling from a population
Shruti Nigam-iNurture Education Solutions Pvt Ltd. 19
∗ If someone was permitted to pick and choose exactly
which graduates were included in the sample, it is
entirely possible that the sample could be skewed to
that person's interests, which may be entirely
unintentional. This introduces bias into a sample.
Sampling randomly helps resolve this problem.
The most basic random sample is called a simple
random sample.
“The sample should be the representative of the entire
population”
Shruti Nigam-iNurture Education Solutions Pvt Ltd. 20
∗ Even when people are picked at random, e.g. for surveys,
caution must be exercised if the non-response is high. For
instance, if only 30% of the people randomly sampled for a
survey actually respond, then it is unclear whether the results
are representative of the entire population. This non-response
bias can skew results. Eg. Sample whole city but poor people
will not be responsive to the survey for water purifier.
∗ Another common downfall is a convenience sample, where
individuals who are easily accessible are more likely to be
included in the sample. Eg. Product survey from a single
shopping mall.
∗ Voluntary response occurs when the sample consists of
people who volunteer to respond because they have strong
opinions about the subject. Eg. Online polls and surveys.
Few sources of bias
Shruti Nigam-iNurture Education Solutions Pvt Ltd. 21
∗ If the sample is not representative of the whole
population, then no matter how large the sample size
is; the results will be incorrect owing to biasness of
the sample.
Shruti Nigam-iNurture Education Solutions Pvt Ltd. 22
∗ Explanatory and response variables
∗ Do you remember this question from previous slide?
∗ Is federal spending, on average, higher or lower in counties
with high rates of poverty?
∗ If we suspect poverty might aect spending in a county,
then poverty is the explanatory variable and federal
spending is the response variable in the relationship.
∗ If there are many variables, it may be possible toIf there are many variables, it may be possible to
consider a number of them as explanatory variables.consider a number of them as explanatory variables.
Observation studies and Experiments
Shruti Nigam-iNurture Education Solutions Pvt Ltd. 23
Shruti Nigam-iNurture Education Solutions Pvt Ltd. 24
∗ Observational Study: The data collected in a way that does
not directly interfere with how the data arise. They merely
observe.
∗ For eg., surveys, review medical or company records.
∗ Experimental Study: Randomly assign the subjects to
treatment and then collect the data.
∗ For eg., each heart attack patient in the drug trial could be
randomly assigned, perhaps by flipping a coin, into one of two
groups: the first group receives a placebo (fake treatment)
and the second group receives the drug.
There are two primary types of data collection:
Observational Studies and Experiments
Shruti Nigam-iNurture Education Solutions Pvt Ltd. 25
Observational Study Experimental Study
∗ Only establish an
association/coorelation
between Expalnatory and
Response variable.
∗ Retrospection study
∗ Prospective study
∗ Observe people from
population
∗ For eg. Energy levels of
persons, ‘Who work-out’ ;
‘who doesn’t work-out’.
∗ For most part allows to
make only ‘Correlation
Statement’.
∗ Seek out sample from
population
∗ Randomly assign to two
groups, namely, control and
treament.
∗ The decision was not left
on population instead
imposed by researcher.
∗ Conditions are controlled in
the experiment.
∗ For most part allows to
make only ‘Causation
Statement’.
Shruti Nigam-iNurture Education Solutions Pvt Ltd. 26
∗ The conclusion of the research will be
determined by the type of study is chosen.
∗ ‘‘Coorelation does not imply Causation’Coorelation does not imply Causation’
Main Difference between Observational Studies and
Experiment
RANDOM ASSIGNMENTS
Shruti Nigam-iNurture Education Solutions Pvt Ltd. 27
Random Sampling
No Random Sampling No Generalizability
Generalizability
Shruti Nigam-iNurture Education Solutions Pvt Ltd. 28
1. 1. Air pollution and birth outcomes, study components. Researchers collected data to
examine the relationship between air pollutants and preterm births in Southern
California. During the study air pollution levels were measured by air quality
monitoring stations. Specically, levels of carbon monoxide were recorded in parts per
million, nitrogen dioxide and ozone in parts per hundred million, and coarse
particulate matter (PM10) in g=m3.
Length of gestation data were collected on 143,196 births between the years 1989 and
1993, and air pollution exposure during gestation was calculated for each birth. The
analysis suggested that increased ambient PM10 and, to a lesser degree, CO
concentrations may be associated with the occurrence of preterm births.
Identify
(a) the cases,
(b) the variables and their types, and
(c) the main research question
in this study.
Exercise
Shruti Nigam-iNurture Education Solutions Pvt Ltd. 29
2. Cheaters, study components. Researchers studying the relationship between
honesty, age and self-control conducted an experiment on 160 children between the
ages of 5 and 15.
Participants reported their age, sex, and whether they were an only child or not. The
researchers asked each child to toss a fair coin in private and to record the outcome
(white or black) on a paper sheet, and said they would only reward children who
report white.
Half the students were explicitly told not to cheat and the others were not given any
explicit instructions. In the no instruction group probability of cheating was found to
be uniform across groups based on child's characteristics. In the group that was
explicitly told to not cheat, girls were less likely to cheat, and while rate of cheating
didn't vary by age for boys, it decreased with age for girls. Identify
(a) the cases,
(b) the variables and their types, and
(c) the main research question
in this study.
Shruti Nigam-iNurture Education Solutions Pvt Ltd. 30
3. Smoking habits of UK residents. A survey was conducted to
study the smoking habits
of UK residents. Below is a data matrix displaying a portion of the
data collected in this survey.
(a) What does each row of the data matrix represent?
(b) How many participants were included in the survey?
(c) Indicate whether each variable in the study is numerical or
categorical. If numerical, identify as continuous or discrete. If
categorical, indicate if the variable is ordinal.
Shruti Nigam-iNurture Education Solutions Pvt Ltd. 31
∗ http://www.openintro.org
∗ https://
www.youtube.com/watch?v=7NhNeADL8fA&list=PLkIselvEzpM
∗ OpenIntro Statistics, Third Edition, David M Diez,
Christopher D Barr, Mine C etinkaya-Rundel.
References and Sources
Shruti Nigam-iNurture Education Solutions Pvt Ltd. 32
Shruti Nigam-iNurture Education Solutions Pvt Ltd. 33

More Related Content

Similar to Data Visualization Guide for Beginners

Q3-M2_3Is_Identifying the Problem and Asking the QuestionsV4.pptx
Q3-M2_3Is_Identifying the Problem and Asking the QuestionsV4.pptxQ3-M2_3Is_Identifying the Problem and Asking the QuestionsV4.pptx
Q3-M2_3Is_Identifying the Problem and Asking the QuestionsV4.pptxArthurLegaspina3
 
Lect 1 Introduction Statistics.pdf
Lect 1 Introduction Statistics.pdfLect 1 Introduction Statistics.pdf
Lect 1 Introduction Statistics.pdfMuhammadAhsan227852
 
Lesson 1 05 measuring central tendency
Lesson 1 05 measuring central tendencyLesson 1 05 measuring central tendency
Lesson 1 05 measuring central tendencyPerla Pelicano Corpez
 
MGT assignment 1.docx
MGT assignment 1.docxMGT assignment 1.docx
MGT assignment 1.docxAliMahesa
 
Sampling and Sample Size
Sampling and Sample SizeSampling and Sample Size
Sampling and Sample SizeDr. Keerti Jain
 
The role of statistics and the data analysis process.ppt
The role of statistics and the data analysis process.pptThe role of statistics and the data analysis process.ppt
The role of statistics and the data analysis process.pptJakeCuenca10
 
Chapter 3 part2- Sampling Design
Chapter 3 part2- Sampling DesignChapter 3 part2- Sampling Design
Chapter 3 part2- Sampling Designnszakir
 
Future Choices December 2015
Future Choices December 2015Future Choices December 2015
Future Choices December 2015Debbie Scott
 

Similar to Data Visualization Guide for Beginners (12)

1.1 statistical and critical thinking
1.1 statistical and critical thinking1.1 statistical and critical thinking
1.1 statistical and critical thinking
 
PSPP software application
PSPP software applicationPSPP software application
PSPP software application
 
week 8-3iS.pptx
week 8-3iS.pptxweek 8-3iS.pptx
week 8-3iS.pptx
 
Q3-M2_3Is_Identifying the Problem and Asking the QuestionsV4.pptx
Q3-M2_3Is_Identifying the Problem and Asking the QuestionsV4.pptxQ3-M2_3Is_Identifying the Problem and Asking the QuestionsV4.pptx
Q3-M2_3Is_Identifying the Problem and Asking the QuestionsV4.pptx
 
Lect 1 Introduction Statistics.pdf
Lect 1 Introduction Statistics.pdfLect 1 Introduction Statistics.pdf
Lect 1 Introduction Statistics.pdf
 
Data visualization intro2
Data visualization intro2Data visualization intro2
Data visualization intro2
 
Lesson 1 05 measuring central tendency
Lesson 1 05 measuring central tendencyLesson 1 05 measuring central tendency
Lesson 1 05 measuring central tendency
 
MGT assignment 1.docx
MGT assignment 1.docxMGT assignment 1.docx
MGT assignment 1.docx
 
Sampling and Sample Size
Sampling and Sample SizeSampling and Sample Size
Sampling and Sample Size
 
The role of statistics and the data analysis process.ppt
The role of statistics and the data analysis process.pptThe role of statistics and the data analysis process.ppt
The role of statistics and the data analysis process.ppt
 
Chapter 3 part2- Sampling Design
Chapter 3 part2- Sampling DesignChapter 3 part2- Sampling Design
Chapter 3 part2- Sampling Design
 
Future Choices December 2015
Future Choices December 2015Future Choices December 2015
Future Choices December 2015
 

More from Shruti Nigam (CWM, AFP)

More from Shruti Nigam (CWM, AFP) (10)

Morph transition 1.pptx
Morph transition 1.pptxMorph transition 1.pptx
Morph transition 1.pptx
 
Morph transition 2.pptx
Morph transition 2.pptxMorph transition 2.pptx
Morph transition 2.pptx
 
Statistics.pdf
Statistics.pdfStatistics.pdf
Statistics.pdf
 
Business forecasting project border
Business forecasting project borderBusiness forecasting project border
Business forecasting project border
 
Data analysis property area analysis via powerbi
Data analysis property area analysis via powerbiData analysis property area analysis via powerbi
Data analysis property area analysis via powerbi
 
Actionable results to enhance Employee satisfaction score analysis via Tableau
Actionable results to enhance Employee satisfaction score analysis via TableauActionable results to enhance Employee satisfaction score analysis via Tableau
Actionable results to enhance Employee satisfaction score analysis via Tableau
 
Finanacial institutions nature and role
Finanacial institutions nature and roleFinanacial institutions nature and role
Finanacial institutions nature and role
 
Mutual funds
Mutual fundsMutual funds
Mutual funds
 
Fs unit-i nbfc
Fs unit-i nbfcFs unit-i nbfc
Fs unit-i nbfc
 
NBFC MBA I SEMESTER
NBFC MBA I SEMESTERNBFC MBA I SEMESTER
NBFC MBA I SEMESTER
 

Recently uploaded

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 

Recently uploaded (20)

E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 

Data Visualization Guide for Beginners

  • 1. Data Visualization Introduction – Data Basics- Variables – Sampling Shruti Nigam-iNurture Education Solutions Pvt Ltd. 1
  • 2. ∗ Introduction to Data Visualization ∗ Data Basics ∗ Variables and their types ∗ Relationship between Variables ∗ Population and Sample ∗ Observation studies and Experiments ∗ Sampling Methods Objective Shruti Nigam-iNurture Education Solutions Pvt Ltd. 2
  • 3. ∗ History of Data Visualization: Introduction to Data Visualization History of DV Shruti Nigam-iNurture Education Solutions Pvt Ltd. 3
  • 4. ∗ Effective presentation and description of data is a first step in most analyses. ∗ Collecting, Analyzing and Visualizaing data as well as Data base decisions. ∗ Making sense of data using statistical tools in order to explore relation between variables. ∗ Making informed decisions. Why to study Data Basics Shruti Nigam-iNurture Education Solutions Pvt Ltd. 4
  • 5. ∗ Let’s take an example: ∗ What is represented by these columns individually? Data Shruti Nigam-iNurture Education Solutions Pvt Ltd. 5
  • 6. ∗ Each row in the table represents a single email or case. ∗ The columns represent characteristics, called variables, for each of the emails. ∗ The data in table represent a data matrix, which is a common way to organize data. ∗ These observations will be referred to as the email50 data set. ∗ They are a random sample from a larger data set. Terminologies Shruti Nigam-iNurture Education Solutions Pvt Ltd. 6
  • 7. Types of Variables Shruti Nigam-iNurture Education Solutions Pvt Ltd. 7
  • 8. Exercise – Identify type of variables Shruti Nigam-iNurture Education Solutions Pvt Ltd. 8
  • 9. ∗ A social scientist may like to answer some of the following questions from the previous data set: ∗ Is federal spending, on average, higher or lower in counties with high rates of poverty? ∗ If homeownership is lower than the national average in one county, will the percent of multi-unit structures in that county likely be above or below the national average? ∗ Which counties have a higher average income: those that enact one or more smoking bans or those that do not? Relationship between Variables Shruti Nigam-iNurture Education Solutions Pvt Ltd. 9
  • 10. ∗ To answer these questions, data must be collected, such as the county data set. ∗ Examining summary statistics could provide insights for each of the three questions about counties. ∗ Additionally, graphs can be used to visually summarize data and are useful for answering such questions as well. ∗ Scatterplots are one type of graph used to study the relationship between two numerical variables. ∗ Following figure compares the variables fed spend and poverty. Each point on the plot represents a single county. Shruti Nigam-iNurture Education Solutions Pvt Ltd. 10
  • 11. The fed spend and poverty variables are said to be associated because the plot showsa discernible pattern. When two variables show some connection with one another, they arecalled associated variables. Associated variables can also be called dependent variables and vice-versa. Because there is a downward trend in Figure { counties with more units in multi- unit structures are associated with lower homeownership { these variables are said to be negatively associated. A positive association is shown in the relationship between the poverty and fed spend variables represented, where counties with higher poverty rates tend to receive more federal spending per capita. Shruti Nigam-iNurture Education Solutions Pvt Ltd. 11
  • 12. ∗ If two variables are not associated, then they are said to be independent. That is, two variables are independent if there is no evident relationship between the two. Note: Associated or independent, not both A pair of variables are either related in some way (associated) or not (independent). No pair of variables is both associated and independent. Shruti Nigam-iNurture Education Solutions Pvt Ltd. 12
  • 13. ∗ Step 1. Collect Data ∗ Step 2. Organize Data ∗ Step 3. Determine types of Variables in the dataset Then proceed with statistical tools.. Steps to find an answer to a research question ∗ How will you answer the following Questions: ∗ What is the average mercury content in swordsh in the Atlantic Ocean? ∗ Over the last 3 years, what is the average time to complete a degree for Jagran undergraduate students? ∗ Does a new drug reduce the number of deaths in patients with severe heart disease? Shruti Nigam-iNurture Education Solutions Pvt Ltd. 13
  • 14. ∗ Each research question refers to a target population. In the first question, the target population is all swordsh in the Atlantic ocean, and each fish represents a case. ∗ Often times, it is too expensive to collect data for every case in a population. Instead, a sample is taken. ∗ A sample represents a subset of the cases and is often a small fraction of the population. For instance, 60 swordfish (or some other number) in the population might be selected, and this sample data may be used to provide an estimate of the population average and answer the research question. Data Collection Basics and terminologies Shruti Nigam-iNurture Education Solutions Pvt Ltd. 14
  • 15. ∗ Over the last 3 years, what is the average time to complete a degree for Jagran undergraduate students? ∗ Does a new drug reduce the number of deaths in patients with severe heart disease? Exercise - For the second and third questions above, identify the target population and what represents an individual case. Shruti Nigam-iNurture Education Solutions Pvt Ltd. 15
  • 16. ∗ A man on the news got mercury poisoning from eating swordfish, so the average mercury concentration in swordfish must be dangerously high. ∗ I met two students who took more than 4 years to graduate from Jagran, so it must take longer to graduate at Duke than at many other colleges. ∗ My friend's dad had a heart attack and died after they gave him a new heart disease drug, so the drug must not work. ∗ Each conclusion is based on data. ∗ Can you Identify the problem with this data?Can you Identify the problem with this data? Consider the following possible responses to the three research questions: Shruti Nigam-iNurture Education Solutions Pvt Ltd. 16
  • 17. ∗ Notice that the first question is only relevant to students who complete their degree; the average cannot be computed using a student who never finished her degree. Thus, only Jagran undergraduate students who have graduated in the last 5 years represent cases in the population under consideration. ∗ Each such student would represent an individual case. ∗ A person with severe heart disease represents acase. The population includes all people with severe heart disease. ∗ First, the data only represent one or two cases ∗ Second, and more importantly, it is unclear whether these cases are actually representative of the population. ∗ Data collected in this haphazard fashion are called anecdotal evidence.Shruti Nigam-iNurture Education Solutions Pvt Ltd. 17
  • 18. ∗ Census includes everyone and “sample” entire population. ∗ Cons- ∗ Lots of resources are needed i.e expensive ∗ Hard to locate some remote population/persons ∗ Unique characteristics of persons, very unreliable data ∗ Population rarely stand constant. Census Vs. Sample Shruti Nigam-iNurture Education Solutions Pvt Ltd. 18
  • 19. ∗ How do you pick sample population of Jagran graduates who has passed in last 3 years? ∗ (Determine population and sample) All graduates in the last 5 years represent the population, and graduates who are selected for review are collectively called the sample. ∗ In general, we always seek to randomly select a sample from a population. Eg. Lottery ∗ Why do we prefer to select sample randomly?Why do we prefer to select sample randomly? Sampling from a population Shruti Nigam-iNurture Education Solutions Pvt Ltd. 19
  • 20. ∗ If someone was permitted to pick and choose exactly which graduates were included in the sample, it is entirely possible that the sample could be skewed to that person's interests, which may be entirely unintentional. This introduces bias into a sample. Sampling randomly helps resolve this problem. The most basic random sample is called a simple random sample. “The sample should be the representative of the entire population” Shruti Nigam-iNurture Education Solutions Pvt Ltd. 20
  • 21. ∗ Even when people are picked at random, e.g. for surveys, caution must be exercised if the non-response is high. For instance, if only 30% of the people randomly sampled for a survey actually respond, then it is unclear whether the results are representative of the entire population. This non-response bias can skew results. Eg. Sample whole city but poor people will not be responsive to the survey for water purifier. ∗ Another common downfall is a convenience sample, where individuals who are easily accessible are more likely to be included in the sample. Eg. Product survey from a single shopping mall. ∗ Voluntary response occurs when the sample consists of people who volunteer to respond because they have strong opinions about the subject. Eg. Online polls and surveys. Few sources of bias Shruti Nigam-iNurture Education Solutions Pvt Ltd. 21
  • 22. ∗ If the sample is not representative of the whole population, then no matter how large the sample size is; the results will be incorrect owing to biasness of the sample. Shruti Nigam-iNurture Education Solutions Pvt Ltd. 22
  • 23. ∗ Explanatory and response variables ∗ Do you remember this question from previous slide? ∗ Is federal spending, on average, higher or lower in counties with high rates of poverty? ∗ If we suspect poverty might aect spending in a county, then poverty is the explanatory variable and federal spending is the response variable in the relationship. ∗ If there are many variables, it may be possible toIf there are many variables, it may be possible to consider a number of them as explanatory variables.consider a number of them as explanatory variables. Observation studies and Experiments Shruti Nigam-iNurture Education Solutions Pvt Ltd. 23
  • 24. Shruti Nigam-iNurture Education Solutions Pvt Ltd. 24
  • 25. ∗ Observational Study: The data collected in a way that does not directly interfere with how the data arise. They merely observe. ∗ For eg., surveys, review medical or company records. ∗ Experimental Study: Randomly assign the subjects to treatment and then collect the data. ∗ For eg., each heart attack patient in the drug trial could be randomly assigned, perhaps by flipping a coin, into one of two groups: the first group receives a placebo (fake treatment) and the second group receives the drug. There are two primary types of data collection: Observational Studies and Experiments Shruti Nigam-iNurture Education Solutions Pvt Ltd. 25
  • 26. Observational Study Experimental Study ∗ Only establish an association/coorelation between Expalnatory and Response variable. ∗ Retrospection study ∗ Prospective study ∗ Observe people from population ∗ For eg. Energy levels of persons, ‘Who work-out’ ; ‘who doesn’t work-out’. ∗ For most part allows to make only ‘Correlation Statement’. ∗ Seek out sample from population ∗ Randomly assign to two groups, namely, control and treament. ∗ The decision was not left on population instead imposed by researcher. ∗ Conditions are controlled in the experiment. ∗ For most part allows to make only ‘Causation Statement’. Shruti Nigam-iNurture Education Solutions Pvt Ltd. 26
  • 27. ∗ The conclusion of the research will be determined by the type of study is chosen. ∗ ‘‘Coorelation does not imply Causation’Coorelation does not imply Causation’ Main Difference between Observational Studies and Experiment RANDOM ASSIGNMENTS Shruti Nigam-iNurture Education Solutions Pvt Ltd. 27
  • 28. Random Sampling No Random Sampling No Generalizability Generalizability Shruti Nigam-iNurture Education Solutions Pvt Ltd. 28
  • 29. 1. 1. Air pollution and birth outcomes, study components. Researchers collected data to examine the relationship between air pollutants and preterm births in Southern California. During the study air pollution levels were measured by air quality monitoring stations. Specically, levels of carbon monoxide were recorded in parts per million, nitrogen dioxide and ozone in parts per hundred million, and coarse particulate matter (PM10) in g=m3. Length of gestation data were collected on 143,196 births between the years 1989 and 1993, and air pollution exposure during gestation was calculated for each birth. The analysis suggested that increased ambient PM10 and, to a lesser degree, CO concentrations may be associated with the occurrence of preterm births. Identify (a) the cases, (b) the variables and their types, and (c) the main research question in this study. Exercise Shruti Nigam-iNurture Education Solutions Pvt Ltd. 29
  • 30. 2. Cheaters, study components. Researchers studying the relationship between honesty, age and self-control conducted an experiment on 160 children between the ages of 5 and 15. Participants reported their age, sex, and whether they were an only child or not. The researchers asked each child to toss a fair coin in private and to record the outcome (white or black) on a paper sheet, and said they would only reward children who report white. Half the students were explicitly told not to cheat and the others were not given any explicit instructions. In the no instruction group probability of cheating was found to be uniform across groups based on child's characteristics. In the group that was explicitly told to not cheat, girls were less likely to cheat, and while rate of cheating didn't vary by age for boys, it decreased with age for girls. Identify (a) the cases, (b) the variables and their types, and (c) the main research question in this study. Shruti Nigam-iNurture Education Solutions Pvt Ltd. 30
  • 31. 3. Smoking habits of UK residents. A survey was conducted to study the smoking habits of UK residents. Below is a data matrix displaying a portion of the data collected in this survey. (a) What does each row of the data matrix represent? (b) How many participants were included in the survey? (c) Indicate whether each variable in the study is numerical or categorical. If numerical, identify as continuous or discrete. If categorical, indicate if the variable is ordinal. Shruti Nigam-iNurture Education Solutions Pvt Ltd. 31
  • 32. ∗ http://www.openintro.org ∗ https:// www.youtube.com/watch?v=7NhNeADL8fA&list=PLkIselvEzpM ∗ OpenIntro Statistics, Third Edition, David M Diez, Christopher D Barr, Mine C etinkaya-Rundel. References and Sources Shruti Nigam-iNurture Education Solutions Pvt Ltd. 32
  • 33. Shruti Nigam-iNurture Education Solutions Pvt Ltd. 33

Editor's Notes

  1. *most part: There are more advanced method which can inference causation statements out of observation studies.
  2. Ideal Experiment Typical observational studies Unaided observational study Most experiments