SlideShare a Scribd company logo
Sreejith Aravindakshan,
Consultant, CIMMYT and Wageningen University, Netherlands
1
A date with DATA:
Getting to know more about data analysis and models
“Data is the new oil”
• Data is a collection of facts, such as numbers, words, measurements, observations or
even just descriptions of things
• Data is all around us. But what exactly is it?
Data is a value assigned to a thing. Color, Shape, Number,
Condition, Size
QUALITATIVE DATA : is everything that refers to the
quality of something: A description of colours, texture and
feel of an object, a description of experiences, and
interviews are all qualitative data.
QUANTITATIVE DATA : is data that refers to a number.
E.g. the number of golf balls, the size, the price, a score
on a test etc.
2
3
• Categorical data is qualitative in nature
• Numerical (quantitative) data of both discrete and continuous nature can be interval or ratio data also
• Interval data has ordered values with same difference but lack a true zero value e.g. Temperature. PH.
• Ratio data are also ordered values with same difference but has a true zero value e.g. height, weight.
Categorical Data : puts the item you are describing
into a category: For example, the condition “used”
would be categorical and also categories such as
“new”, “used”, ”broken” etc.
Discrete Data : is numerical data that has gaps in
it: e.g. the count of golf balls. There can only be
whole numbers of golf ball (there is no such thing
as 0.3 golf balls).
Continuous Data : is numerical data with a
continuous range: e.g. size of the golf balls can be
any value (e.q. 10.55 mm or 10.61 mm but also
10.536 mm). In continuous data, all values are
possible with no gaps in between.
Primary Data
Secondary Data
4
5
Hypothesis
Sampling
Data Collection
Data Entry
Data Cleaning
Theory
Research Design
Data storage
What are the steps?
Sampling
Probability (Random)
Non-probability (purposive)
6
• From researchers’ experience
 Can result in wide confidence interval
or measurement error
• Using some formula
For instance, Cochran’s formula for sample size
calculation:
𝑛0 =
𝑍2
𝑝𝑞
𝑒2
Where:
 e is the desired level of precision (i.e.
the margin of error or confidence interval),
 p is the (estimated) proportion of the
population which has the attribute in question,
 q is 1 – p.
Determining the ideal sample size
7
Example
 Suppose we are doing a study on the inhabitants of a large town or village, and want to
find out how many households serve breakfast in the mornings. We don’t have much
information on the subject to begin with, so we’re going to assume that half of the
families serve breakfast: this gives us maximum variability. So p = 0.5. Now let’s say we
want 95% confidence level, and at least 5 percent—plus or minus—precision. A 95 %
confidence level gives us Z values of 1.96, from the table values, so we get
 ((1.96)2 *(0.5) *(0.5)) / (0.05)2 =
384.16 ~ 385.
 So a random sample of 385 households in our target population should be enough to
give us the confidence levels we need.
8
Both Accurate
and Precise
Accurate
Not precise
Not accurate
But precise
Neither accurate
nor precise
• Accuracy refers to how close measurements are to the "true" value
• Precision refers to how close measurements are to each other
Data accuracy vs. precision
9
Independent Variable: The variable in the study
under consideration. The cause for the outcome
for the study.
Dependent Variable: The variable being
affected by the independent variable. The
effect of the study
y = f(x)
Which is which here?
10
Principles of Data Collection
• Understanding and knowing what types of data required
• Collect only relevant data
• Determine methods of data collection
 Survey/questionnaire
 Observation, participatory
 Focus groups
 Standard instruments
 Content analysis
 Experiments/observations
 Personal interviews
 Literature search – meta analysis
11
Principles…..
• Where, who, how, and when to collect
* Research design
* Sampling procedure
* Prepare field work schedule/data plan
* Conduct preliminary (surveys) investigation
• Assess situation and prepare further strategies
12
13
 Enter the data in
MS-Excel.
 Top row with
variable labels in
each cell.
 Save the entered
data as .csv file in
MS-Excel
Data analysis has been around for a while…
R.A. Fisher
Howard Dresner
Peter LuhnW.E. Deming
Robert Gentleman
Ross Ihaka
14
Knowing your data
Descriptive/summary statistics: Mean, median, mode, standard deviation, frequencies, standard error
15
n
i=1
Mean
16
• Consider the set
• 1, 1, 2, 2, 3, 6, 7, 11, 11, 13, 14, 16, 19
• In this case there are 13 values so the median is the middle
value, or (n+1) / 2
• (13+1) /2 = 7
• Consider the set
• 1, 1, 2, 2, 3, 6, 7, 11, 11, 13, 14, 16
• In the second case, the mean of the two middle values is the
median or (n+1) /2
(12 + 1) / 2 = 6.5 ~ (6+7) / 2 = 6.5
Median
17
The most frequent value in a data set
• Consider the set
• 1, 1, 1, 1, 2, 2, 3, 6, 11, 11, 11, 13, 14, 16, 19
• In this case the mode is 1 because it is the most common value.
• This is a case of unimodal distrbution
• There may be cases where there are more than one mode as in this case
• Consider the set
• 1, 1, 1, 1, 2, 2, 3, 6, 11, 11, 11, 11, 13, 14, 16, 19
• In this case there are two modes (bimodal) : 1 and 11 because both
occur 4 times in the data set.
Mode
18
Data distributions
19
Data
visualization with
R
20
R is just super cool for data analytics
21
Visualizing my scientific career using data in R
R package
“ggplot2” is
amazing!!
23
Basic regression models
y = Dependent variable (Response variable)
x = Independent variable (Explanatory or predictor variable)
𝜀 = random error component
𝛽0 = intercept
𝛽1 = Slope or coefficient of 𝑥 and 𝑥1 in linear model and
multiple regression models, respectively
𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽 𝑘 𝑥 𝑘 + 𝜀
𝑦 = 𝛽0 + 𝛽1 𝑥 + 𝜀
24
O
F
Y
X I
OLS Regression
SFA
DEA
Output Efficiency of F: FO/YO
Input Efficiency of F: XI/XF
Symbol Meaning Level of
significance
ns P > 0.05 Not applicable
* P ≤ 0.05 At 10% level
** P ≤ 0.01 At 5% level
*** P ≤ 0.001 At 1% level
**** P ≤ 0.0001 At 0.1% level
"p-value offers a first defense line against being fooled by randomness,
separating signal from noise"
26
Statistical significance and p-value
Chance (Random Error; Sampling Error)
Bias (Systematic Errors [inaccuracies])
 Selection bias
 Loss to follow-up bias
Information bias
• Nondifferential (e.g. simple misclassification)
• Differential Biases (e.g., recall bias, interviewer bias)
Confounding (Imbalance in Other Factors)
A situation in which the effect of two processes
are not separated.
Errors affecting validity. A
systematic error (caused by the
investigator or the subjects) that
causes an incorrect (over- or
under-) estimate of an association.
What is bias?
27
28
A word of caution:
“Interpretation can
however be
subjective”
Don’t have any strong opinion about SPSS since I am not an avid user of the
same......
29
R or others – The fight is on
A lot more documents found in Google Scholar still uses
SPSS than R while it is vice-versa in Scopus .
30
What Is R?
• a programming “environment”
• object-oriented
• similar to S-Plus
• freeware
• provides calculations on matrices
• excellent graphics capabilities
• supported by a large user network
31
What is R Not?
• a statistics software package
• menu-driven
• quick to learn
• a program with a complex graphical interface
32
Installing R
• www.r-project.org/
• download from CRAN
• select a download site
• download the base package at a minimum
• download contributed packages as needed
33
Tutorials cont.
• Textbooks
The Art of R programming by Norman Matloff Handbook of programming with R by
Garrett Grolemund
38
DATA in
Disclaimer: Many of the image files used in this presentation have been downloaded from the internet. Any copyright holders who are not
duly acknowledged here may contact me for proper citation.
Contact : sreejiagriman@gmail.com

More Related Content

What's hot

Data cleaning and screening
Data cleaning and screeningData cleaning and screening
Data cleaning and screening
Hassan Hussein
 
Quantitative Analysis: Conducting, Interpreting, & Writing
Quantitative Analysis: Conducting, Interpreting, & WritingQuantitative Analysis: Conducting, Interpreting, & Writing
Quantitative Analysis: Conducting, Interpreting, & Writing
Statistics Solutions
 
Chapter 6 data analysis iec11
Chapter 6 data analysis iec11Chapter 6 data analysis iec11
Chapter 6 data analysis iec11
Ho Cao Viet
 
An Introduction to SPSS
An Introduction to SPSSAn Introduction to SPSS
An Introduction to SPSS
Rayman Soe
 
Quantitative analysis using SPSS
Quantitative analysis using SPSSQuantitative analysis using SPSS
Quantitative analysis using SPSSAlaa Sadik
 
Introduction to spss
Introduction to spssIntroduction to spss
Introduction to spss
Manish Parihar
 
Statrting spss
Statrting spssStatrting spss
Statrting spss
Mohamed Afifi
 
Questionnaire analysis using_spss
Questionnaire analysis using_spssQuestionnaire analysis using_spss
Questionnaire analysis using_spss
Kritika Jain
 
Final spss hands on training (descriptive analysis) may 24th 2013
Final spss  hands on training (descriptive analysis) may 24th 2013Final spss  hands on training (descriptive analysis) may 24th 2013
Final spss hands on training (descriptive analysis) may 24th 2013Tin Myo Han
 
Statistics using SPSS
Statistics using SPSSStatistics using SPSS
Statistics using SPSS
Eva van Poppel
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSS
csula its training
 
Spss basic Dr Marwa Zalat
Spss basic Dr Marwa ZalatSpss basic Dr Marwa Zalat
Spss basic Dr Marwa Zalat
Marwa Zalat
 
Basic knowledge on statistics
Basic knowledge on statisticsBasic knowledge on statistics
Basic knowledge on statistics
Subodh Khanal
 
Mba ii rm unit-4.1 data analysis & presentation a
Mba ii rm unit-4.1 data analysis & presentation aMba ii rm unit-4.1 data analysis & presentation a
Mba ii rm unit-4.1 data analysis & presentation a
Rai University
 
Research Methodology (MBA II SEM) - Introduction to SPSS
Research Methodology (MBA II SEM) - Introduction to SPSSResearch Methodology (MBA II SEM) - Introduction to SPSS
Research Methodology (MBA II SEM) - Introduction to SPSSGB Technical University
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
gokulprasath06
 
Analyzing survey data
Analyzing survey dataAnalyzing survey data
Analyzing survey data
Fatima Sultana
 
Spss basics tutorial
Spss basics tutorialSpss basics tutorial
Spss basics tutorial
Jack Rabah
 
Sampling Techniques, Data Collection and tabulation in the field of Social Sc...
Sampling Techniques, Data Collection and tabulation in the field of Social Sc...Sampling Techniques, Data Collection and tabulation in the field of Social Sc...
Sampling Techniques, Data Collection and tabulation in the field of Social Sc...
Manoj Sharma
 

What's hot (19)

Data cleaning and screening
Data cleaning and screeningData cleaning and screening
Data cleaning and screening
 
Quantitative Analysis: Conducting, Interpreting, & Writing
Quantitative Analysis: Conducting, Interpreting, & WritingQuantitative Analysis: Conducting, Interpreting, & Writing
Quantitative Analysis: Conducting, Interpreting, & Writing
 
Chapter 6 data analysis iec11
Chapter 6 data analysis iec11Chapter 6 data analysis iec11
Chapter 6 data analysis iec11
 
An Introduction to SPSS
An Introduction to SPSSAn Introduction to SPSS
An Introduction to SPSS
 
Quantitative analysis using SPSS
Quantitative analysis using SPSSQuantitative analysis using SPSS
Quantitative analysis using SPSS
 
Introduction to spss
Introduction to spssIntroduction to spss
Introduction to spss
 
Statrting spss
Statrting spssStatrting spss
Statrting spss
 
Questionnaire analysis using_spss
Questionnaire analysis using_spssQuestionnaire analysis using_spss
Questionnaire analysis using_spss
 
Final spss hands on training (descriptive analysis) may 24th 2013
Final spss  hands on training (descriptive analysis) may 24th 2013Final spss  hands on training (descriptive analysis) may 24th 2013
Final spss hands on training (descriptive analysis) may 24th 2013
 
Statistics using SPSS
Statistics using SPSSStatistics using SPSS
Statistics using SPSS
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSS
 
Spss basic Dr Marwa Zalat
Spss basic Dr Marwa ZalatSpss basic Dr Marwa Zalat
Spss basic Dr Marwa Zalat
 
Basic knowledge on statistics
Basic knowledge on statisticsBasic knowledge on statistics
Basic knowledge on statistics
 
Mba ii rm unit-4.1 data analysis & presentation a
Mba ii rm unit-4.1 data analysis & presentation aMba ii rm unit-4.1 data analysis & presentation a
Mba ii rm unit-4.1 data analysis & presentation a
 
Research Methodology (MBA II SEM) - Introduction to SPSS
Research Methodology (MBA II SEM) - Introduction to SPSSResearch Methodology (MBA II SEM) - Introduction to SPSS
Research Methodology (MBA II SEM) - Introduction to SPSS
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
 
Analyzing survey data
Analyzing survey dataAnalyzing survey data
Analyzing survey data
 
Spss basics tutorial
Spss basics tutorialSpss basics tutorial
Spss basics tutorial
 
Sampling Techniques, Data Collection and tabulation in the field of Social Sc...
Sampling Techniques, Data Collection and tabulation in the field of Social Sc...Sampling Techniques, Data Collection and tabulation in the field of Social Sc...
Sampling Techniques, Data Collection and tabulation in the field of Social Sc...
 

Similar to Data in science

Environmental statistics
Environmental statisticsEnvironmental statistics
Environmental statistics
Georgios Ath. Kounis
 
Analysing & interpreting data.ppt
Analysing & interpreting data.pptAnalysing & interpreting data.ppt
Analysing & interpreting data.ppt
manaswidebbarma1
 
Statistics pres 3.31.2014
Statistics pres 3.31.2014Statistics pres 3.31.2014
Statistics pres 3.31.2014tjcarter
 
Data collection
Data collectionData collection
Data collection
Suparyati Amikom
 
Back to the basics-Part2: Data exploration: representing and testing data pro...
Back to the basics-Part2: Data exploration: representing and testing data pro...Back to the basics-Part2: Data exploration: representing and testing data pro...
Back to the basics-Part2: Data exploration: representing and testing data pro...
Giannis Tsakonas
 
Introduction to Data Management in Human Ecology
Introduction to Data Management in Human EcologyIntroduction to Data Management in Human Ecology
Introduction to Data Management in Human Ecology
Kern Rocke
 
Statistics for IB Biology
Statistics for IB BiologyStatistics for IB Biology
Statistics for IB Biology
Eran Earland
 
Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014
Claudia Wagner
 
Descriptive Analysis.pptx
Descriptive Analysis.pptxDescriptive Analysis.pptx
Descriptive Analysis.pptx
Parveen Vashisth
 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Henock Beyene
 
Data and Data Collection in Data Science.ppt
Data and Data Collection in Data Science.pptData and Data Collection in Data Science.ppt
Data and Data Collection in Data Science.ppt
ammarhaider78
 
UNIT I -Data and Data Collection.ppt
UNIT I -Data and Data Collection.pptUNIT I -Data and Data Collection.ppt
UNIT I -Data and Data Collection.ppt
CHRISCONFORTE
 
Data and Data Collection - Quantitative and Qualitative
Data and Data Collection - Quantitative and QualitativeData and Data Collection - Quantitative and Qualitative
Data and Data Collection - Quantitative and Qualitative
ssuserc2c311
 
introduction to statistical theory
introduction to statistical theoryintroduction to statistical theory
introduction to statistical theory
Unsa Shakir
 
Quantitative Data - A Basic Introduction
Quantitative Data - A Basic IntroductionQuantitative Data - A Basic Introduction
Quantitative Data - A Basic Introduction
DrKevinMorrell
 
Metaanalysis copy
Metaanalysis    copyMetaanalysis    copy
Metaanalysis copy
Amandeep Kaur
 
5 numerical descriptive statitics
5 numerical descriptive statitics5 numerical descriptive statitics
5 numerical descriptive statitics
Penny Jiang
 
Data Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & NormalityData Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & Normality
Ikbal Ahmed
 
UNIT I -Data and Data Collection1.ppt
UNIT I -Data and Data Collection1.pptUNIT I -Data and Data Collection1.ppt
UNIT I -Data and Data Collection1.ppt
NAGESH108233
 
Advanced statistics for librarians
Advanced statistics for librariansAdvanced statistics for librarians
Advanced statistics for librarians
John McDonald
 

Similar to Data in science (20)

Environmental statistics
Environmental statisticsEnvironmental statistics
Environmental statistics
 
Analysing & interpreting data.ppt
Analysing & interpreting data.pptAnalysing & interpreting data.ppt
Analysing & interpreting data.ppt
 
Statistics pres 3.31.2014
Statistics pres 3.31.2014Statistics pres 3.31.2014
Statistics pres 3.31.2014
 
Data collection
Data collectionData collection
Data collection
 
Back to the basics-Part2: Data exploration: representing and testing data pro...
Back to the basics-Part2: Data exploration: representing and testing data pro...Back to the basics-Part2: Data exploration: representing and testing data pro...
Back to the basics-Part2: Data exploration: representing and testing data pro...
 
Introduction to Data Management in Human Ecology
Introduction to Data Management in Human EcologyIntroduction to Data Management in Human Ecology
Introduction to Data Management in Human Ecology
 
Statistics for IB Biology
Statistics for IB BiologyStatistics for IB Biology
Statistics for IB Biology
 
Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014
 
Descriptive Analysis.pptx
Descriptive Analysis.pptxDescriptive Analysis.pptx
Descriptive Analysis.pptx
 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01
 
Data and Data Collection in Data Science.ppt
Data and Data Collection in Data Science.pptData and Data Collection in Data Science.ppt
Data and Data Collection in Data Science.ppt
 
UNIT I -Data and Data Collection.ppt
UNIT I -Data and Data Collection.pptUNIT I -Data and Data Collection.ppt
UNIT I -Data and Data Collection.ppt
 
Data and Data Collection - Quantitative and Qualitative
Data and Data Collection - Quantitative and QualitativeData and Data Collection - Quantitative and Qualitative
Data and Data Collection - Quantitative and Qualitative
 
introduction to statistical theory
introduction to statistical theoryintroduction to statistical theory
introduction to statistical theory
 
Quantitative Data - A Basic Introduction
Quantitative Data - A Basic IntroductionQuantitative Data - A Basic Introduction
Quantitative Data - A Basic Introduction
 
Metaanalysis copy
Metaanalysis    copyMetaanalysis    copy
Metaanalysis copy
 
5 numerical descriptive statitics
5 numerical descriptive statitics5 numerical descriptive statitics
5 numerical descriptive statitics
 
Data Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & NormalityData Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & Normality
 
UNIT I -Data and Data Collection1.ppt
UNIT I -Data and Data Collection1.pptUNIT I -Data and Data Collection1.ppt
UNIT I -Data and Data Collection1.ppt
 
Advanced statistics for librarians
Advanced statistics for librariansAdvanced statistics for librarians
Advanced statistics for librarians
 

Recently uploaded

Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 

Recently uploaded (20)

Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 

Data in science

  • 1. Sreejith Aravindakshan, Consultant, CIMMYT and Wageningen University, Netherlands 1 A date with DATA: Getting to know more about data analysis and models
  • 2. “Data is the new oil” • Data is a collection of facts, such as numbers, words, measurements, observations or even just descriptions of things • Data is all around us. But what exactly is it? Data is a value assigned to a thing. Color, Shape, Number, Condition, Size QUALITATIVE DATA : is everything that refers to the quality of something: A description of colours, texture and feel of an object, a description of experiences, and interviews are all qualitative data. QUANTITATIVE DATA : is data that refers to a number. E.g. the number of golf balls, the size, the price, a score on a test etc. 2
  • 3. 3 • Categorical data is qualitative in nature • Numerical (quantitative) data of both discrete and continuous nature can be interval or ratio data also • Interval data has ordered values with same difference but lack a true zero value e.g. Temperature. PH. • Ratio data are also ordered values with same difference but has a true zero value e.g. height, weight.
  • 4. Categorical Data : puts the item you are describing into a category: For example, the condition “used” would be categorical and also categories such as “new”, “used”, ”broken” etc. Discrete Data : is numerical data that has gaps in it: e.g. the count of golf balls. There can only be whole numbers of golf ball (there is no such thing as 0.3 golf balls). Continuous Data : is numerical data with a continuous range: e.g. size of the golf balls can be any value (e.q. 10.55 mm or 10.61 mm but also 10.536 mm). In continuous data, all values are possible with no gaps in between. Primary Data Secondary Data 4
  • 5. 5 Hypothesis Sampling Data Collection Data Entry Data Cleaning Theory Research Design Data storage What are the steps?
  • 7. • From researchers’ experience  Can result in wide confidence interval or measurement error • Using some formula For instance, Cochran’s formula for sample size calculation: 𝑛0 = 𝑍2 𝑝𝑞 𝑒2 Where:  e is the desired level of precision (i.e. the margin of error or confidence interval),  p is the (estimated) proportion of the population which has the attribute in question,  q is 1 – p. Determining the ideal sample size 7
  • 8. Example  Suppose we are doing a study on the inhabitants of a large town or village, and want to find out how many households serve breakfast in the mornings. We don’t have much information on the subject to begin with, so we’re going to assume that half of the families serve breakfast: this gives us maximum variability. So p = 0.5. Now let’s say we want 95% confidence level, and at least 5 percent—plus or minus—precision. A 95 % confidence level gives us Z values of 1.96, from the table values, so we get  ((1.96)2 *(0.5) *(0.5)) / (0.05)2 = 384.16 ~ 385.  So a random sample of 385 households in our target population should be enough to give us the confidence levels we need. 8
  • 9. Both Accurate and Precise Accurate Not precise Not accurate But precise Neither accurate nor precise • Accuracy refers to how close measurements are to the "true" value • Precision refers to how close measurements are to each other Data accuracy vs. precision 9
  • 10. Independent Variable: The variable in the study under consideration. The cause for the outcome for the study. Dependent Variable: The variable being affected by the independent variable. The effect of the study y = f(x) Which is which here? 10
  • 11. Principles of Data Collection • Understanding and knowing what types of data required • Collect only relevant data • Determine methods of data collection  Survey/questionnaire  Observation, participatory  Focus groups  Standard instruments  Content analysis  Experiments/observations  Personal interviews  Literature search – meta analysis 11
  • 12. Principles….. • Where, who, how, and when to collect * Research design * Sampling procedure * Prepare field work schedule/data plan * Conduct preliminary (surveys) investigation • Assess situation and prepare further strategies 12
  • 13. 13  Enter the data in MS-Excel.  Top row with variable labels in each cell.  Save the entered data as .csv file in MS-Excel
  • 14. Data analysis has been around for a while… R.A. Fisher Howard Dresner Peter LuhnW.E. Deming Robert Gentleman Ross Ihaka 14
  • 15. Knowing your data Descriptive/summary statistics: Mean, median, mode, standard deviation, frequencies, standard error 15
  • 17. • Consider the set • 1, 1, 2, 2, 3, 6, 7, 11, 11, 13, 14, 16, 19 • In this case there are 13 values so the median is the middle value, or (n+1) / 2 • (13+1) /2 = 7 • Consider the set • 1, 1, 2, 2, 3, 6, 7, 11, 11, 13, 14, 16 • In the second case, the mean of the two middle values is the median or (n+1) /2 (12 + 1) / 2 = 6.5 ~ (6+7) / 2 = 6.5 Median 17
  • 18. The most frequent value in a data set • Consider the set • 1, 1, 1, 1, 2, 2, 3, 6, 11, 11, 11, 13, 14, 16, 19 • In this case the mode is 1 because it is the most common value. • This is a case of unimodal distrbution • There may be cases where there are more than one mode as in this case • Consider the set • 1, 1, 1, 1, 2, 2, 3, 6, 11, 11, 11, 11, 13, 14, 16, 19 • In this case there are two modes (bimodal) : 1 and 11 because both occur 4 times in the data set. Mode 18
  • 21. R is just super cool for data analytics 21
  • 22. Visualizing my scientific career using data in R R package “ggplot2” is amazing!! 23
  • 23. Basic regression models y = Dependent variable (Response variable) x = Independent variable (Explanatory or predictor variable) 𝜀 = random error component 𝛽0 = intercept 𝛽1 = Slope or coefficient of 𝑥 and 𝑥1 in linear model and multiple regression models, respectively 𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽 𝑘 𝑥 𝑘 + 𝜀 𝑦 = 𝛽0 + 𝛽1 𝑥 + 𝜀 24
  • 24. O F Y X I OLS Regression SFA DEA Output Efficiency of F: FO/YO Input Efficiency of F: XI/XF
  • 25.
  • 26. Symbol Meaning Level of significance ns P > 0.05 Not applicable * P ≤ 0.05 At 10% level ** P ≤ 0.01 At 5% level *** P ≤ 0.001 At 1% level **** P ≤ 0.0001 At 0.1% level "p-value offers a first defense line against being fooled by randomness, separating signal from noise" 26 Statistical significance and p-value
  • 27. Chance (Random Error; Sampling Error) Bias (Systematic Errors [inaccuracies])  Selection bias  Loss to follow-up bias Information bias • Nondifferential (e.g. simple misclassification) • Differential Biases (e.g., recall bias, interviewer bias) Confounding (Imbalance in Other Factors) A situation in which the effect of two processes are not separated. Errors affecting validity. A systematic error (caused by the investigator or the subjects) that causes an incorrect (over- or under-) estimate of an association. What is bias? 27
  • 28. 28 A word of caution: “Interpretation can however be subjective”
  • 29. Don’t have any strong opinion about SPSS since I am not an avid user of the same...... 29
  • 30. R or others – The fight is on A lot more documents found in Google Scholar still uses SPSS than R while it is vice-versa in Scopus . 30
  • 31. What Is R? • a programming “environment” • object-oriented • similar to S-Plus • freeware • provides calculations on matrices • excellent graphics capabilities • supported by a large user network 31
  • 32. What is R Not? • a statistics software package • menu-driven • quick to learn • a program with a complex graphical interface 32
  • 33. Installing R • www.r-project.org/ • download from CRAN • select a download site • download the base package at a minimum • download contributed packages as needed 33
  • 34.
  • 35.
  • 36.
  • 37.
  • 38. Tutorials cont. • Textbooks The Art of R programming by Norman Matloff Handbook of programming with R by Garrett Grolemund 38
  • 39.
  • 40.
  • 41.
  • 42.
  • 44. Disclaimer: Many of the image files used in this presentation have been downloaded from the internet. Any copyright holders who are not duly acknowledged here may contact me for proper citation. Contact : sreejiagriman@gmail.com