SlideShare a Scribd company logo
1 of 27
Is cross-fertilization good or bad?: An
analysis of Darwin’s Zea Mays Data
By Jamie Chatman
and
Charlotte Hsieh
Outline
 Short biography of Charles Darwin and
Ronald Fisher
 Description of the Zea Mays data
 Analysis of the data
 Parametric tests (t-test, confidence intervals)
 Nonparametric test (i.e. Wilcoxon signed rank)
 Bootstrap tests
 Conclusion
Short Biography of Charles Darwin
 Darwin was born in 1809 in Shrewsbury,
England
 At 16 went to Edinburgh University to study
medicine, but did not finish
 He went to Cambridge University, where he
received his degree studying to become a
clergyman.
 Darwin worked as an unpaid naturalist on a five-year
scientific expedition to South America 1831.
 Darwin’s research led to his book, On the Origin of
Species by Means of Natural Selection, published in
1859.
1809-1882
Short Biography of Ronald Fisher
 Fisher was born in East Finchley,
London in 1890.
 Fisher went to Cambridge University and
received a degree in mathematics.
 Fisher made many discoveries in statistics
including maximum likelihood, analysis of
variance, sufficiency, and was a pioneer for
design of experiments.
1890-1962
Darwin’s Zea Mays Data
Hypothesis
 Null Hypothesis:
 Ho: There is no difference in stalk height between
the cross-fertilized and self-fertilized plants.
 Alternative Hypothesis:
 HA: Cross-fertilized stalk heights are not equal to
self-fertilized heights
 HA: Cross-fertilization leads to increased stalk
height
Galton’s Approach to the Data
Crossed Self-Fert.
Pot I 23.500 17.375
12.000 20.375
21.000 20.000
Pot II 22.000 20.000
19.124 18.375
21.500 18.625
Pot III 22.125 18.625
20.375 15.250
18.250 16.500
21.625 18.000
23.250 16.250
Pot IV 21.000 18.000
22.125 12.750
23.000 15.500
12.000 18.000
Original Data
Crossed Self-Fert. Difference
23.500 20.375 3.175
23.250 20.000 3.250
23.000 20.000 3.000
22.125 18.625 3.500
22.125 18.625 3.500
22.000 18.375 3.625
21.625 18.000 3.625
21.500 18.000 3.500
21.000 18.000 3.000
21.000 17.375 3.625
20.375 16.500 3.875
19.124 16.250 2.874
18.250 15.500 2.750
12.000 15.250 -3.250
12.000 12.750 -0.750
Galton’s Approach
Parametric Test
 Fisher made an assumption that the stalk heights
were normally distributed
 Crossed: X ~
 Self-fertilized Y~
 Difference: X-Y=d ~


 p-value : 0.0497
 Reject the null hypothesis that at the .05 level
),(
2
XXN σµ
),(
2
YYN σµ
),(
22
XYxYN σσµµ +−
26.22
6166.2
2
=
=
d
s
d
d.f.= 14
148.2
06166.2
15
26.22
=
−
=t
yx µµ =
Parametric Test
 95% confidence interval
)15/7181.4*145.26167.215/7181.4*145.26167.2( +≤≤− d
))/()/(( 025.025. nstxdnstx +≤≤−
)2298.500364(. ≤≤ d
Since zero is not in the interval, the null hypothesis that the differences =0,
(or that the means) are equal is rejected
Fisher’s Non-Parametric Approach
 If Ho is true, and the heights of the crossed and self-
fertilized are equal, then there should be an equal
chance that each one of the pairs came from the
self-fert. or the crossed
 If we look at all possible swaps in each pair there are
215
= 32,768 possibilities
 The sum of the differences is 39.25
 But only 863 of these cases have sums of the difference as
great as 39.25
 So the null hypothesis would be rejected at the
0526.
768,32
863*2
= level
Fisher’s Nonparametric Approach
 The results of the nonparametric test agreed with
the results of the t-test
 Fisher was happy with this
 However, Fisher believed that removing the
assumption of normality in the nonparametric test
would result in a less powerful test than the t-test
 “[Nonparametric tests] assume less knowledge, or
more ignorance, of the experimental material than
does the standard test…”
 We disagree
Non-Parametric Test
 Wilcoxon Signed Rank Test
Diff.
6.125
-8.375
1
2
0.749
2.875
3.5
5.125
1.75
3.625
7
3
9.375
7.5
-6
Diff. Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
72
1
== ∑=
n
i
iRW
0.749
1
1.75
2
2.875
3
3.5
3.625
5.125
6
6.125
7
7.59.375
8.375
-
-
6
)12)(1(...21
)(
0
2
1
2
1
2
1
2
1
)(
222
1
1
++
=
+++
=
=
+






+
+






−=
nn
n
n
RVar
nn
RE
6
)12)(1(
)(
0)(
1
++
=
=





= ∑=
nnn
WVar
REWE
n
i
i
Non-Parametric Test
 Wilcoxon Signed Rank Test
 When n is large W~N(0, Var(W))
 This gives a p-value of 0.0409. Thus we reject the
null hypothesis.
045.2
072
)(
0
6
)130)(115(15
=
−
=
−
++
WVar
W
Bootstrap Methods
 Introduced by Bradley Efron (1979)
 44 years after Fisher’s analysis
 "If statistics had evolved at a time when computers existed,
it wouldn't be what it is today (Efron)."
 Uses repeated re-samples of the data
 Allows the use of computer sampling approaches
that are asymptotically equivalent to tests where
exact significance levels require complicated
manipulations
 A sampling simulation approximation to Fisher’s
nonparametric approach
The data “pull themselves up by their own bootstraps” by
generating new data sets through which their reliability can be
determined.
Bootstrap: Random Sign Change
 If Ho is true, there is an equal chance that the
plants in each pair are cross-fertilized or self-
fertilized
 Method:
 1. Randomly shift from cross to self-fertilized in each
pair
 2. Compute sum of differences
 3. Repeat 5,000 times
 4. Plot histogram of summed differences
 5. Find the number of summed differences > 39.25
Bootstrap: Random Sign Change
-60 -40 -20 0 20 40 60
0200400600800
Histogram of 5000 Resampled Sums of (Sign) Randomized
Zea Mays Differences
Total of Differences
Frequency
Results
 124/5000 are >39.25.
 The p-value is
2*(124/5000)=0.0496.
 Compare to exact
combinatorial p-value of
0.0526
Bootstrap: Resample Within Pots
 Experimenters will tend to present data in such a way as
to get significant results
 In order to be sure that pairings in each pot are random,
we can resample within pots
 We assume equality of heights in each pot
 Method:
 1. Sample 3 crossed plants in pot 1 with replacement
 2. Sample 3 self-fert. plants in pot 1 with replacement
 3. Repeat for pots 2-4
 4. Compute sum of differences
 5. Repeat 5,000 times
 6. Plot histogram of summed differences
 5. Find the number of summed differences <0
Bootstrap: Resample Within Pots
-100 -50 0 50 100
050010001500
Histogram of Sums of Differences in 5000
Resamplings with Resampling Within Pots
Value of Sum of Differences
Frequency
Results
 27/5000 are <0
 The p-value is
2*(27/5000)=0.0108
Resampling-Based Sign Test
 Disregard size of difference and look only at the sign of the
difference
 If Ho is true, the probability of any difference being positive or
negative is 0.5, and we can use a binomial approach, where we
would expect half out of 15 pairs to have a positive difference
and half to have a negative difference
 We can count the number of positive differences in resampled
pairs of size 15
 Method:
 1. Sample 3 crossed plants in pot 1 with replacement
 2. Sample 3 self-fert. plants in pot 1 with replacement
 3. Repeat for pots 2-4
 4. Count the number of positive differences
 5. Repeat 5,000 times
Resampling-Based Sign Test
Results
 Almost every time out of
5,000, we get over 8
positive differences out of
15.
 #pos diff < 6: 0/5000
 #pos diff < 8: 2/5000
 p-value is essentially 0
6 8 10 12 14
0500100015002000
Histogram of Number of Positive Differences Between
Crossed and Self-Fertilized in 5000 Resamplings of
Size 15 from the Zea Mays Data with Randomization
Within Pots
Number of Positive Differences
Frequency
Randomization Within Pots
 Disregard information about cross or self-fertilized
 Find the distribution of summed differences by
resampling from pooled data
 Method:
 1. Pool plants in pot 1
 2. Sample 3 plants from the pool w/replacement, treat as crossed
 3. Sample 3 plants from the pool w/replacement, treat as self-fert.
 4. Repeat for pots 2-4
 5. Compute sum of differences
 6. Repeat 5,000 times
 7. Plot histogram of summed differences (=distribution of null
hypothesis)
 8. Find the number of summed differences >39.25
Randomization Within Pots
Results
 38/5000 are >39.25
 The p-value is
2*(38/5000)= 0.0152
-100 -50 0 50 100
050010001500
Histogram of Null Hypothesis Randomization
Test Distribution (resample of 5000)
Sum of Differences
Frequency
Resampling Approach to Confidence
Intervals
 Using Darwin’s original
differences:
 1. Sample 15 differences
with replacement
 2. Compute the sum of
differences
 3. Repeat 5,000 times
 4. Plot histogram of
summed differences
 5. Take 125th
and 4875th
summed difference
 Divide by sample size = 15
-100 -50 0 50 100
050010001500
Histogram of 5000 Sums of 15 Resampled
Differences in Galton's Zea Mays Data
Sum of 15 Differences
Frequency
We get 95% CI: (0.1749, 4.817),
which is shorter than the t-interval
(.0036, 5.230)
Resampling Approach to Confidence
Intervals
 In the resampling approaches, “95% of the
resampled average differences were between
0.1749 and 4.817.”
 This is not equivalent to the t- procedure,
where “with probability 95%, the true value of
the difference estimate lies between 0.0036
and 5.230.”
Conclusion
 We can conclude from our tests that cross-
fertilization leads to increased stalk heights
 Despite Fisher’s concerns that removing
normality assumptions was less intelligible
than the t-test, nonparametric resampling-
based methods are powerful and efficient
Is there anything else to consider?
 Not using randomization, which might lead to
environmental advantages and disadvantages
 Soil conditions or fertility
 Lighting
 Air currents
 Irrigation/evaporation
References
 Fisher, R.A.(1935). The Design of Experiments. Edinburgh:
Oliver & Boyd, 29-49.
 Thompson, J.R.(2000). Simulation: A Modeler’s Approach.
New York: Wiley-International Publication, 199-210.
 http://www.fact-index.com/r/ro/ronald_fisher.html
 http://www.lib.virginia.edu/science/parshall/darwin.html
 http://www.mste.uiuc.edu/stat/bootarticle.html
 http://www.psych.usyd.edu.au/difference5/scholars/galton.html

More Related Content

What's hot

PG STAT 531 Lecture 6 Test of Significance, z Test
PG STAT 531 Lecture 6 Test of Significance, z TestPG STAT 531 Lecture 6 Test of Significance, z Test
PG STAT 531 Lecture 6 Test of Significance, z TestAashish Patel
 
Hypergeometric distribution
Hypergeometric distributionHypergeometric distribution
Hypergeometric distributionmohammad nouman
 
P G STAT 531 Lecture 7 t test and Paired t test
P G STAT 531 Lecture 7 t test and Paired t testP G STAT 531 Lecture 7 t test and Paired t test
P G STAT 531 Lecture 7 t test and Paired t testAashish Patel
 
Statistik Chapter 3
Statistik Chapter 3Statistik Chapter 3
Statistik Chapter 3WanBK Leo
 
Week 5 lecture_math_221_nov_2012
Week 5 lecture_math_221_nov_2012Week 5 lecture_math_221_nov_2012
Week 5 lecture_math_221_nov_2012Brent Heard
 
Sample sample distribution
Sample sample distributionSample sample distribution
Sample sample distributionNur Suaidah
 
Probability 2(final)
Probability 2(final)Probability 2(final)
Probability 2(final)Khadiza Begum
 
PG STAT 531 Lecture 5 Probability Distribution
PG STAT 531 Lecture 5 Probability DistributionPG STAT 531 Lecture 5 Probability Distribution
PG STAT 531 Lecture 5 Probability DistributionAashish Patel
 
Week 5 Lecture Math 221 Mar 2012
Week 5 Lecture Math 221 Mar 2012Week 5 Lecture Math 221 Mar 2012
Week 5 Lecture Math 221 Mar 2012Brent Heard
 
11.1 11.1 combination and permutations
11.1 11.1 combination and permutations11.1 11.1 combination and permutations
11.1 11.1 combination and permutationsMark Ryder
 
P G STAT 531 Lecture 8 Chi square test
P G STAT 531 Lecture 8 Chi square testP G STAT 531 Lecture 8 Chi square test
P G STAT 531 Lecture 8 Chi square testAashish Patel
 
L3 conditional probability
L3 conditional probabilityL3 conditional probability
L3 conditional probabilityRDemolina
 
Normal Distribution, Binomial Distribution, Poisson Distribution
Normal Distribution, Binomial Distribution, Poisson DistributionNormal Distribution, Binomial Distribution, Poisson Distribution
Normal Distribution, Binomial Distribution, Poisson DistributionQ Dauh Q Alam
 

What's hot (20)

Chapter6
Chapter6Chapter6
Chapter6
 
PG STAT 531 Lecture 6 Test of Significance, z Test
PG STAT 531 Lecture 6 Test of Significance, z TestPG STAT 531 Lecture 6 Test of Significance, z Test
PG STAT 531 Lecture 6 Test of Significance, z Test
 
Hypergeometric distribution
Hypergeometric distributionHypergeometric distribution
Hypergeometric distribution
 
P G STAT 531 Lecture 7 t test and Paired t test
P G STAT 531 Lecture 7 t test and Paired t testP G STAT 531 Lecture 7 t test and Paired t test
P G STAT 531 Lecture 7 t test and Paired t test
 
Practice Test 2 Solutions
Practice Test 2  SolutionsPractice Test 2  Solutions
Practice Test 2 Solutions
 
Statistik Chapter 3
Statistik Chapter 3Statistik Chapter 3
Statistik Chapter 3
 
Week 5 lecture_math_221_nov_2012
Week 5 lecture_math_221_nov_2012Week 5 lecture_math_221_nov_2012
Week 5 lecture_math_221_nov_2012
 
Counting
CountingCounting
Counting
 
Sample sample distribution
Sample sample distributionSample sample distribution
Sample sample distribution
 
Probability 2(final)
Probability 2(final)Probability 2(final)
Probability 2(final)
 
PG STAT 531 Lecture 5 Probability Distribution
PG STAT 531 Lecture 5 Probability DistributionPG STAT 531 Lecture 5 Probability Distribution
PG STAT 531 Lecture 5 Probability Distribution
 
Week 5 Lecture Math 221 Mar 2012
Week 5 Lecture Math 221 Mar 2012Week 5 Lecture Math 221 Mar 2012
Week 5 Lecture Math 221 Mar 2012
 
11.1 11.1 combination and permutations
11.1 11.1 combination and permutations11.1 11.1 combination and permutations
11.1 11.1 combination and permutations
 
Counting
Counting  Counting
Counting
 
Chapter8
Chapter8Chapter8
Chapter8
 
Binomial Probability Distributions
Binomial Probability DistributionsBinomial Probability Distributions
Binomial Probability Distributions
 
Chapter7
Chapter7Chapter7
Chapter7
 
P G STAT 531 Lecture 8 Chi square test
P G STAT 531 Lecture 8 Chi square testP G STAT 531 Lecture 8 Chi square test
P G STAT 531 Lecture 8 Chi square test
 
L3 conditional probability
L3 conditional probabilityL3 conditional probability
L3 conditional probability
 
Normal Distribution, Binomial Distribution, Poisson Distribution
Normal Distribution, Binomial Distribution, Poisson DistributionNormal Distribution, Binomial Distribution, Poisson Distribution
Normal Distribution, Binomial Distribution, Poisson Distribution
 

Similar to Zea mays

Chi square[1]
Chi square[1]Chi square[1]
Chi square[1]sbarkanic
 
chi-Square. test-
chi-Square. test-chi-Square. test-
chi-Square. test-shifanaz9
 
Solution to the practice test ch 10 correlation reg ch 11 gof ch12 anova
Solution to the practice test ch 10 correlation reg ch 11 gof ch12 anovaSolution to the practice test ch 10 correlation reg ch 11 gof ch12 anova
Solution to the practice test ch 10 correlation reg ch 11 gof ch12 anovaLong Beach City College
 
Test of-significance : Z test , Chi square test
Test of-significance : Z test , Chi square testTest of-significance : Z test , Chi square test
Test of-significance : Z test , Chi square testdr.balan shaikh
 
10. sampling and hypotehsis
10. sampling and hypotehsis10. sampling and hypotehsis
10. sampling and hypotehsisKaran Kukreja
 
Probability 4.2
Probability 4.2Probability 4.2
Probability 4.2herbison
 
Binomial distribution good
Binomial distribution goodBinomial distribution good
Binomial distribution goodZahida Pervaiz
 
Chi Square
Chi SquareChi Square
Chi SquareJolie Yu
 
Statistik Chapter 5 (1)
Statistik Chapter 5 (1)Statistik Chapter 5 (1)
Statistik Chapter 5 (1)WanBK Leo
 
4 1 probability and discrete probability distributions
4 1 probability and discrete    probability distributions4 1 probability and discrete    probability distributions
4 1 probability and discrete probability distributionsLama K Banna
 
Discrete Probability Distributions
Discrete Probability DistributionsDiscrete Probability Distributions
Discrete Probability Distributionsmandalina landy
 
Statistik Chapter 5
Statistik Chapter 5Statistik Chapter 5
Statistik Chapter 5WanBK Leo
 
CHISQUAREgenetics.ppt
CHISQUAREgenetics.pptCHISQUAREgenetics.ppt
CHISQUAREgenetics.pptbashirlone123
 
probability ch 6 ppt_1_1.pptx
probability ch 6 ppt_1_1.pptxprobability ch 6 ppt_1_1.pptx
probability ch 6 ppt_1_1.pptxYeMinThant4
 

Similar to Zea mays (20)

Chi square[1]
Chi square[1]Chi square[1]
Chi square[1]
 
chi-Square. test-
chi-Square. test-chi-Square. test-
chi-Square. test-
 
Solution to the practice test ch 10 correlation reg ch 11 gof ch12 anova
Solution to the practice test ch 10 correlation reg ch 11 gof ch12 anovaSolution to the practice test ch 10 correlation reg ch 11 gof ch12 anova
Solution to the practice test ch 10 correlation reg ch 11 gof ch12 anova
 
Statistical analysis by iswar
Statistical analysis by iswarStatistical analysis by iswar
Statistical analysis by iswar
 
Binomial probability distributions
Binomial probability distributions  Binomial probability distributions
Binomial probability distributions
 
Chi
ChiChi
Chi
 
Test of-significance : Z test , Chi square test
Test of-significance : Z test , Chi square testTest of-significance : Z test , Chi square test
Test of-significance : Z test , Chi square test
 
Chi square test
Chi square testChi square test
Chi square test
 
10. sampling and hypotehsis
10. sampling and hypotehsis10. sampling and hypotehsis
10. sampling and hypotehsis
 
Probability 4.2
Probability 4.2Probability 4.2
Probability 4.2
 
Binomial distribution good
Binomial distribution goodBinomial distribution good
Binomial distribution good
 
Chi Square
Chi SquareChi Square
Chi Square
 
Statistik Chapter 5 (1)
Statistik Chapter 5 (1)Statistik Chapter 5 (1)
Statistik Chapter 5 (1)
 
4 1 probability and discrete probability distributions
4 1 probability and discrete    probability distributions4 1 probability and discrete    probability distributions
4 1 probability and discrete probability distributions
 
Discrete Probability Distributions
Discrete Probability DistributionsDiscrete Probability Distributions
Discrete Probability Distributions
 
Test of significance
Test of significanceTest of significance
Test of significance
 
Statistik Chapter 5
Statistik Chapter 5Statistik Chapter 5
Statistik Chapter 5
 
CHISQUAREgenetics.ppt
CHISQUAREgenetics.pptCHISQUAREgenetics.ppt
CHISQUAREgenetics.ppt
 
05inference_2011.ppt
05inference_2011.ppt05inference_2011.ppt
05inference_2011.ppt
 
probability ch 6 ppt_1_1.pptx
probability ch 6 ppt_1_1.pptxprobability ch 6 ppt_1_1.pptx
probability ch 6 ppt_1_1.pptx
 

Recently uploaded

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Zea mays

  • 1. Is cross-fertilization good or bad?: An analysis of Darwin’s Zea Mays Data By Jamie Chatman and Charlotte Hsieh
  • 2. Outline  Short biography of Charles Darwin and Ronald Fisher  Description of the Zea Mays data  Analysis of the data  Parametric tests (t-test, confidence intervals)  Nonparametric test (i.e. Wilcoxon signed rank)  Bootstrap tests  Conclusion
  • 3. Short Biography of Charles Darwin  Darwin was born in 1809 in Shrewsbury, England  At 16 went to Edinburgh University to study medicine, but did not finish  He went to Cambridge University, where he received his degree studying to become a clergyman.  Darwin worked as an unpaid naturalist on a five-year scientific expedition to South America 1831.  Darwin’s research led to his book, On the Origin of Species by Means of Natural Selection, published in 1859. 1809-1882
  • 4. Short Biography of Ronald Fisher  Fisher was born in East Finchley, London in 1890.  Fisher went to Cambridge University and received a degree in mathematics.  Fisher made many discoveries in statistics including maximum likelihood, analysis of variance, sufficiency, and was a pioneer for design of experiments. 1890-1962
  • 6. Hypothesis  Null Hypothesis:  Ho: There is no difference in stalk height between the cross-fertilized and self-fertilized plants.  Alternative Hypothesis:  HA: Cross-fertilized stalk heights are not equal to self-fertilized heights  HA: Cross-fertilization leads to increased stalk height
  • 7. Galton’s Approach to the Data Crossed Self-Fert. Pot I 23.500 17.375 12.000 20.375 21.000 20.000 Pot II 22.000 20.000 19.124 18.375 21.500 18.625 Pot III 22.125 18.625 20.375 15.250 18.250 16.500 21.625 18.000 23.250 16.250 Pot IV 21.000 18.000 22.125 12.750 23.000 15.500 12.000 18.000 Original Data Crossed Self-Fert. Difference 23.500 20.375 3.175 23.250 20.000 3.250 23.000 20.000 3.000 22.125 18.625 3.500 22.125 18.625 3.500 22.000 18.375 3.625 21.625 18.000 3.625 21.500 18.000 3.500 21.000 18.000 3.000 21.000 17.375 3.625 20.375 16.500 3.875 19.124 16.250 2.874 18.250 15.500 2.750 12.000 15.250 -3.250 12.000 12.750 -0.750 Galton’s Approach
  • 8. Parametric Test  Fisher made an assumption that the stalk heights were normally distributed  Crossed: X ~  Self-fertilized Y~  Difference: X-Y=d ~    p-value : 0.0497  Reject the null hypothesis that at the .05 level ),( 2 XXN σµ ),( 2 YYN σµ ),( 22 XYxYN σσµµ +− 26.22 6166.2 2 = = d s d d.f.= 14 148.2 06166.2 15 26.22 = − =t yx µµ =
  • 9. Parametric Test  95% confidence interval )15/7181.4*145.26167.215/7181.4*145.26167.2( +≤≤− d ))/()/(( 025.025. nstxdnstx +≤≤− )2298.500364(. ≤≤ d Since zero is not in the interval, the null hypothesis that the differences =0, (or that the means) are equal is rejected
  • 10. Fisher’s Non-Parametric Approach  If Ho is true, and the heights of the crossed and self- fertilized are equal, then there should be an equal chance that each one of the pairs came from the self-fert. or the crossed  If we look at all possible swaps in each pair there are 215 = 32,768 possibilities  The sum of the differences is 39.25  But only 863 of these cases have sums of the difference as great as 39.25  So the null hypothesis would be rejected at the 0526. 768,32 863*2 = level
  • 11. Fisher’s Nonparametric Approach  The results of the nonparametric test agreed with the results of the t-test  Fisher was happy with this  However, Fisher believed that removing the assumption of normality in the nonparametric test would result in a less powerful test than the t-test  “[Nonparametric tests] assume less knowledge, or more ignorance, of the experimental material than does the standard test…”  We disagree
  • 12. Non-Parametric Test  Wilcoxon Signed Rank Test Diff. 6.125 -8.375 1 2 0.749 2.875 3.5 5.125 1.75 3.625 7 3 9.375 7.5 -6 Diff. Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 72 1 == ∑= n i iRW 0.749 1 1.75 2 2.875 3 3.5 3.625 5.125 6 6.125 7 7.59.375 8.375 - - 6 )12)(1(...21 )( 0 2 1 2 1 2 1 2 1 )( 222 1 1 ++ = +++ = = +       + +       −= nn n n RVar nn RE 6 )12)(1( )( 0)( 1 ++ = =      = ∑= nnn WVar REWE n i i
  • 13. Non-Parametric Test  Wilcoxon Signed Rank Test  When n is large W~N(0, Var(W))  This gives a p-value of 0.0409. Thus we reject the null hypothesis. 045.2 072 )( 0 6 )130)(115(15 = − = − ++ WVar W
  • 14. Bootstrap Methods  Introduced by Bradley Efron (1979)  44 years after Fisher’s analysis  "If statistics had evolved at a time when computers existed, it wouldn't be what it is today (Efron)."  Uses repeated re-samples of the data  Allows the use of computer sampling approaches that are asymptotically equivalent to tests where exact significance levels require complicated manipulations  A sampling simulation approximation to Fisher’s nonparametric approach The data “pull themselves up by their own bootstraps” by generating new data sets through which their reliability can be determined.
  • 15. Bootstrap: Random Sign Change  If Ho is true, there is an equal chance that the plants in each pair are cross-fertilized or self- fertilized  Method:  1. Randomly shift from cross to self-fertilized in each pair  2. Compute sum of differences  3. Repeat 5,000 times  4. Plot histogram of summed differences  5. Find the number of summed differences > 39.25
  • 16. Bootstrap: Random Sign Change -60 -40 -20 0 20 40 60 0200400600800 Histogram of 5000 Resampled Sums of (Sign) Randomized Zea Mays Differences Total of Differences Frequency Results  124/5000 are >39.25.  The p-value is 2*(124/5000)=0.0496.  Compare to exact combinatorial p-value of 0.0526
  • 17. Bootstrap: Resample Within Pots  Experimenters will tend to present data in such a way as to get significant results  In order to be sure that pairings in each pot are random, we can resample within pots  We assume equality of heights in each pot  Method:  1. Sample 3 crossed plants in pot 1 with replacement  2. Sample 3 self-fert. plants in pot 1 with replacement  3. Repeat for pots 2-4  4. Compute sum of differences  5. Repeat 5,000 times  6. Plot histogram of summed differences  5. Find the number of summed differences <0
  • 18. Bootstrap: Resample Within Pots -100 -50 0 50 100 050010001500 Histogram of Sums of Differences in 5000 Resamplings with Resampling Within Pots Value of Sum of Differences Frequency Results  27/5000 are <0  The p-value is 2*(27/5000)=0.0108
  • 19. Resampling-Based Sign Test  Disregard size of difference and look only at the sign of the difference  If Ho is true, the probability of any difference being positive or negative is 0.5, and we can use a binomial approach, where we would expect half out of 15 pairs to have a positive difference and half to have a negative difference  We can count the number of positive differences in resampled pairs of size 15  Method:  1. Sample 3 crossed plants in pot 1 with replacement  2. Sample 3 self-fert. plants in pot 1 with replacement  3. Repeat for pots 2-4  4. Count the number of positive differences  5. Repeat 5,000 times
  • 20. Resampling-Based Sign Test Results  Almost every time out of 5,000, we get over 8 positive differences out of 15.  #pos diff < 6: 0/5000  #pos diff < 8: 2/5000  p-value is essentially 0 6 8 10 12 14 0500100015002000 Histogram of Number of Positive Differences Between Crossed and Self-Fertilized in 5000 Resamplings of Size 15 from the Zea Mays Data with Randomization Within Pots Number of Positive Differences Frequency
  • 21. Randomization Within Pots  Disregard information about cross or self-fertilized  Find the distribution of summed differences by resampling from pooled data  Method:  1. Pool plants in pot 1  2. Sample 3 plants from the pool w/replacement, treat as crossed  3. Sample 3 plants from the pool w/replacement, treat as self-fert.  4. Repeat for pots 2-4  5. Compute sum of differences  6. Repeat 5,000 times  7. Plot histogram of summed differences (=distribution of null hypothesis)  8. Find the number of summed differences >39.25
  • 22. Randomization Within Pots Results  38/5000 are >39.25  The p-value is 2*(38/5000)= 0.0152 -100 -50 0 50 100 050010001500 Histogram of Null Hypothesis Randomization Test Distribution (resample of 5000) Sum of Differences Frequency
  • 23. Resampling Approach to Confidence Intervals  Using Darwin’s original differences:  1. Sample 15 differences with replacement  2. Compute the sum of differences  3. Repeat 5,000 times  4. Plot histogram of summed differences  5. Take 125th and 4875th summed difference  Divide by sample size = 15 -100 -50 0 50 100 050010001500 Histogram of 5000 Sums of 15 Resampled Differences in Galton's Zea Mays Data Sum of 15 Differences Frequency We get 95% CI: (0.1749, 4.817), which is shorter than the t-interval (.0036, 5.230)
  • 24. Resampling Approach to Confidence Intervals  In the resampling approaches, “95% of the resampled average differences were between 0.1749 and 4.817.”  This is not equivalent to the t- procedure, where “with probability 95%, the true value of the difference estimate lies between 0.0036 and 5.230.”
  • 25. Conclusion  We can conclude from our tests that cross- fertilization leads to increased stalk heights  Despite Fisher’s concerns that removing normality assumptions was less intelligible than the t-test, nonparametric resampling- based methods are powerful and efficient
  • 26. Is there anything else to consider?  Not using randomization, which might lead to environmental advantages and disadvantages  Soil conditions or fertility  Lighting  Air currents  Irrigation/evaporation
  • 27. References  Fisher, R.A.(1935). The Design of Experiments. Edinburgh: Oliver & Boyd, 29-49.  Thompson, J.R.(2000). Simulation: A Modeler’s Approach. New York: Wiley-International Publication, 199-210.  http://www.fact-index.com/r/ro/ronald_fisher.html  http://www.lib.virginia.edu/science/parshall/darwin.html  http://www.mste.uiuc.edu/stat/bootarticle.html  http://www.psych.usyd.edu.au/difference5/scholars/galton.html