SlideShare a Scribd company logo
1 of 42
Download to read offline
Richard Gill, Mathematical Institute, Leiden University. 16 November, 2022
The Dutch new herring
scandals
Repeated measurements
with unintended feedback
At some point in the latter half of the 14th century Willem Beukelszoon began to go about things differently to other
fi
shermen or
fi
sh curers. He is credited with discovering something called kaken, or “gibbing”: a method of gutting and
deboning herring that left parts of its stomach and another of its internal organs, the pyloric caecae, intact. Removing the
guts and bones took away the bits that would begin to rot
fi
rst, whilst the remaining pyloric caecae would continue to emit
an enzyme called trypsin. The herring would then be thrown into brine and basically, pickle in its own juices.
Smell them and weep
Netherlands
fi
shmongers accuse herring-tasters of erring
Can the Dutch still trust their herring-tasters?
Print edition | Europe
Nov 23rd 2017
| AMSTERDAM
HERRING (genus Clupea, with four species found in the Baltic and North Seas) have been vital to northern Europe’s economy since the Middle Ages,
when
fi
shermen worked out how to preserve them in brine. Every north European country maintains that there is a right way to eat the
fi
sh, but they
differ as to what it is. In Sweden Baltic surströmming are fermented until slightly rancid. In Denmark the sill are pickled, or cooked and eaten in long
strips. In the Netherlands haring must be lightly salted for preservation but otherwise raw, dipped in minced onion and accompanied with a pickle. No
food is more loved.
So the Dutch were shocked when accusations surfaced in November that there was something rotten about the national herring test. The test, sponsored
by the Algemeen Dagblad, a newspaper, is carried out by two expert tasters, who each year rate the herring at over a hundred shops and stands across the
country. Ben Vollaard, an economist at Tilburg University, was surprised when his respected local
fi
shmonger scored zero. The merchant told Mr Vollaard
that one judge routinely tipped the scales, giving higher scores to stores that get their
fi
sh from the Atlantic Group, a distributor in Scheveningen. The
judge happened to be a consultant for Atlantic, giving courses on how to slice and serve herring.
“I saw how much damage a low rating could do. The judges act like God,” says Mr Vollaard, who specialises in using statistics to detect crime. He
decided to run the numbers. The ratings include objective criteria, like weight and fattiness, and subjective ones such as taste and appearance. The
economist contacted 85% of the shops surveyed in the past two years and asked who their distributors were. He found that whereas the overall average
score was 5.5, the average for those supplied by Atlantic was 8.7. The extra boost for the Atlantic stores came mainly from the subjective scores.
Mr Vollaard’s study has blown the lid off the sealed world of Dutch herring. Fishmongers who long suspected the judge of bias towards Atlantic now say
the test is rotten. Two who received low ratings have vowed to sue the Algemeen Dagblad for defamation.
The judge and Atlantic say they have been smeared, and that the statistical evidence is a red herring. They say Mr Vollaard’s
fi
gures are off, and that their
high scores are due to their superior
fi
sh. But the charges of belangenverstrengeling (con
fl
ict of interest) have left the test’s reputation for impartiality
gutted.
This article appeared in the Europe section of the print edition under the headline "Failing the smell test"
Print edition | Europe
Nov 23rd 2017
| AMSTERDAM
• Celebrating many years collaboration with Per Kragh Andersen
• and Ørnulf Borgan, whose retirement party I had to miss
(pandemic)
• In loving memory of the leader of the pack, Niels Keiding
• May the royalty checks for “the weightlifter’s guide to counting
processes” (ABGK) continue to come in, for many years to
come!
ABGK for ever!
• Pitfalls of amateur regression: The Dutch New Herring controversies
• Fengnan Gao & Richard D. Gill
• https://arxiv.org/abs/2104.00333
• Submitted to SJS
• Previously rejected by Statistica Neerlandica, Life Data Analysis, and JRSS(A)
• SJS: “The journal specializes in statistical modelling showing particular
appreciation of the underlying substantive research problems”
• “The emergence of specialized methods for analysing longitudinal and
spatial data is just one example of an area of important methodological
development in which the Scandinavian Journal of Statistics has a particular
niche"
Longitudinal and spatial data with enormous societal impact
Dutch New Herring
• AD: Algemene Dagblad, a popular Dutch daily newspaper, based in
Rotterdam
• The Herring Test: a yearly ranking of consumer outlets of “Dutch
New Herring”
• Dutch New Herring (“Hollandse maatjes”): An EU protected
designation for herring prepared according to ancient Dutch tradition
• Ordinary consumers (readers of AD) nominated their local favourite
food market stalls, supermarkets,
fi
sh shops; highly ranked outlets
were automatically included in the next year’s test
• A high ranking was like a Michelin star. A low ranking was the kiss of
death.
Was the AD Herring Test biased?
The substantive research problem
Before we continue, introducing my co-author
Fengnan Gao, Shanghai
Fengnan Gao, Shanghai
Fengnan Gao, Shanghai (photographed in Venice)
The person who started this all
“I heard something strange about the AD Herring Test. I pulled the data together to see how it all
worked. Things came out of that that I thought, ‘this can't be true’. The test panel had a business
interest in a herring supplier, Atlantic. Coincidence or not, whoever got the
fi
sh from Atlantic,
scored signi
fi
cantly higher. I am proud of the fact that my research has helped put an end to this.”
Ben Vollaard (“BV”), Tilburg
• BV published statistical analyses as two “working papers” on his
university web pages
• Tilburg University put out a press release, both times
• BV appeared on current a
ff
airs chat shows on national TV, the
results were reported in the newspapers
• The AD cancelled the “AD Herring Test”, brought in lawyers,
making complaints to BV and to his university. Aggrieved herring
outlets started legal action against AD.
• The AD’s lawyer hired me …
Answer: PR activities of his university
How did BV’s work reach The Economist?
• Atlantic’s “Dutch New Herring” is objectively the best, according to the well-
known criteria according to which the AD Herring Testers evaluate Dutch new
herring
• Well-cleaned
• Prepared on site
• Neither too “ripe” [matured] nor too “unripe” (cf. venison, whisky, cheese)
• Temperature not too cold, not too warm (cf. hygiene regulations)
• No microbiological contamination
• High fat percentage, high weight (per
fi
sh)
• Atlantic’s Dutch New Herring is also the most expensive! (Euros per gram).
Coincidence?
Coincidence?
No: Atlantic’s Dutch New Herring is the best!
No: Atlantic’s Dutch New Herring is the best!
Coincidence?
• Acknowledgement of con
fl
ict of interest: I was paid by AD to help them
in their complaint of violation of scienti
fi
c integrity against Ben Vollaard
• I claim that I gave them my independent scienti
fi
c opinion on their case
• I argued that econometrician Vollaard was incompetent. It is not for me
to judge his integrity
• The Netherlands national organ for evaluation of complaints of
violation of scienti
fi
c integrity ruled that Vollaard was innocent, but that
the scienti
fi
c issues needed further research
• Hence I continued my research with essential collaboration of
Fengnan Gao (replication of entire research project, including data
extraction from original sources)
Vollaard’s
basic
analysis
15 Estimate Std.Error t value Pr(>|t|)
16
17 Intercept 4.139005 0.727812 5.687 3.31e -08 ***
18
19 weight (grams) 0.039137 0.009726 4.024 7.41e -05 ***
20
21 temp
22 < 7 deg reference category
23 7 -- 10 -0.685962 0.193448 -3.546 0.000460 ***
24 > 10 deg -1.793139 0.223113 -8.037 2.77e -14 ***
25
26 fat
27 < 10% reference category
28 10--14% 0.172845 0.197387 0.876 0.381978
29 > 14% 0.581602 0.250033 2.326 0.020743 *
30
31 fresh 1.817081 0.200335 9.070 < 2e -16 ***
32
33 micro
34 very good reference category
35 adequate -0.161412 0.315593 -0.511 0.609443
36 bad -0.618397 0.448309 -1.379 0.168897
37 warning -0.151143 0.291129 -0.519 0.604067
38 reject -2.279099 0.683553 -3.334 0.000973 ***
39
40 ripeness
41 mild reference category
42 average -0.377860 0.336139 -1.124 0.261947
43 strong -1.930692 0.386549 -4.995 1.05e -06 ***
44 rotten -4.598752 0.503490 -9.134 < 2e -16 ***
45
46 cleaning
47 very good reference category
48 good -0.983911 0.210504 -4.674 4.64e -06 ***
49 poor -1.716668 0.223459 -7.682 2.79e -13 ***
50 bad -2.761112 0.439442 -6.283 1.30e -09 ***
51
52 yr2017 0.208296 0.174740 1.192 0.234279
• Positive e
ff
ect, signi
fi
cant at 5% level, size of e
ff
ect ca. 0.5
• Strangely, he also had a covariate “among top 10”
• It was also rather signi
fi
cant, positive
Add indicator variable “< 30 km from Rotterdam’
Vollaard’s first paper
• He did not add it as a covariate
• Instead, he showed that the according to his
fi
tted model, the
di
ff
erence between “Atlantic” outlets and the rest is mainly due to
the subjective variables “ripeness” and “cleaning”. Two
objective variables “temperature” and “being freshly served”
have some importance, but less; and the other objective
variables are not important at all.
New variable “outlet supplied by Atlantic”
Vollaard’s second paper
Atlantic als leverancier scoren gemiddeld een 9,1. De andere verkooppunten scoren een onvoldoende,
gemiddeld net geen 5,5. We zien weinig verschil in gemiddelde score tussen de overige leveranciers.
Figuur 1. Gemiddeldescorevan verkooppunten op deharingtest 2016/17, naar leverancier
Dit verschil in eindcijfer kan versterkt zijn als in de groep die niet Atlantic als leverancier heeft relatief veel
rotte appels zitten; zaken die echt haring verkopen die niet verkocht zou moeten worden. Deze rotte appels
zouden het gemiddelde sterk omlaag kunnen trekken. Om dit te onderzoeken gebruiken we de
microbiologische gesteldheid van de haring zoals vastgesteld door het laboratorium en vermeld door het
0
1
2
3
4
5
6
7
8
9
10
Atlantic als leverancier Andere leverancier
Score
op
haringtest
0
1
2
3
4
5
6
7
8
9
10
Atlantic als leverancier Andere leverancier
Score
op
haringtest
3,6 punten
verschil
3,3 punten
verschil
Alle verkooppunten Alleen verkooppunten met haring van
voldoende microbiologische gesteldheid
From report 2
n1 + n2 = 292 n1 + n2 = 292 - 37
Tabel 1. Waarom verkooppunten dieniet Atlantic alsleverancier hebben lager scoren op deharingtest
Verklaringsfactor
Verschil tussen verkooppunten met
en zonder Atlantic als leverancier Eenheid
Effect op
eindcijfer (0-10)
Gewicht -2.17 gram -0.09
Microbiologische gesteldheid
(zeer) goed referentiecategorie
Voldoende 0.07 aandeel -0.01
Slecht 0.03 aandeel -0.02
waarschuwingsfase 0.09 aandeel -0.01
Afgekeurd 0.02 aandeel -0.04
Gezamenlijk -0.07
Vetpercentage
beneden 10% referentiecategorie
tussen de 10 en 14% -0.21 aandeel -0.04
boven 14% -0.02 aandeel -0.01
Gezamenlijk -0.05
Temperatuur
beneden 7⁰C referentiecategorie
tussen 7 en 10⁰C 0.29 aandeel -0.17
boven 10⁰C 0.21 aandeel -0.36
Gezamenlijk -0.52
Vers van het mes
niet referentiecategorie
wel -0.29 aandeel -0.51
Schoonmaken
zeer goed referentiecategorie
goed 0.27 aandeel -0.22
matig 0.25 aandeel -0.40
slecht 0.04 aandeel -0.12
gezamenlijk -0.73
Rijping
licht referentiecategorie
gemiddeld -0.24 aandeel 0.09
sterk 0.34 aandeel -0.63
bedorven 0.06 aandeel -0.25
gezamenlijk -0.80
Regio
binnen straal van 30 km rond R’dam referentiecategorie
meer dan 30 km van R’dam 0.60 aandeel -0.18
Top 10
top 10 referentiecategorie
But how good is the model?
−2 0 2 4 6 8
−4
−2
0
2
4
Fitted values
Residuals
lm(finalscore ~ weight + (factor(temp_cat)) + (factor(fat_cat)) + fresh_num ...
Residuals vs Fitted
229
82
116
−3 −2 −1 0 1 2 3
−4
−2
0
2
4
Theoretical Quantiles
Standardized
residuals
lm(finalscore ~ weight + (factor(temp_cat)) + (factor(fat_cat)) + fresh_num ...
Normal Q−Q
229
82
116
−2 0 2 4 6 8
0.0
0.5
1.0
1.5
2.0
Fitted values
Standardized
residuals
lm(finalscore ~ weight + (factor(temp_cat)) + (factor(fat_cat)) + fresh_num ...
Scale−Location
229
82
116
0.00 0.05 0.10 0.15 0.20 0.25 0.30
−4
−2
0
2
4
Leverage
Standardized
residuals
lm(finalscore ~ weight + (factor(temp_cat)) + (factor(fat_cat)) + fresh_num ...
Cook's distance
0.5
Residuals vs Leverage
229
224
206
0 2 4 6 8 10
0
5
10
15
20
25
30
35
Final test score
Number
of
outlets
The model was wrong
• The testers gave a score “zero” if the
fi
sh should not have
been sold
• temperature > 10 deg C
• microbiological contamination = health danger
• maturity = rotten
More dif
fi
culties
• Two years data combined, some? many? outlets in both years’
data
• Misclassi
fi
cation of “Atlantic” supplied outlets (deduced from
inconsistency of summary statistics: one outlet got a “6” in
2016)
• Vollaard tested e
ff
ect of “near to Rotterdam” with a dummy
variable, but e
ff
ect of “supplied by Atlantic” with faulty subject
matter argument (according to him, maturity and microbiology
should have been the same). But: “ripening” is a chemical
process, “contamination” is biological
• Dummy variable for “Atlantic” was not signi
fi
cant!
• Vollaard would not give us the un-anonymised data
• AD gave us their classi
fi
cation of “Atlantic”
• We went back to the original data
Better model
• Score zero = “disquali
fi
ed”
• We study outlets with score >/= 1
• We omit outlets’ 2nd observation if they are tested in both
years
• Result: much better
fi
t, much smaller standard error of
residuals
• Overall picture unchanged
More discoveries
• BV had incorrectly merged the lowest (“green”) and the
highest (“rotten”) category of “ripening”
• BV had mapped scores like “7–” and “7+” to 7. We tried
“7 +/– epsilon” for various choices of epsilon (0.01, 0.1).
Tiny changes had dramatic e
ff
ects on statistical
signi
fi
cance of “distance from Rotterdam” and “Atlantic”
• “Rotterdam” went out, “Atlantic” came in!
• Attempts to improve model (e.g., interaction between
most important variables) failed due to multicolinearity
Conclusions for
Dutch new herring
• The data was not a random sample. It was self-selected.
Most participants were new participants and often came
from areas where the the AD herring test was little known
• These newcomers often performed poorly
• There is absolutely no evidence of bias toward Atlantic
outlets
• There is little evidence for a regional bias
• “Atlantic” is the best!
Take home messages
• In linear regression analysis where multicollinearity is present,
regression estimates are highly sensitive to small perturbations in
model speci
fi
cations
• Too much applied statistics applies a conventional method-of-choice
to a data-set without any thought about the data-generation
mechanism
• What I learned from ABGK: doing science is fun and it’s a social activity
• Scandinavian statistics has found the right balance between theory and
application
• Thanks Per (and also Niels and Ørnulf!)
Delft
Leiden
Apeldoorn
Schiphol
Rotterdam
Utrecht
+ Scheveningen
Vollaard’s anonymised & combined data set
AD Herring Test websites 2026, 2017
AD identification of Vollaard’s vendors
tidyverse
dplyr
ggplot2
cbsodataR
ggmap
year
2016
2017
54.59815
148.41316
403.42879
1096.63316
2980.95799
pop. density
by municipality
Venders of 2016 and 2017, Dutch New Herring
54.59815
148.41316
403.42879
1096.63316
2980.95799
pop. density
by municipality
year
2016
2017
Venders of 2016 and 2017, Dutch New Herring
54.59815
148.41316
403.42879
1096.63316
2980.95799
pop. density
by municipality
score^2
0
25
50
75
100
Venders of 2016 and 2017, Dutch New Herring
54.59815
148.41316
403.42879
1096.63316
2980.95799
pop. density
by municipality
(11 − score)^2
30
60
90
120
Venders of 2016 and 2017, Dutch New Herring
year
2016
2017
54.59815
148.41316
403.42879
1096.63316
2980.95799
pop. density
by municipality
Venders of 2016 and 2017, Dutch New Herring
distance
semivariance
2
4
6
8
10
0.5 1.0
51.0
51.5
52.0
52.5
53.0
4 5 6 7
x
y
https://rpubs.com/nabilabd/118172
Introduction to Kriging in R
Nabil A.
October 14, 2015
Variogram (empirical, smooth); result of kriging
fi
tted model

More Related Content

More from Richard Gill

Breed, BOAS, CFR.pdf
Breed, BOAS, CFR.pdfBreed, BOAS, CFR.pdf
Breed, BOAS, CFR.pdfRichard Gill
 
Bell mini conference RDG.pptx
Bell mini conference RDG.pptxBell mini conference RDG.pptx
Bell mini conference RDG.pptxRichard Gill
 
Schrödinger’s cat meets Occam’s razor
Schrödinger’s cat meets Occam’s razorSchrödinger’s cat meets Occam’s razor
Schrödinger’s cat meets Occam’s razorRichard Gill
 
optimizedBell.pptx
optimizedBell.pptxoptimizedBell.pptx
optimizedBell.pptxRichard Gill
 
optimizedBell.pptx
optimizedBell.pptxoptimizedBell.pptx
optimizedBell.pptxRichard Gill
 
Gull talk London.pdf
Gull talk London.pdfGull talk London.pdf
Gull talk London.pdfRichard Gill
 
Gull talk London.pdf
Gull talk London.pdfGull talk London.pdf
Gull talk London.pdfRichard Gill
 
Statistical issues in Serial Killer Nurse cases
Statistical issues in Serial Killer Nurse casesStatistical issues in Serial Killer Nurse cases
Statistical issues in Serial Killer Nurse casesRichard Gill
 
CHANCE Lucia JSM 2021
CHANCE Lucia JSM 2021CHANCE Lucia JSM 2021
CHANCE Lucia JSM 2021Richard Gill
 
Intro Richard Gill
Intro Richard GillIntro Richard Gill
Intro Richard GillRichard Gill
 

More from Richard Gill (20)

vaxjo2023rdg.pdf
vaxjo2023rdg.pdfvaxjo2023rdg.pdf
vaxjo2023rdg.pdf
 
Apeldoorn.pdf
Apeldoorn.pdfApeldoorn.pdf
Apeldoorn.pdf
 
LundTalk2.pdf
LundTalk2.pdfLundTalk2.pdf
LundTalk2.pdf
 
LundTalk.pdf
LundTalk.pdfLundTalk.pdf
LundTalk.pdf
 
Breed, BOAS, CFR.pdf
Breed, BOAS, CFR.pdfBreed, BOAS, CFR.pdf
Breed, BOAS, CFR.pdf
 
Bell mini conference RDG.pptx
Bell mini conference RDG.pptxBell mini conference RDG.pptx
Bell mini conference RDG.pptx
 
Nobel.pdf
Nobel.pdfNobel.pdf
Nobel.pdf
 
Nobel.pdf
Nobel.pdfNobel.pdf
Nobel.pdf
 
Schrödinger’s cat meets Occam’s razor
Schrödinger’s cat meets Occam’s razorSchrödinger’s cat meets Occam’s razor
Schrödinger’s cat meets Occam’s razor
 
optimizedBell.pptx
optimizedBell.pptxoptimizedBell.pptx
optimizedBell.pptx
 
optimizedBell.pptx
optimizedBell.pptxoptimizedBell.pptx
optimizedBell.pptx
 
Gull talk London.pdf
Gull talk London.pdfGull talk London.pdf
Gull talk London.pdf
 
Gull talk London.pdf
Gull talk London.pdfGull talk London.pdf
Gull talk London.pdf
 
Statistical issues in Serial Killer Nurse cases
Statistical issues in Serial Killer Nurse casesStatistical issues in Serial Killer Nurse cases
Statistical issues in Serial Killer Nurse cases
 
Dalembert
DalembertDalembert
Dalembert
 
Bell in paris
Bell in parisBell in paris
Bell in paris
 
CHANCE Lucia JSM 2021
CHANCE Lucia JSM 2021CHANCE Lucia JSM 2021
CHANCE Lucia JSM 2021
 
BenGeen long talk
BenGeen long talkBenGeen long talk
BenGeen long talk
 
Intro Richard Gill
Intro Richard GillIntro Richard Gill
Intro Richard Gill
 
Ben Geen Talk
Ben Geen TalkBen Geen Talk
Ben Geen Talk
 

Recently uploaded

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
The Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravityThe Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravitySubhadipsau21168
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett SquareIsiahStephanRadaza
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.k64182334
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 

Recently uploaded (20)

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
The Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravityThe Black hole shadow in Modified Gravity
The Black hole shadow in Modified Gravity
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett Square
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 

herring_copenhagen.pdf

  • 1. Richard Gill, Mathematical Institute, Leiden University. 16 November, 2022 The Dutch new herring scandals Repeated measurements with unintended feedback
  • 2.
  • 3. At some point in the latter half of the 14th century Willem Beukelszoon began to go about things differently to other fi shermen or fi sh curers. He is credited with discovering something called kaken, or “gibbing”: a method of gutting and deboning herring that left parts of its stomach and another of its internal organs, the pyloric caecae, intact. Removing the guts and bones took away the bits that would begin to rot fi rst, whilst the remaining pyloric caecae would continue to emit an enzyme called trypsin. The herring would then be thrown into brine and basically, pickle in its own juices.
  • 4. Smell them and weep Netherlands fi shmongers accuse herring-tasters of erring Can the Dutch still trust their herring-tasters? Print edition | Europe Nov 23rd 2017 | AMSTERDAM HERRING (genus Clupea, with four species found in the Baltic and North Seas) have been vital to northern Europe’s economy since the Middle Ages, when fi shermen worked out how to preserve them in brine. Every north European country maintains that there is a right way to eat the fi sh, but they differ as to what it is. In Sweden Baltic surströmming are fermented until slightly rancid. In Denmark the sill are pickled, or cooked and eaten in long strips. In the Netherlands haring must be lightly salted for preservation but otherwise raw, dipped in minced onion and accompanied with a pickle. No food is more loved. So the Dutch were shocked when accusations surfaced in November that there was something rotten about the national herring test. The test, sponsored by the Algemeen Dagblad, a newspaper, is carried out by two expert tasters, who each year rate the herring at over a hundred shops and stands across the country. Ben Vollaard, an economist at Tilburg University, was surprised when his respected local fi shmonger scored zero. The merchant told Mr Vollaard that one judge routinely tipped the scales, giving higher scores to stores that get their fi sh from the Atlantic Group, a distributor in Scheveningen. The judge happened to be a consultant for Atlantic, giving courses on how to slice and serve herring. “I saw how much damage a low rating could do. The judges act like God,” says Mr Vollaard, who specialises in using statistics to detect crime. He decided to run the numbers. The ratings include objective criteria, like weight and fattiness, and subjective ones such as taste and appearance. The economist contacted 85% of the shops surveyed in the past two years and asked who their distributors were. He found that whereas the overall average score was 5.5, the average for those supplied by Atlantic was 8.7. The extra boost for the Atlantic stores came mainly from the subjective scores. Mr Vollaard’s study has blown the lid off the sealed world of Dutch herring. Fishmongers who long suspected the judge of bias towards Atlantic now say the test is rotten. Two who received low ratings have vowed to sue the Algemeen Dagblad for defamation. The judge and Atlantic say they have been smeared, and that the statistical evidence is a red herring. They say Mr Vollaard’s fi gures are off, and that their high scores are due to their superior fi sh. But the charges of belangenverstrengeling (con fl ict of interest) have left the test’s reputation for impartiality gutted. This article appeared in the Europe section of the print edition under the headline "Failing the smell test" Print edition | Europe Nov 23rd 2017 | AMSTERDAM
  • 5. • Celebrating many years collaboration with Per Kragh Andersen • and Ørnulf Borgan, whose retirement party I had to miss (pandemic) • In loving memory of the leader of the pack, Niels Keiding • May the royalty checks for “the weightlifter’s guide to counting processes” (ABGK) continue to come in, for many years to come! ABGK for ever!
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11. • Pitfalls of amateur regression: The Dutch New Herring controversies • Fengnan Gao & Richard D. Gill • https://arxiv.org/abs/2104.00333 • Submitted to SJS • Previously rejected by Statistica Neerlandica, Life Data Analysis, and JRSS(A) • SJS: “The journal specializes in statistical modelling showing particular appreciation of the underlying substantive research problems” • “The emergence of specialized methods for analysing longitudinal and spatial data is just one example of an area of important methodological development in which the Scandinavian Journal of Statistics has a particular niche" Longitudinal and spatial data with enormous societal impact Dutch New Herring
  • 12. • AD: Algemene Dagblad, a popular Dutch daily newspaper, based in Rotterdam • The Herring Test: a yearly ranking of consumer outlets of “Dutch New Herring” • Dutch New Herring (“Hollandse maatjes”): An EU protected designation for herring prepared according to ancient Dutch tradition • Ordinary consumers (readers of AD) nominated their local favourite food market stalls, supermarkets, fi sh shops; highly ranked outlets were automatically included in the next year’s test • A high ranking was like a Michelin star. A low ranking was the kiss of death. Was the AD Herring Test biased? The substantive research problem
  • 13. Before we continue, introducing my co-author Fengnan Gao, Shanghai
  • 14. Fengnan Gao, Shanghai Fengnan Gao, Shanghai (photographed in Venice)
  • 15. The person who started this all “I heard something strange about the AD Herring Test. I pulled the data together to see how it all worked. Things came out of that that I thought, ‘this can't be true’. The test panel had a business interest in a herring supplier, Atlantic. Coincidence or not, whoever got the fi sh from Atlantic, scored signi fi cantly higher. I am proud of the fact that my research has helped put an end to this.” Ben Vollaard (“BV”), Tilburg
  • 16. • BV published statistical analyses as two “working papers” on his university web pages • Tilburg University put out a press release, both times • BV appeared on current a ff airs chat shows on national TV, the results were reported in the newspapers • The AD cancelled the “AD Herring Test”, brought in lawyers, making complaints to BV and to his university. Aggrieved herring outlets started legal action against AD. • The AD’s lawyer hired me … Answer: PR activities of his university How did BV’s work reach The Economist?
  • 17. • Atlantic’s “Dutch New Herring” is objectively the best, according to the well- known criteria according to which the AD Herring Testers evaluate Dutch new herring • Well-cleaned • Prepared on site • Neither too “ripe” [matured] nor too “unripe” (cf. venison, whisky, cheese) • Temperature not too cold, not too warm (cf. hygiene regulations) • No microbiological contamination • High fat percentage, high weight (per fi sh) • Atlantic’s Dutch New Herring is also the most expensive! (Euros per gram). Coincidence? Coincidence? No: Atlantic’s Dutch New Herring is the best!
  • 18. No: Atlantic’s Dutch New Herring is the best! Coincidence? • Acknowledgement of con fl ict of interest: I was paid by AD to help them in their complaint of violation of scienti fi c integrity against Ben Vollaard • I claim that I gave them my independent scienti fi c opinion on their case • I argued that econometrician Vollaard was incompetent. It is not for me to judge his integrity • The Netherlands national organ for evaluation of complaints of violation of scienti fi c integrity ruled that Vollaard was innocent, but that the scienti fi c issues needed further research • Hence I continued my research with essential collaboration of Fengnan Gao (replication of entire research project, including data extraction from original sources)
  • 19. Vollaard’s basic analysis 15 Estimate Std.Error t value Pr(>|t|) 16 17 Intercept 4.139005 0.727812 5.687 3.31e -08 *** 18 19 weight (grams) 0.039137 0.009726 4.024 7.41e -05 *** 20 21 temp 22 < 7 deg reference category 23 7 -- 10 -0.685962 0.193448 -3.546 0.000460 *** 24 > 10 deg -1.793139 0.223113 -8.037 2.77e -14 *** 25 26 fat 27 < 10% reference category 28 10--14% 0.172845 0.197387 0.876 0.381978 29 > 14% 0.581602 0.250033 2.326 0.020743 * 30 31 fresh 1.817081 0.200335 9.070 < 2e -16 *** 32 33 micro 34 very good reference category 35 adequate -0.161412 0.315593 -0.511 0.609443 36 bad -0.618397 0.448309 -1.379 0.168897 37 warning -0.151143 0.291129 -0.519 0.604067 38 reject -2.279099 0.683553 -3.334 0.000973 *** 39 40 ripeness 41 mild reference category 42 average -0.377860 0.336139 -1.124 0.261947 43 strong -1.930692 0.386549 -4.995 1.05e -06 *** 44 rotten -4.598752 0.503490 -9.134 < 2e -16 *** 45 46 cleaning 47 very good reference category 48 good -0.983911 0.210504 -4.674 4.64e -06 *** 49 poor -1.716668 0.223459 -7.682 2.79e -13 *** 50 bad -2.761112 0.439442 -6.283 1.30e -09 *** 51 52 yr2017 0.208296 0.174740 1.192 0.234279
  • 20. • Positive e ff ect, signi fi cant at 5% level, size of e ff ect ca. 0.5 • Strangely, he also had a covariate “among top 10” • It was also rather signi fi cant, positive Add indicator variable “< 30 km from Rotterdam’ Vollaard’s first paper
  • 21. • He did not add it as a covariate • Instead, he showed that the according to his fi tted model, the di ff erence between “Atlantic” outlets and the rest is mainly due to the subjective variables “ripeness” and “cleaning”. Two objective variables “temperature” and “being freshly served” have some importance, but less; and the other objective variables are not important at all. New variable “outlet supplied by Atlantic” Vollaard’s second paper
  • 22. Atlantic als leverancier scoren gemiddeld een 9,1. De andere verkooppunten scoren een onvoldoende, gemiddeld net geen 5,5. We zien weinig verschil in gemiddelde score tussen de overige leveranciers. Figuur 1. Gemiddeldescorevan verkooppunten op deharingtest 2016/17, naar leverancier Dit verschil in eindcijfer kan versterkt zijn als in de groep die niet Atlantic als leverancier heeft relatief veel rotte appels zitten; zaken die echt haring verkopen die niet verkocht zou moeten worden. Deze rotte appels zouden het gemiddelde sterk omlaag kunnen trekken. Om dit te onderzoeken gebruiken we de microbiologische gesteldheid van de haring zoals vastgesteld door het laboratorium en vermeld door het 0 1 2 3 4 5 6 7 8 9 10 Atlantic als leverancier Andere leverancier Score op haringtest 0 1 2 3 4 5 6 7 8 9 10 Atlantic als leverancier Andere leverancier Score op haringtest 3,6 punten verschil 3,3 punten verschil Alle verkooppunten Alleen verkooppunten met haring van voldoende microbiologische gesteldheid From report 2 n1 + n2 = 292 n1 + n2 = 292 - 37
  • 23. Tabel 1. Waarom verkooppunten dieniet Atlantic alsleverancier hebben lager scoren op deharingtest Verklaringsfactor Verschil tussen verkooppunten met en zonder Atlantic als leverancier Eenheid Effect op eindcijfer (0-10) Gewicht -2.17 gram -0.09 Microbiologische gesteldheid (zeer) goed referentiecategorie Voldoende 0.07 aandeel -0.01 Slecht 0.03 aandeel -0.02 waarschuwingsfase 0.09 aandeel -0.01 Afgekeurd 0.02 aandeel -0.04 Gezamenlijk -0.07 Vetpercentage beneden 10% referentiecategorie tussen de 10 en 14% -0.21 aandeel -0.04 boven 14% -0.02 aandeel -0.01 Gezamenlijk -0.05 Temperatuur beneden 7⁰C referentiecategorie tussen 7 en 10⁰C 0.29 aandeel -0.17 boven 10⁰C 0.21 aandeel -0.36 Gezamenlijk -0.52 Vers van het mes niet referentiecategorie wel -0.29 aandeel -0.51 Schoonmaken zeer goed referentiecategorie goed 0.27 aandeel -0.22 matig 0.25 aandeel -0.40 slecht 0.04 aandeel -0.12 gezamenlijk -0.73 Rijping licht referentiecategorie gemiddeld -0.24 aandeel 0.09 sterk 0.34 aandeel -0.63 bedorven 0.06 aandeel -0.25 gezamenlijk -0.80 Regio binnen straal van 30 km rond R’dam referentiecategorie meer dan 30 km van R’dam 0.60 aandeel -0.18 Top 10 top 10 referentiecategorie
  • 24. But how good is the model? −2 0 2 4 6 8 −4 −2 0 2 4 Fitted values Residuals lm(finalscore ~ weight + (factor(temp_cat)) + (factor(fat_cat)) + fresh_num ... Residuals vs Fitted 229 82 116 −3 −2 −1 0 1 2 3 −4 −2 0 2 4 Theoretical Quantiles Standardized residuals lm(finalscore ~ weight + (factor(temp_cat)) + (factor(fat_cat)) + fresh_num ... Normal Q−Q 229 82 116 −2 0 2 4 6 8 0.0 0.5 1.0 1.5 2.0 Fitted values Standardized residuals lm(finalscore ~ weight + (factor(temp_cat)) + (factor(fat_cat)) + fresh_num ... Scale−Location 229 82 116 0.00 0.05 0.10 0.15 0.20 0.25 0.30 −4 −2 0 2 4 Leverage Standardized residuals lm(finalscore ~ weight + (factor(temp_cat)) + (factor(fat_cat)) + fresh_num ... Cook's distance 0.5 Residuals vs Leverage 229 224 206
  • 25. 0 2 4 6 8 10 0 5 10 15 20 25 30 35 Final test score Number of outlets
  • 26. The model was wrong • The testers gave a score “zero” if the fi sh should not have been sold • temperature > 10 deg C • microbiological contamination = health danger • maturity = rotten
  • 27. More dif fi culties • Two years data combined, some? many? outlets in both years’ data • Misclassi fi cation of “Atlantic” supplied outlets (deduced from inconsistency of summary statistics: one outlet got a “6” in 2016) • Vollaard tested e ff ect of “near to Rotterdam” with a dummy variable, but e ff ect of “supplied by Atlantic” with faulty subject matter argument (according to him, maturity and microbiology should have been the same). But: “ripening” is a chemical process, “contamination” is biological • Dummy variable for “Atlantic” was not signi fi cant!
  • 28. • Vollaard would not give us the un-anonymised data • AD gave us their classi fi cation of “Atlantic” • We went back to the original data
  • 29. Better model • Score zero = “disquali fi ed” • We study outlets with score >/= 1 • We omit outlets’ 2nd observation if they are tested in both years • Result: much better fi t, much smaller standard error of residuals • Overall picture unchanged
  • 30. More discoveries • BV had incorrectly merged the lowest (“green”) and the highest (“rotten”) category of “ripening” • BV had mapped scores like “7–” and “7+” to 7. We tried “7 +/– epsilon” for various choices of epsilon (0.01, 0.1). Tiny changes had dramatic e ff ects on statistical signi fi cance of “distance from Rotterdam” and “Atlantic” • “Rotterdam” went out, “Atlantic” came in! • Attempts to improve model (e.g., interaction between most important variables) failed due to multicolinearity
  • 31. Conclusions for Dutch new herring • The data was not a random sample. It was self-selected. Most participants were new participants and often came from areas where the the AD herring test was little known • These newcomers often performed poorly • There is absolutely no evidence of bias toward Atlantic outlets • There is little evidence for a regional bias • “Atlantic” is the best!
  • 32. Take home messages • In linear regression analysis where multicollinearity is present, regression estimates are highly sensitive to small perturbations in model speci fi cations • Too much applied statistics applies a conventional method-of-choice to a data-set without any thought about the data-generation mechanism • What I learned from ABGK: doing science is fun and it’s a social activity • Scandinavian statistics has found the right balance between theory and application • Thanks Per (and also Niels and Ørnulf!)
  • 34.
  • 35.
  • 36.
  • 37. Vollaard’s anonymised & combined data set AD Herring Test websites 2026, 2017 AD identification of Vollaard’s vendors tidyverse dplyr ggplot2 cbsodataR ggmap year 2016 2017 54.59815 148.41316 403.42879 1096.63316 2980.95799 pop. density by municipality Venders of 2016 and 2017, Dutch New Herring
  • 40. 54.59815 148.41316 403.42879 1096.63316 2980.95799 pop. density by municipality (11 − score)^2 30 60 90 120 Venders of 2016 and 2017, Dutch New Herring
  • 42. distance semivariance 2 4 6 8 10 0.5 1.0 51.0 51.5 52.0 52.5 53.0 4 5 6 7 x y https://rpubs.com/nabilabd/118172 Introduction to Kriging in R Nabil A. October 14, 2015 Variogram (empirical, smooth); result of kriging fi tted model