R E V I E W
Validation of the MDS Clinical Diagnostic Criteria
for Parkinson’s Disease
Ronald B. Postuma, MD, MSc,1* Werner Poewe, MD,2
Irene Litvan, MD,3
Simon Lewis, MD,4
Anthony E. Lang, OC, MD, FRCPC,5
Glenda Halliday, PhD,4
Christopher G. Goetz, MD,6
Piu Chan, MD, PhD,7
Elizabeth Slow, MD FRCPC,5
Klaus Seppi, MD,2
Eva Schaffer, MD,8
Silvia Rios-Romenets, MD,1
Taomian Mi, MD PhD,7
Corina Maetzler, PhD,8
Yuan Li, MD PhD,7
Beatrice Heim, MD,2
Ian O. Bledsoe, MD9
and Daniela Berg, MD8*
1
Department of Neurology, Montreal General Hospital, Montreal, Quebec, Canada
2
Department of Neurology, Innsbruck Medical University, Innsbruck, Austria
3
Department of Neurosciences, UC San Diego, La Jolla, California, USA
4
Brain and Mind Centre, Sydney Medical School, Camperdown, Australia
5
Morton and Gloria Shulman Movement Disorders Clinic and the Edmond J. Safra Program in Parkinson’s Disease, Toronto Western Hospital,
Toronto, Ontario, Canada
6
Rush University Medical Center, Chicago, Illinois, USA
7
Department of Neurobiology and Neurology, Xuanwu Hospital of Capital Medical University, Beijing, People’s Republic of China
8
Klinik für Neurologie, UKSH, Campus Kiel, Christian-Albrechts-Universität, Kiel, Germany
9
University of California, San Francisco, CA, USA
ABSTRACT: Background: In 2015, the International
Parkinson and Movement Disorder Society published
clinical diagnostic criteria for Parkinson’s disease. These
criteria aimed to codify/reproduce the expert clinical
diagnostic process and to help standardize diagnosis in
research and clinical settings. Their accuracy compared
with expert clinical diagnosis has not been tested. The
objectives of this study were to validate the International
Parkinson and Movement Disorder Society diagnostic
criteria against a gold standard of expert clinical diagno-
sis, and to compare concordance/accuracy of the Inter-
national Parkinson and Movement Disorder Society
criteria to 1988 United Kingdom Brain Bank criteria.
Methods: From 8 centers, we recruited 626 parkinsonism
patients (434 PD, 192 non-PD). An expert neurologist
diagnosed each patient as having PD or non-PD, regard-
less of International Parkinson and Movement Disorder
Society criteria (gold standard, clinical diagnosis). Then a
second neurologist evaluated the presence/absence of
each individual item from the International Parkinson and
Movement Disorder Society criteria. The overall accu-
racy/concordance rate, sensitivity, and specificity of the
International Parkinson and Movement Disorder Society
criteria compared with the expert gold standard were
calculated.
Results: Of 434 patients diagnosed with PD, 94.5% met the
International Parkinson and Movement Disorder Society cri-
teria for probable PD (5.5% false-negative rate). Of 192 non-
PD patients, 88.5% were identified as non-PD by the criteria
(11.5% false-positive rate). The overall accuracy for probable
PD was 92.6%. In addition, 59.3% of PD patients and only
1.6% of non-PD patients met the International Parkinson and
Movement Disorder Society criteria for clinically established
PD. In comparison, United Kingdom Brain Bank criteria had
lower sensitivity (89.2%, P = 0.008), specificity (79.2%,
P = 0.018), and overall accuracy (86.4%, P < 0.001). Diag-
nostic accuracy did not differ according to age or sex. Speci-
ficity improved as disease duration increased.
Conclusions: The International Parkinson and Movement
Disorder Society criteria demonstrated high sensitivity
and specificity compared with the gold standard, expert
diagnosis, with sensitivity and specificity both higher
than United Kingdom Brain Bank criteria. © 2018 Interna-
tional Parkinson and Movement Disorder Society
Key Words: Parkinson’s disease; diagnosis; criteria
---------------------------------------------------------*Correspondence to: Dr. Daniela Berg, Klinik für Neurologie, UKSH,
Campus Kiel, Christian-Albrechts-Universität, Kiel, Germany; daniela.
berg@uksh.de, or Dr. Ronald B. Postuma, Department of Neurology,
L7-305 Montreal General Hospital, 1650 Cedar Avenue, Montreal,
Canada H3G1A4; ron.postuma@mcgill.ca
Relevant conflicts of interest/financial disclosures: The authors have
no financial disclosures in relationship to this article.
Funding agencies: This study was funded by the Michael J Fox
Foundation. The study design was discussed with members of the
Michael J. Fox Foundation.
Received: 27 October 2017; Revised: 22 December 2017; Accepted: 5
February 2018
Published online 00 Month 2018 in Wiley Online Library
(wileyonlinelibrary.com). DOI: 10.1002/mds.27362
Movement Disorders, Vol. 00, No. 00, 2018 1
In 2015, a task force from the International Parkin-
son and Movement Disorder Society (MDS) published
“Clinical Diagnostic Criteria for Parkinson’s disease”
(MDS-PD-CDC).1–3
The goal of the criteria was to help
to standardize clinical diagnosis, both for research (eg,
clinical trials) and for clinical practice. In designing the
criteria, the task force members noted that no reliable
objective test for PD is currently available. Therefore,
expert opinion remains the gold standard for diagnosis
during life. Accordingly, the clinical criteria were
designed to mimic and codify the diagnostic process of
an expert clinician.
Although data from published studies were used in
the design of the criteria, the specific decisions about
which criteria should be included, how they should be
defined, how they should be weighted, and so forth
were done in an iterative process primarily through
expert opinion. Therefore, these criteria require valida-
tion. In this multicenter study, we determined to what
degree the MDS clinical criteria accurately reproduced
the gold standard, expert clinical diagnosis. We also
compared concordance/accuracy of the MDS criteria
and the 1988 UK Brain Bank criteria.4
Methods
PD Criteria Description
The MDS PD criteria include a 2-step process for
diagnosis.1
First, parkinsonism is identified, which is
defined as bradykinesia with fatiguing, plus one of
rigidity or rest tremor. Second, PD is determined as the
cause of the parkinsonism. The second step uses 3 clas-
ses of diagnostic features: supportive criteria (features
typical for PD, which are seen in many but not all
patients), absolute exclusions (features that rule out
diagnosis of probable PD), and red flags (features that
argue against PD, but are not sufficient to rule out
probable PD). Red flags can be balanced by supportive
criteria, such that documentation of a supportive crite-
rion can offset a red flag.
Two levels of certainty were prespecified. The pri-
mary one is probable PD, with the goal to balance sen-
sitivity and specificity, such that 80% of PD patients
would meet criteria (sensitivity ≥ 80%), and 80% of
non-PD patients would be excluded by the criteria
(specificity ≥ 80%). An additional category of clinically
established PD was created to maximize specificity dur-
ing life (ie, specificity ≥ 90%), without regard to
sensitivity.
All movement disorders specialists who contributed
to the MDS clinical criteria program1
were eligible to
participate in the validation study. Prerequisites for par-
ticipation were availability of an expert in the clinical
diagnosis of PD, a second neurologist to administer the
diagnostic criteria checklist, access to a sufficient number
of individuals with parkinsonism, and ethics approval.
Patients
All patients with a diagnosis of parkinsonism1
were
eligible for this study. Exclusion criteria were mini-
mized to enhance generalizability and representative-
ness. Centers were instructed to recruit the entire range
of parkinsonism in clinics, with a PD:non-PD ratio of
approximately 2:1 (convenience sampling allowed).
Centers were instructed to not exclude patients for
whom diagnosis was uncertain. Exclusion criteria
included:
1. Patients without documented parkinsonism (or with
a single isolated cardinal sign, such as isolated rest
tremor): although nonparkinsonian conditions such
as dystonia or essential tremor can occasionally be
incorrectly diagnosed as PD, the primary focus of this
validation was the differential diagnosis of parkin-
sonism (PD vs non-PD).
2. Patients with dementia severe enough to possibly pre-
clude appropriate informed consent (here defined as
MMSE < 22): patients with mild-moderate dementia
were included if they had a spouse or caregiver present
to ensure the patient’s interests were protected and to
prevent inaccurate data collection. Note that according
to the task force definition, the diagnosis of dementia
with Lewy bodies does not exclude potential PD
diagnosis.3,5
Each center’s local research ethics board approved
the study protocol, and informed written consent was
obtained from all patients.
For each patient, there were 2 study evaluations con-
ducted by separate investigators. Investigators did not
share their information or conclusions:
1. Gold-standard diagnostic evaluation: This determi-
nation was conducted by a neurologist with
>10 years’ experience in PD diagnosis who was gen-
erally the patient’s treating physician. All evaluators
were familiar with the definition of PD as outlined
by the task force.3
This evaluator made the gold-
standard diagnosis using all relevant clinical infor-
mation available over the entire disease course,
including any ancillary tests that were available
(without using the MDS criteria). In addition to
making the primary diagnosis, the gold-standard
evaluators also estimated their percent certainty of
PD versus non-PD status.
2. Item criteria evaluation: This assessment was con-
ducted by a second neurologist (generally a move-
ment disorders fellow or junior faculty) who was
trained in the application of the criteria. All criteria
from the MDS-PD-CDC and from the UK Brain
Bank4
were assessed. These evaluators conducted
2 Movement Disorders, Vol. 00, No. 00, 2018
P O S T U M A E T A L
the assessments without required reference to the
expert neurologist’s diagnosis and were not asked to
provide their diagnostic opinion; rather, they were
simply instructed to evaluate whether each individ-
ual criterion (from a checklist) was met. The evalua-
tor used both interview and in-person examination
to document criteria. In addition, chart review could
be performed to document features not available on
current history and examination. Based on the
review, each individual criterion was scored by the
rater as present or absent. The approximate time
taken to complete the checklist was 15 minutes for
each scale.
As olfactory testing is included as a supportive crite-
rion in the MDS-PD criteria, it was systematically tested
in this study (10 patients also had metaiodo-
benzylguanidine [MIBG] scintigraphy results available).
Centers used either the 12-item Brief Smell-
Identification test version A (BSIT) 6,7
or the Sniffin
Sticks.8
Age-standardized normal values were taken
from the published literature.6,8
Abnormal olfaction
was defined as a score below the 10th percentile for
age. The Sniffin Sticks combine all patients > 55 years
together in 1 category; therefore, for consistency, the
same standardization was performed for the BSIT (ie,
cutoff of 6). On secondary analysis, a second adjust-
ment subdivided the >55 age group on BSIT. Blood
pressure was measured in the supine position and then
reassessed standing after 3 minutes.
Data Analysis
The primary outcome was the accuracy/concordance
of MDS criteria according to the gold standard, clinical
diagnosis. Sensitivity and specificity, as well as kappa
agreement, were calculated for each measure. Compari-
sons between MDS and UK Brain Bank criteria were
tested with the Fisher exact test. The prevalence of each
individual criterion was assessed in the PD and non-PD
groups. Prespecified secondary analyses included analy-
sis of those with high (>80%) versus low (≤80%) cer-
tainty of PD and those with short (<5 years) versus
longer (≥5 years) disease duration. Additional second-
ary analyses assessed the effect of age and sex, and ana-
lyses testing different permutations of criteria were
performed (see Results section). Statistical tests were
performed with SPSS software, version 22.
Results
Patient Characteristics
Overall, we recruited 626 patients: 434 with PD and
192 with non-PD parkinsonism (Table 1). Age and sex
were similar between groups. As expected, disease
duration was shorter in the non-PD group (2.8 vs
6.3 years). In the non-PD group, the commonest condi-
tions were multiple system atrophy (38%), progressive
supranuclear palsy (33%), and corticobasal syndrome
(10%); see Supplemental Table 1.
Overall Performance of MDS Criteria
Of the 434 patients diagnosed with PD by the gold-
standard expert, 94.5% met MDS criteria for probable
PD (ie, 5.5% false-negative rate). Of the non-PD
patients, 88.5% were identified as non-PD by the cri-
teria (ie, 11.5% false-positive rate). Combining all
patients, the overall accuracy for probable PD was
92.6% compared with the gold standard, expert clinical
diagnosis. In addition, 59.3% of PD patients met MDS
criteria for clinically established PD. Only 1.6% of
non-PD patients met clinically established criteria.
In comparison, the UK Brain Bank had both lower
sensitivity and specificity. 89.4% of PD patients met
UK Brain Bank criteria for PD (10.6% false-negatives,
P = 0.008 to MDS criteria), and 79.2% of non-PD
patients were identified as non-PD by UK Brain Bank
criteria (20.8% false-positives, P = 0.018). Overall
accuracy was 86.4%, significantly lower than the MDS
rate (P < 0.001). Removing the dementia exclusion
from the UK Brain Bank criteria (to account for defini-
tion changes3,5
) made no difference to the results (false-
negatives, 11.4%; false-positives, 21.4%). A high pro-
portion of PD patients (83.9%) met UK Brain Bank cri-
teria for definite PD, with a false-positive rate of 8.9%
TABLE 1. Patient characteristics and overall criteria performance
PD (n = 434) non-PD parkinsonism (n = 192)
Age 66.6 ± 9.7 68.0 ± 9.9
Sex (% female) 64 60
Disease duration from symptom onset (years) 7.5 ± 6.0 4.3 ± 2.9
Disease duration from diagnosis (years) 6.3 ± 5.8 2.8 ± 2.4
Meets MDS probable criteria (%) 94.5 11.5
Meets UK Brain Bank probable (%) 89.4 20.8
Meets MDS clinically established (%) 60.4 1.6
Meets UK Brain Bank definite (%) 83.9 8.9
Continuous variables are presented as mean ± standard deviation.
PD, Parkinson’s disease; MDS, International Parkinson and Movement Disorder Society; UK, United Kingdom.
Movement Disorders, Vol. 00, No. 00, 2018 3
V A L I D A T I O N O F M D S C R I T E R I A
(P = 0.002 compared with MDS clinically established
criteria). Kappa agreement for the MDS probable PD
criteria was 0.828 (scored as “excellent” according to
Fleiss criteria9
), and 0.680 (“good”9
) for UK Brain
Bank criteria.
Analysis of Individual MDS Criteria
Supportive Criteria
Overall, PD patients had an average 2.1 ± 1.1 support-
ive criteria, significantly more than the non-PD group
(0.7 ± 0.5 supportive criteria; P < 0.001; Table 2).
Supportive criteria were originally designed to have
benchmark specificity ≥ 80%. Three of 4 criteria met this
benchmark; dyskinesia occurred in 7% of non-PD
patients, rest tremor in 13.6%, and excellent levodopa
response in 19.9%. However, although positive ancillary
testing differed between groups (P < 0.001) specificity of
the ancillary testing criterion was low, at 61.7%. Thirty-
six percent of the non-PD population had abnormal
olfaction, and 44.4% (n = 9) had abnormal MIBG scin-
tigraphy. Adjusting the olfactory cutoff to a stricter cutoff
(ie, < 10% for all age groups > 55 years on BSIT)
improved specificity only slightly (ie, 68.1%).
Absolute Exclusions
The occurrence of MDS absolute exclusions in the
PD group was low; only 3.2% had any absolute exclu-
sion, compared with 64.1% of the non-PD group
(P < 0.001; Table 3). Absolute exclusions were origi-
nally designed with a benchmark specificity of >95%;
all absolute exclusion criteria met this benchmark, with
the highest single absolute exclusion (absent levodopa
response) occurring in only 1.3% of PD-diagnosed
cases. In the non-PD group, the prevalence of absolute
exclusions varied considerably, from a low of 1.1%
(frontotemporal dementia) to a high of 33.8% (absent
levodopa response).
Red Flags
Red flags were included in the criteria to account for
many “atypical” features being seen in >5% of the true
PD population. We found that 19.1% of PD patients
had at least 1 red flag, compared with 84.1% of non-PD
patients (P < 0.001; Table 4). In general, the prevalence
of individual red flags in PD cases was relatively low; the
highest prevalence was severe autonomic dysfunction, at
9.7%. Among the non-PD group, recurrent falls had the
highest prevalence (51.3%). As anticipated, red flags
TABLE 2. Supportive criteria
PD (n = 434) non-PD parkinsonism (n = 192)
Total number supportive criteria 2.14 ± 1.05 0.71 ± 0.80
Excellent levodopa response (%) 73.4 (n = 364) 19.9 (n = 151)
Dyskinesia (%) 34.0 (n = 382) 7.0 (n = 157)
Asymmetric rest tremor (%) 56.5 13.6
Positive test (%) 67.6 38.3
Olfaction (%) 67.4 36.6
MIBG scintigraphy (%) 100 (n = 1) 44.4 (n = 9)
Continuous variables are presented as mean ± standard deviation. N is provided whenever a variable has >3% missing values (eg, diagnostic tests not required
for the criteria).
PD, Parkinson’s disease; MIBG, metaiodobenzylguanidine.
TABLE 3. Absolute exclusion criteria
PD (n = 434) Non-PD parkinsonism (n = 192)
Total number absolute exclusions 0.037 ± 0.22 0.93 ± 0.88
Any absolute exclusion (%) 3.2 64.1
Cerebellar abnormalities (%) 0.2 14.2
Vertical supranuclear gaze palsy (%) 1.2 29.2
Frontotemporal dementia (%) 0.2 1.1
Leg-only parkinsonism (%) 0.5 2.6
Dopamine blocker/depleter (%) 0.5 4.2
Absent levodopa response (%) 1.3 (n = 393) 33.8 (n = 157)
Cortical sensory loss (%) 0 13.1
Normal dopamine imaging (%) 0 (n = 42) 13.0 (n = 23)
Continuous variables are presented as mean ± standard deviation. N is provided whenever a variable has >3% missing values (eg, diagnostic tests not required
for the criteria),
PD, Parkinson’s disease.
4 Movement Disorders, Vol. 00, No. 00, 2018
P O S T U M A E T A L
targeting parkinsonism mimics (eg, essential/dystonic
tremor) were uncommon, as these patients were not
included in this study (nonprogression, 0.6%; absent
nonmotor, 1.6%).
Secondary Analyses
We conducted several sensitivity and secondary
analyses:
1. Center effects: Overall, there was no significant het-
erogeneity between centers in sensitivity for PD diag-
nosis (range between centers, 88%-98%). However,
for the non-PD patient group, there was 1 outlier
center (4.9 × interquartile range; P = 0.008), in
which 6 of 19 non-PD patients (31.6%) met MDS
criteria (range in the other centers, 6-13%). If this
outlier center were removed, specificity improved to
90.8%, with no change in sensitivity (94.4%).
2. Effect of diagnosis: We examined the diagnostic dis-
tribution of the 22 false-positive non-PD cases. We
found no clear differences in the underlying diagnosis
of accurate versus false-positive cases; 10 of 22 false-
positives (45%) had been diagnosed with MSA, and
7 (33%) had PSP (Supplementary Table 1).
3. Disease duration (Supplementary Table 2): Because
diagnosis is generally more difficult with shorter
disease duration and because clinical trials focus
increasingly on this population, we assessed criteria
performance in those with disease of 5 years or more
versus <5 years. Sensitivity was relatively similar
(95.9% for > 5 years, 92.8% for < 5 years). How-
ever, specificity improved considerably with longer
duration; 95.0% for duration ≥ 5 years, versus
86.8% for < 5 years.
4. Stratification to certainty (Supplementary Table 3):
This study included all patients with parkinsonism,
even cases in which the gold standard, diagnosis, was
uncertain. To assess concordance in higher-certainty
cases, we removed patients for whom the certainty
(PD vs not-PD) was quantified as < 80% by the gold-
standard diagnostician. Sensitivity did not differ
according to certainty (94.7% for > 80% vs 93.4%
for ≤ 80%). Moreover, there was only a small change
in specificity (90.4% vs. 87.3%). The sensitivity
advantage of the MDS criteria compared with UK
Brain Bank criteria was more notable in the lower-
certainty cases (93.4% vs 81.3%, P = 0.024), also
with better specificity (87.3% vs 76.3%, P = 0.06).
5. Effect of sex: Accuracy was not significantly differ-
ent in men versus women (sensitivity, 96.1% in
men, 91.6% in women; P = 0.078; specificity,
86.0% in men vs 92.3% in women; P = 0.24).
TABLE 4. Red flags
PD (n = 434)
Non-PD parkinsonism
(n = 192)
Total number red flags 0.24 ± 0.54 2.02 ± 1.56
Any red flag (%) 19.1 84.4
>2 Red flags (%) 0.7 33.3
Red flags > supportive criteria (%) 3.0 66.7
Either > 2 red flags or > supportive (%) 3.0 66.7
1. Rapid progression gait (%) 1.4 22.7
2. Absence of progression > 5 years (%) 1.2 0.6
3. Severe bulbar dysfunction < 5 years
(%)
0 18.0
4. Inspiratory stridor (%) 0.2 9.0
5. Severe autonomic < 5 years (%) 9.7 40.1
Orthostatic hypotension (%) 4.8 21.6
Severe urinary dysfunction (%) 5.3 30.6
6. Recurrent falls < 3 years (%) 3.7 51.3
7. Anterocollis or contractures (%) 2.6 10.2
8. Absent Nonmotor > 5 years (%) 1.6 1.6
Sleep maintenance insomnia (%) 44.5 39.6
Somnolence (%) 32.4 31.3
REM sleep behavior disorder (%) 45.5 32.4
Constipation (%) 61.2 59.3
Urinary urge (%) 48.4 60.6
Symptomatic orthostasis (%) 24.9 38.8
Hyposmia, self-reported (%) 67.1 26.8
Depression (%) 32.2 41.5
Anxiety (%) 37.6 36.1
Visual hallucinations (%) 17.4 6.7
9. Pyramidal tract signs (%) 1.8 25.8
10. Bilateral symmetric parkinsonism (%) 1.9 25.1
Movement Disorders, Vol. 00, No. 00, 2018 5
V A L I D A T I O N O F M D S C R I T E R I A
6. Effect of age: Age is a potential confounder in diagno-
sis, as comorbid conditions and multiple neurodegen-
erative pathologies are more common. We divided
each group according to age above or below the
group median. Sensitivity did not differ (93.4% young
vs 95.4% old), but specificity was lower in older
patients (93.8% vs 83.3%, P = 0.039).
Testing Potential Modifications
1. Although the ancillary testing supportive criterion
did not meet the 80% specificity threshold in our
study, removing this from the diagnostic criteria did
not improve overall accuracy (91.3% without ancil-
lary testing vs 92.7% with testing). Sensitivity
dropped from 94.5% to 91.9%, whereas specificity
rose modestly, from 88.5% to 90.1%.
2. To optimize specificity, we tested the effect of treating
all red flags as absolute exclusions for probable PD
(leaving supportive criteria irrelevant). Doing this,
the false-positive rate lowered from 11.5% to 4.7%
(still higher than the clinically established false-
positive rate of 1.6%). However, sensitivity dropped
below the target 80% threshold (ie, 79.3%). Alterna-
tively, removing all the red flags (ie, using only a sim-
ple criterion with the 8 absolute exclusions) would
provide poor specificity (ie, 64%).
3. To enhance the accuracy of PD diagnosis in early
PD (ie, to develop a high-certainty early-PD category
for future clinical trials), we examined this group in
more detail. For short-duration disease or untreated
PD, it is very difficult to meet criteria for clinically
established PD criteria (eg, dyskinesia and levodopa
response often do not apply), leaving no clear high-
specificity option for early PD. Therefore, we ana-
lyzed the effect of treating red flags as absolute
exclusion criteria (and removing duration compo-
nents) in cases with disease duration < 5 years. This
yielded 95.4% specificity and 68.9% sensitivity
compared with the gold standard, clinical diagnosis
(by comparison, the existing clinically established
category in this group had 98.7% specificity and
47.1% sensitivity).
Discussion
The MDS clinical diagnostic criteria for PD were
designed to mimic an expert clinician’s diagnostic pro-
cess and to codify and standardize diagnosis for use in
clinical research and for clinical diagnosis by those with
less expertise in PD. This study was designed to test how
well this codification was done. We found an overall
accuracy/concordance rate of 92.4%, with sensitivity of
94.5% and specificity of 88.5%. The UK Brain Bank cri-
teria had a discordance/error rate approximately double
the MDS criteria, and both sensitivity and specificity
were lower than with MDS criteria. Accuracy/concor-
dance was higher in those who were younger and who
had longer disease duration.
Although not the primary purpose of this study, our
study does provide important information on the util-
ity of olfactory testing in PD. Most studies have sug-
gested that olfactory testing has approximately 80%
sensitivity and >80% specificity for the diagnosis of
PD.10
We found much lower diagnostic performance
in this very large and systematically assessed sample. It
should be noted that the shorter versions of olfactory
tests were used, and it is possible that longer tests (eg,
the 40-item University of Pennsylvania Smell Identifi-
cation test) would perform better. However, many
other previous studies also used shorter versions. It is
possible that cultural factors from exposure to differ-
ent odors may be important; for example, the sensitiv-
ity of olfactory testing was lower in Beijing (49%)
than in the other centers (70%). However, specificity
did not substantially differ according to site (eg, 59%
in Beijing vs 65% in the other centers), and specificity
was <80% in every single center. Therefore, olfactory
testing’s diagnostic utility remains unclear.
Why did both the sensitivity and specificity of the
MDS criteria exceed UK Brain Bank criteria? For speci-
ficity, no clear pattern was detected. Of 27 false-
positives caught by MDS criteria (UK Brain Bank posi-
tive for PD, but MDS criteria and gold standard nega-
tive), 14 (52%) were excluded for absolute exclusions
(commonest: cortical sensory loss [5], absent levodopa
response [5]), and 13 for unbalanced red flags (com-
monest red flags overall: frequent falls [15], bilateral
symmetric parkinsonism [8]). However, regarding the
improved sensitivity, most related to clear advances in
the field. Of 37 false-negative diagnoses detected by
MDS criteria, 12 (32%) were excluded for > 1 affected
family member (underscoring advances in the genetics
of PD), 5 (14%) for head injury (now well recognized
as a PD risk factor11
), 4 (11%) for early autonomic
involvement (now recognized as a common prodromal
marker12
), 2 (5%) for dementia (now recognized in
early PD13
), and 1 (3%) for neuroleptic exposure
(reflecting the development of atypical neuroleptics
and/or increasing recognition that drug-induced parkin-
sonism is a potential prodromal marker of true PD14
).
This underscores the need to revise the MDS criteria
periodically in the future to reflect advances in our
understanding of PD. This will become especially criti-
cal if a reliable diagnostic biomarker (eg, neuroimaging,
tissue diagnosis, blood/cerebrospinal markers) becomes
firmly established.
Overall, the sensitivity and specificity of probable pro-
dromal PD clearly exceeded the 80% target threshold.
Based on our subanalysis, should the MDS criteria be
modified? Because sensitivity exceeded specificity, tweak-
ing criteria might help to balance these slightly (eg,
6 Movement Disorders, Vol. 00, No. 00, 2018
P O S T U M A E T A L
changing time courses of atypical features to shorter dis-
ease durations, changing some red flags to absolute exclu-
sions). However, post hoc modifications entail a
considerable risk of overfitting, leading to less reliable gen-
eralization. Moreover, if optimal specificity is desired, the
clinically established criteria had very high specificity
(98.4%) with a reasonable sensitivity of 59%. With
regard to the ancillary testing supportive criteria (olfaction
and MIBG scintigraphy), note that ancillary testing was
always considered optional in the MDS criteria. In gen-
eral, we found that the support for this criterion’s inclu-
sion is uncertain (sensitivity improves slightly, specificity
lowers slightly). Therefore, studies can omit this without
substantial loss in accuracy, and during future revisions,
consideration might be given to removing the criterion.
Based on our analysis and on considerations of
utility, we propose one specific addition. This is a diag-
nostic category that specifically targets a high-specificity
diagnosis in untreated/very early PD patients (eg, an
early-PD disease modification trial, in which specificity
must be high, but sensitivity need not exceed the 80%
benchmark of probable PD). This will be described in
an accompanying submisstion.
Some limitations of this study should be noted. First,
it must be emphasized that this is not a clinicopatho-
logic or clinicogenetic study. The gold standard used
here is expert clinician diagnosis, using all modalities
available to them in their clinical practice. Obviously,
diagnosis, the gold standard, will be wrong in some
cases. This would have unpredictable effects on sensitiv-
ity/specificity estimates. Cases in which the gold stan-
dard is incorrect but diagnostic criteria correct would
have biased estimates of MDS criteria accuracy down-
ward (eg, removal of a single outlier site increased spec-
ificity to 92%), whereas cases in which both were
incorrect would have biased accuracy upward. Indeed,
no diagnostic criteria have ever had full pathologic vali-
dation (the UK Brain Bank had an evaluation of posi-
tive predictive value that did not include non-PD
clinical diagnoses, and could not assess sensitivity or
specificity15–17
). A definitive clinicopathologic and
genetic study would be ideal, but this would be at least
several years away because it would require documen-
tation of all MDS criteria during life, then waiting until
the death of the majority of participants (the majority
needed to avoid selection bias from more severe/atypi-
cal cases or the absence of very long-standing cases). It
would also require additional genetic testing to ensure
that false-negative patients without pathology do not
have clinicogenetic PD without synuclein3
(parkin,
LRRK-2, etc.3
). Second, it is possible that knowing the
criteria were about to be applied might have influenced
the gold-standard diagnosis opinion, thereby biasing
the MDS criteria estimates upward. However, note that
diagnosis was made before criteria were applied (ie,
results of the criteria evaluation were not available to
gold-standard evaluators). Residual confounding from
this issue would be most likely to occur in the centers
of the 2 primary criteria authors, who would be most
familiar with each individual criterion; however, the
overall concordance in the Tubingen/Montreal centers
was not clearly higher than the other centers (94.1% vs
92.4% overall). Third, there is subjectivity and interpre-
tation built into some criteria (eg, excellent levodopa
response, drug-induced parkinsonism). Raters with less
expertise may incorrectly estimate these, further reduc-
ing accuracy. Our raters were generally junior neurolo-
gists currently in movement disorders fellowships; we
cannot assess how individual criteria may be inter-
preted by those with less (or more) expertise/interest in
PD diagnosis. It is also possible that because junior neu-
rologists were aware of the diagnosis, bias in identify-
ing exclusionary criteria might be seen (note again that
these neurologists only rated criteria as present/absent
and did not provide their diagnostic opinion). More-
over, if centers had available extensive ancillary tests
beyond those in the criteria (eg, MRI, other neuroimag-
ing), this may have increased reliability of the gold-
standard diagnosis compared with other centers. There-
fore, additional validation outside subspecialty centers
would be useful. Fourth, the overall occurrence of vas-
cular parkinsonism and drug-induced parkinsonism
was low (4.7% and 2.1%, respectively), perhaps reflec-
tive of the subspecialty centers in this study; in general
neurology practice, these may be more common, and
we cannot state how reliably the criteria exclude these
diagnoses. Fifth, it should be noted that MIBG scintig-
raphy was not required for this study (and only 10 had
scintigraphy results on file); therefore, we cannot reli-
ably comment on its specificity. Finally, this study did
not include parkinsonism mimics or those with only a
single parkinsonism sign (eg, isolated rest tremor),
because all patients were required to fulfill the opera-
tional definition of parkinsonism for entry. Therefore,
performance of those criteria that were designed to
catch these mimics (dopaminergic functional imaging,
nonprogression, absence of nonmotor features) cannot
be fully assessed here. Design of a trial of parkinsonism
mimics would be difficult, because it is very unclear
what entrance criteria could be used that could general-
ize to the types of errors encountered in diagnosing par-
kinsonism itself.
On the other hand, our study has notable strengths.
Primary among these are a large sample size (including
almost 200 patients with atypical parkinsonism), sys-
tematic sampling of all criteria in the same visit and the
same patients, and a direct head-to-head comparison of
criteria. Patients were evaluated in 4 continents, so
results are generalizable to different ethnicities. Finally,
this sampling included a systematic evaluation of olfac-
tion, allowing a comprehensive assessment of the accu-
racy of olfactory testing in parkinsonism.
Movement Disorders, Vol. 00, No. 00, 2018 7
V A L I D A T I O N O F M D S C R I T E R I A
Acknowledgments: We thank Majid Al O’taibi, Deborah A. Hammond,
Elie Matar, and Paul D Clouston, who helped with rating patients.
References
1. Postuma RB, Berg D, Stern M, et al. MDS Clinical Diagnostic
Criteria for Parkinson’s Disease. Mov Disord 2015;30:
1591-1600.
2. Berg D, Postuma RB, Adler CH, et al. MDS Research Criteria for
Prodromal Parkinson’s Disease. Mov Disord 2015;30:1600-1611.
3. Berg D, Postuma RB, Bloem B, et al. Time to redefine PD? Introduc-
tory statement of the MDS Task Force on the definition of Parkin-
son’s disease. Mov Disord 2014;29:454-462.
4. Gibb WR, Lees AJ. The relevance of the Lewy body to the pathogen-
esis of idiopathic Parkinson’s disease. J Neurol Neurosurg Psychiatry
1988;51:745-752.
5. Postuma RB, Berg D, Stern M, et al. Abolishing the 1-year rule: How
much evidence will be enough? Mov Disord 2016;31:1623-1627.
6. Doty RL, Shaman P, Dann M. Development of the University
of Pennsylvania Smell Identification Test: a standardized microen-
capsulated test of olfactory function. Physiol Behav 1984;32:
489-502.
7. Doty RL, Marcus A, Lee WW. Development of the 12-item
Cross-Cultural Smell Identification Test (CC-SIT). Laryngoscope
1996;106:353-356.
8. Hummel T, Sekinger B, Wolf SR, Pauli E, Kobal G. ‘Sniffin’ sticks’:
olfactory performance assessed by the combined testing of odor
identification, odor discrimination and olfactory threshold. Chem
Senses 1997;22:39-52.
9. Fleiss JL. Statistical Methods for Rates and Proportions. 2nd ed. -
New York: John Wiley; 1981.
10. Rahayel S, Frasnelli J, Joubert S. The effect of Alzheimer’s disease
and Parkinson’s disease on olfaction: a meta-analysis. Behav Brain
Res 2012;231:60-74.
11. Jafari S, Etminan M, Aminzadeh F, Samii A. Head injury and risk of
Parkinson disease: a systematic review and meta-analysis. Mov Dis-
ord 2013;28:1222-1229.
12. Postuma RB, Berg D. Advances in markers of prodromal Parkinson
disease. Nat Rev Neurol 2016;12:622-634.
13. Fereshtehnejad SM, Zeighami Y, Dagher A, Postuma RB. Clinical
criteria for subtyping Parkinson’s disease: biomarkers and longitudi-
nal progression. Brain 2017;140:1959-1976.
14. Brigo F, Erro R, Marangi A, Bhatia K, Tinazzi M. Differentiating
drug-induced parkinsonism from Parkinson’s disease: an update on
non-motor symptoms and investigations. Parkinsonism Relat Disord
2014;20:808-814.
15. Hughes AJ, Daniel SE, Kilford L, Lees AJ. Accuracy of clinical diagno-
sis of idiopathic Parkinson’s disease: a clinico-pathological study of
100 cases. J Neurol Neurosurg Psychiatry 1992;55:181-184.
16. Hughes AJ, Ben-Shlomo Y, Daniel SE, Lees AJ. What features
improve the accuracy of clinical diagnosis in Parkinson’s disease: a
clinicopathologic study. Neurology 1992;42:1142-1146.
17. Hughes AJ, Daniel SE, Lees AJ. Improved accuracy of clinical diagno-
sis of Lewy body Parkinson’s disease. Neurology 2001;57:1497-1499.
Supporting Data
Additional Supporting Information may be found in
the online version of this article at the publisher’s
web-site.
8 Movement Disorders, Vol. 00, No. 00, 2018
P O S T U M A E T A L

Postuma2018

  • 1.
    R E VI E W Validation of the MDS Clinical Diagnostic Criteria for Parkinson’s Disease Ronald B. Postuma, MD, MSc,1* Werner Poewe, MD,2 Irene Litvan, MD,3 Simon Lewis, MD,4 Anthony E. Lang, OC, MD, FRCPC,5 Glenda Halliday, PhD,4 Christopher G. Goetz, MD,6 Piu Chan, MD, PhD,7 Elizabeth Slow, MD FRCPC,5 Klaus Seppi, MD,2 Eva Schaffer, MD,8 Silvia Rios-Romenets, MD,1 Taomian Mi, MD PhD,7 Corina Maetzler, PhD,8 Yuan Li, MD PhD,7 Beatrice Heim, MD,2 Ian O. Bledsoe, MD9 and Daniela Berg, MD8* 1 Department of Neurology, Montreal General Hospital, Montreal, Quebec, Canada 2 Department of Neurology, Innsbruck Medical University, Innsbruck, Austria 3 Department of Neurosciences, UC San Diego, La Jolla, California, USA 4 Brain and Mind Centre, Sydney Medical School, Camperdown, Australia 5 Morton and Gloria Shulman Movement Disorders Clinic and the Edmond J. Safra Program in Parkinson’s Disease, Toronto Western Hospital, Toronto, Ontario, Canada 6 Rush University Medical Center, Chicago, Illinois, USA 7 Department of Neurobiology and Neurology, Xuanwu Hospital of Capital Medical University, Beijing, People’s Republic of China 8 Klinik für Neurologie, UKSH, Campus Kiel, Christian-Albrechts-Universität, Kiel, Germany 9 University of California, San Francisco, CA, USA ABSTRACT: Background: In 2015, the International Parkinson and Movement Disorder Society published clinical diagnostic criteria for Parkinson’s disease. These criteria aimed to codify/reproduce the expert clinical diagnostic process and to help standardize diagnosis in research and clinical settings. Their accuracy compared with expert clinical diagnosis has not been tested. The objectives of this study were to validate the International Parkinson and Movement Disorder Society diagnostic criteria against a gold standard of expert clinical diagno- sis, and to compare concordance/accuracy of the Inter- national Parkinson and Movement Disorder Society criteria to 1988 United Kingdom Brain Bank criteria. Methods: From 8 centers, we recruited 626 parkinsonism patients (434 PD, 192 non-PD). An expert neurologist diagnosed each patient as having PD or non-PD, regard- less of International Parkinson and Movement Disorder Society criteria (gold standard, clinical diagnosis). Then a second neurologist evaluated the presence/absence of each individual item from the International Parkinson and Movement Disorder Society criteria. The overall accu- racy/concordance rate, sensitivity, and specificity of the International Parkinson and Movement Disorder Society criteria compared with the expert gold standard were calculated. Results: Of 434 patients diagnosed with PD, 94.5% met the International Parkinson and Movement Disorder Society cri- teria for probable PD (5.5% false-negative rate). Of 192 non- PD patients, 88.5% were identified as non-PD by the criteria (11.5% false-positive rate). The overall accuracy for probable PD was 92.6%. In addition, 59.3% of PD patients and only 1.6% of non-PD patients met the International Parkinson and Movement Disorder Society criteria for clinically established PD. In comparison, United Kingdom Brain Bank criteria had lower sensitivity (89.2%, P = 0.008), specificity (79.2%, P = 0.018), and overall accuracy (86.4%, P < 0.001). Diag- nostic accuracy did not differ according to age or sex. Speci- ficity improved as disease duration increased. Conclusions: The International Parkinson and Movement Disorder Society criteria demonstrated high sensitivity and specificity compared with the gold standard, expert diagnosis, with sensitivity and specificity both higher than United Kingdom Brain Bank criteria. © 2018 Interna- tional Parkinson and Movement Disorder Society Key Words: Parkinson’s disease; diagnosis; criteria ---------------------------------------------------------*Correspondence to: Dr. Daniela Berg, Klinik für Neurologie, UKSH, Campus Kiel, Christian-Albrechts-Universität, Kiel, Germany; daniela. berg@uksh.de, or Dr. Ronald B. Postuma, Department of Neurology, L7-305 Montreal General Hospital, 1650 Cedar Avenue, Montreal, Canada H3G1A4; ron.postuma@mcgill.ca Relevant conflicts of interest/financial disclosures: The authors have no financial disclosures in relationship to this article. Funding agencies: This study was funded by the Michael J Fox Foundation. The study design was discussed with members of the Michael J. Fox Foundation. Received: 27 October 2017; Revised: 22 December 2017; Accepted: 5 February 2018 Published online 00 Month 2018 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/mds.27362 Movement Disorders, Vol. 00, No. 00, 2018 1
  • 2.
    In 2015, atask force from the International Parkin- son and Movement Disorder Society (MDS) published “Clinical Diagnostic Criteria for Parkinson’s disease” (MDS-PD-CDC).1–3 The goal of the criteria was to help to standardize clinical diagnosis, both for research (eg, clinical trials) and for clinical practice. In designing the criteria, the task force members noted that no reliable objective test for PD is currently available. Therefore, expert opinion remains the gold standard for diagnosis during life. Accordingly, the clinical criteria were designed to mimic and codify the diagnostic process of an expert clinician. Although data from published studies were used in the design of the criteria, the specific decisions about which criteria should be included, how they should be defined, how they should be weighted, and so forth were done in an iterative process primarily through expert opinion. Therefore, these criteria require valida- tion. In this multicenter study, we determined to what degree the MDS clinical criteria accurately reproduced the gold standard, expert clinical diagnosis. We also compared concordance/accuracy of the MDS criteria and the 1988 UK Brain Bank criteria.4 Methods PD Criteria Description The MDS PD criteria include a 2-step process for diagnosis.1 First, parkinsonism is identified, which is defined as bradykinesia with fatiguing, plus one of rigidity or rest tremor. Second, PD is determined as the cause of the parkinsonism. The second step uses 3 clas- ses of diagnostic features: supportive criteria (features typical for PD, which are seen in many but not all patients), absolute exclusions (features that rule out diagnosis of probable PD), and red flags (features that argue against PD, but are not sufficient to rule out probable PD). Red flags can be balanced by supportive criteria, such that documentation of a supportive crite- rion can offset a red flag. Two levels of certainty were prespecified. The pri- mary one is probable PD, with the goal to balance sen- sitivity and specificity, such that 80% of PD patients would meet criteria (sensitivity ≥ 80%), and 80% of non-PD patients would be excluded by the criteria (specificity ≥ 80%). An additional category of clinically established PD was created to maximize specificity dur- ing life (ie, specificity ≥ 90%), without regard to sensitivity. All movement disorders specialists who contributed to the MDS clinical criteria program1 were eligible to participate in the validation study. Prerequisites for par- ticipation were availability of an expert in the clinical diagnosis of PD, a second neurologist to administer the diagnostic criteria checklist, access to a sufficient number of individuals with parkinsonism, and ethics approval. Patients All patients with a diagnosis of parkinsonism1 were eligible for this study. Exclusion criteria were mini- mized to enhance generalizability and representative- ness. Centers were instructed to recruit the entire range of parkinsonism in clinics, with a PD:non-PD ratio of approximately 2:1 (convenience sampling allowed). Centers were instructed to not exclude patients for whom diagnosis was uncertain. Exclusion criteria included: 1. Patients without documented parkinsonism (or with a single isolated cardinal sign, such as isolated rest tremor): although nonparkinsonian conditions such as dystonia or essential tremor can occasionally be incorrectly diagnosed as PD, the primary focus of this validation was the differential diagnosis of parkin- sonism (PD vs non-PD). 2. Patients with dementia severe enough to possibly pre- clude appropriate informed consent (here defined as MMSE < 22): patients with mild-moderate dementia were included if they had a spouse or caregiver present to ensure the patient’s interests were protected and to prevent inaccurate data collection. Note that according to the task force definition, the diagnosis of dementia with Lewy bodies does not exclude potential PD diagnosis.3,5 Each center’s local research ethics board approved the study protocol, and informed written consent was obtained from all patients. For each patient, there were 2 study evaluations con- ducted by separate investigators. Investigators did not share their information or conclusions: 1. Gold-standard diagnostic evaluation: This determi- nation was conducted by a neurologist with >10 years’ experience in PD diagnosis who was gen- erally the patient’s treating physician. All evaluators were familiar with the definition of PD as outlined by the task force.3 This evaluator made the gold- standard diagnosis using all relevant clinical infor- mation available over the entire disease course, including any ancillary tests that were available (without using the MDS criteria). In addition to making the primary diagnosis, the gold-standard evaluators also estimated their percent certainty of PD versus non-PD status. 2. Item criteria evaluation: This assessment was con- ducted by a second neurologist (generally a move- ment disorders fellow or junior faculty) who was trained in the application of the criteria. All criteria from the MDS-PD-CDC and from the UK Brain Bank4 were assessed. These evaluators conducted 2 Movement Disorders, Vol. 00, No. 00, 2018 P O S T U M A E T A L
  • 3.
    the assessments withoutrequired reference to the expert neurologist’s diagnosis and were not asked to provide their diagnostic opinion; rather, they were simply instructed to evaluate whether each individ- ual criterion (from a checklist) was met. The evalua- tor used both interview and in-person examination to document criteria. In addition, chart review could be performed to document features not available on current history and examination. Based on the review, each individual criterion was scored by the rater as present or absent. The approximate time taken to complete the checklist was 15 minutes for each scale. As olfactory testing is included as a supportive crite- rion in the MDS-PD criteria, it was systematically tested in this study (10 patients also had metaiodo- benzylguanidine [MIBG] scintigraphy results available). Centers used either the 12-item Brief Smell- Identification test version A (BSIT) 6,7 or the Sniffin Sticks.8 Age-standardized normal values were taken from the published literature.6,8 Abnormal olfaction was defined as a score below the 10th percentile for age. The Sniffin Sticks combine all patients > 55 years together in 1 category; therefore, for consistency, the same standardization was performed for the BSIT (ie, cutoff of 6). On secondary analysis, a second adjust- ment subdivided the >55 age group on BSIT. Blood pressure was measured in the supine position and then reassessed standing after 3 minutes. Data Analysis The primary outcome was the accuracy/concordance of MDS criteria according to the gold standard, clinical diagnosis. Sensitivity and specificity, as well as kappa agreement, were calculated for each measure. Compari- sons between MDS and UK Brain Bank criteria were tested with the Fisher exact test. The prevalence of each individual criterion was assessed in the PD and non-PD groups. Prespecified secondary analyses included analy- sis of those with high (>80%) versus low (≤80%) cer- tainty of PD and those with short (<5 years) versus longer (≥5 years) disease duration. Additional second- ary analyses assessed the effect of age and sex, and ana- lyses testing different permutations of criteria were performed (see Results section). Statistical tests were performed with SPSS software, version 22. Results Patient Characteristics Overall, we recruited 626 patients: 434 with PD and 192 with non-PD parkinsonism (Table 1). Age and sex were similar between groups. As expected, disease duration was shorter in the non-PD group (2.8 vs 6.3 years). In the non-PD group, the commonest condi- tions were multiple system atrophy (38%), progressive supranuclear palsy (33%), and corticobasal syndrome (10%); see Supplemental Table 1. Overall Performance of MDS Criteria Of the 434 patients diagnosed with PD by the gold- standard expert, 94.5% met MDS criteria for probable PD (ie, 5.5% false-negative rate). Of the non-PD patients, 88.5% were identified as non-PD by the cri- teria (ie, 11.5% false-positive rate). Combining all patients, the overall accuracy for probable PD was 92.6% compared with the gold standard, expert clinical diagnosis. In addition, 59.3% of PD patients met MDS criteria for clinically established PD. Only 1.6% of non-PD patients met clinically established criteria. In comparison, the UK Brain Bank had both lower sensitivity and specificity. 89.4% of PD patients met UK Brain Bank criteria for PD (10.6% false-negatives, P = 0.008 to MDS criteria), and 79.2% of non-PD patients were identified as non-PD by UK Brain Bank criteria (20.8% false-positives, P = 0.018). Overall accuracy was 86.4%, significantly lower than the MDS rate (P < 0.001). Removing the dementia exclusion from the UK Brain Bank criteria (to account for defini- tion changes3,5 ) made no difference to the results (false- negatives, 11.4%; false-positives, 21.4%). A high pro- portion of PD patients (83.9%) met UK Brain Bank cri- teria for definite PD, with a false-positive rate of 8.9% TABLE 1. Patient characteristics and overall criteria performance PD (n = 434) non-PD parkinsonism (n = 192) Age 66.6 ± 9.7 68.0 ± 9.9 Sex (% female) 64 60 Disease duration from symptom onset (years) 7.5 ± 6.0 4.3 ± 2.9 Disease duration from diagnosis (years) 6.3 ± 5.8 2.8 ± 2.4 Meets MDS probable criteria (%) 94.5 11.5 Meets UK Brain Bank probable (%) 89.4 20.8 Meets MDS clinically established (%) 60.4 1.6 Meets UK Brain Bank definite (%) 83.9 8.9 Continuous variables are presented as mean ± standard deviation. PD, Parkinson’s disease; MDS, International Parkinson and Movement Disorder Society; UK, United Kingdom. Movement Disorders, Vol. 00, No. 00, 2018 3 V A L I D A T I O N O F M D S C R I T E R I A
  • 4.
    (P = 0.002compared with MDS clinically established criteria). Kappa agreement for the MDS probable PD criteria was 0.828 (scored as “excellent” according to Fleiss criteria9 ), and 0.680 (“good”9 ) for UK Brain Bank criteria. Analysis of Individual MDS Criteria Supportive Criteria Overall, PD patients had an average 2.1 ± 1.1 support- ive criteria, significantly more than the non-PD group (0.7 ± 0.5 supportive criteria; P < 0.001; Table 2). Supportive criteria were originally designed to have benchmark specificity ≥ 80%. Three of 4 criteria met this benchmark; dyskinesia occurred in 7% of non-PD patients, rest tremor in 13.6%, and excellent levodopa response in 19.9%. However, although positive ancillary testing differed between groups (P < 0.001) specificity of the ancillary testing criterion was low, at 61.7%. Thirty- six percent of the non-PD population had abnormal olfaction, and 44.4% (n = 9) had abnormal MIBG scin- tigraphy. Adjusting the olfactory cutoff to a stricter cutoff (ie, < 10% for all age groups > 55 years on BSIT) improved specificity only slightly (ie, 68.1%). Absolute Exclusions The occurrence of MDS absolute exclusions in the PD group was low; only 3.2% had any absolute exclu- sion, compared with 64.1% of the non-PD group (P < 0.001; Table 3). Absolute exclusions were origi- nally designed with a benchmark specificity of >95%; all absolute exclusion criteria met this benchmark, with the highest single absolute exclusion (absent levodopa response) occurring in only 1.3% of PD-diagnosed cases. In the non-PD group, the prevalence of absolute exclusions varied considerably, from a low of 1.1% (frontotemporal dementia) to a high of 33.8% (absent levodopa response). Red Flags Red flags were included in the criteria to account for many “atypical” features being seen in >5% of the true PD population. We found that 19.1% of PD patients had at least 1 red flag, compared with 84.1% of non-PD patients (P < 0.001; Table 4). In general, the prevalence of individual red flags in PD cases was relatively low; the highest prevalence was severe autonomic dysfunction, at 9.7%. Among the non-PD group, recurrent falls had the highest prevalence (51.3%). As anticipated, red flags TABLE 2. Supportive criteria PD (n = 434) non-PD parkinsonism (n = 192) Total number supportive criteria 2.14 ± 1.05 0.71 ± 0.80 Excellent levodopa response (%) 73.4 (n = 364) 19.9 (n = 151) Dyskinesia (%) 34.0 (n = 382) 7.0 (n = 157) Asymmetric rest tremor (%) 56.5 13.6 Positive test (%) 67.6 38.3 Olfaction (%) 67.4 36.6 MIBG scintigraphy (%) 100 (n = 1) 44.4 (n = 9) Continuous variables are presented as mean ± standard deviation. N is provided whenever a variable has >3% missing values (eg, diagnostic tests not required for the criteria). PD, Parkinson’s disease; MIBG, metaiodobenzylguanidine. TABLE 3. Absolute exclusion criteria PD (n = 434) Non-PD parkinsonism (n = 192) Total number absolute exclusions 0.037 ± 0.22 0.93 ± 0.88 Any absolute exclusion (%) 3.2 64.1 Cerebellar abnormalities (%) 0.2 14.2 Vertical supranuclear gaze palsy (%) 1.2 29.2 Frontotemporal dementia (%) 0.2 1.1 Leg-only parkinsonism (%) 0.5 2.6 Dopamine blocker/depleter (%) 0.5 4.2 Absent levodopa response (%) 1.3 (n = 393) 33.8 (n = 157) Cortical sensory loss (%) 0 13.1 Normal dopamine imaging (%) 0 (n = 42) 13.0 (n = 23) Continuous variables are presented as mean ± standard deviation. N is provided whenever a variable has >3% missing values (eg, diagnostic tests not required for the criteria), PD, Parkinson’s disease. 4 Movement Disorders, Vol. 00, No. 00, 2018 P O S T U M A E T A L
  • 5.
    targeting parkinsonism mimics(eg, essential/dystonic tremor) were uncommon, as these patients were not included in this study (nonprogression, 0.6%; absent nonmotor, 1.6%). Secondary Analyses We conducted several sensitivity and secondary analyses: 1. Center effects: Overall, there was no significant het- erogeneity between centers in sensitivity for PD diag- nosis (range between centers, 88%-98%). However, for the non-PD patient group, there was 1 outlier center (4.9 × interquartile range; P = 0.008), in which 6 of 19 non-PD patients (31.6%) met MDS criteria (range in the other centers, 6-13%). If this outlier center were removed, specificity improved to 90.8%, with no change in sensitivity (94.4%). 2. Effect of diagnosis: We examined the diagnostic dis- tribution of the 22 false-positive non-PD cases. We found no clear differences in the underlying diagnosis of accurate versus false-positive cases; 10 of 22 false- positives (45%) had been diagnosed with MSA, and 7 (33%) had PSP (Supplementary Table 1). 3. Disease duration (Supplementary Table 2): Because diagnosis is generally more difficult with shorter disease duration and because clinical trials focus increasingly on this population, we assessed criteria performance in those with disease of 5 years or more versus <5 years. Sensitivity was relatively similar (95.9% for > 5 years, 92.8% for < 5 years). How- ever, specificity improved considerably with longer duration; 95.0% for duration ≥ 5 years, versus 86.8% for < 5 years. 4. Stratification to certainty (Supplementary Table 3): This study included all patients with parkinsonism, even cases in which the gold standard, diagnosis, was uncertain. To assess concordance in higher-certainty cases, we removed patients for whom the certainty (PD vs not-PD) was quantified as < 80% by the gold- standard diagnostician. Sensitivity did not differ according to certainty (94.7% for > 80% vs 93.4% for ≤ 80%). Moreover, there was only a small change in specificity (90.4% vs. 87.3%). The sensitivity advantage of the MDS criteria compared with UK Brain Bank criteria was more notable in the lower- certainty cases (93.4% vs 81.3%, P = 0.024), also with better specificity (87.3% vs 76.3%, P = 0.06). 5. Effect of sex: Accuracy was not significantly differ- ent in men versus women (sensitivity, 96.1% in men, 91.6% in women; P = 0.078; specificity, 86.0% in men vs 92.3% in women; P = 0.24). TABLE 4. Red flags PD (n = 434) Non-PD parkinsonism (n = 192) Total number red flags 0.24 ± 0.54 2.02 ± 1.56 Any red flag (%) 19.1 84.4 >2 Red flags (%) 0.7 33.3 Red flags > supportive criteria (%) 3.0 66.7 Either > 2 red flags or > supportive (%) 3.0 66.7 1. Rapid progression gait (%) 1.4 22.7 2. Absence of progression > 5 years (%) 1.2 0.6 3. Severe bulbar dysfunction < 5 years (%) 0 18.0 4. Inspiratory stridor (%) 0.2 9.0 5. Severe autonomic < 5 years (%) 9.7 40.1 Orthostatic hypotension (%) 4.8 21.6 Severe urinary dysfunction (%) 5.3 30.6 6. Recurrent falls < 3 years (%) 3.7 51.3 7. Anterocollis or contractures (%) 2.6 10.2 8. Absent Nonmotor > 5 years (%) 1.6 1.6 Sleep maintenance insomnia (%) 44.5 39.6 Somnolence (%) 32.4 31.3 REM sleep behavior disorder (%) 45.5 32.4 Constipation (%) 61.2 59.3 Urinary urge (%) 48.4 60.6 Symptomatic orthostasis (%) 24.9 38.8 Hyposmia, self-reported (%) 67.1 26.8 Depression (%) 32.2 41.5 Anxiety (%) 37.6 36.1 Visual hallucinations (%) 17.4 6.7 9. Pyramidal tract signs (%) 1.8 25.8 10. Bilateral symmetric parkinsonism (%) 1.9 25.1 Movement Disorders, Vol. 00, No. 00, 2018 5 V A L I D A T I O N O F M D S C R I T E R I A
  • 6.
    6. Effect ofage: Age is a potential confounder in diagno- sis, as comorbid conditions and multiple neurodegen- erative pathologies are more common. We divided each group according to age above or below the group median. Sensitivity did not differ (93.4% young vs 95.4% old), but specificity was lower in older patients (93.8% vs 83.3%, P = 0.039). Testing Potential Modifications 1. Although the ancillary testing supportive criterion did not meet the 80% specificity threshold in our study, removing this from the diagnostic criteria did not improve overall accuracy (91.3% without ancil- lary testing vs 92.7% with testing). Sensitivity dropped from 94.5% to 91.9%, whereas specificity rose modestly, from 88.5% to 90.1%. 2. To optimize specificity, we tested the effect of treating all red flags as absolute exclusions for probable PD (leaving supportive criteria irrelevant). Doing this, the false-positive rate lowered from 11.5% to 4.7% (still higher than the clinically established false- positive rate of 1.6%). However, sensitivity dropped below the target 80% threshold (ie, 79.3%). Alterna- tively, removing all the red flags (ie, using only a sim- ple criterion with the 8 absolute exclusions) would provide poor specificity (ie, 64%). 3. To enhance the accuracy of PD diagnosis in early PD (ie, to develop a high-certainty early-PD category for future clinical trials), we examined this group in more detail. For short-duration disease or untreated PD, it is very difficult to meet criteria for clinically established PD criteria (eg, dyskinesia and levodopa response often do not apply), leaving no clear high- specificity option for early PD. Therefore, we ana- lyzed the effect of treating red flags as absolute exclusion criteria (and removing duration compo- nents) in cases with disease duration < 5 years. This yielded 95.4% specificity and 68.9% sensitivity compared with the gold standard, clinical diagnosis (by comparison, the existing clinically established category in this group had 98.7% specificity and 47.1% sensitivity). Discussion The MDS clinical diagnostic criteria for PD were designed to mimic an expert clinician’s diagnostic pro- cess and to codify and standardize diagnosis for use in clinical research and for clinical diagnosis by those with less expertise in PD. This study was designed to test how well this codification was done. We found an overall accuracy/concordance rate of 92.4%, with sensitivity of 94.5% and specificity of 88.5%. The UK Brain Bank cri- teria had a discordance/error rate approximately double the MDS criteria, and both sensitivity and specificity were lower than with MDS criteria. Accuracy/concor- dance was higher in those who were younger and who had longer disease duration. Although not the primary purpose of this study, our study does provide important information on the util- ity of olfactory testing in PD. Most studies have sug- gested that olfactory testing has approximately 80% sensitivity and >80% specificity for the diagnosis of PD.10 We found much lower diagnostic performance in this very large and systematically assessed sample. It should be noted that the shorter versions of olfactory tests were used, and it is possible that longer tests (eg, the 40-item University of Pennsylvania Smell Identifi- cation test) would perform better. However, many other previous studies also used shorter versions. It is possible that cultural factors from exposure to differ- ent odors may be important; for example, the sensitiv- ity of olfactory testing was lower in Beijing (49%) than in the other centers (70%). However, specificity did not substantially differ according to site (eg, 59% in Beijing vs 65% in the other centers), and specificity was <80% in every single center. Therefore, olfactory testing’s diagnostic utility remains unclear. Why did both the sensitivity and specificity of the MDS criteria exceed UK Brain Bank criteria? For speci- ficity, no clear pattern was detected. Of 27 false- positives caught by MDS criteria (UK Brain Bank posi- tive for PD, but MDS criteria and gold standard nega- tive), 14 (52%) were excluded for absolute exclusions (commonest: cortical sensory loss [5], absent levodopa response [5]), and 13 for unbalanced red flags (com- monest red flags overall: frequent falls [15], bilateral symmetric parkinsonism [8]). However, regarding the improved sensitivity, most related to clear advances in the field. Of 37 false-negative diagnoses detected by MDS criteria, 12 (32%) were excluded for > 1 affected family member (underscoring advances in the genetics of PD), 5 (14%) for head injury (now well recognized as a PD risk factor11 ), 4 (11%) for early autonomic involvement (now recognized as a common prodromal marker12 ), 2 (5%) for dementia (now recognized in early PD13 ), and 1 (3%) for neuroleptic exposure (reflecting the development of atypical neuroleptics and/or increasing recognition that drug-induced parkin- sonism is a potential prodromal marker of true PD14 ). This underscores the need to revise the MDS criteria periodically in the future to reflect advances in our understanding of PD. This will become especially criti- cal if a reliable diagnostic biomarker (eg, neuroimaging, tissue diagnosis, blood/cerebrospinal markers) becomes firmly established. Overall, the sensitivity and specificity of probable pro- dromal PD clearly exceeded the 80% target threshold. Based on our subanalysis, should the MDS criteria be modified? Because sensitivity exceeded specificity, tweak- ing criteria might help to balance these slightly (eg, 6 Movement Disorders, Vol. 00, No. 00, 2018 P O S T U M A E T A L
  • 7.
    changing time coursesof atypical features to shorter dis- ease durations, changing some red flags to absolute exclu- sions). However, post hoc modifications entail a considerable risk of overfitting, leading to less reliable gen- eralization. Moreover, if optimal specificity is desired, the clinically established criteria had very high specificity (98.4%) with a reasonable sensitivity of 59%. With regard to the ancillary testing supportive criteria (olfaction and MIBG scintigraphy), note that ancillary testing was always considered optional in the MDS criteria. In gen- eral, we found that the support for this criterion’s inclu- sion is uncertain (sensitivity improves slightly, specificity lowers slightly). Therefore, studies can omit this without substantial loss in accuracy, and during future revisions, consideration might be given to removing the criterion. Based on our analysis and on considerations of utility, we propose one specific addition. This is a diag- nostic category that specifically targets a high-specificity diagnosis in untreated/very early PD patients (eg, an early-PD disease modification trial, in which specificity must be high, but sensitivity need not exceed the 80% benchmark of probable PD). This will be described in an accompanying submisstion. Some limitations of this study should be noted. First, it must be emphasized that this is not a clinicopatho- logic or clinicogenetic study. The gold standard used here is expert clinician diagnosis, using all modalities available to them in their clinical practice. Obviously, diagnosis, the gold standard, will be wrong in some cases. This would have unpredictable effects on sensitiv- ity/specificity estimates. Cases in which the gold stan- dard is incorrect but diagnostic criteria correct would have biased estimates of MDS criteria accuracy down- ward (eg, removal of a single outlier site increased spec- ificity to 92%), whereas cases in which both were incorrect would have biased accuracy upward. Indeed, no diagnostic criteria have ever had full pathologic vali- dation (the UK Brain Bank had an evaluation of posi- tive predictive value that did not include non-PD clinical diagnoses, and could not assess sensitivity or specificity15–17 ). A definitive clinicopathologic and genetic study would be ideal, but this would be at least several years away because it would require documen- tation of all MDS criteria during life, then waiting until the death of the majority of participants (the majority needed to avoid selection bias from more severe/atypi- cal cases or the absence of very long-standing cases). It would also require additional genetic testing to ensure that false-negative patients without pathology do not have clinicogenetic PD without synuclein3 (parkin, LRRK-2, etc.3 ). Second, it is possible that knowing the criteria were about to be applied might have influenced the gold-standard diagnosis opinion, thereby biasing the MDS criteria estimates upward. However, note that diagnosis was made before criteria were applied (ie, results of the criteria evaluation were not available to gold-standard evaluators). Residual confounding from this issue would be most likely to occur in the centers of the 2 primary criteria authors, who would be most familiar with each individual criterion; however, the overall concordance in the Tubingen/Montreal centers was not clearly higher than the other centers (94.1% vs 92.4% overall). Third, there is subjectivity and interpre- tation built into some criteria (eg, excellent levodopa response, drug-induced parkinsonism). Raters with less expertise may incorrectly estimate these, further reduc- ing accuracy. Our raters were generally junior neurolo- gists currently in movement disorders fellowships; we cannot assess how individual criteria may be inter- preted by those with less (or more) expertise/interest in PD diagnosis. It is also possible that because junior neu- rologists were aware of the diagnosis, bias in identify- ing exclusionary criteria might be seen (note again that these neurologists only rated criteria as present/absent and did not provide their diagnostic opinion). More- over, if centers had available extensive ancillary tests beyond those in the criteria (eg, MRI, other neuroimag- ing), this may have increased reliability of the gold- standard diagnosis compared with other centers. There- fore, additional validation outside subspecialty centers would be useful. Fourth, the overall occurrence of vas- cular parkinsonism and drug-induced parkinsonism was low (4.7% and 2.1%, respectively), perhaps reflec- tive of the subspecialty centers in this study; in general neurology practice, these may be more common, and we cannot state how reliably the criteria exclude these diagnoses. Fifth, it should be noted that MIBG scintig- raphy was not required for this study (and only 10 had scintigraphy results on file); therefore, we cannot reli- ably comment on its specificity. Finally, this study did not include parkinsonism mimics or those with only a single parkinsonism sign (eg, isolated rest tremor), because all patients were required to fulfill the opera- tional definition of parkinsonism for entry. Therefore, performance of those criteria that were designed to catch these mimics (dopaminergic functional imaging, nonprogression, absence of nonmotor features) cannot be fully assessed here. Design of a trial of parkinsonism mimics would be difficult, because it is very unclear what entrance criteria could be used that could general- ize to the types of errors encountered in diagnosing par- kinsonism itself. On the other hand, our study has notable strengths. Primary among these are a large sample size (including almost 200 patients with atypical parkinsonism), sys- tematic sampling of all criteria in the same visit and the same patients, and a direct head-to-head comparison of criteria. Patients were evaluated in 4 continents, so results are generalizable to different ethnicities. Finally, this sampling included a systematic evaluation of olfac- tion, allowing a comprehensive assessment of the accu- racy of olfactory testing in parkinsonism. Movement Disorders, Vol. 00, No. 00, 2018 7 V A L I D A T I O N O F M D S C R I T E R I A
  • 8.
    Acknowledgments: We thankMajid Al O’taibi, Deborah A. Hammond, Elie Matar, and Paul D Clouston, who helped with rating patients. References 1. Postuma RB, Berg D, Stern M, et al. MDS Clinical Diagnostic Criteria for Parkinson’s Disease. Mov Disord 2015;30: 1591-1600. 2. Berg D, Postuma RB, Adler CH, et al. MDS Research Criteria for Prodromal Parkinson’s Disease. Mov Disord 2015;30:1600-1611. 3. Berg D, Postuma RB, Bloem B, et al. Time to redefine PD? Introduc- tory statement of the MDS Task Force on the definition of Parkin- son’s disease. Mov Disord 2014;29:454-462. 4. Gibb WR, Lees AJ. The relevance of the Lewy body to the pathogen- esis of idiopathic Parkinson’s disease. J Neurol Neurosurg Psychiatry 1988;51:745-752. 5. Postuma RB, Berg D, Stern M, et al. Abolishing the 1-year rule: How much evidence will be enough? Mov Disord 2016;31:1623-1627. 6. Doty RL, Shaman P, Dann M. Development of the University of Pennsylvania Smell Identification Test: a standardized microen- capsulated test of olfactory function. Physiol Behav 1984;32: 489-502. 7. Doty RL, Marcus A, Lee WW. Development of the 12-item Cross-Cultural Smell Identification Test (CC-SIT). Laryngoscope 1996;106:353-356. 8. Hummel T, Sekinger B, Wolf SR, Pauli E, Kobal G. ‘Sniffin’ sticks’: olfactory performance assessed by the combined testing of odor identification, odor discrimination and olfactory threshold. Chem Senses 1997;22:39-52. 9. Fleiss JL. Statistical Methods for Rates and Proportions. 2nd ed. - New York: John Wiley; 1981. 10. Rahayel S, Frasnelli J, Joubert S. The effect of Alzheimer’s disease and Parkinson’s disease on olfaction: a meta-analysis. Behav Brain Res 2012;231:60-74. 11. Jafari S, Etminan M, Aminzadeh F, Samii A. Head injury and risk of Parkinson disease: a systematic review and meta-analysis. Mov Dis- ord 2013;28:1222-1229. 12. Postuma RB, Berg D. Advances in markers of prodromal Parkinson disease. Nat Rev Neurol 2016;12:622-634. 13. Fereshtehnejad SM, Zeighami Y, Dagher A, Postuma RB. Clinical criteria for subtyping Parkinson’s disease: biomarkers and longitudi- nal progression. Brain 2017;140:1959-1976. 14. Brigo F, Erro R, Marangi A, Bhatia K, Tinazzi M. Differentiating drug-induced parkinsonism from Parkinson’s disease: an update on non-motor symptoms and investigations. Parkinsonism Relat Disord 2014;20:808-814. 15. Hughes AJ, Daniel SE, Kilford L, Lees AJ. Accuracy of clinical diagno- sis of idiopathic Parkinson’s disease: a clinico-pathological study of 100 cases. J Neurol Neurosurg Psychiatry 1992;55:181-184. 16. Hughes AJ, Ben-Shlomo Y, Daniel SE, Lees AJ. What features improve the accuracy of clinical diagnosis in Parkinson’s disease: a clinicopathologic study. Neurology 1992;42:1142-1146. 17. Hughes AJ, Daniel SE, Lees AJ. Improved accuracy of clinical diagno- sis of Lewy body Parkinson’s disease. Neurology 2001;57:1497-1499. Supporting Data Additional Supporting Information may be found in the online version of this article at the publisher’s web-site. 8 Movement Disorders, Vol. 00, No. 00, 2018 P O S T U M A E T A L