Researchers developed a computational method to predict which strains of influenza A (H3N2) virus should be included in seasonal flu vaccines. They analyzed phylogenetic and antigenic data from previous years to identify strains with alleles that were increasing in prevalence and antigenic impact. Their method accurately predicted the three most prevalent strains with 12% higher accuracy than the World Health Organization. This demonstrated that computational analysis of epidemiological data can help improve vaccine strain selection and influenza prediction.
Computational Prediction of Vaccine Strains for Human Influenza A
1. Erik Blixt
Computational Prediction of Vaccine Strains for Human Influenza A (H3N2) Virus
Introduction
Influenzapandemicshave causedover50milliondeathsworldwide,withanaverage of 23
thousanddeathsperyearin the UnitedStates(1).Althoughavaccine is available, itonlyprotectsthe
recipientfromthree of fourstrainsof the virus.The strainschosenforthe vaccine are selectedafter
fromresearchof predominate strainsfromthe previousseason.
The Influenzavirus isinthe familyof Orthomyxovirus, meaningthatithas a negative senseRNA
genome with6-8linearsegmentsof RNA.Eachof these RNAsencode specificproteinsneededforthe
virus’slife cycle(2).Glycoproteinsonthe influenzavirus’ssurface bindtospecificreceptorson the
surface of the targetcell andis endocytosedintothe cell.The viral RNP segmentsare releasedfrom
capsidinthe nucleusthen broughtintothe nucleustobe splicedtogetherintomRNA segments.These
mRNA segmentsare localized intothe cytoplasmtobe translated intoearly viral proteinswhichencode
the viral surface glycoproteinsandproteinstoencode late genesforviral genomereplication.Once the
viral RNA has been replicated intoaviral genome RNA transcriptthe virus andenclosedby
nucleocapsids,the virusisreleasedfromthe cell throughexocytosis.Asthe virusexitsthe cell,the
envelopeof the hostcell becomesthe envelope of the virus,withthe viral glycoproteinsonthe surface.
The capsid glycoproteinshemagglutininandneuraminidase(HA andNA,respectively) are used
to bindselectivelybindandtargetspecificregionsof the host’scell membraneforentry(3).Genetic
changesinthe sequence of glycoproteinsleads toantigenicdifferences.Asaresult,HA and NA are
groupedintodifferentsubtypes. HA isgroupedintosubtypes1-18 andNA intosubtypes1-11,and
variouscombinationof these twosubtypescanoccur inthe viral genome.
If a hostis infectedwithmultiple strainsof the virus,withdifferingglycoproteinsubtypes,the
opportunityforantigenicshiftoccurs.Twoor more differentstrainsof the same viruscombine toforma
newsubtype bymixingsurface antigensof the original strains.Forexample if ahostcell wasinfected
withbothH1N1 and H3N2 subtypes,the resultingprogenycouldbe anycombinationof those subtypes
(H1N1, H1N2, H3N1, or H3N2).
The varyingspecificallelicsequencesof eachof the glycoproteinRNA segments increases
geneticdiversitywithinthe population. Antigenicshiftincreasesthe difficultyof accuratelyestablishing
a vaccine.Because of this,the WorldHealthOrganization(WHO) collectsandevaluatesdatatopredict
whichstrainsof the viruswill be mostprevalentinthe subsequentseasonsothatan effective vaccine
can be developed.
In thispaper,researchersfromthe Steinbrücklabinthe HeinrichHeineUniversity inDüsseldorf,
Germanyproposeda methodtoovercome the challengeof selectingstrainstouse.Theyhypothesized
that if theycoulddevelopacomputational method forselectingthe circulatingpredominantand
emergingstrainsof InfluenzaA fromthe previousyears’data,thenthisinformationcanbe usedto
2. Erik Blixt
predictwhichstrainsof the vaccinesshouldbe usedfora seasonal fluvaccine. Thispaperwasof interest
to me because,asan applicantto the epidemiologyprogram, thispaperpotentiallypresentsanew,and
more accurate, processforselectingwhichstrainsof the fluvaccine touse eachseason.
ResultsandDiscussion
The German teamsoughtto determine whichstrainsof the viruswouldachieve dominance
each seasonbyquantifyingthe numberof infectionsof eachstrainof the virus,andcomparingthese
resultstopreviousyearstodetermine the rate of increase foreachstrain.These datawouldshowwhich
of the strainshada specificgeneticadvantage andcouldbe targetedbya vaccine before anendemic
occurred. Data on the strainswere obtainedfromthe InfluenzaViral Resourse,usingsamplesfrom
1995-2007(35).
Steinbrück’steamdevelopedamethodtobetterdetermine whichstrainsshouldbe produced
for a seasonal InfluenzaA (H3N2) virusvaccine.Theywere able todetermine whichunique HA alleles
were onthe rise tofuture predominance usinginformationgatheredfrompreviousseasons,using4
methodsystem.First,theydevelopedaphylogenetictree of the variousstrainsof the H3N2 virusandits
serotypes.Second,anADplotwasdevelopedfromthistree todeterminewhich3strainsof the virus
were mostlikelytobecome dominantinfuture seasons.Third,anantigenictree wasdevelopedfrom
the phylogenetictree andHIdistanceswere usedtoidentifythe antigenicvarietyof the 3most
prevalentHA alleles.Finally,aunique HA allelewasdeterminedtobecome dominantif ithadan
antigenicweightof .5antigenicunitsorgreater,indicatinganincrease of frequencyof 5%.If these
alleleswerenotpredominantinthe populationbefore(x>50% of the population),thenthe allelewould
be proposedtobe addedtothe nextseason’svaccine.
To determine the differencesof eachof the viral strains,a hemaglutinininhibitionassay(HI) is
performedeachyearon humansera from a sample of the population(4).Thisassaydeterminesthe
phenotypicpropertiesof eachstrainbycomparingthe antigenicsimilaritiesbetweentwostrainsof the
virusbasedonthe agglutinationof redbloodcells.FromthisdataSmithetal developedan“antigenic
cartography” to visualizedatafromHI anddetermine antigenicdifferences(5).Thisinformation wasalso
usedincombinationwithallele dynamicsplots(AD),whichwereusedtoindicate whichweremost
affectedbydirectional selection,andthuscorrelatedtothe virus’sdominance inapopulation(6).The
allelicfrequencywasdeterminedbythe numberof isolateswiththatalleleonaspecificphylogenetic
branch compared, tothe entire populationof thatseason.
The German teamusedthisinformationtodevelopan“antigenictree”,whichmapped antigenic
differencesin aphylogenetictree(7).Thismethodusedantigenicdistancestoestablishantigenic
weights,whichcouldthenbe usedtodetermine setsof codingchangesinHA. Steinbrückusedthis
information, alongwiththe ADplotstoidentifyantigenicallydistinctallelesof HA,andcorrelatedthis
data withthe emergence of dominant viral strains.
3. Erik Blixt
Thisdata demonstratedhowtheywere able topredictwhichviral strainswouldhave the
greatestpredominanceinapopulation.Theirstudieswere basedondatafrom2002 – 2007, and only
usedsamplesfrom2 yearsprecedingthe seasonof thatyear.Theirdata wasusedto determine which
strainsfor a vaccine wouldhave beenmosteffective,andcomparedtheirresultstothatof the WHO’s
InfluenzaSurveillance andResponse System.
Theirdata were used todetermine whichvaccine wouldhave beenmosteffective forthe
previous seasoninquestion andcomparedwiththe datacollectedfromWHO.A vaccine update would
have beenrecommendedif anovel strainwaspredictedpositivetobecome predominantinthe
subsequentseason,basedonif the allelehadanantigenicimpactgreaterthan.5 antigenicunits,the
amountfor one antigentobindto a target receptor.Theirresultswere categorizedinto4categories
whencomparedwiththe WHO’sdata. True positives(positive allelesthatwere accuratelypredicted
positive),true negatives(negative allelesthatwere accuratelypredictednegative),falsepositives
(negative allelesthatwere incorrectlypredictedpositive),andfalse negatives(positive allelesthatwere
incorrectlypredictednegative).A false positiveresultwouldhave beenmostdetrimental asthe vaccine
wouldhave beenincorrectlyupdated,replacingatrulypositivestrainwiththe false positive.
In the 2003 seasonto the 2004-2005 season,FU02 strainwaspredominantwithHA allele coding
changesat 156H, 75Q, and 155T(8). The allele wasrankedfirstforthe 2002-2003 season,andwas
predictedasa true positive candidate forthe vaccine strain.The WHOalsorecommendedthe FU02
strainin thatsame year.FU02 was replacedbyCA04 inthe 2004-2005 season,withHA allele changesat
145N, 159F, and 226I, andrankedfirstinthe AD plotinthe 2005 season,one yearafter
predominance(9).Thisallelewasgivenafalse negativereport,butthisisdue to the fact that samples
were notobtainedinthe 2004-2005 season,a yearbefore itspredominance.The WHOincludedCA04in
theirvaccine 2 seasonstoolate,generatingafalse positive aswell. Inthe 2006 seasonWI05 replaced
CA04 as the dominantstrainwithHA allele changesin193F,and correctlypredictedasa true
positive(10).The allele alsorankedveryhighlyinthe 2006-2007 season,butwas notre predictedforthe
vaccine as itwas alreadyselectedinthe seasonbefore.The WHOonlyaddedWI05 inthe 2006-2007
season,makingita false negative forthem.Finally,BR07became predominantinthe 2007 seasonwith
HA allelicchangesin50E and 140I(11). Itwas noticedinthe 2006-2007 seasonbutonlyhad a small 4%
increase inthe AD plot,therefore itwasnotrecommendedforthe vaccine inthe 2007 season,nor2007-
2008. Boththe Germanteamand WHO failedtorecognize these falsenegatives.Thisisprobablydue to
the fact that the strain appearedverylate inthe 2006-2007 season,aftermostsampleswere
concluded(12).Everyothertestresultedinatrue negative for Steinbrück’steam.
Withthe exceptionof the false positive in2004 withFU02 and the false negative withBR07in
2007, Steinbrück’steamaccuratelypredictedthe toprankedallelesfromthe ADplot25/27 times,with
93% accuracy. Whenseasonswere scoredasto predictthe mostpopularstrainof the season,
Steinbrück’steamscored78%accuracy. In comparison,the WHO onlyscoredwith66% accuracy(6).
4. Erik Blixt
The group accuratelypredictedthe 3most prevalentstrainof InfluenzaA with12% higher
accuracy thenthe WHO. By usinga combinationof ADplotsandantigenictreestocompare the
antigenicweightof strains;the emergence of strainswere accuratelypredictedayearin advance, only
usingdata fromthe previous2 seasons. Anyerrorin theirfindingwouldbe largelydue to factorsoutof
theircontrol.Forexample, the group’sonlyfalsepositivewas due toa lack of available datafromthat
season.Thisproblemiseasilyrectifiedasmore datais available andbetterrecordkeepinghas
developedastime progresses.
Thispaperillustrateshowlarge scale computeranalysiscanaiddramaticallyinepidemiological
research. Steinbrück’steamprocessedinformationbetweentwodatasets, usingADplotsto identifyHA
alleleswiththe largestincrease inprevalence andantigenictreestodetermine the antigenicimpactof
those results, toaccuratelypredictthe effectsof antigenicdrift.But Steinbrück’steamisnotthe only
groupto attemptto innovate hownewmodelscanbe usedtopredictthe flu’sspread. Łuksza& Lässigat
the Universityof Columbiacreatedafitnessfunctionthatpredictedthe growthrate of viral strains
basedon a susceptible-infected-recovered(SIR)modelandantigenchangestomeasure pathogen-host-
interaction(13).Inanotherresearchstudy, Duetal usedHIanalysistodetermine asequence of
antigenicsimilarityof viral strains(14).
Steinbrück’spapershowsthe beginningof anew periodof epidemiological dataprocessingin
the computerage.For a nextexperiment,if theseapproachesweretobe combinedwith othernew
methodsfrom Łuksza& Lässig’slab or Dr. Du’s lab andcomparedthe accuracy to the current WHO
model,thenthiscouldleadtoa dramaticrevolutioninhow aide isable tobe providedworldwide.
Steinbrück’smethodhasbeentested,andresultedinahigheraccuracythan the current model used.
These methodsshouldbe appliedbythe WHOas part of the standardpredictionprocess.
References
1) Tognotti E. 2009. Influenzapandemics:ahistorical retrospect.J.Infect. Dev.Ctries. 3:331–334.
http://dx.doi.org/10.3855/jidc.239.
2) FouchierRAM, Munster V,WallenstenA,BestebroerTM, Herfst S, SmithD, RimmelzwaanGF,
OlsenB, Osterhaus ADME. 2005. Characterization of anovel influenzaA virushemagglutinin
subtype (H16) obtained fromblack-headedgulls.J.Virol. 79:2814–2822. http://dx.doi.org
/10.1128/JVI.79.5.2814-2822.2005
3) Tong S, Zhu X, Li Y,Shi M, Zhang J, Bourgeois M,Yang H, ChenX, Recuenco S, GomezJ,Chen
LM, Johnson A, Tao Y, DreyfusC, Yu W, McBride R, CarneyPJ, GilbertAT, Chang J, GuoZ, Davis
CT, Paulson JC,StevensJ, Rupprecht CE, HolmesEC, WilsonIA, DonisRO. 2013. New worldbats
harbor diverse influenzaA viruses.PLoSPathog. 9:e1003657.
http://dx.doi.org/10.1371/journal.ppat.1003657.
5. Erik Blixt
4) Hirst GK. 1943. Studiesof antigenicdifferencesamongstrainsof influenzaA bymeansof red cell
agglutination.J.Exp.Med. 78:407–423. http://dx.doi.org/10.1084/jem.78.5.407.
5) Smith DJ, LapedesAS, de Jong JC,BestebroerTM, RimmelzwaanGF, OsterhausADME,
FouchierRAM. 2004. Mappingthe antigenicandgeneticevolutionof influenzavirus.Science
305:371–376. http://dx.doi.org/10.1126/science.1097211.
6) SteinbrückL, McHardy AC. 2011. Allele dynamicsplotsforthe studyof evolutionarydynamicsin
viral populations.NucleicAcidsRes. 39:e4. http://dx.doi.org/10.1093/nar/gkq909.
7) Bao Y,Bolotov P, Dernovoy D, Kiryutin B, Zaslavsky L, Tatusova T, Ostell J,Lipman D. 2008. The
influenzavirusresource atthe National CenterforBiotechnologyInformation.J.Virol. 82:596–
601. http://dx.doi.org/10.1128/JVI.02005-07.
8) WHO. 2003. Recommendedcompositionof influenzavirusvaccinesforuse inthe 2004 influenza
season.Wkly.Epidemiol.Rec. 78:375–379.
9) WHO. 2005. Recommendedcompositionof influenzavirusvaccinesforuse inthe 2005-2006
influenzaseason.Wkly.Epidemiol.Rec. 80: 66 –71.
10) WHO. 2006. Recommendedcompositionof influenzavirusvaccinesforuse inthe 2006-2007
influenzaseason.Wkly.Epidemiol.Rec. 81:82–86.
11) WHO. 2007. Recommendedcompositionof influenzavirusvaccinesforuse inthe 2008 influenza
season.Wkly.Epidemiol.Rec. 82:351–356.
12) WHO. 2007. Recommendedcompositionof influenzavirusvaccinesforuse inthe 2007-2008
influenzaseason.Wkly.Epidemiol.Rec. 82:69–74.
13) Łuksza M,Lässig M. 2014. A predictive fitnessmodel forinfluenza.Nature 507:57–61.
http://dx.doi.org/10.1038/nature13087.
14) Du X, Dong L, Lan Y, PengY, WuA, Zhang Y, Huang W,Wang D, Wang M, GuoY, Shu Y, JiangT.
2012. Mappingof H3N2 influenzaantigenicevolutioninChinarevealsastrategyforvaccine
strainrecommendation.Nat.Commun. 3:709. http://dx.doi.org/10.1038/ncomms1710.