The 7 Things I Know About Cyber Security After 25 Years | April 2024
Predicting peptide/MHC interactions: Application to epitope identification and vaccine design
1. Predicting peptide/MHC interactions:
Application to epitope identification and
vaccine design
Or
Finding the needle in the haystack
Morten Nielsen
Center for Biological Sequence Analysis
Department of Systems biology
Technical University of Denmark
mniel@cbs.dtu.dk
Bridging between two worlds
“Para serte sincero, no creo en este
approach bioinformatico a la
inmunologia, ...”
“Hablar de detectar epitopes a partir del genoma de una bacteria
entera me parece muy complicado. Me parece impracticable y
"misleading", en el sentido de que puede quitar fondos, esfuerzos y
atencion a las vias lentas pero seguras de llegar a este proposito
por metodos experimentales.”
FG, 2006
2. Vaccines have been
made for 36 of >400
human pathogens
+HPV & Rotavirus
Immunological Bioinformatics, The MIT press.
Deaths from
infectious diseases
in the world in 2002
www.who.int/entity/whr/2004/annex/topic/en/annex_2_en.pdf
4. MHC Class I pathway
Finding the needle in the haystack
1/200 peptides make
to the surface
Figure by Eric A.J. Reits
MHC-I molecules present peptides
on the surface of most cells
Figure courtesy Mette Voldby Larsen
5. CTL response
Figure courtesy Mette Voldby Larsen
The death of an infected cell
6. Antigen Discovery
Lauemøller et al., 2000
Influenza A virus (A/Goose/Guangdong/1/96(H5N1))
>Segment 1
Genome
agcaaaagcaggtcaattatattcaatatggaaagaataaaagaactaagagatctaatg
tcgcagtcccgcactcgcgagatactaacaaaaaccactgtggatcatatggccataatc
aagaaatacacatcaggaagacaagagaagaaccctgctctcagaatgaaatggatgatg
gcaatgaaatatccaatcacagcagacaagagaataatggagatgattcctgaaaggaat
and 13350 other nucleotides on 8 segments
9mer
peptides
>polymerase“
Proteins MERIKELRD
ERIKELRDL
MERIKELRDLMSQSRTREILTKTTVDHMAIIKKYTSGRQEKNPALRMKWMMAMKYPITAD RIKELRDLM
KRIMEMIPERNEQGQTLWSKTNDAGSDRVMVSPLAVTWWNRNGPTTSTVHYPKVYKTYFE IKELRDLMS
KVERLKHGTFGPVHFRNQVKIRRRVDINPGHADLSAKEAQDVIMEVVFPNEVGARILTSE KELRDLMSQ
SQLTITKEKKEELQDCKIAPLMVAYMLERELVRKTRFLPVAGGTSSVYIEVLHLTQGTCW ELRDLMSQS
EQMYTPGGEVRNDDVDQSLIIAARNIVRRATVSADPLASLLEMCHSTQIGGIRMVDILRQ LRDLMSQSR
NPTEEQAVDICKAAMGLRISSSFSFGGFTFKRTNGSSVKKEEEVLTGNLQTLKIKVHEGY RDLMSQSRT
EEFTMVGRRATAILRKATRRLIQLIVSGRDEQSIAEAIIVAMVFSQEDCMIKAVRGDLNF DLMSQSRTR
... LMSQSRTRE
and 9 other proteins and 4376 other 9mers
9. HLA polymorphism
HLA polymorphism
• Few human beings will share the same set
of HLA alleles
– Different persons will react to a pathogen
infection in a different manner
• A T cell based vaccine must include
epitopes specific for each HLA allele in a
population
– A peptide based vaccine must consist of many
100 HLA class I epitopes
– (and ~1000 class II epitopes)
11. HLA binding specificity
High information
positions
If we have binding data, can we accurate describe
the binding specificity!
HLA specificity clustering
A0201
A6802
A0101 B0702
12. Coverage of HLA alleles
Supertype Selected allele
A1 A*0101
A2 A*0201
A3 A*1101
A24 A*2401
A26 (new*) A*2601
B7 B*0702
B8 (new*) B*0801
B27 B*2705
B39(new*) B*3901
B44 B*4001
B58 B*5801
B62 B*1501 Clustering in: O Lund et al., Immunogenetics. 2004 55:797-810
Data
• Alleles characterized with 5 or more data points
• 3% covered
17. SYFPEITHI benchmark
(1400 ligands restricted to 46 HLA molecules)
Prediction Primate MHCs
• Can we predict binding specificities for
non-human primates using the NetMHCpan
method trained on human specificity data
only?
18. Yes. Monkey are just like humans
Patr B*0101
Patr A*0101
Sidney et al. (2006) Sidney et al. (2006)
And even Pigs and Cows are (somewhat)
like humans
19. So, we can find the needle in the haystack
• Given a protein sequence and an HLA molecule, we can
accurately predict with peptides will bind (70-95%)
• 15-80% of these will in turn be epitopes
But, can we find the haystack?
20. Epitope based vaccines and diagnostics
• Challenges
• Identify epitopes in pathogen genome
• A small viral genome contains >> 1000 potential CTL
epitopes
• HLA diversity
• No two humans will induce the same reaction to a
pathogen infection
• Viral escape and viral genomic diversity
• No two viral strains will “host” the same set of T
cell epitopes
Viral escape and pathogen variability
The virus of today is different from the virus of
tomorrow (Viral escape)
???
??
????
Figure courtesy Mette Voldby Larsen
21. Pathogen variability
HIV Gag phylogenetic tree
Clade C
Few peptides
conserved
between all
viral strains
Clade D
Clade AE
Clade A Clade B
22. Immuno-dominance
• Highly immunogenic
peptides
• High variability = easy
escapable
• Immune response useless Dominance
Sub-dominance
• Weakly immunogenic
peptides
• Low variability = no
escapable
• Immune response highly
effective = good vaccine
candidates
Polyvalent vaccines
• The equivalent of this in epitope based
vaccines is to select epitopes in a way so
that they together cover all strains.
Uneven coverage, Average coverage = 2
Epitope
Strain 1
Strain 2
Even coverage, Average coverage = 2
Strain 1
Strain 2
23. EpiSelect
Pi j
S =# j
G
i " + Ci
!
Cross-clade immunogens
Table 3 Highly immunogenic epitopes and there cross-clade recognition. 21 HLA-supertype
restricted epitopes were highly immunogenic and induced a CTL-response in at least four subjects.
The table shows the subtype the responding subjects were infected with and at which frequency the
epitope sequence is found among the HIV-1 subtype reference strains.
Epitope sequence HLA-supertype The subtypes Frequence of the epitope sequence in
& protein region of the responders subtype1:
A B C D AE
QVPLRPMTY A1-nef B, B, C, D, AE, nd
LTDTTNQKT A1-pol B, B, B, C, C, AE
KIQNFRVYY A1-pol B, D, AE, nd
FLGKIWPSHK A2-gag A1, A1, A1, B, B, B, B, C, AE, nd
SLYNTVATL A2-gag A1, B, B, B, C, C, C
GALDLSHFL A2-nef, var. 12 A1, B, B, B, C, AG
AAVDLSHFL A2-nef, var. 2 A1, B, B, B, AG
ILKEPVHGV A2-pol B, B, B, B, C, C, nd
QLTEAVQKI A2-pol B, C
AVDLSHFLK A3-nef, var. 1 A1, B, D, nd
ALDLSHFLK A3-nef, var. 2 A1, B, D, nd
AFDLSFFLK A3-nef, var. 3 B, C, C, C, C, AE, AE
WYIKIFIII A24-env B B, B, C, C
HYMLKHLVW A24-gag A1, B, B, C
IPRRIRQGL B7-env, var 1 A1, B, C, AE
IPRRIRQGF B7-env, var 2 A1, B, AE, CPX06
HPVHAGPVA B7-gag A1, B, C, D
RALGPGATL B7-gag A1, B, C, D
TPQDLNTML B7-pol A1, B, C, C
SPAIFQSSM B7-pol A1, A1, B, C, C, D, AE
QEILDLWVY B44-nef A1, A1, B, B, B, C
1
The color represents the frequencies of the exact epitopes sequence in the different subtypes; blue:
0%, light blue: 1-24%, orange: 25-49% and red: >50%. 2Subtype variants of the same epitope. nd:
not determined
Perez. et al. JI, 2008
24. All HIV responsive patients respond to at
least one of nine peptides
Perez et al., JI, 2008
PopCover - Searching in two dimensions.
HIV class II case story
• Data
– 396 full length genomes with annotated tat, nef, gag and
pol proteins covering A(50), B(104) ,C(156), D(40) and
AE(46) strains
• HLA-DR frequencies taken from
– 43 (allele frequency in at least one population > 2.5%) HLA
class II alleles
• 36 HLA-DRB1, HLA-DR3,4,5, and 4 HLA-DQ alleles
• Select predicted peptide binders
– 5608(tat), 20961(nef), 31848(gag),42748(pol)
• Select peptides from each protein with optimal
genomic and HLA coverage
– tat(4), nef(15), gag(15) and pol(15)
25. EpiSelect and PoPCover
• EpiSelect
Pi j
S =#
j
G
i " + Ci
The sum is over all genomes i. Pji is 1 if epitope j is present in genome i. Ci is
the number of times genome i has been targeted in the already selected
set of epitopes
!
• PopCover
j
Rki " fk " gi
S j
A+ G
= $$
i k # + Eik
The sum is over all genomes i and HLA alleles k. Rjki is 1 if epitope j is present
in genome i and is presented by allele k, and Eki is the number of times
allele k has been targets by epitopes in genome i by the already selected
set of epitopes, and gi is the genomes frequency
!
Benchmark
• Create 10,000 virtual patients with a given
HIV genomic sequence and HLA alleles as
defined by the HLA allele frequencies and
HIV genomic data
• Test how many of these patients that are
targets by at least on of the selected
peptides
26. HIV patient coverage
•Selected peptide pools
–tat(4), nef(15), gag(15) and pol(15)
So, have we found the haystack?
27. MTB (mycobacterium tuberculosis)
• Bacterial genome coding for more then
4000 proteins
• 700 known epitopes, found in only 30
proteins (ORFs)
MTB (mycobacterium tuberculosis)
• Bacterial genome coding for more then
4000 proteins
• 700 known epitopes, found in only 30
proteins (ORFs)
• Is this biology, or history?
– More than 500.000 unique 9mer peptides
– Where to start?
• Each HLA allele will binding ~5000 of these
peptides..
28. Functional bias in TB epitope proteins
Functional bias in TB epitope proteins
29. Where are the epitopes?
So no we cannot find the haystack?
But, this is the same problem faced by
experimental methods!
30. Conclusions
• Rational epitope discovery is feasible
– Prediction methods are an important guide for
epitope identification
– Given a protein sequence and an HLA molecule, we can
predict the peptide binders (find the needle in the
haystack)
• Pan-specific MHC prediction method can deal
with the immense MHC polymorphism
• Epitope selection strategies can deal with
pathogen diversity
• For large pathogens, we still have no handle on
how to select immunogenic proteins (we cannot
find the haystack)
CBS immunology web servers
www.cbs.dtu.dk/services
31. Acknowledgements
•Immunological Bioinformatics group,
CBS, DTU
– Ole Lund - Group leader
– Claus Lundegaard - Data bases, HLA
binding predictions
• Collaborators
– IMMI, University of Copenhagen
• Søren Buus: MHC binding
– La Jolla Institute of Allergy and
Infectious Diseases
• A. Sette, B. Peters: Epitope
database
• and many, many more
www.cbs.dtu.dk/services