SlideShare a Scribd company logo
1 of 30
Quantifiable predictive features define
epitope-specific T cell receptor repertoires
Thi Nguyen, Ph.D. Candidate
Graduate Biomedical Sciences | Immunology Theme
University of Alabama at Birmingham (UAB)
kimthi@uab.edu
Summer Journal Club
August 29th, 2018
Outline
1. T cells background -TCR diversity
2. Experiment workflow
3. CD8+ epitope specific TCR repertoire general biochemical
characteristics
4. Gene preference usage
5. TCRdist = measure difference between TCRs
6. CDR3 motif discovery
7. TCRdiv = measure TCR diversity
8. Nearest Neighbor Classifier
T cells
• T cell/T lymphocyte, is a type of white blood cell that is critical for immune defense
• T cell can be distinguished from other leukocytes by the presence of TCR
• derived from hematopoietic stem cells in bone marrow
• mature in the thymus (thymocyte)
• have many different subsets with distinct function (helper, killer, regulatory)
• unique ability to recognize patterns between normal (self) vs abnormal (non-self or
cancerous or sick/dying) cells (cell-mediated immunity) through pMHC binding.
• Upon TCR-pMHC binding + costimulatory molecule binding, they become activated
• Depending on the cytokine cues from environment, they become differentiated
TCR diversity-V(D)J recommbination
Immunobiology: The Immune System in Health and Disease. 5th edition.
Janeway CA Jr, Travers P, Walport M, et al.
New York: Garland Science; 2001.
germline
• Theoretical estimate 1015 -1061 TCR
• Observed 106 TCR
Gene rearrangement
Experiment Workflow
Mice (n=78)
Influenza
(i.n.)
mCMV
(i.p.)
BAL
Stain with tetramers
sort
Epitope-specific
CD8+ T cells
spleen
Human (influenza, CMV, EBV)
(n=32)
Single cell
mRNA
Paired TCR amplification + sequence
N = 4635 paired TCR sequences
PBMC
Paired TCR𝛂𝝱 amplification and sequence
Pradyot Dash et al
Methods Mol. Biol. 2015
TCR𝛂𝝱 sequence analysis
• V and J genes were assigned using BLAST against the IMGT database
• CDR3 nucleotide and aa sequence were assigned based on the location of the
conserved cysteine in the V region and the FGXG motif in the J region.
• Full CDR3 starts at C104 and ending in F position of the FGXG motif
• trimmed CDR3 starts at the 3rd position after C104 and terminate with 2nd position
before F118
• To handle degenerate J-gene FGXG motifs, J aa sequence were manually aligned to
define the F118 position before sequence analysis
Extended table. TCR repertoires characteristics
clonality = 1 – Simpson’s diversity index
(normalized by the size of repertoire)
Pshare = estimated rate a clone drawn
from one subject has an identical
aa sequence to another subject
Extended Fig.1. CDR3 region characteristics of 10 epitope-specific TCR repertoires
How do they quantify gene preference?
Jensen-Shannon divergence (JSD) (total divergence to the average)
• measure the similarity between two probability distribution/ quantifies how
distinguishable two distributions are from each other.
• Based on Kullback-Leibler divergence but it is symmetric and has a finite value
• Basic form = entropy of the mixture minus the mixture of the entropy
Gene usage preference = a normalized JSD between gene frequencies of epitope-
specific repertoire and non-specific background from public dataset.
• This can be generalized to a number of random variables with arbitrary weights:
Gene correlation analysis
• covariation between gene usage was quantified by adjusted mutual information
• correct for the number and frequencies of the observed genes that cluster by chance
• set lower significance threshold in Fig.1c, they randomly shuffled genes in each of
the 60 gene pairing lists 100 times and recompute the adjusted mutual information.
• The largest value observed in these 6000 random trials = lower significance threshold
Fig.1: V and J gene segment usage and covariation in epitope-specific
responses
Extended Data Figure 2: V and J gene segment usage and covariation
in epitope-specific responses
Extended Data Figure 3. Schematic overview of the TCRdist
• Similarity between TCR = similarity between pMHC-
contacting loops.
• Loops are defined based on IMGT CDR
definition with modifications:
(1) Include CDR2.5
(2) Use trimmed CDR3
• AAdist (Alignment score) = BLOSUM62 matrix
• TCRdist = Sum (weighted AAdist)
distance(a,a) = 0
distance(a,b) = min (4, 4-BLOSUM62(a,b) to reduce
penalty for aa with positive BLOSUM62 score.
• A gap penalty of 4 (8 for CDR3) = distance between
gap position and an aa.
• Weight of 3 is applied to mismatches in the CDR3.
BLOSUM62 matrix
BLOSUM = Block substitution matrix
• Score alignment between protein sequence (locally , as opposed to PAM)
• Based on observed alignments
• Larger = higher sequence similarity => smaller evolutionary distance
Clustering and dimensionality reduction
• TCR with the largest number of neighbors within the distance threshold is chosen
as a cluster center.
 It and all its neighbors are removed from the repertoires
 repeat the process until all TCRs have been clustered.
• The distance threshold was chosen to yield homogeneous cluster of sufficient size,
same threshold was used for all repertoire
• result of this clustering method was visualized by average-linkage hierarchical clusterin
trees and TCR sequence logos
• They also use 2D kernel PCA (scikit-learn, KernelPCA function) to visualize the TCR
landscape. This attempts to preserve similarity structure of the input data while reducing
their dimensionality.
Fig.2: TCRdist analysis of the M45 repertoire identifies clusters of
related receptors
TCR logos
• summarize V and J gene usage, CDR3 aa sequence
and inferred rearrangement structure of the CDR3
• 4 components:
1. V-gene logos (left): V-gene names are scaled by
frequency and stacked top to bottom from most to least
common
2. CDR3 sequence logo where aa are scaled and ordered
by frequency and colored by chemical type.
3. a J-gene logo (right)
4. CDR3 where the genomic source regions for each
nucleotide column are represented by frequency-scaled
bars ordered top to bottom from V to D to J and colored
according to their frequency.
Extended Fig.4:
2D projections of mouse epitope-specific TCR repertoire
• kernel PCA applied to TCRdist
• Colored based on gene segment usage
kernel PCA ~ nonlinear form of PCA
CDR3 motif discovery
• Motifs = fixed length patterns consisting of aa position, wild card positions, aa group
positions (allowed groupings (K,R), (D, E), (N,Q), (S,T), (FYWH), (AGSP), (VILM))
• motif score = (observed –expected)2 /expected.
• observed = number of times motifs were observed from TCR sequence
• expected = values from background TCR (with V and J gene match observed repertoire
• Starting with two-position motifs scoring above a seed threshold, each motif was
iteratively extended by adding new specified position that increase the motif score.
• motif scores were sorted and filtered for redundancy.
• motif score above a threshold were extended to include near-neighbour TCR using
a stringent distance threshold => capture additional patterns
• final set of motif for each repertoire were visualized using TCR logo.
Amino Acid Groupings
https://en.wikipedia.org/wiki/Amino_acid
Fig.3: Enriched CDR3 sequence motifs define key features of epitope
specificity
TCRdiv metric to measure repertoire diversity
Simpson diversity index (D):
• takes into account of both the
richness and evenness of the
population.
• Measure the probability that 2
Individuals randomly selected from
A population will belong to the same class.
• 0 ≤ D ≤ 1
• 1 means the samples are identical
• 0 means otherwise
TCRdiv:
• Estimate the expected value of a Gaussian function of the inter-sample that returns
1 if the samples are identical and exp(-(TCRdist(a,b)/s.d.)2) otherwise.
• S.d. = 18.45 for single chain distance and twice that for paired analyses based on
empirical assessments of receptor distance distribution for multiple epitopes.
• TCRdiv =inverse of this estimate
http://www.countrysideinfo.co.uk/simpsons.htm
Extended Fig.8. TCRdiv measures for each chain and paired chains
Nearest neighbor (NN) -distance classifier
• receptor density within repertoire = sampling density nearby each receptor
• = weighted average distance to nearest neighbor receptors in the repertoire
• small NN-distance means higher local sampling density = many nearby neighbors
• They use the nearest 10% of the repertoire with a weight that linearly decrease from
nearest to farthest neighbors.
• To compute AUROC score for the NN-distance classifier, epitope-specific TCR (positive
and background receptors (negative) were sorted by NN-distance
• ROC curve:
sensitive = fractional recovery of epitope-specific receptor
1-specificity = fractional recovery of background receptors
Nearest Neighbor Classifiers
http://user.it.uu.se/~kostis/Teaching/DM-05/Slides/classification01.pdf
KNN algorithm
http://user.it.uu.se/~kostis/Teaching/DM-05/Slides/classification01.pdf
Fig.4 . Quantifying the defining features of epitope-specific populations
Extended Fig 9. Specificity and Avidity of the dispersed TCR
Modeling gene rearrangement
• Each nucleotide of the CDR3 is assigned to either V, D, J or N-nucleotide insertion
so as to minimize number of N.
• They sample V and J gene segments from the observed receptors but generated
the junctional sequences based on the inferred probability distribution for number of
insertion and deletion from the background TCR (public data).
• They also use these probability distributions to generate the random receptors that
formed one of the two control set for CDR3 motif discovery algorithm.
extended Fig.8.
Summary
• characterize 10 epitope-specific TCR repertoires of CD8+ T cells from 4600 single
celled TCR:
 gene segment usage
 epitope selection
• Develop TCRdist to quantify similarity among TCR based on spaces of TCR
• Develop TCRdiv to quantify TCR repertoire diversity
• develop a distance-based classifier that can assign unobserved TCR to characterized
TCR
Significance:
• potential application to analyze clinical TCR repertoire data where the target is
unknown such as in TIL.
• propose that despite tremendous diversity of TCR, we can develop predictive model for
TCR-pMHC recognition.

More Related Content

What's hot

Basic Immunology 11-20
Basic Immunology 11-20Basic Immunology 11-20
Basic Immunology 11-20improvemed
 
T CELL RECEPTOR.pptx
T CELL RECEPTOR.pptxT CELL RECEPTOR.pptx
T CELL RECEPTOR.pptxBinteHawah1
 
Ch7BacterialGenetics.ppt
Ch7BacterialGenetics.pptCh7BacterialGenetics.ppt
Ch7BacterialGenetics.pptdawitg2
 
Organs of immune system
Organs of immune systemOrgans of immune system
Organs of immune systemMariam77865
 
Cancer genetics [autosaved]
Cancer genetics [autosaved]Cancer genetics [autosaved]
Cancer genetics [autosaved]prachiupadhyay8
 
T Cell Antigen Receptor
T Cell Antigen ReceptorT Cell Antigen Receptor
T Cell Antigen Receptorraj kumar
 
Bio108 Cell Biology lec 4 The Complexity of Eukaryotic Genomes
Bio108 Cell Biology lec 4 The Complexity of Eukaryotic GenomesBio108 Cell Biology lec 4 The Complexity of Eukaryotic Genomes
Bio108 Cell Biology lec 4 The Complexity of Eukaryotic GenomesShaina Mavreen Villaroza
 
Antigen processing & presentation
Antigen processing & presentationAntigen processing & presentation
Antigen processing & presentationDr Alok Tripathi
 

What's hot (20)

IMMUNE RESPONSE TO TUMORS
IMMUNE RESPONSE TO TUMORSIMMUNE RESPONSE TO TUMORS
IMMUNE RESPONSE TO TUMORS
 
Tumor immunity
Tumor immunityTumor immunity
Tumor immunity
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
 
Cancer signaling pathway
Cancer signaling pathwayCancer signaling pathway
Cancer signaling pathway
 
Basic Immunology 11-20
Basic Immunology 11-20Basic Immunology 11-20
Basic Immunology 11-20
 
Cancer immunology
Cancer immunologyCancer immunology
Cancer immunology
 
Tumor immunity
Tumor immunityTumor immunity
Tumor immunity
 
T CELL RECEPTOR.pptx
T CELL RECEPTOR.pptxT CELL RECEPTOR.pptx
T CELL RECEPTOR.pptx
 
Protein Localization
Protein LocalizationProtein Localization
Protein Localization
 
T and b cells
T and b cellsT and b cells
T and b cells
 
Ch7BacterialGenetics.ppt
Ch7BacterialGenetics.pptCh7BacterialGenetics.ppt
Ch7BacterialGenetics.ppt
 
Organs of immune system
Organs of immune systemOrgans of immune system
Organs of immune system
 
microbial genetics
 microbial genetics microbial genetics
microbial genetics
 
Cancer genetics [autosaved]
Cancer genetics [autosaved]Cancer genetics [autosaved]
Cancer genetics [autosaved]
 
T Cell Antigen Receptor
T Cell Antigen ReceptorT Cell Antigen Receptor
T Cell Antigen Receptor
 
Bio108 Cell Biology lec 4 The Complexity of Eukaryotic Genomes
Bio108 Cell Biology lec 4 The Complexity of Eukaryotic GenomesBio108 Cell Biology lec 4 The Complexity of Eukaryotic Genomes
Bio108 Cell Biology lec 4 The Complexity of Eukaryotic Genomes
 
2. immune regulation
2. immune regulation2. immune regulation
2. immune regulation
 
T cells and b-cells
T cells and b-cellsT cells and b-cells
T cells and b-cells
 
Antigen processing & presentation
Antigen processing & presentationAntigen processing & presentation
Antigen processing & presentation
 
Dna binding motiffs
Dna binding motiffsDna binding motiffs
Dna binding motiffs
 

Similar to Predictive Features of TCR Repertoire

Sequencing the Human TCRβ Repertoire on the Ion S5TM
Sequencing the Human TCRβ Repertoire on the Ion S5TMSequencing the Human TCRβ Repertoire on the Ion S5TM
Sequencing the Human TCRβ Repertoire on the Ion S5TMThermo Fisher Scientific
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomicsajay301
 
Pooled Sequence Haplotype Estimator
Pooled Sequence Haplotype EstimatorPooled Sequence Haplotype Estimator
Pooled Sequence Haplotype EstimatorDevin Petersohn
 
SAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionSAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionAashish Patel
 
Sequencing the circulating and infiltrating T-cell repertoire on the Ion S5TM
Sequencing the circulating and infiltrating T-cell repertoire on the Ion S5TMSequencing the circulating and infiltrating T-cell repertoire on the Ion S5TM
Sequencing the circulating and infiltrating T-cell repertoire on the Ion S5TMThermo Fisher Scientific
 
Seminar Slides
Seminar SlidesSeminar Slides
Seminar Slidespannicle
 
Lab Presentation, Molecular Data Cluster Algorithms
Lab Presentation, Molecular Data Cluster AlgorithmsLab Presentation, Molecular Data Cluster Algorithms
Lab Presentation, Molecular Data Cluster AlgorithmsSean Maden
 
QTL MAPPING AND APPROACHES IN BIPARENTAL MAPPING POPULATIONS.pptx
QTL MAPPING AND APPROACHES IN BIPARENTAL MAPPING POPULATIONS.pptxQTL MAPPING AND APPROACHES IN BIPARENTAL MAPPING POPULATIONS.pptx
QTL MAPPING AND APPROACHES IN BIPARENTAL MAPPING POPULATIONS.pptxPABOLU TEJASREE
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedJonathan Eisen
 
Comparing the early ciRNA papers
Comparing the early ciRNA papers Comparing the early ciRNA papers
Comparing the early ciRNA papers Darya Vanichkina
 
Improving the accuracy of k-means algorithm using genetic algorithm
Improving the accuracy of k-means algorithm using genetic algorithmImproving the accuracy of k-means algorithm using genetic algorithm
Improving the accuracy of k-means algorithm using genetic algorithmKasun Ranga Wijeweera
 

Similar to Predictive Features of TCR Repertoire (20)

TCRpower
TCRpowerTCRpower
TCRpower
 
Sequencing the Human TCRβ Repertoire on the Ion S5TM
Sequencing the Human TCRβ Repertoire on the Ion S5TMSequencing the Human TCRβ Repertoire on the Ion S5TM
Sequencing the Human TCRβ Repertoire on the Ion S5TM
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
20140711 2 j_willey_ercc2.0_workshop
20140711 2 j_willey_ercc2.0_workshop20140711 2 j_willey_ercc2.0_workshop
20140711 2 j_willey_ercc2.0_workshop
 
Microsatellites Markers
Microsatellites  MarkersMicrosatellites  Markers
Microsatellites Markers
 
Topological associated domains- Hi-C
Topological associated domains- Hi-CTopological associated domains- Hi-C
Topological associated domains- Hi-C
 
I010415255
I010415255I010415255
I010415255
 
Microarray Analysis
Microarray AnalysisMicroarray Analysis
Microarray Analysis
 
Pooled Sequence Haplotype Estimator
Pooled Sequence Haplotype EstimatorPooled Sequence Haplotype Estimator
Pooled Sequence Haplotype Estimator
 
SAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionSAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene Expression
 
Sequencing the circulating and infiltrating T-cell repertoire on the Ion S5TM
Sequencing the circulating and infiltrating T-cell repertoire on the Ion S5TMSequencing the circulating and infiltrating T-cell repertoire on the Ion S5TM
Sequencing the circulating and infiltrating T-cell repertoire on the Ion S5TM
 
Seminar Slides
Seminar SlidesSeminar Slides
Seminar Slides
 
Lab Presentation, Molecular Data Cluster Algorithms
Lab Presentation, Molecular Data Cluster AlgorithmsLab Presentation, Molecular Data Cluster Algorithms
Lab Presentation, Molecular Data Cluster Algorithms
 
QTL MAPPING AND APPROACHES IN BIPARENTAL MAPPING POPULATIONS.pptx
QTL MAPPING AND APPROACHES IN BIPARENTAL MAPPING POPULATIONS.pptxQTL MAPPING AND APPROACHES IN BIPARENTAL MAPPING POPULATIONS.pptx
QTL MAPPING AND APPROACHES IN BIPARENTAL MAPPING POPULATIONS.pptx
 
Cancer Immunogenomics
Cancer ImmunogenomicsCancer Immunogenomics
Cancer Immunogenomics
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
 
Gene expression profiling i
Gene expression profiling  iGene expression profiling  i
Gene expression profiling i
 
Comparing the early ciRNA papers
Comparing the early ciRNA papers Comparing the early ciRNA papers
Comparing the early ciRNA papers
 
Improving the accuracy of k-means algorithm using genetic algorithm
Improving the accuracy of k-means algorithm using genetic algorithmImproving the accuracy of k-means algorithm using genetic algorithm
Improving the accuracy of k-means algorithm using genetic algorithm
 

More from Thi K. Tran-Nguyen, PhD

IL-21 promotes pulmonary fibrosis through the induction of profibrotic CD8+ T...
IL-21 promotes pulmonary fibrosis through the induction of profibrotic CD8+ T...IL-21 promotes pulmonary fibrosis through the induction of profibrotic CD8+ T...
IL-21 promotes pulmonary fibrosis through the induction of profibrotic CD8+ T...Thi K. Tran-Nguyen, PhD
 
BiP-derived HLA-DR4 Epitopes Differentially Recognized by T cells in RA
BiP-derived HLA-DR4 Epitopes Differentially Recognized by T cells in RABiP-derived HLA-DR4 Epitopes Differentially Recognized by T cells in RA
BiP-derived HLA-DR4 Epitopes Differentially Recognized by T cells in RAThi K. Tran-Nguyen, PhD
 
Goblet Cells Deliver Luminal Antigen to CD103+ DCs
Goblet Cells Deliver Luminal Antigen to CD103+ DCsGoblet Cells Deliver Luminal Antigen to CD103+ DCs
Goblet Cells Deliver Luminal Antigen to CD103+ DCsThi K. Tran-Nguyen, PhD
 
Induction of Protective IgA by intestinal DC
Induction of Protective IgA by intestinal DCInduction of Protective IgA by intestinal DC
Induction of Protective IgA by intestinal DCThi K. Tran-Nguyen, PhD
 
Transcriptional Responses to Anti-cancer Drugs in vitro
Transcriptional Responses to Anti-cancer Drugs in vitroTranscriptional Responses to Anti-cancer Drugs in vitro
Transcriptional Responses to Anti-cancer Drugs in vitroThi K. Tran-Nguyen, PhD
 
Extract Stressors for Suicide from Twitter Using Deep Learning
Extract Stressors for Suicide from Twitter Using Deep LearningExtract Stressors for Suicide from Twitter Using Deep Learning
Extract Stressors for Suicide from Twitter Using Deep LearningThi K. Tran-Nguyen, PhD
 
Allogeneic IgG Enhances Antitumor T-cell Immunity
Allogeneic IgG Enhances Antitumor T-cell ImmunityAllogeneic IgG Enhances Antitumor T-cell Immunity
Allogeneic IgG Enhances Antitumor T-cell ImmunityThi K. Tran-Nguyen, PhD
 
Gut Microbiome Composition Influences Responses to immunotherapy
Gut Microbiome Composition Influences Responses to immunotherapyGut Microbiome Composition Influences Responses to immunotherapy
Gut Microbiome Composition Influences Responses to immunotherapyThi K. Tran-Nguyen, PhD
 

More from Thi K. Tran-Nguyen, PhD (20)

CHAMP1-family-conference-Oct-2022.pptx
CHAMP1-family-conference-Oct-2022.pptxCHAMP1-family-conference-Oct-2022.pptx
CHAMP1-family-conference-Oct-2022.pptx
 
IL-21 promotes pulmonary fibrosis through the induction of profibrotic CD8+ T...
IL-21 promotes pulmonary fibrosis through the induction of profibrotic CD8+ T...IL-21 promotes pulmonary fibrosis through the induction of profibrotic CD8+ T...
IL-21 promotes pulmonary fibrosis through the induction of profibrotic CD8+ T...
 
BiP-derived HLA-DR4 Epitopes Differentially Recognized by T cells in RA
BiP-derived HLA-DR4 Epitopes Differentially Recognized by T cells in RABiP-derived HLA-DR4 Epitopes Differentially Recognized by T cells in RA
BiP-derived HLA-DR4 Epitopes Differentially Recognized by T cells in RA
 
Fibrotic Diseases
Fibrotic DiseasesFibrotic Diseases
Fibrotic Diseases
 
Histology Exam
Histology ExamHistology Exam
Histology Exam
 
Goblet Cells Deliver Luminal Antigen to CD103+ DCs
Goblet Cells Deliver Luminal Antigen to CD103+ DCsGoblet Cells Deliver Luminal Antigen to CD103+ DCs
Goblet Cells Deliver Luminal Antigen to CD103+ DCs
 
Induction of Protective IgA by intestinal DC
Induction of Protective IgA by intestinal DCInduction of Protective IgA by intestinal DC
Induction of Protective IgA by intestinal DC
 
Fibrosis- Why and How?
Fibrosis- Why and How?Fibrosis- Why and How?
Fibrosis- Why and How?
 
Vietnam
VietnamVietnam
Vietnam
 
Transcriptional Responses to Anti-cancer Drugs in vitro
Transcriptional Responses to Anti-cancer Drugs in vitroTranscriptional Responses to Anti-cancer Drugs in vitro
Transcriptional Responses to Anti-cancer Drugs in vitro
 
CancerSeek
CancerSeekCancerSeek
CancerSeek
 
Deep Learning for EHR Data
Deep Learning for EHR DataDeep Learning for EHR Data
Deep Learning for EHR Data
 
PSN for Precision Medicine
PSN for Precision MedicinePSN for Precision Medicine
PSN for Precision Medicine
 
Extract Stressors for Suicide from Twitter Using Deep Learning
Extract Stressors for Suicide from Twitter Using Deep LearningExtract Stressors for Suicide from Twitter Using Deep Learning
Extract Stressors for Suicide from Twitter Using Deep Learning
 
Big Data Programming-Final Project
Big Data Programming-Final ProjectBig Data Programming-Final Project
Big Data Programming-Final Project
 
Cancer Immunotherapy
Cancer ImmunotherapyCancer Immunotherapy
Cancer Immunotherapy
 
Allogeneic IgG Enhances Antitumor T-cell Immunity
Allogeneic IgG Enhances Antitumor T-cell ImmunityAllogeneic IgG Enhances Antitumor T-cell Immunity
Allogeneic IgG Enhances Antitumor T-cell Immunity
 
CD28null T-cells in Autoimmune Disease
CD28null T-cells in Autoimmune DiseaseCD28null T-cells in Autoimmune Disease
CD28null T-cells in Autoimmune Disease
 
Gut Microbiome Composition Influences Responses to immunotherapy
Gut Microbiome Composition Influences Responses to immunotherapyGut Microbiome Composition Influences Responses to immunotherapy
Gut Microbiome Composition Influences Responses to immunotherapy
 
Single-Cell RNAseq in IPF
Single-Cell RNAseq in IPFSingle-Cell RNAseq in IPF
Single-Cell RNAseq in IPF
 

Recently uploaded

FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsOrtegaSyrineMay
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...Monika Rani
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Serviceshivanisharma5244
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxDiariAli
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
An introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingAn introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingadibshanto115
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfSumit Kumar yadav
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Velocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptVelocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptRakeshMohan42
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 

Recently uploaded (20)

FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
An introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingAn introduction on sequence tagged site mapping
An introduction on sequence tagged site mapping
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Velocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptVelocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.ppt
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 

Predictive Features of TCR Repertoire

  • 1. Quantifiable predictive features define epitope-specific T cell receptor repertoires Thi Nguyen, Ph.D. Candidate Graduate Biomedical Sciences | Immunology Theme University of Alabama at Birmingham (UAB) kimthi@uab.edu Summer Journal Club August 29th, 2018
  • 2. Outline 1. T cells background -TCR diversity 2. Experiment workflow 3. CD8+ epitope specific TCR repertoire general biochemical characteristics 4. Gene preference usage 5. TCRdist = measure difference between TCRs 6. CDR3 motif discovery 7. TCRdiv = measure TCR diversity 8. Nearest Neighbor Classifier
  • 3. T cells • T cell/T lymphocyte, is a type of white blood cell that is critical for immune defense • T cell can be distinguished from other leukocytes by the presence of TCR • derived from hematopoietic stem cells in bone marrow • mature in the thymus (thymocyte) • have many different subsets with distinct function (helper, killer, regulatory) • unique ability to recognize patterns between normal (self) vs abnormal (non-self or cancerous or sick/dying) cells (cell-mediated immunity) through pMHC binding. • Upon TCR-pMHC binding + costimulatory molecule binding, they become activated • Depending on the cytokine cues from environment, they become differentiated
  • 4. TCR diversity-V(D)J recommbination Immunobiology: The Immune System in Health and Disease. 5th edition. Janeway CA Jr, Travers P, Walport M, et al. New York: Garland Science; 2001. germline • Theoretical estimate 1015 -1061 TCR • Observed 106 TCR Gene rearrangement
  • 5. Experiment Workflow Mice (n=78) Influenza (i.n.) mCMV (i.p.) BAL Stain with tetramers sort Epitope-specific CD8+ T cells spleen Human (influenza, CMV, EBV) (n=32) Single cell mRNA Paired TCR amplification + sequence N = 4635 paired TCR sequences PBMC
  • 6. Paired TCR𝛂𝝱 amplification and sequence Pradyot Dash et al Methods Mol. Biol. 2015
  • 7. TCR𝛂𝝱 sequence analysis • V and J genes were assigned using BLAST against the IMGT database • CDR3 nucleotide and aa sequence were assigned based on the location of the conserved cysteine in the V region and the FGXG motif in the J region. • Full CDR3 starts at C104 and ending in F position of the FGXG motif • trimmed CDR3 starts at the 3rd position after C104 and terminate with 2nd position before F118 • To handle degenerate J-gene FGXG motifs, J aa sequence were manually aligned to define the F118 position before sequence analysis
  • 8. Extended table. TCR repertoires characteristics clonality = 1 – Simpson’s diversity index (normalized by the size of repertoire) Pshare = estimated rate a clone drawn from one subject has an identical aa sequence to another subject
  • 9. Extended Fig.1. CDR3 region characteristics of 10 epitope-specific TCR repertoires
  • 10. How do they quantify gene preference? Jensen-Shannon divergence (JSD) (total divergence to the average) • measure the similarity between two probability distribution/ quantifies how distinguishable two distributions are from each other. • Based on Kullback-Leibler divergence but it is symmetric and has a finite value • Basic form = entropy of the mixture minus the mixture of the entropy Gene usage preference = a normalized JSD between gene frequencies of epitope- specific repertoire and non-specific background from public dataset. • This can be generalized to a number of random variables with arbitrary weights:
  • 11. Gene correlation analysis • covariation between gene usage was quantified by adjusted mutual information • correct for the number and frequencies of the observed genes that cluster by chance • set lower significance threshold in Fig.1c, they randomly shuffled genes in each of the 60 gene pairing lists 100 times and recompute the adjusted mutual information. • The largest value observed in these 6000 random trials = lower significance threshold
  • 12. Fig.1: V and J gene segment usage and covariation in epitope-specific responses
  • 13. Extended Data Figure 2: V and J gene segment usage and covariation in epitope-specific responses
  • 14. Extended Data Figure 3. Schematic overview of the TCRdist • Similarity between TCR = similarity between pMHC- contacting loops. • Loops are defined based on IMGT CDR definition with modifications: (1) Include CDR2.5 (2) Use trimmed CDR3 • AAdist (Alignment score) = BLOSUM62 matrix • TCRdist = Sum (weighted AAdist) distance(a,a) = 0 distance(a,b) = min (4, 4-BLOSUM62(a,b) to reduce penalty for aa with positive BLOSUM62 score. • A gap penalty of 4 (8 for CDR3) = distance between gap position and an aa. • Weight of 3 is applied to mismatches in the CDR3.
  • 15. BLOSUM62 matrix BLOSUM = Block substitution matrix • Score alignment between protein sequence (locally , as opposed to PAM) • Based on observed alignments • Larger = higher sequence similarity => smaller evolutionary distance
  • 16. Clustering and dimensionality reduction • TCR with the largest number of neighbors within the distance threshold is chosen as a cluster center.  It and all its neighbors are removed from the repertoires  repeat the process until all TCRs have been clustered. • The distance threshold was chosen to yield homogeneous cluster of sufficient size, same threshold was used for all repertoire • result of this clustering method was visualized by average-linkage hierarchical clusterin trees and TCR sequence logos • They also use 2D kernel PCA (scikit-learn, KernelPCA function) to visualize the TCR landscape. This attempts to preserve similarity structure of the input data while reducing their dimensionality.
  • 17. Fig.2: TCRdist analysis of the M45 repertoire identifies clusters of related receptors TCR logos • summarize V and J gene usage, CDR3 aa sequence and inferred rearrangement structure of the CDR3 • 4 components: 1. V-gene logos (left): V-gene names are scaled by frequency and stacked top to bottom from most to least common 2. CDR3 sequence logo where aa are scaled and ordered by frequency and colored by chemical type. 3. a J-gene logo (right) 4. CDR3 where the genomic source regions for each nucleotide column are represented by frequency-scaled bars ordered top to bottom from V to D to J and colored according to their frequency.
  • 18. Extended Fig.4: 2D projections of mouse epitope-specific TCR repertoire • kernel PCA applied to TCRdist • Colored based on gene segment usage kernel PCA ~ nonlinear form of PCA
  • 19. CDR3 motif discovery • Motifs = fixed length patterns consisting of aa position, wild card positions, aa group positions (allowed groupings (K,R), (D, E), (N,Q), (S,T), (FYWH), (AGSP), (VILM)) • motif score = (observed –expected)2 /expected. • observed = number of times motifs were observed from TCR sequence • expected = values from background TCR (with V and J gene match observed repertoire • Starting with two-position motifs scoring above a seed threshold, each motif was iteratively extended by adding new specified position that increase the motif score. • motif scores were sorted and filtered for redundancy. • motif score above a threshold were extended to include near-neighbour TCR using a stringent distance threshold => capture additional patterns • final set of motif for each repertoire were visualized using TCR logo.
  • 21. Fig.3: Enriched CDR3 sequence motifs define key features of epitope specificity
  • 22. TCRdiv metric to measure repertoire diversity Simpson diversity index (D): • takes into account of both the richness and evenness of the population. • Measure the probability that 2 Individuals randomly selected from A population will belong to the same class. • 0 ≤ D ≤ 1 • 1 means the samples are identical • 0 means otherwise TCRdiv: • Estimate the expected value of a Gaussian function of the inter-sample that returns 1 if the samples are identical and exp(-(TCRdist(a,b)/s.d.)2) otherwise. • S.d. = 18.45 for single chain distance and twice that for paired analyses based on empirical assessments of receptor distance distribution for multiple epitopes. • TCRdiv =inverse of this estimate http://www.countrysideinfo.co.uk/simpsons.htm
  • 23. Extended Fig.8. TCRdiv measures for each chain and paired chains
  • 24. Nearest neighbor (NN) -distance classifier • receptor density within repertoire = sampling density nearby each receptor • = weighted average distance to nearest neighbor receptors in the repertoire • small NN-distance means higher local sampling density = many nearby neighbors • They use the nearest 10% of the repertoire with a weight that linearly decrease from nearest to farthest neighbors. • To compute AUROC score for the NN-distance classifier, epitope-specific TCR (positive and background receptors (negative) were sorted by NN-distance • ROC curve: sensitive = fractional recovery of epitope-specific receptor 1-specificity = fractional recovery of background receptors
  • 27. Fig.4 . Quantifying the defining features of epitope-specific populations
  • 28. Extended Fig 9. Specificity and Avidity of the dispersed TCR
  • 29. Modeling gene rearrangement • Each nucleotide of the CDR3 is assigned to either V, D, J or N-nucleotide insertion so as to minimize number of N. • They sample V and J gene segments from the observed receptors but generated the junctional sequences based on the inferred probability distribution for number of insertion and deletion from the background TCR (public data). • They also use these probability distributions to generate the random receptors that formed one of the two control set for CDR3 motif discovery algorithm. extended Fig.8.
  • 30. Summary • characterize 10 epitope-specific TCR repertoires of CD8+ T cells from 4600 single celled TCR:  gene segment usage  epitope selection • Develop TCRdist to quantify similarity among TCR based on spaces of TCR • Develop TCRdiv to quantify TCR repertoire diversity • develop a distance-based classifier that can assign unobserved TCR to characterized TCR Significance: • potential application to analyze clinical TCR repertoire data where the target is unknown such as in TIL. • propose that despite tremendous diversity of TCR, we can develop predictive model for TCR-pMHC recognition.

Editor's Notes

  1. TCR is heterodimeric surface receptor that mediates recognition of pathogens-associated epitope through interaction with pMHC. TCR are generagted by genomic rearrangement of germline TCR locus, a process called VDJ that has the potential to generate marked diversity of TCR, estimated from. However, only observed 106 TCR due to several limitation (biological, experimental, technical). This cartoon on left show the germline organization of the human T-cell receptor α and β loci:  cluster of 61 Jα gene segments is located a considerable distance from the Vα gene segments. The Jα gene segments are followed by a single C gene, which contains separate exons for the constant and hinge domains and a single exon encoding the transmembrane and cytoplasmic regions. The TCRβ locus (chromosome 7) has a different organization, with a cluster of 52 functional Vβ gene segments located distantly from two separate clusters each containing a single D gene segment, together with six or seven J gene segments and a single C gene. Each TCRβ C gene has separate exons encoding the constant domain, the hinge, the transmembrane region, and the cytoplasmic region (not shown). The TCRα- and β-chain genes are composed of discrete segments that are joined by somatic recombination during development of the T cell. For the α chain, a Vα gene segment rearranges to a Jα gene segment to create a functional V-region exon. Transcription and splicing of the VJα exon to Cα generates the mRNA that is translated to yield the T-cell receptor α-chain protein. For the β chain, the variable domain is encoded in three gene segments, Vβ, Dβ, and Jβ. Rearrangement of these gene segments generates a functional VDJβ V-region exon that is transcribed and spliced to join to Cβ; the resulting mRNA is translated to yield the T-cell receptor β chain. The α and β chains pair soon after their biosynthesis to yield the α:β T-cell receptor heterodimer, that can bind to a particular pMHC presented by APC. Now TCR that recognize the same pMHC molecular more often than not have completely different sequences. But for the most part, they also have certain similarity. This paper attempt to characterize the similarity and different of unique different TCR from TCR repertoire that are epitope specific, hoping that this will enable them to predictively model TCR sequence given known epitope and vice versa. Say, given a particular TCR sequence, we’re going to predict the pMHC it bind to.
  2. Historically studying the real TCR sequence in repertoire is impossible because of bulk sequencing. Now with recent advance in single cell sequencing, the possibility to study and model TCR and immune response is endless. First, they use flow cytometry to sort for CD8 T cells that are specific to certain epitope of virus by staining cells in pMHC-tetramer. After single cell sort, they subject these cells to paired TCR amplification, first, isolate RNA then make cDNA and do 2 rounds of nested PCR to amplify CDR3alpha and beta parralellely. Then they sequence these PCR products-> translate to aa sequence then combine alpha with beta to have the alphabeta coexpression profile.
  3. Degeneracy of genetic code = different nucleotide sequence encode the same polypeptide.
  4. They first analyze this TCR repertoire data set using established feature such as length, charge, hydrophobicity. a, TCR repertoires of 10 epitope-specific populations. Where as there are substantial levels of sharing or publicity were observed at the single chain level, lower level of sharing was observed, with 3 epitops (F2,m139 and pp65) have no public receptor in this data set. b, Biophysical characteristics of TCR repertoires of 10 epitope specific populations.
  5. CDR3 length, charge, hydrophobicity, and inferred number of junctional nucleotide insertions for both single and paired chains as shown in the histograms. Different epitopes are colour-coded. Gere show that mean values for CDR3 length, charge and hydrophobicity are tightly clustered for the majority of the epitopes and all these features show great degree of overlapping ranges. B. Correlation between CDR3α β and antigenic peptides for charge, hydrophobicity, length, and N-insertions observed in all 10 epitopes. They found a negative correlation between CDR3 charge and peptide charge, peptide length and CDR3 length, suggesting that charge and length complemetarity may have a role in pMHC recognition.
  6. To quantify gene preference, they constructed a background, non-epitope selected repertoire by combining public data and compare gene frequencies between epitope-specific repertoire to those seen in background set.
  7. Fig 1 A . Gene segment usage and gene–gene pairing landscapes are illustrated graphically using four vertical stacks (one for each V and J segment) connected by curved segments with thickness proportional to the number of TCRs with the respective gene pairing (each panel is labelled with the four gene segments atop their respective colour stacks and the epitope identifier in the top middle). Genes are coloured by frequency within the repertoire with a fixed colour sequence used throughout the manuscript which begins red (most frequent),green (second most frequent), blue, cyan, magenta, and black. Clonallyexpanded TCRs were reduced to a single data point for this analysis. Thenumber of clones is indicated to the left of each panel. The enrichment of gene segments relative to background frequencies is indicated by up or down arrows, with each successive arrowhead corresponding to an additional twofold deviation. A shows the degree of dominance of single gene and pairwise gene associations. Each epitope specific response is characterized by an overrepresentation of individual genes as well as gene pairing preferences. Example is PB1 epitope, where Trav3-3, Tra26 and Trb2-3 are used in the single largest block of receptors. B. Jensen-Shannon divergence between observed gene usage of the epitope-specific CD8+ T cell repertoire compared to background, normalized by mean Shannon entropy. Higher value show higher gene preference. C.. Show gene usage correlation between V and J segment within a chain or across chain
  8. Gene segment usage and gene–gene pairing landscapes are illustrated graphically using four vertical stacks (one for each V and J segment) connected by curved segments with thickness proportional to the number of TCRs with the respective gene pairing (each panel is labelled with the four gene segments atop their respective colour stacks and the epitope identifier in the top middle). Genes are coloured by frequency within the repertoire with a fixed colour sequence used throughout the manuscript which begins red (most frequent), green (second most frequent), blue, cyan, magenta, and black. Clonally expanded TCRs were reduced to a single data point for this analysis. The number of clones is indicated to the left of each panel. The enrichment of gene segments relative to background frequencies is indicated by up or down arrows, with each successive arrowhead corresponding to an additional twofold deviation.
  9. To map epitope specific TCR landscape at high resolution and to quantify similarity between TCR, they developed TCRdist = a distance measure on the space of TCR, guided by structural information on pMHC binding. Each of the two TCRs being compared is first mapped to the amino acid sequence of its CDR loops (CDR1, CDR2, and CDR3 as well as an additional variable loop here labelled ‘CDR2.5’), as indicated by the black arrows leading from the coloured loop regions in the receptor structures to the corresponding amino acid sequences in the middle of the diagram. These CDR sequences are aligned based on the IMGT reference multiple sequence alignments, and a distance score (‘AAdist’) is computed for each position in the alignment using the BLOSUM62 similarity matrix according to the formula given in the box at the bottom left. The Aadist scores are weighted as shown in the ‘weight’ row (thereby increasing the contribution of the CDR3 regions) and summed to produce the final TCRdist score (shown at the right).
  10. 2A shows gene usage just like fig1. 2B . They use TCRdist value to do a 2D kernel principal components analysis (PCA) . Then the clustere are coloured by Vα (left panel) and Vβ (right panel) gene usage. Three groups of receptors that correspond to TCR logos and clusters depicted in c are indicated with dashed ellipses. So they obstain a coarse-grained visualization of each repertoire by mapping high-dimensional TCR landscape into 2 dimensions, each dot representing a TCR, clustered based on TCRdist. To complement these landscape projetion, they also constru hierarchical trees and a TCR logos. TCR logo summarize the gene frequencies, CDR3 aa sequence and inferred rearrangement structures to further annotate these clusters. Examination of these trees show that repertoire mostly contained dominant cluster of receptors due to common V-J region usage but also by similarity of CDR3 motifs. In addition to the core clusters of similar receptors, each repertoire also has divergent regions that are clearly distinct from each other (For the dendogram and TCR logos of other TCR repertoire, we can take a look at the extended data figure 5-6. Looking at the logos, many of the shared CDR3 residues are derived directly from genomic sequence, and thus reflect the biased gene usage.
  11. Inspection of these projected landsccape allows us to identify subregions of the TCR repertoires that are tightly clustered (similar), and associate it with the gene segment usage. The standard PCA always finds linear principal components to represent the data in lower dimension. Sometime, we need non-linear principal components.If we apply standard PCA for the below data, it will fail to find good representative direction. Kernel PCA (KPCA) rectifies this limitation. Kernel PCA just performs PCA in a new space. It uses Kernel trick to find principal components in different space (Possibly High Dimensional Space). PCA finds new directions based on covariance matrix of original variables. It can extract maximum P (number of features) eigen values. KPCA finds new directions based on kernel matrix. It can extract n (number of observations) eigenvalues. PCA allow us to reconstruct pre-image using few eigenvectors from total P eigenvectors. It may not be possible in KPCA. The computational complexity for KPCA to extract principal components take more time compared to Standard PCA.
  12. They hypothesize that motifs that are not germline encoded are more likely to be contributor of specificity. To identify these features directly, they performed statistical analsysi of overrepresented CDR3 motifs, taking into account the underlying sequence bias introduced by the rearrangement process.
  13. the grouping of aa was based on their similarity in charge, and hydrophobicity
  14. They hypothesize that motifs that are not germline encoded are more likely to be contributor of specificity. To identify these features directly, they performed statistical analsysi of overrepresented CDR3 motifs, taking into account the underlying sequence bias introduced by the rearrangement process. Here fig3, they show top-scoring motif for both CDR3a and b for 10 repertoire along with the residues that are enriched relative to the background distribution. Their results are also supported by the solved ternary structures for PA, BMLF and M1 due to the fact that the enriched non-germline residue either directly contact pMHC or contribute to the stabilization of the CDR3 loop confirmation.
  15. They develop a new diversity metric that generalize Simpson’s diversity index by capturing the similarity among receptors in addition to exact identiy.
  16. They develop a new diversity metric that generalize Simpson’s diversity index by capturing the similarity among receptors in addition to exact identity. Shows TCRdiv diversity measure for the 10 epitope specific CD8+ T cells repertoire, in each chain as well as in paired.Examining the TCRdiv score clarify trends seen in the earlier analysis. e.g. PB1 repertoire show low diversity in the alpha chain but high diversity in beta chain, wheras the opposite is true for the M38 repertoire.
  17. Then since their landscape analysis suggest that each repertoire is composed of one or more groups of clustered receptors sharing similar sequence feature together with a more diverse, outlying receptors. So to measure the receptor density within repertoire and quantify the relative contribution of clustered and diverged TCR, they develop a repertoire specific NN score that capture the density of receptors surrounding each receptor.
  18. They developed a new diversity metric (TCRdiv) that generalizes Simpson’s diversity index by capturing similarity among receptors in TCRdiv diversity measures (a) and smoothed density profiles of the nearest-neighbours (NN) distance (b) are shown for each repertoire. B shows majority of TCR exhibit a biomodal distribution in terms of their nearest neighbor distances, with one peak shows low NN distance, representin the dominant and densely sampled cluster and the 2nd peak with bigger NN distances to reflect all the outlier receptor. They also confirm the ag specificty of these non cluster receptors by cloning the receptors into TCR-null cells and measure their ability to tetramers and confirmed that indeed, these outliers receptors represent legitimate but unconventional epitope specificity. (c)To test the predictive power of TCRdist, they defined a TCR classifier that assign a given receptor to the repertoire with the lowest NN distance. They first measure the the sensitivity vs specificity of the classifier for identifying epitope specific receptors among a pool of randomly generated background receptors. d, The area under these ROC curves (AUROC), a standard measure of classification success is above 0.8 for all, except for pp65 repertoire. They are also the one with the highest diversity. e, Indeed TCR div is negatively correlated with AUROC, with the most diverged receptor being harder to be descriminated from the background. f, To validate the TCR classifier, they generated an additional dataset by using index sorted cells stained with 4 tetramer, NP, PB, PB1 and F2 from airway of influenza infected mice. Cells were sorted without the index tetramer information, sequenced and assigned to one of 4 epitopes or non-specific response using the NN-classifier. The predictor correctly assigned most TCR sequences to target epitope as identify by tetramer staining with AUROC =0.9 for 3, for F2, itis about 0.72 for single cells and 0.85 for clonotypes, possibiliy because this epitope has the fewest receptor sequences available to train the classifier. Importantly 85% receptors correctly classified were not previously observed, demonstrating the power of this approach to classify nocel ag-specific receptors. Also a significant population of cells fell below the threshold for tetramer positivity yet were assigned to a specific epitope by the NN clasifier. They hypothesize that these cells maybe specific for predicted epitope yet could not be identified by tetramer staining. Assignment of TCR sequences from influenza-infected lung without prior knowledge of their tetramer specificity by NN-distance classifier. Tetramer binding (mean fluorescence intensity (MFI), x axis) is plotted against NN-distance score (y axis) for a validation set of T cell receptors (n = 856 TCRs; 352 clones) collected after development of the classifier. The solid vertical lines indicate the MFI thresholds used to define epitope-positive receptors, which are plotted with the colours given in the legend (receptors negative for all four tetramers are shown in grey). Raw MFI values were scaled to align the threshold values across tetramers. Dotted horizontal lines indicating a fixed NN-distance score are provided for visual reference.
  19. A+ B: They also confirm the ag specificty of these non cluster receptors by cloning the receptors into TCR-null cells and measure their ability to tetramers and confirmed that indeed, these outliers receptors represent legitimate but unconventional epitope specificity. C: The distribution of the tested TCRs (numbered 1–5 corresponding to left to right occurrence in on a NN-distance plot and d, their V-J usage and CDR3 sequences with NNdistance score are shown. E. Analysis of the mean fluorescence intensities (MFI) of the clustered and dispersed (separated by visual threshold of 135 NN-distance score) group of receptors shows no consistent segregation of the avidity. Mean and standard error of mean are shown. f, PB1-specific TCRs derived from cells sorted by low, intermediate and high gating show overlapping distribution of NN-distance scores (n = 23 (low), 18 (intermediate), 23 (high) cells). So basically there is no correlation betwen avidity of binding with NN distance.
  20. However, they found a strong correlation between receptor density and TCR generation probability, suggesting the ease of generation explains a portion of the variation in the landscape structure.