"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Mining for Novel TNF Ligands
1. Mining for Novel TNF Ligands Using Unison,
http://unison-db.sourceforge.net/ an Open Source Database for Target Discovery
Reece Hart <rkh@gene.com> Departments of Bioinformatics and Protein Engineering Genentech, Inc. San Francisco, CA 94080
Abstract Mining Curated Sequence Databases Mining Six-Frame Translations of the Human Genome Mining Pathogenic Sequences for TNF-like Structures
Tumor Necrosis Factor (TNF) ligands, acting through their cognate TNF receptors, are We've mined public and proprietary sequence sources using many methods, including Most TNF ligands are encoded in the Human Genome with the majority of the TNF Because extensive expression cloning and computational prediction failed to identify a
critical to numerous immunological responses, including B and T cell differentiation, hidden Markov models and PSI-BLAST profiles from Pfam, CDD, Superfamily, and domain in a single exon. This suggests that it might be possible to detect novel TNFs novel human TNF ligand which bound any of the orphan TNF receptors, we began to
apoptosis, and inflammation. Several “orphan” TNF receptors exist for which the custom sequence- and structure-based alignments, and threading using Prospect (Xu by scanning naïve six-frame translations of ORFs. For calibration of scoring functions, consider the possibility that these receptors might bind pathogenic proteins either as a
corresponding ligands are unknown. Over the past several years, we have undertaken and Xu) and ProHit (Sippl). The figures below outline one way to integrate and analyze we instead chose to scan fixed-length subsequences of 6-frame translations, as shown surveillance mechanism or as an exploited “security hole” (as with herpes virus binding
attempts to identify these unknown ligands from curated protein sequences, six-frame these data in Unison. below. to HVEM, a TNF receptor). Recently, a new sequence appeared in Swiss-Prot which
translations of the human genome, and from pathogenic sequences. This poster = mouse click threads extremely well to TNF backbones and occurs in a virus known for its host
Six-Frame Translation and Threading Method
summarizes these efforts and introduces Unison, an Open Source database for evasion mechanisms.
organizing and mining complex proteomic data. 1 2 3 4 5 6 7 X 8 9 10 11 12 13 14 15 16 17 18 19 20 Y 21 22
3B bp
Tumor Necrosis Factor Ligand Family 450 NT fragment ● UCSC genome assembly (NHGD34)
450bp w/150bp overlap generates:
TNF ligands are type II membrane proteins which belong to the C1q-TNF superfamily
●
– 10 M fragments
and signal through corresponding TNF receptors. Three putative TNF receptors have no – 60M 6-frame translations
~500M ORF fragments
known ligand, and this suggests that other ligands remain to be discovered. Most TNF ≤150 AA six-frame translations
–
– 27M fragments w/length ≥50AA ( )
domains are encoded by a single exon and bind one distinct TNF receptor, although X – fragments <50AA ( X ) were discarded
27M fragments were threaded against 22 TNF
there are exceptions to both rules. The currently known TNF ligand-receptor ●
superfamily members (TNF+C1q)
interactions and exon structures are shown below. X ● 900K (of 27M) had score <=250; each was
1. Viral sequences sorted by the best TNF-C1q threading
1c28, 1gr3
1d2q,1dg6
1jh5, 1kxg
threaded against 3286 representative chains
X X X
1aly, 1i9r
1iqa, 1jtz
2. Review candidates
2tnf (mus)
“raw” score.
Others:
● total time: 176 CPU-weeks (4 weeks on 22 2-cpu
9sgh
TNF Family Exon Structure machines)
Clicking any of the classified results at left returns a list
1tnf,
Most TNF domains are encoded within a single exon X X X VA28_MCV is one of a family of orthologous A28
of distinct sequences with their “best” annotations. proteins in poxvirii.
0 50 100 150 200 250 1. Integrating multiple search methods
Lta 1
TNFa 2
Ltb 3
A single Unison page allows users to select and 2. Threading results for VA28_MCV aligned to 3286 FSSP
OX40L 4 integrate results from HMMs, PSSMs, and Prospect2
1d0g 1d4v 1du3
representative backbones. TNF and C1q family members
CD40L 5 threadings to any family of models (TNFs in this Distribution of Prospect2 raw scores
FasL 6
histogram shows the distribution of the best (lowest) are among the best fold recognition templates.
CD27L 7 case). “Hits” are then classified into true positives,
“raw” score for the alignment of each 150AA six-frame
1tnr
CD30L 8 false negatives, and “unknown” positives
41BBL 9
(candidates) by reference to a curated list of known
translation fragment to TNF-C1q superfamily backbones. Fragment threading identifies NP_848635.1
Frequency
Fragment 8602 is highlighted and shown as an example
1bzi*
TRAIL 10
Screenshots showing ambiguous alignment to different regions on
RANKL 11 family members. below. chr 13.
TWEAK 12
APRIL 13
Unfortunately, only distinctly C1q-like proteins have been
NP
BLyS 13B
LIGHT 14 identified so far.
VEGI
AITRL
15
18 TBD: 166 w/score ≤ -120 ▶
EDA (max TNF fragment score = -154)
Exon TNF Domain
analyzed: 76 w/score ≤ -200
Adapted from Bodmer, Schneider, Tschopp
TiBS 27(1): 19-26 (2002).
Twenty-two structures of TNF and C1q structures are known, all of which have 8602
4. Genomic map. Best raw score to any TNF SF member (lower is better)
profound structural similarity among the ligands despite very poor sequence similarity
(average pairwise identity is between ~ 9 and ~30%). Identifying TNFs by sequence- Unison contains rudimentary protein-to-genome
alignments using BLAT. This sequence has a high-
based methods is difficult because of the poor sequence conservation and their quality orthologous C-terminal fragment from mouse. 3. For comparison, the alignment of Apo2L/TRAIL to
similarity to C1q proteins, which are not relevant to our interest in ligands for the Clicking the map opens an in-house viewer with more the same FSSP representatives. The raw score for the
extensive genomic mapping data. Threading of Unison:8602 to 1c28a alignment of VA28_MCV to 1gr3a, a TNF-C1q family
orphan receptors. Unison provides on-the-fly threading visualization via JMol, PyMOL, member, is denoted by the red triangle (▶) and is
and RasMOL. (PyMOL is used below.)
Legend: blue=identity; cyan=similarity; red=dissimilarity;
blue cyan red comparable to those for alignments of known TNFs to
CD40L (1aly) structure-based alignment of two TNFs by CE yellow=cysteine; yellow spacefill= conserved cysteine; grey=query
yellow spacefill grey other TNF-C1q structures.
gap/template insert; >nAA< = query insert/template gap
Threading Results for Fragment 8602 4. A28 aligned to CD40L.
A' looks more C1q-like than TNF-like, but close Legend: blue=identity; cyan=similarity; red=dissimilarity;
blue cyan red
A 120º
B yellow=cysteine; yellow spacefill= conserved cysteine; grey=query
yellow spacefill grey
gap/template insert; >nAA< = query insert/template gap
H B'
C
F
A' Reasons for hope: Reasons for doubt:
3. Summary of features for Unison:8602. ● VA28_MCV has a signal peptide and is known to be on ● threading alignment has a significant deletion (but is
G H A viral coat; conditional mutants abolish entry nearly as good as other intra-TNF family alignments)
E D C MCV has numerous genes for host evasion, including A28 doesn't thread as well to other TNF backbones
B ● ●
F B' 1aly (CD40L) homologs for a Death Effector Domain which inhibits ● other A28s don't thread well to TNFs
90º 1tnf (TNFα) caspase-8 (also found in HSV), IL18 BP, and MHC ● some viral capsid proteins also have a similar fold
G class I complex which may act as a decoy. (but in RNA viruses)
E D ● There is a precedent for viral entry via TNFR: HSV ● VA28_MCV does not appear to stimulate any of the
CE-generated alignment enters via TNFRSF14/HveA/HVEM. orphan receptors. Non-orphans have not been tested.
141 aligned residues
2.2 Å RMSD (backbone)
● MCV infects keratinocytes, which are known to
26% Identity (c.f. 19% by S-W) 5. On-the-fly re-threading of sequence 8602 to express TNFR during their development
c.f. 0.71 Å RMSD / 65 AA
0.78 Å RMSD / 48 AA 1aly-1c28a
the TRAIL ligand viewed with RasMOL (PyMOL
and JMol are also supported).
About Unison Unison Contents Conclusions and Directions Acknowledgments
Unison is a database of non-redundant protein sequences, diverse computational ● >5M distinct sequences from >40 reliable and speculative sources covering >9900 ● We have identified several candidate TNF ligands among curated and speculative Kiran Mukhyala and David Cavanaugh have contributed immensely to Unison.
predictions based on these sequences, and extensive auxiliary data which facilitate species human sequence databases, six frame translations of the R34 release of the human
interpretations of the predictions. The intent is to provide an integrated resource for ● features and alignments from BLAST, PSI-BLAST, HMMER, Prospect threading, GPI genome, and pathogenic sequence, but none appear to bind the orphan TNF The TNF mining effort was a multi-year collaboration within Genentech and included:
complex feature-based mining for target discovery and target elimination. Unison anchoring, TM detection, signal prediction, cellular localization, genomic localization, receptors. Vishva Dixit, Wayne Fairbrother, Sarah Hymowitz, Nobuhiko Kayagaki, Nick Skelton,
includes command line tools and a web interface. The schema, tools, web interface, regular expressions, CE alignments, and secondary structure prediction ● A large number of C1q-like sequences exist in the human genome. Minhong Yan, and Zemin Zhang.
and dumps of non-proprietary data have recently been released under the Academic ● external databases: NCBI taxonomy, HomoloGene, GO, PDB (w/enumerated seqres- ● Unison has facilitated the management, update, and analysis of an enormous amount
Free License and are available at http://unison-db.sourceforge.net/ . resid mapping), SCOP, MINT, Derwent Patent Database of diverse precomputed data. Thanks to Genentech and William Wood for providing a great place to work.