Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

800 views

Published on

Presented at the University of Nottingham, 2005

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
800
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • AA_DERWENT 1,226,302 sequences;
  • Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

    1. 1. Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family <ul><li>Christopher Southan </li></ul><ul><li>Department of Molecular Pharmacology </li></ul><ul><li>AstraZeneca R&D, Mölndal, Sweden </li></ul>
    2. 2. Outline <ul><li>Introduction </li></ul><ul><li>Proteomic identification of novel secreted rat Ly6 proteins in EST data </li></ul><ul><li>Discovery of unknown homologues </li></ul><ul><li>Bioinformatic analysis of chimeric mRNAs </li></ul><ul><li>Database errors propagated by the chimeras </li></ul><ul><li>Delineating a large secreted Ly6 family on the rat genome </li></ul><ul><li>Discovery of mouse homologues but no clear orthologues </li></ul><ul><li>Equivocal biochemical results for homologues </li></ul><ul><li>Summary of bioinformatic pitfalls </li></ul>
    3. 3. Introduction: Quirks that Lurk in Databases <ul><li>The sequence deluge into the primary databases necessitates automated pipelines to produce 'value added' secondary databases </li></ul><ul><li>But, however sophisticated the data parsing or curation, anomalies will get through </li></ul><ul><li>Most things that could have gone wrong, have </li></ul><ul><li>Although the overall quirk frequency is low, they present pitfalls for the unwary </li></ul><ul><li>Responsibility for primary annotation and sequence quality lies solely with submitting authors </li></ul><ul><li>Few originating authors correct, update or withdraw their primary sequence entries </li></ul><ul><li>It is difficult to discriminate between in vitro artifacts or rare in vivo events </li></ul>
    4. 4. Rat Urine  HPLC  Intact MALDI  N-Terminal Sequence High-speed microbore column
    5. 5. Rat Urine  2D-Gel  Trypsin  MS/MS  PepSea Search  EST hits <ul><li>Spot 1 gave two different </li></ul><ul><li>peptide matches </li></ul><ul><li>CTSFDSTGFCHVGR contained within rat EST A893514 </li></ul><ul><li>CESLDSTGLCR contained within rat EST AA800439 </li></ul>
    6. 6. EST AA893514 vs. dbEST: 30 Rat Hits at 95% to 100% Identity
    7. 7. Assembly of Rat Urinary Proteins 1 and 2 <ul><li>9 EST sequences, the MS/MS sequences, and the N-terminal Edman data, were consistent with two paralogous proteins </li></ul><ul><li>90% identical at the AA level and 96% identical at the DNA level </li></ul><ul><li>Highly represented in rat liver ESTs </li></ul><ul><li>One N-glycosylation site with 1.6 to 2.0 Kda glycan </li></ul><ul><li>Secreted forms abundant in male rat urine by HPLC </li></ul><ul><li>RUP1 independently verified as liver regeneration-related protein by full mRNA </li></ul>verified signal peptide  RUP1 MGKHILLLPLGLSLLMSSLLA LQ C FRCTSFDSTGFCHVGRQK C QTYP DEICAWVVVTTRD ||| ||||||||||||||||||||||| |:||||:|:|||: |||||||||||||||||| RUP2 MGKPILLLPLGLSLLMSSLLA LQ C FRCESLDSTGLCRVGRRI C QTYP DEICAWVVVTTRD RU P1 GKFVYG NQS CAECNATTVEHGSLIVSTNCCSATPFCNMVHR 101 ||||||||||||| :|||||||||:||||||||||||||| RUP2 GKFVYG NQS CAECIGTTVEHGSLIISTNCCSATPFCNMVHP 101
    8. 8. RUP3: Independent MS-based Identification by Wait et al. “Proteins of rat serum, urine and CSF:VI” Electrophoresis 22, 3043-3052 (2001) RUP1 MGKPILLLPLGLSLLMSSLLALQCFR CESLDSTGLCRVGR RICQTYPDEICAWVVVTTRD RUP2 MGKHILLLPLGLSLLMSSLLALQCFR CTSFDSTGFCHVGR QKCQTYPDEICAWVVVTTRD RUP3 MGKHILLLPLGLSLLMSSLLALQCFR CISFDSTGFCYVGR HICQTYPDEICAWVVVTTRD *** *********************** * **** * ***. ****************** RUP1 GKFVYGNQSCAECIGTTVEHGSLIISTNCCSATPFCNMVHP EST AA800439 RUP2 GKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR EST AA893514 RUP3 GKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR EST AA893518 ************* *********.***************
    9. 9. RUP Paralogues Define a New Family of Secreted Ly-6 Proteins
    10. 10. A Quirky Result: Solid Matches Between RUP2 and Four Unrelated mRNAs <ul><li>R at mitochondrial IF1 protein mRNA, L07806 , 883 bp </li></ul><ul><li>Rat casein kinase II alpha subunit (CK2) , L15618, 2180 bp </li></ul><ul><li>Rat mitochondrial succinyl-CoA synthetase alpha subunit J03621 , 1684 bp </li></ul><ul><li>Rat 3' non-translated beta-F1-ATPase mRNA-binding protein mRNA AF368860, 1197 bp </li></ul><ul><li>Matches of 92% to 100% identity over 300-500 bases </li></ul><ul><li>Two in reverse-frame, two in forward frame </li></ul>
    11. 11. Three RUP-like Chimeras and a Pre-mRNA L07806 F1-ATPase inhibitor AF368860 UTR F1-ATPase inhib L15618 casein kinase II alpha J03621 mito succinyl-CoA synthase alpha
    12. 12. Translation Matches for the Chimeras Reveal a Cryptic Protein RUP-2 28 TSFDSTGFCHVGRQKCQTYPDEICAWVVVTTRDGKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR 101 TSFDSTGFCHVGRQKCQTYPDEICAWVVVTTRDGKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR 417 TSFDSTGFCHVGRQKCQTYPDEICAWVVVTTRDGKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR 196 L07806 Rattus rattus mitochondrial IF1 protein mRNA RUP-2: 59 RDGKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR 101 RDGKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR 708 RDGKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR 580 L15618 Rat casein kinase II alpha subunit (CK2) mRNA RUP-2 24 CFRCTSFDSTGFCHVGRQKCQTYPDEICAWVVVTTRDGKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMV 99 CF C + +S G C+ C +P E+CA V+T +DGKFVYGNQSCAEC+ TVEHGSLIVSTNCCSAT FCN+V 50 CFECGNLNSMGICNFRTAVCYAHPGEVCA-SVLTYKDGKFVYGNQSCAECSGRTVEHGSLIVSTNCCSATSFCNIV 274 J03621 Rat mitochondrial succinyl-CoA synthetase alpha subunit
    13. 13. RUP1 Gene Structure
    14. 14. Matching the Chimeras Against the Rat Genome <ul><li>SCORE START END QSIZE IDENTITY CHRO STRAND START END </li></ul><ul><li>------------------------------------------------------------ </li></ul><ul><li>L15618 Rat casein kinase II alpha subunit </li></ul><ul><li>1451 709 2177 2180 99.9% 3 + 142470350 142514932 </li></ul><ul><li>799 1091 2161 2180 90.2% 10 - 39567792 39568918 </li></ul><ul><li>313 392 711 2180 99.1% 8 - 36902949 36905031 </li></ul><ul><li>L07806 Rattus rattus mitochondrial IF1 protein </li></ul><ul><li>405 420 826 833 100.0% 5 + 152628418 152632060 </li></ul><ul><li>398 8 415 833 99.1% 8 - 36902399 36905032 </li></ul><ul><li>J03621Rat mitochondrial succinyl-CoA synthetase subunit </li></ul><ul><li>1203 472 1684 1684 100.0% 4 + 106816653 106845979 </li></ul><ul><li>469 1 472 1684 100.0% 8 - 36133698 36137263 </li></ul><ul><li>AF368860 Rattus norvegicus 3' non-translated beta-F1-ATPase </li></ul><ul><li>1118 1 1120 1120 100.0% 8 + 37247995 37251530 </li></ul><ul><li>1016 1 1120 1120 96.9% 8 + 36688890 36905034 </li></ul><ul><li>1006 1 1120 1120 95.6% 8 + 36901482 37055697 </li></ul>
    15. 15. Multiple Loci on Rat Chromosome 8: Erroneous Mapping of the Chimeras L15618 casein kinase II alpha L07806 F1-ATPase inhibitor AF198441 Rat RUP2 AF198442 Rat spleen protein 1
    16. 16. What Caused the Chimeras? <ul><li>Each of the chimeric cDNAs submitted by different research groups 1988-1993 </li></ul><ul><li>All were prepared from rat cDNA libraries </li></ul><ul><li>Two of these genes are nuclear-encoded mitochondrial proteins </li></ul><ul><li>L07806-IF1 has 2 non-chimeric counterparts </li></ul><ul><li>Hits to rat genome data confirm the three 'host' transcripts are on different loci </li></ul><ul><li>The 5' insertions are different sequences, lengths and orientations </li></ul><ul><li>L15618 is single-exon insert and maps to an unexpressed locus </li></ul><ul><li>Are these insertions of RUP2-like genes in vitro artefacts or rare translocation events in vivo ? </li></ul>
    17. 17. Protein Database Entries from the Chimera and Pre mRNA <ul><li>The L07806-derived chimeric protein was chosen as the reference sequence by NCBI </li></ul><ul><li>NP_037047 ATPase inhibitor, mitochondrial precursor length = 107: </li></ul><ul><li>NP_037047 MTKSCRIEAST LGVWGMRVLQTRGFGSDS </li></ul><ul><li>M S + LGVWGMRVLQTRGFGSDS </li></ul><ul><li>Q03344 MAGSALAVRARLGVWGMRVLQTRGFGSDS </li></ul><ul><li>but Swiss-Prot Q03344 highlights the discrepancy and correctly chooses “normal” rather than the chimera </li></ul><ul><li>CONFLICT MAGSALAVRAR -> MTKSCRIEAST (IN REF. 1). </li></ul><ul><li>The L07806-derived chimeric protein, without the targeting sequence, was expressed as a maltose binding protein fusion in E coli and was fully active! </li></ul><ul><li>tr Q91XP0 3' non-translated beta-F1-ATPase mRNA-binding protein: Length = 28 </li></ul><ul><li>The artefactual sequence includes an exon </li></ul><ul><li>Q91XP0 and AAK61874 MGKHILLLPLVLSLLMSSL QDSCGHEPS </li></ul><ul><li>RUP1 MGKHILLLPLGLSLLMSSLLLALQCFRCTSFDSTGFCHVGRQK... </li></ul>
    18. 18. The L07806 Chimera Caused Errors in U niGene
    19. 19. RUP Gene Family on Rat 8q21
    20. 20. Rat and Mouse RUP Homologues are Highly Diverged
    21. 21. Sequences Conserved in Rat but Divergent in Mouse
    22. 22. Homologues in Five Mammals but True Orthology Unclear
    23. 23. Remote Human Homolgues but no Strict Ortholgues <ul><li>>tr|AF462605|Q8WXA2|9AD752F00D901FFE PATE.[Homo sapiens] (expressed in prostate and testis) Length = 126 </li></ul><ul><li>Score = 31.2 bits (69), Expect = 3.3 </li></ul><ul><li>Identities = 21/79 (26%), Positives = 32/79 (39%), Gaps = 6/79 (7%) </li></ul><ul><li>RUP1 : 23 QCFRCESLDSTGLCRVGRRICQTYPDEICAWVVVTTRDGK----FVYGNQSCAECIGTTV </li></ul><ul><li>QC C C GR IC +E C + RDG F+ ++CA+ G + </li></ul><ul><li>PATE : 47 QCRMCHLQFPGEKCSRGRGICTATTEEACMVGRMFKRDGNPWLTFMGCLKNCADVKG--I </li></ul><ul><li>Query: 79 EHGSLIISTNCCSATPFCN 97 </li></ul><ul><li>+++ CC + CN </li></ul><ul><li>Sbjct: 105 RWSVYLVNFRCCRSHDLCN 123 </li></ul>
    24. 24. Threading Reveals Homology between RUP1, Lynx1 and Snake Toxin Structures Lynx1, an Endogenous Toxin-like modulator of AChRs in the CNS,
    25. 25. Why so Few Apparent Orthologues?
    26. 26. P55000 : Antineoplastic Urinary Protein/S ecreted Mammalian Ly-6/uPAR Related Protein – Equivocal Annotation
    27. 27. Linking Sequence to Function: the Lost Keyword Problem (PubMed Queries in red) <ul><li>Adermann et al. &quot;Structural and phylogenetic characterisation of human SLURP-1, the first secreted mammalian member of the Ly-6 /uPAR protein superfamily&quot; Protein Sci. 1999 … from blood and urine peptide libraries. SLURP-1 is encoded by the ARS (component B)-81/s locus, and appears to be the first mammalian member of the Ly-6/uPAR family lacking a GPI-anchoring signal sequence ... SLURP-1 (+) Ly-6 (+) ANUP (-) </li></ul><ul><li>Katz et al &quot;A partial catalogue of proteins secreted by epidermal keratinocytes in culture.&quot; J Invest Dermatol. 1999 … proteins secreted by adult human epidermal keratinocytes included anti-neoplastic urinary protein (+) ANUP (-) SLURP-1(-) Ly-6 (-) </li></ul><ul><li>Fischer et al. &quot;Mutations in the gene encoding SLURP-1 in Mal de Meleda&quot;. Hum Mol Genet. 2001 … Three different homozygous mutations (a deletion, a nonsense and a splice site mutation) were detected in 19 families of Algerian and Croatian origin … first instance of a secreted protein being involved in a palmoplantar keratoderma.. SLURP-1 (+) Ly-6 (+) ANUP (-) </li></ul>
    28. 28. Mouse Ly-6-like Caltrin: Sequence Errors, Unverified Reported Function, New Name and New Function?
    29. 29. Confusion Over Caltrin: 5 Different Sequences in SwissProt; 22 PubMed Citations <ul><li>Caltrin = inhibition of Ca2+ uptake into spermatozoa </li></ul><ul><li>CALTRIN PRECURSOR (CALCIUM TRANSPORT INHIBITOR). - Mus musculus ( a Ly-6 protein) </li></ul><ul><li>CALTRIN PRECURSOR (CALCIUM TRANSPORT INHIBITOR) (SEMINALPLASMIN) (SPLN). - Bos taurus (PYY-like) </li></ul><ul><li>CALTRIN-LIKE PROTEIN I. - Cavia porcellus (weak protease inhibitor match) </li></ul><ul><li>CALTRIN-LIKE PROTEIN II. - Cavia porcellus (elastase inhibitor like) </li></ul><ul><li>PANCREATIC SECRETORY TRYPSIN INHIBITOR II PRECURSOR (PSTI-II) (CALTRIN) (CALCIUM TRANSPORT INHIBITOR). - Rattus norvegicus (trypsin inhibitor identity) </li></ul>
    30. 30. Limited Knolwedge for the Short Ly-6 Proteins <ul><li>Single domain proteins ~85-100 residues mostly with signal peptide </li></ul><ul><li>Probable ligands by inference from toxin structures? </li></ul><ul><li>Recently duplicated rodent parologous family of 6 -10 gene loci but very different evolutionary trajectories between mouse and rat </li></ul><ul><li>Liver and spleen expression in rat </li></ul><ul><li>Significant amounts of multiple gene products, probably glycosylated, secreted in male rat urine </li></ul><ul><li>Foetal expression for pig, bovine and horse orthologues </li></ul><ul><li>Rapid evolution in mammals </li></ul><ul><li>Mix of secreted and GPI anchored homologues in human </li></ul><ul><li>Human Lynx-1 modulating AChRs </li></ul><ul><li>SLURP linked to skin physiology </li></ul><ul><li>Caltrin/SVS VII Phospholipid binding </li></ul><ul><li>Homologues involved in myelopoiesis in Xenopus and liver acute phase in rainbow trout </li></ul>
    31. 31. Summary of the Bioinformatic Pitfalls <ul><li>The chimeric and pre-mRNAs lead to: </li></ul><ul><ul><li>Artifactual clustering of ESTs and non-homologous gene products in Unigene </li></ul></ul><ul><ul><li>Protein database conflicts and artifacts </li></ul></ul><ul><ul><li>Propogation of errors in RefSeq and rat genome </li></ul></ul><ul><li>Loose ends and sequence errors in old data </li></ul><ul><li>Equivocal functional annotation transitively perpetuated </li></ul><ul><li>Sequence-literature links broken by gene name ambiguities </li></ul><ul><li>Incorrect signal peptide annotation </li></ul><ul><li>Similarity scores for Ly-6 homologues fall below those in domain databases </li></ul><ul><li>Rapid evolution made orthologue assignment difficult </li></ul>
    32. 32. Conclusions <ul><li>Bioinformatics can help a little bit of proteomics data go a long way </li></ul><ul><li>Finding quirks in database entries is definitely part of the fun b ut … </li></ul><ul><li>Sequence anomalies can seriously confound automated annotation </li></ul><ul><li>They can only be exposed of unravelled by </li></ul><ul><ul><li>transitive and broad sequence/keyword searching </li></ul></ul><ul><ul><li>detailed examination of sequence and literature links </li></ul></ul><ul><ul><li>understanding database building procedures </li></ul></ul><ul><ul><li>chimeras can be recognised by EST and genome matches </li></ul></ul><ul><li>Conflicting data links should be ideally be resolved by new data but may have to use judgment </li></ul><ul><li>Difficult to discriminate between in vitro artefacts and rare in vivo events </li></ul><ul><li>Inferring biological meaning from database searches requires an understanding of the experiments and the in-silico analyses </li></ul><ul><li>Value of Swiss-Prot is significantly enhanced by community annotation </li></ul>
    33. 33. Acknowledgments, Reference and Database Entries <ul><li>Southan C, Cutler P, Birrell H, Connell J, Fantom KG, Sims M, Shaikh N, Schneider K. “The characterisation of novel secreted Ly-6 proteins from rat urine by the combined use of two-dimensional gel electrophoresis, microbore high performance liquid chromatography and expressed sequence tag data” Proteomics 2002 Feb;2(2):187-96. </li></ul><ul><li>AF198441 Rat RUP2 mRNA </li></ul><ul><li>UP1_RAT (P81827) Urinary protein 1 (RUP1) </li></ul><ul><li>UP2_RAT (P81828) Urinary protein 2 (RUP2) </li></ul><ul><li>UP3_RAT (P83125) Urinary protein 3 (RUP3) </li></ul><ul><li>RSP1_RAT (Q9QXN2) Spleen protein 1 </li></ul><ul><li>AF198442 Rat spleen protein 1 precursor, mRNA, complete cds </li></ul><ul><li>P83106 PIP1 protein (PIP1) - Sus scrofa </li></ul><ul><li>P83107 BOP1 protein (BOP1) - Bos taurus </li></ul><ul><li>Q9BZG9 Ly-6 neurotoxin-like protein Lynx1 - Homo sapiens </li></ul><ul><li>AF321824 Human Ly-6 neurotoxin-like protein Lynx1 mRNA, partial cds </li></ul>
    34. 34. Human Short Ly6 Proteins None - - 18 + - 11q24.2 113 LVLF31 Genset (sec), USDOH CyC PA2 - 21 + + 11q24.2 126 PATE HGS, ARS, Biovision (partial) Ly6 - 22 + + 8q24.3 103 SLURP1 Genentech, HGS, Incyte Ly6 103 22 + - 8q24.3 125 RGTR43 Genentch (sec/tm) ZymoGenetics Ly6 - 22 + + 8q24.3 97 SLURP2 Curagen, Hyseq, HGS (sec), Incyte (sec) Genset (partial) Ly6 91 19 + + 8q24.3 115 LYNX1 Patents InterPro GPI Sigpep ESTs Ens Chrom Size Name
    35. 35. VertebrateShort Ly6 Proteins
    36. 36. Searches Against Rat ESTs Confirmed the Three mRNAs as Chimeras J03621 L07806 L15618
    37. 37. mRNA Anomaly No. 4: Unspliced? <ul><li>LOCUS AF368860 1197 bp mRNA 13-JUN-2001 </li></ul><ul><li>(CDS 10..96 &quot;MGKHILLLPLVLSLLMSSLQ DSCGHEPS &quot;) </li></ul><ul><li>Rattus norvegicus 3' non-translated beta-F1-ATPase mRNA-binding protein mRNA, complete cds. &quot;Identification of a liver specific cDNA clone chaperoning the differential assembly of ribonucleoprotein complexes at the 3' UTR of the mRNAs of oxidative phosphorylation&quot; </li></ul>BLAST vs Rat ESTs RUP-4? MGKHILLLPLVLSLLMSSLLALQCIQCARIDSRGICRHDIYICHADSDEVCSWVVATTRD MGKHILLLPL LSLLMSSLLALQC +C DS G C C DE+C+WVV TTRD RUP-2 MGKHILLLPLGLSLLMSSLLALQCFRCTSFDSTGFCHVGRQKCQTYPDEICAWVVVTTRD RUP-4? GKFVYGNQSCAECNATTVEQGSLIVSTNCCSASHFCNMVYR (ESTs AA945232,AA945121) GKFVYGNQSCAECNATTVE GSLIVSTNCCSA+ FCNMV+R RUP-2 GKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR 101
    38. 38. RUP Homologues Expand a New Sub-family of Secreted Ly-6 Proteins
    39. 39. 3D PSSM Fold Recognition Server

    ×