• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family
 

Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family

on

  • 773 views

Presented at the University of Nottingham, 2005

Presented at the University of Nottingham, 2005

Statistics

Views

Total Views
773
Views on SlideShare
771
Embed Views
2

Actions

Likes
0
Downloads
0
Comments
0

2 Embeds 2

http://www.slideshare.net 1
http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • AA_DERWENT 1,226,302 sequences;

Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family Presentation Transcript

  • Discriminating Facts from Artefacts in the Secreted Ly-6 Protein Family
    • Christopher Southan
    • Department of Molecular Pharmacology
    • AstraZeneca R&D, Mölndal, Sweden
  • Outline
    • Introduction
    • Proteomic identification of novel secreted rat Ly6 proteins in EST data
    • Discovery of unknown homologues
    • Bioinformatic analysis of chimeric mRNAs
    • Database errors propagated by the chimeras
    • Delineating a large secreted Ly6 family on the rat genome
    • Discovery of mouse homologues but no clear orthologues
    • Equivocal biochemical results for homologues
    • Summary of bioinformatic pitfalls
  • Introduction: Quirks that Lurk in Databases
    • The sequence deluge into the primary databases necessitates automated pipelines to produce 'value added' secondary databases
    • But, however sophisticated the data parsing or curation, anomalies will get through
    • Most things that could have gone wrong, have
    • Although the overall quirk frequency is low, they present pitfalls for the unwary
    • Responsibility for primary annotation and sequence quality lies solely with submitting authors
    • Few originating authors correct, update or withdraw their primary sequence entries
    • It is difficult to discriminate between in vitro artifacts or rare in vivo events
  • Rat Urine  HPLC  Intact MALDI  N-Terminal Sequence High-speed microbore column
  • Rat Urine  2D-Gel  Trypsin  MS/MS  PepSea Search  EST hits
    • Spot 1 gave two different
    • peptide matches
    • CTSFDSTGFCHVGR contained within rat EST A893514
    • CESLDSTGLCR contained within rat EST AA800439
  • EST AA893514 vs. dbEST: 30 Rat Hits at 95% to 100% Identity
  • Assembly of Rat Urinary Proteins 1 and 2
    • 9 EST sequences, the MS/MS sequences, and the N-terminal Edman data, were consistent with two paralogous proteins
    • 90% identical at the AA level and 96% identical at the DNA level
    • Highly represented in rat liver ESTs
    • One N-glycosylation site with 1.6 to 2.0 Kda glycan
    • Secreted forms abundant in male rat urine by HPLC
    • RUP1 independently verified as liver regeneration-related protein by full mRNA
    verified signal peptide  RUP1 MGKHILLLPLGLSLLMSSLLA LQ C FRCTSFDSTGFCHVGRQK C QTYP DEICAWVVVTTRD ||| ||||||||||||||||||||||| |:||||:|:|||: |||||||||||||||||| RUP2 MGKPILLLPLGLSLLMSSLLA LQ C FRCESLDSTGLCRVGRRI C QTYP DEICAWVVVTTRD RU P1 GKFVYG NQS CAECNATTVEHGSLIVSTNCCSATPFCNMVHR 101 ||||||||||||| :|||||||||:||||||||||||||| RUP2 GKFVYG NQS CAECIGTTVEHGSLIISTNCCSATPFCNMVHP 101
  • RUP3: Independent MS-based Identification by Wait et al. “Proteins of rat serum, urine and CSF:VI” Electrophoresis 22, 3043-3052 (2001) RUP1 MGKPILLLPLGLSLLMSSLLALQCFR CESLDSTGLCRVGR RICQTYPDEICAWVVVTTRD RUP2 MGKHILLLPLGLSLLMSSLLALQCFR CTSFDSTGFCHVGR QKCQTYPDEICAWVVVTTRD RUP3 MGKHILLLPLGLSLLMSSLLALQCFR CISFDSTGFCYVGR HICQTYPDEICAWVVVTTRD *** *********************** * **** * ***. ****************** RUP1 GKFVYGNQSCAECIGTTVEHGSLIISTNCCSATPFCNMVHP EST AA800439 RUP2 GKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR EST AA893514 RUP3 GKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR EST AA893518 ************* *********.***************
  • RUP Paralogues Define a New Family of Secreted Ly-6 Proteins
  • A Quirky Result: Solid Matches Between RUP2 and Four Unrelated mRNAs
    • R at mitochondrial IF1 protein mRNA, L07806 , 883 bp
    • Rat casein kinase II alpha subunit (CK2) , L15618, 2180 bp
    • Rat mitochondrial succinyl-CoA synthetase alpha subunit J03621 , 1684 bp
    • Rat 3' non-translated beta-F1-ATPase mRNA-binding protein mRNA AF368860, 1197 bp
    • Matches of 92% to 100% identity over 300-500 bases
    • Two in reverse-frame, two in forward frame
  • Three RUP-like Chimeras and a Pre-mRNA L07806 F1-ATPase inhibitor AF368860 UTR F1-ATPase inhib L15618 casein kinase II alpha J03621 mito succinyl-CoA synthase alpha
  • Translation Matches for the Chimeras Reveal a Cryptic Protein RUP-2 28 TSFDSTGFCHVGRQKCQTYPDEICAWVVVTTRDGKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR 101 TSFDSTGFCHVGRQKCQTYPDEICAWVVVTTRDGKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR 417 TSFDSTGFCHVGRQKCQTYPDEICAWVVVTTRDGKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR 196 L07806 Rattus rattus mitochondrial IF1 protein mRNA RUP-2: 59 RDGKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR 101 RDGKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR 708 RDGKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR 580 L15618 Rat casein kinase II alpha subunit (CK2) mRNA RUP-2 24 CFRCTSFDSTGFCHVGRQKCQTYPDEICAWVVVTTRDGKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMV 99 CF C + +S G C+ C +P E+CA V+T +DGKFVYGNQSCAEC+ TVEHGSLIVSTNCCSAT FCN+V 50 CFECGNLNSMGICNFRTAVCYAHPGEVCA-SVLTYKDGKFVYGNQSCAECSGRTVEHGSLIVSTNCCSATSFCNIV 274 J03621 Rat mitochondrial succinyl-CoA synthetase alpha subunit
  • RUP1 Gene Structure
  • Matching the Chimeras Against the Rat Genome
    • SCORE START END QSIZE IDENTITY CHRO STRAND START END
    • ------------------------------------------------------------
    • L15618 Rat casein kinase II alpha subunit
    • 1451 709 2177 2180 99.9% 3 + 142470350 142514932
    • 799 1091 2161 2180 90.2% 10 - 39567792 39568918
    • 313 392 711 2180 99.1% 8 - 36902949 36905031
    • L07806 Rattus rattus mitochondrial IF1 protein
    • 405 420 826 833 100.0% 5 + 152628418 152632060
    • 398 8 415 833 99.1% 8 - 36902399 36905032
    • J03621Rat mitochondrial succinyl-CoA synthetase subunit
    • 1203 472 1684 1684 100.0% 4 + 106816653 106845979
    • 469 1 472 1684 100.0% 8 - 36133698 36137263
    • AF368860 Rattus norvegicus 3' non-translated beta-F1-ATPase
    • 1118 1 1120 1120 100.0% 8 + 37247995 37251530
    • 1016 1 1120 1120 96.9% 8 + 36688890 36905034
    • 1006 1 1120 1120 95.6% 8 + 36901482 37055697
  • Multiple Loci on Rat Chromosome 8: Erroneous Mapping of the Chimeras L15618 casein kinase II alpha L07806 F1-ATPase inhibitor AF198441 Rat RUP2 AF198442 Rat spleen protein 1
  • What Caused the Chimeras?
    • Each of the chimeric cDNAs submitted by different research groups 1988-1993
    • All were prepared from rat cDNA libraries
    • Two of these genes are nuclear-encoded mitochondrial proteins
    • L07806-IF1 has 2 non-chimeric counterparts
    • Hits to rat genome data confirm the three 'host' transcripts are on different loci
    • The 5' insertions are different sequences, lengths and orientations
    • L15618 is single-exon insert and maps to an unexpressed locus
    • Are these insertions of RUP2-like genes in vitro artefacts or rare translocation events in vivo ?
  • Protein Database Entries from the Chimera and Pre mRNA
    • The L07806-derived chimeric protein was chosen as the reference sequence by NCBI
    • NP_037047 ATPase inhibitor, mitochondrial precursor length = 107:
    • NP_037047 MTKSCRIEAST LGVWGMRVLQTRGFGSDS
    • M S + LGVWGMRVLQTRGFGSDS
    • Q03344 MAGSALAVRARLGVWGMRVLQTRGFGSDS
    • but Swiss-Prot Q03344 highlights the discrepancy and correctly chooses “normal” rather than the chimera
    • CONFLICT MAGSALAVRAR -> MTKSCRIEAST (IN REF. 1).
    • The L07806-derived chimeric protein, without the targeting sequence, was expressed as a maltose binding protein fusion in E coli and was fully active!
    • tr Q91XP0 3' non-translated beta-F1-ATPase mRNA-binding protein: Length = 28
    • The artefactual sequence includes an exon
    • Q91XP0 and AAK61874 MGKHILLLPLVLSLLMSSL QDSCGHEPS
    • RUP1 MGKHILLLPLGLSLLMSSLLLALQCFRCTSFDSTGFCHVGRQK...
  • The L07806 Chimera Caused Errors in U niGene
  • RUP Gene Family on Rat 8q21
  • Rat and Mouse RUP Homologues are Highly Diverged
  • Sequences Conserved in Rat but Divergent in Mouse
  • Homologues in Five Mammals but True Orthology Unclear
  • Remote Human Homolgues but no Strict Ortholgues
    • >tr|AF462605|Q8WXA2|9AD752F00D901FFE PATE.[Homo sapiens] (expressed in prostate and testis) Length = 126
    • Score = 31.2 bits (69), Expect = 3.3
    • Identities = 21/79 (26%), Positives = 32/79 (39%), Gaps = 6/79 (7%)
    • RUP1 : 23 QCFRCESLDSTGLCRVGRRICQTYPDEICAWVVVTTRDGK----FVYGNQSCAECIGTTV
    • QC C C GR IC +E C + RDG F+ ++CA+ G +
    • PATE : 47 QCRMCHLQFPGEKCSRGRGICTATTEEACMVGRMFKRDGNPWLTFMGCLKNCADVKG--I
    • Query: 79 EHGSLIISTNCCSATPFCN 97
    • +++ CC + CN
    • Sbjct: 105 RWSVYLVNFRCCRSHDLCN 123
  • Threading Reveals Homology between RUP1, Lynx1 and Snake Toxin Structures Lynx1, an Endogenous Toxin-like modulator of AChRs in the CNS,
  • Why so Few Apparent Orthologues?
  • P55000 : Antineoplastic Urinary Protein/S ecreted Mammalian Ly-6/uPAR Related Protein – Equivocal Annotation
  • Linking Sequence to Function: the Lost Keyword Problem (PubMed Queries in red)
    • Adermann et al. "Structural and phylogenetic characterisation of human SLURP-1, the first secreted mammalian member of the Ly-6 /uPAR protein superfamily" Protein Sci. 1999 … from blood and urine peptide libraries. SLURP-1 is encoded by the ARS (component B)-81/s locus, and appears to be the first mammalian member of the Ly-6/uPAR family lacking a GPI-anchoring signal sequence ... SLURP-1 (+) Ly-6 (+) ANUP (-)
    • Katz et al "A partial catalogue of proteins secreted by epidermal keratinocytes in culture." J Invest Dermatol. 1999 … proteins secreted by adult human epidermal keratinocytes included anti-neoplastic urinary protein (+) ANUP (-) SLURP-1(-) Ly-6 (-)
    • Fischer et al. "Mutations in the gene encoding SLURP-1 in Mal de Meleda". Hum Mol Genet. 2001 … Three different homozygous mutations (a deletion, a nonsense and a splice site mutation) were detected in 19 families of Algerian and Croatian origin … first instance of a secreted protein being involved in a palmoplantar keratoderma.. SLURP-1 (+) Ly-6 (+) ANUP (-)
  • Mouse Ly-6-like Caltrin: Sequence Errors, Unverified Reported Function, New Name and New Function?
  • Confusion Over Caltrin: 5 Different Sequences in SwissProt; 22 PubMed Citations
    • Caltrin = inhibition of Ca2+ uptake into spermatozoa
    • CALTRIN PRECURSOR (CALCIUM TRANSPORT INHIBITOR). - Mus musculus ( a Ly-6 protein)
    • CALTRIN PRECURSOR (CALCIUM TRANSPORT INHIBITOR) (SEMINALPLASMIN) (SPLN). - Bos taurus (PYY-like)
    • CALTRIN-LIKE PROTEIN I. - Cavia porcellus (weak protease inhibitor match)
    • CALTRIN-LIKE PROTEIN II. - Cavia porcellus (elastase inhibitor like)
    • PANCREATIC SECRETORY TRYPSIN INHIBITOR II PRECURSOR (PSTI-II) (CALTRIN) (CALCIUM TRANSPORT INHIBITOR). - Rattus norvegicus (trypsin inhibitor identity)
  • Limited Knolwedge for the Short Ly-6 Proteins
    • Single domain proteins ~85-100 residues mostly with signal peptide
    • Probable ligands by inference from toxin structures?
    • Recently duplicated rodent parologous family of 6 -10 gene loci but very different evolutionary trajectories between mouse and rat
    • Liver and spleen expression in rat
    • Significant amounts of multiple gene products, probably glycosylated, secreted in male rat urine
    • Foetal expression for pig, bovine and horse orthologues
    • Rapid evolution in mammals
    • Mix of secreted and GPI anchored homologues in human
    • Human Lynx-1 modulating AChRs
    • SLURP linked to skin physiology
    • Caltrin/SVS VII Phospholipid binding
    • Homologues involved in myelopoiesis in Xenopus and liver acute phase in rainbow trout
  • Summary of the Bioinformatic Pitfalls
    • The chimeric and pre-mRNAs lead to:
      • Artifactual clustering of ESTs and non-homologous gene products in Unigene
      • Protein database conflicts and artifacts
      • Propogation of errors in RefSeq and rat genome
    • Loose ends and sequence errors in old data
    • Equivocal functional annotation transitively perpetuated
    • Sequence-literature links broken by gene name ambiguities
    • Incorrect signal peptide annotation
    • Similarity scores for Ly-6 homologues fall below those in domain databases
    • Rapid evolution made orthologue assignment difficult
  • Conclusions
    • Bioinformatics can help a little bit of proteomics data go a long way
    • Finding quirks in database entries is definitely part of the fun b ut …
    • Sequence anomalies can seriously confound automated annotation
    • They can only be exposed of unravelled by
      • transitive and broad sequence/keyword searching
      • detailed examination of sequence and literature links
      • understanding database building procedures
      • chimeras can be recognised by EST and genome matches
    • Conflicting data links should be ideally be resolved by new data but may have to use judgment
    • Difficult to discriminate between in vitro artefacts and rare in vivo events
    • Inferring biological meaning from database searches requires an understanding of the experiments and the in-silico analyses
    • Value of Swiss-Prot is significantly enhanced by community annotation
  • Acknowledgments, Reference and Database Entries
    • Southan C, Cutler P, Birrell H, Connell J, Fantom KG, Sims M, Shaikh N, Schneider K. “The characterisation of novel secreted Ly-6 proteins from rat urine by the combined use of two-dimensional gel electrophoresis, microbore high performance liquid chromatography and expressed sequence tag data” Proteomics 2002 Feb;2(2):187-96.
    • AF198441 Rat RUP2 mRNA
    • UP1_RAT (P81827) Urinary protein 1 (RUP1)
    • UP2_RAT (P81828) Urinary protein 2 (RUP2)
    • UP3_RAT (P83125) Urinary protein 3 (RUP3)
    • RSP1_RAT (Q9QXN2) Spleen protein 1
    • AF198442 Rat spleen protein 1 precursor, mRNA, complete cds
    • P83106 PIP1 protein (PIP1) - Sus scrofa
    • P83107 BOP1 protein (BOP1) - Bos taurus
    • Q9BZG9 Ly-6 neurotoxin-like protein Lynx1 - Homo sapiens
    • AF321824 Human Ly-6 neurotoxin-like protein Lynx1 mRNA, partial cds
  • Human Short Ly6 Proteins None - - 18 + - 11q24.2 113 LVLF31 Genset (sec), USDOH CyC PA2 - 21 + + 11q24.2 126 PATE HGS, ARS, Biovision (partial) Ly6 - 22 + + 8q24.3 103 SLURP1 Genentech, HGS, Incyte Ly6 103 22 + - 8q24.3 125 RGTR43 Genentch (sec/tm) ZymoGenetics Ly6 - 22 + + 8q24.3 97 SLURP2 Curagen, Hyseq, HGS (sec), Incyte (sec) Genset (partial) Ly6 91 19 + + 8q24.3 115 LYNX1 Patents InterPro GPI Sigpep ESTs Ens Chrom Size Name
  • VertebrateShort Ly6 Proteins
  • Searches Against Rat ESTs Confirmed the Three mRNAs as Chimeras J03621 L07806 L15618
  • mRNA Anomaly No. 4: Unspliced?
    • LOCUS AF368860 1197 bp mRNA 13-JUN-2001
    • (CDS 10..96 "MGKHILLLPLVLSLLMSSLQ DSCGHEPS ")
    • Rattus norvegicus 3' non-translated beta-F1-ATPase mRNA-binding protein mRNA, complete cds. "Identification of a liver specific cDNA clone chaperoning the differential assembly of ribonucleoprotein complexes at the 3' UTR of the mRNAs of oxidative phosphorylation"
    BLAST vs Rat ESTs RUP-4? MGKHILLLPLVLSLLMSSLLALQCIQCARIDSRGICRHDIYICHADSDEVCSWVVATTRD MGKHILLLPL LSLLMSSLLALQC +C DS G C C DE+C+WVV TTRD RUP-2 MGKHILLLPLGLSLLMSSLLALQCFRCTSFDSTGFCHVGRQKCQTYPDEICAWVVVTTRD RUP-4? GKFVYGNQSCAECNATTVEQGSLIVSTNCCSASHFCNMVYR (ESTs AA945232,AA945121) GKFVYGNQSCAECNATTVE GSLIVSTNCCSA+ FCNMV+R RUP-2 GKFVYGNQSCAECNATTVEHGSLIVSTNCCSATPFCNMVHR 101
  • RUP Homologues Expand a New Sub-family of Secreted Ly-6 Proteins
  • 3D PSSM Fold Recognition Server