Predikin and PredikinDB: tools to predict protein kinase peptide specificity

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    3 Favorites

    Predikin and PredikinDB: tools to predict protein kinase peptide specificity - Presentation Transcript

    1. Outline of talk
      • Introduction to protein kinases
      • Prediction of substrate specificity
      • Predikin and PredikinDB
      • Evaluation
      Neil Saunders School of Molecular and Microbial Sciences University of Queensland
    2. Introduction to protein kinases kinase ATP protein OH + protein OPi kinase ADP + Biochemistry
      • Two major (eukaryotic) types: (1) Ser/Thr; (2) Tyr
      • ~ 2% of human genes encode a protein kinase
      • At least 30-50% of human proteins phosphorylated
      • Regulate essentially every cellular process
    3. Complex signalling networks How do protein kinases find their targets?
    4. Kinase specificity – substrate recruitment Remenyi et al. (2006) Docking interactions in protein kinase and phosphatase networks. Curr Opin Struct Biol 16: 676-685 LOCATE calcium/calmodulin-dependent protein kinase IV
      • Substrate recruitment
      • Any process that brings substrate to kinase
      • - docking
      • - binding to scaffolding protein(s)
      • - colocalisation
      • - coregulation
      Docking interactions Colocalisation
    5. Kinase specificity - peptide specificity Amino acid frequency in substrate sequences at X{7}[ST]X{7} sites CK-2 PKA MAPK
    6. Structural basis for peptide specificity Substrate heptapeptide binding to protein kinase A PKA surface + heptapeptide RRASIHD Schematic of heptapeptide + PKA SDRs
    7. Accurate location of key residues using HMMER *->Yellkkl GkG aFGkVylardkktgrlv AiK vik..........eril Y+++k+lG+G+FGkV+la+++ tg++vA+K+i+++ +++ + ri+ snf1p 55 YQIVKTL GE GS F GKVKLAYHTTTGQKV ALK IINkkvl aks dmqGRIE 101 rEikiLkk.dHPNIVkLydvfed.dklylVmEyceGdl GdL fdllkkrgr rEi+ L+ +HP+I+kLydv+ ++d++ +V Ey+++ +Lfd++++r + snf1p 102 REISYLRLlRHPHIIKLYDVIKSkDEIIMVIEYAGN-- - E L FD YIVQRDK 148 rglrkvlsE.earfyfrQilsaLeYLHsqgIiHRDLKPeNiLLds..hvK +sE+ear++f+Qi+sa+eY+H+++I+HRDLKPeN+LLd++ +vK snf1p 149 ------MSEqEARRFFQQIISAVEYCHRHKIVHRDLKPENLLLDEhlNVK 192 la DFG lArql......ttfvGTpeYm APE vl...gYgkpavDiWSlGcil +aDFGl+ ++++++ +t +G+p+Y APEv++++ Y +p+vD+WS+G+il snf1p 193 IA DFG L SNIMtdgnflK TS CG S P NY A APE VIsgkLYAGPEVDVWSCGVIL 242 yElltGkpPFp..qldlifkkig..........SpeakdLikklLvkdPe y +l+++ PF+++ + ++fk+i ++ ++ ++ Sp a Lik++L ++P snf1p 243 YVMLCRRLPFDdeSIPVLFKNISngvytlpkflSPGAAGLIKRMLIVNPL 292 kRlta.eaLedeldikaHPff<-* +R++++e+++ + +f snf1p 293 NRISIhEIMQ-------DDWF 306 GkG, AiK, GdL, DFG, APE anchor positions -3 +3 Substrate heptapeptide X X X [ST] X X X
    8. Predikin: components
      • PredikinDB : database of phosphorylation sites
      • Predikin.pm : Perl module to process kinases
      • Web server
    9. Why not phospho.ELM? +------+-----------+--------+----+-------+------------+------+----------------------+ | acc | sequence | position | code | pmids | kinases | source | entry_date | +------+-----------+--------+----+-------+------------+------+----------------------+ | P04083 |AMVSEFLK...| 20 | Y |2457390| Abl;Src;EGFR |LTP |2004-12-31 00:00:00+01| +------+-----------+--------+----+-------+------------+------+----------------------+ A phosphoELM entry
      • Problems
      • Incorrect/missing accession numbers
      • Phosphoresidues not at given positions
      • Multiple kinase entries per substrate
      • Inconsistent names for kinase families
      • No way to link kinase name with kinase sequence
      FT MOD_RES 26 26 Phospho serine ( by PKC ). phospho.ELM is derived from SwissProt entries http://phospho.elm.eu.org
    10. PredikinDB construction Substrate UniProt entry ID IF2A_MOUSE AC Q6ZWX6 ; Q3TIQ0; OS Mus musculus (Mouse) . FT MOD_RES 49 49 Phospho serine (by HRI ) ( By similarity ). FT MOD_RES 52 52 Phospho serine (by EIF2AK3 , GCN2 , HRI and PKR ). Entries in table_kinases that match kinase name and species Q9Z2R9 EIF2AK1 Eif2ak1; Hri Q9Z2B5 EIF2AK3 Eif2ak3;Pek;Perk Q9QZ05 EIF2AK4 Eif2ak4; Gcn2 ;Kiaa1338 Q03963 EIF2AK2 Eif2ak2; Pkr ;Prkr;Tik Entry in table_psites substrate_ac residue posn hepta conf kinase_name kinase_ac Q6ZWX6 S 49 ILLSELS 2 HRI Q9Z2R9 Q6ZWX6 S 52 SELSRRR 1 EIF2AK3 Q9Z2B5 Q6ZWX6 S 52 SELSRRR 1 GCN2 Q9QZ05 Q6ZWX6 S 52 SELSRRR 1 HRI Q9Z2R9 Q6ZWX6 S 52 SELSRRR 1 PKR Q03963 PredikinDB links phosphorylation sites to their specific kinase sequences
    11. PredikinDB – table schema table_kinases kinase_ac (AC) kinase_id (ID) domain domain_seq kinase_type kinase_name (GN Name) kinase_syn (GN Synonyms) panther_name panther_ac panther_evalue ksd_name ksd_ac ksd_evalue species (OS) kingdom (OC) + 38 SDR-related residues table_psites ID substrate_ac (AC) residue (MOD_RES) position (MOD_RES) hepta confidence (MOD_RES) kinase_name (MOD_RES) kinase_ac (AC) table_substrates substrate_ac (AC) substrate_id (ID) species (OS) kingdom (OC)
    12. The Predikin Perl module
      • External tools
      • - HMMER + HMM libraries
      • - pantherScore
      • - DisEMBL, TMHMM (filters)
      • Bioperl libraries ( http://www.bioperl.org)
      protein kinase sequence find catalytic domains assign kinase type locate SDRs assign KSD family assign PANTHER family find substrate XXX[STY]XXX make kinase scoring matrix score XXX[STY]XXX sites
    13. Scoring matrices: SDR method Query kinase: GEL+1 = E GEL+3 = F GEL+4 = S Type = Ser/Thr SQL query for heptapeptide position -3: select hepta from psites, kinases where kinase_type = 'Ser/Thr' and psites.kinase_ac = kinases.kinase_ac and GELp1 rlike '[ D E N ] ' and GELp3 rlike '[ F WY ] ' and GELp4 rlike '[ AN S T ]' Heptapeptides : Q FSTVKG E QFSTVK R SVSEAA R SGSSPN R HDSGLD R RMSDEF A RGSFDA Repeat for positions -2 to +3 and corresponding SDRs Frequency matrix PWM (weights) matrix score substrates
    14. Scoring matrices: filters and cutoffs Residue Phosphosites Disordered 1 TM Helix 2 S 24 637 23 081 16 T 5 405 4 898 5 Y 4 285 3 318 12 Total 34 327 31 297 ( 91.2% ) 33 ( 0.1% )
      • Most sites disordered (DisEMBL prediction)
      • Most sites not in TM helix (TMHMM prediction)
    15. Evaluation of Predikin A brief area under ROC curve primer
      • Outline of evaluation procedure
      • Obtain kinase-substrate pairs from PredikinDB
      • Construct scoring matrix for kinase (don't include its substrates)
      • Score all XXX[ST]XXX sites in corresponding substrate
      • Label sites as 1 (known, annotated) or 0 (unknown, unannotated)
      • Generate AROC values using R package ROCR
      TN TP FP FN unannotated sites annotated sites scores ROC curve
    16. Evaluation of comparable methods Comparison with existing methods is not easy
      • Existing tools take a substrate and score sites based on a kinase family
      • Predikin takes kinase(s) + substrate(s) and scores sites based on kinase sequence
      • Problems to solve
      • Determine the kinase families common to other tools
      • Relate families to kinase sequences in PredikinDB
      • Submit corresponding substrates to each server
      • - (no API, standalone tools, web services...)
      • Collate scored XXX[ST]XXX sites common to all methods
      • Format data for AROC analysis
      Example submission using HTML::Form (NetPhosK) # get the form my $ua = LWP::UserAgent->new; my $response = $ua->get($url); my @forms = HTML::Form->parse($response); # set the values $forms[0]->value(' SEQSUB ', “myfile.fa”); $forms[0]->value(' threshold ', '0.00'); # submit the form my $output = $ua->request($form[0]-> click ); # parse output
    17. Evaluation results
      • Predikin performance equals or exceeds that of existing methods
      • Performance may depend on type of kinase
      Ser/Thr kinase substrates Method Score Predikin SDR score sites used +/- NetPhosK 0.93 0.90 75/5334 KinasePhos 0.80 0.88 76/5663 GPS 0.88 0.87 72/5307 PPSP 0.92 0.87 75/3778 Scansite 0.96 0.87 55/2936 CMGC kinase substrates Method Score Predikin SDR score sites used +/- NetPhosK 0.62 0.96 211/9146 KinasePhos 0.94 0.96 211/9106 GPS 0.93 0.96 211/9146 PPSP 0.94 0.96 208/8158 Scansite 0.95 0.97 175/5039
    18. Usage cases kinase substrate score CLA4 1 CLA4 727 KRA T MVG 92.93 CLA4 1 YOL113W 541 KRATMVG 92.93 CLA4 1 YHL021C 129 KGSSFVS 91.87 CLA4 1 YKR010C 527 KRNSITE 91.70 CLA4 1 YNL049C 526 RATSFFG 90.14 CLA4 1 YDL056W 477 KRKSTTP 88.70 CLA4 1 YOL157C 527 KLFSFTK 88.25 CLA4 1 YBR198C 157 RAYSMLK 87.71 CLA4 1 YML076C 878 HRESMTG 87.62 CLA4 1 YOR181W 619 KRKTKVG 87.37 kinase substrate score NP_001547 1 COA1 80 SSM S GLH 85.49 NP_001269 1 COA1 80 SSM S GLH 85.49 XP_042066 1 COA1 80 SSMSGLH 75.77 XP_001128827 1 COA1 80 SSMSGLH 75.77 NP_001013725 1 COA1 80 SSMSGLH 74.72 NP_004064 1 COA1 80 SSMSGLH 73.84 NP_006613 1 COA1 80 SSMSGLH 73.84 NP_001778 1 COA1 80 SSMSGLH 72.21 XP_001128005 1 COA1 80 SSMSGLH 72.21 NP_277021 1 COA1 80 SSMSGLH 72.21
      • Substrates for CLA4
      • A PAK/STE-20 kinase in S. cerevisiae
      • Phosphorylates own activation loop T727?
      • Evidence for this in literature
      • Kinases for acetyl CoA carboxylase
      • Known phosphorylation site on S80
      • Phosphorylated in AMPK knockout mice
      • Suggested alternate kinases: IKK α/β
      • Experimental evidence (Bruce Kemp)
    19. The Predikin webserver: implementation http://predikin.biosci.uq.edu.au perl.so MySQL PredikinDB PHP Predikin.pm Apache Server DisEMBL TMHMM BLAST pantherScore HMMER Client (browser)
    20. The Predikin webserver: screenshots Kinase sequence submission
    21. The Predikin webserver: screenshots Frequency and weight matrices
    22. The Predikin webserver: screenshots Scored sites
    23. Acknowledgements
      • Funding & advice (UQ) Testing
      • Bostjan Kobe (ARC, NHMRC) Bruce Kemp (SVI Melbourne)
      • Thomas Huber Brenda Andrews (U. Toronto)
      • Predikin 1.0 (UQ) General
      • Ross Brinkworth Kobe Lab
      • Robert Breinl

    + Neil SaundersNeil Saunders, 2 years ago

    custom

    1190 views, 3 favs, 0 embeds more stats

    Talk given at Bioinformatics Australia 2007 meeting more

    More info about this document

    CC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs License

    Go to text version

    • Total Views 1190
      • 1190 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 3
    • Downloads 3
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories