Predikin and PredikinDB: tools to predict protein kinase peptide specificity

2,403 views

Published on

Talk given at Bioinformatics Australia 2007 meeting in Brisbane. Note: the ROC analyses are out of date now, but the conclusions still hold.

Published in: Economy & Finance, Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,403
On SlideShare
0
From Embeds
0
Number of Embeds
14
Actions
Shares
0
Downloads
31
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Predikin and PredikinDB: tools to predict protein kinase peptide specificity

  1. 1. Outline of talk <ul><li>Introduction to protein kinases </li></ul><ul><li>Prediction of substrate specificity </li></ul><ul><li>Predikin and PredikinDB </li></ul><ul><li>Evaluation </li></ul>Neil Saunders School of Molecular and Microbial Sciences University of Queensland
  2. 2. Introduction to protein kinases kinase ATP protein OH + protein OPi kinase ADP + Biochemistry <ul><li>Two major (eukaryotic) types: (1) Ser/Thr; (2) Tyr </li></ul><ul><li>~ 2% of human genes encode a protein kinase </li></ul><ul><li>At least 30-50% of human proteins phosphorylated </li></ul><ul><li>Regulate essentially every cellular process </li></ul>
  3. 3. Complex signalling networks How do protein kinases find their targets?
  4. 4. Kinase specificity – substrate recruitment Remenyi et al. (2006) Docking interactions in protein kinase and phosphatase networks. Curr Opin Struct Biol 16: 676-685 LOCATE calcium/calmodulin-dependent protein kinase IV <ul><li>Substrate recruitment </li></ul><ul><li>Any process that brings substrate to kinase </li></ul><ul><li>- docking </li></ul><ul><li>- binding to scaffolding protein(s) </li></ul><ul><li>- colocalisation </li></ul><ul><li>- coregulation </li></ul>Docking interactions Colocalisation
  5. 5. Kinase specificity - peptide specificity Amino acid frequency in substrate sequences at X{7}[ST]X{7} sites CK-2 PKA MAPK
  6. 6. Structural basis for peptide specificity Substrate heptapeptide binding to protein kinase A PKA surface + heptapeptide RRASIHD Schematic of heptapeptide + PKA SDRs
  7. 7. Accurate location of key residues using HMMER *->Yellkkl GkG aFGkVylardkktgrlv AiK vik..........eril Y+++k+lG+G+FGkV+la+++ tg++vA+K+i+++ +++ + ri+ snf1p 55 YQIVKTL GE GS F GKVKLAYHTTTGQKV ALK IINkkvl aks dmqGRIE 101 rEikiLkk.dHPNIVkLydvfed.dklylVmEyceGdl GdL fdllkkrgr rEi+ L+ +HP+I+kLydv+ ++d++ +V Ey+++ +Lfd++++r + snf1p 102 REISYLRLlRHPHIIKLYDVIKSkDEIIMVIEYAGN-- - E L FD YIVQRDK 148 rglrkvlsE.earfyfrQilsaLeYLHsqgIiHRDLKPeNiLLds..hvK +sE+ear++f+Qi+sa+eY+H+++I+HRDLKPeN+LLd++ +vK snf1p 149 ------MSEqEARRFFQQIISAVEYCHRHKIVHRDLKPENLLLDEhlNVK 192 la DFG lArql......ttfvGTpeYm APE vl...gYgkpavDiWSlGcil +aDFGl+ ++++++ +t +G+p+Y APEv++++ Y +p+vD+WS+G+il snf1p 193 IA DFG L SNIMtdgnflK TS CG S P NY A APE VIsgkLYAGPEVDVWSCGVIL 242 yElltGkpPFp..qldlifkkig..........SpeakdLikklLvkdPe y +l+++ PF+++ + ++fk+i ++ ++ ++ Sp a Lik++L ++P snf1p 243 YVMLCRRLPFDdeSIPVLFKNISngvytlpkflSPGAAGLIKRMLIVNPL 292 kRlta.eaLedeldikaHPff<-* +R++++e+++ + +f snf1p 293 NRISIhEIMQ-------DDWF 306 GkG, AiK, GdL, DFG, APE anchor positions -3 +3 Substrate heptapeptide X X X [ST] X X X
  8. 8. Predikin: components <ul><li>PredikinDB : database of phosphorylation sites </li></ul><ul><li>Predikin.pm : Perl module to process kinases </li></ul><ul><li>Web server </li></ul>
  9. 9. Why not phospho.ELM? +------+-----------+--------+----+-------+------------+------+----------------------+ | acc | sequence | position | code | pmids | kinases | source | entry_date | +------+-----------+--------+----+-------+------------+------+----------------------+ | P04083 |AMVSEFLK...| 20 | Y |2457390| Abl;Src;EGFR |LTP |2004-12-31 00:00:00+01| +------+-----------+--------+----+-------+------------+------+----------------------+ A phosphoELM entry <ul><li>Problems </li></ul><ul><li>Incorrect/missing accession numbers </li></ul><ul><li>Phosphoresidues not at given positions </li></ul><ul><li>Multiple kinase entries per substrate </li></ul><ul><li>Inconsistent names for kinase families </li></ul><ul><li>No way to link kinase name with kinase sequence </li></ul>FT MOD_RES 26 26 Phospho serine ( by PKC ). phospho.ELM is derived from SwissProt entries http://phospho.elm.eu.org
  10. 10. PredikinDB construction Substrate UniProt entry ID IF2A_MOUSE AC Q6ZWX6 ; Q3TIQ0; OS Mus musculus (Mouse) . FT MOD_RES 49 49 Phospho serine (by HRI ) ( By similarity ). FT MOD_RES 52 52 Phospho serine (by EIF2AK3 , GCN2 , HRI and PKR ). Entries in table_kinases that match kinase name and species Q9Z2R9 EIF2AK1 Eif2ak1; Hri Q9Z2B5 EIF2AK3 Eif2ak3;Pek;Perk Q9QZ05 EIF2AK4 Eif2ak4; Gcn2 ;Kiaa1338 Q03963 EIF2AK2 Eif2ak2; Pkr ;Prkr;Tik Entry in table_psites substrate_ac residue posn hepta conf kinase_name kinase_ac Q6ZWX6 S 49 ILLSELS 2 HRI Q9Z2R9 Q6ZWX6 S 52 SELSRRR 1 EIF2AK3 Q9Z2B5 Q6ZWX6 S 52 SELSRRR 1 GCN2 Q9QZ05 Q6ZWX6 S 52 SELSRRR 1 HRI Q9Z2R9 Q6ZWX6 S 52 SELSRRR 1 PKR Q03963 PredikinDB links phosphorylation sites to their specific kinase sequences
  11. 11. PredikinDB – table schema table_kinases kinase_ac (AC) kinase_id (ID) domain domain_seq kinase_type kinase_name (GN Name) kinase_syn (GN Synonyms) panther_name panther_ac panther_evalue ksd_name ksd_ac ksd_evalue species (OS) kingdom (OC) + 38 SDR-related residues table_psites ID substrate_ac (AC) residue (MOD_RES) position (MOD_RES) hepta confidence (MOD_RES) kinase_name (MOD_RES) kinase_ac (AC) table_substrates substrate_ac (AC) substrate_id (ID) species (OS) kingdom (OC)
  12. 12. The Predikin Perl module <ul><li>External tools </li></ul><ul><li>- HMMER + HMM libraries </li></ul><ul><li>- pantherScore </li></ul><ul><li>- DisEMBL, TMHMM (filters) </li></ul><ul><li>Bioperl libraries ( http://www.bioperl.org) </li></ul>protein kinase sequence find catalytic domains assign kinase type locate SDRs assign KSD family assign PANTHER family find substrate XXX[STY]XXX make kinase scoring matrix score XXX[STY]XXX sites
  13. 13. Scoring matrices: SDR method Query kinase: GEL+1 = E GEL+3 = F GEL+4 = S Type = Ser/Thr SQL query for heptapeptide position -3: select hepta from psites, kinases where kinase_type = 'Ser/Thr' and psites.kinase_ac = kinases.kinase_ac and GELp1 rlike '[ D E N ] ' and GELp3 rlike '[ F WY ] ' and GELp4 rlike '[ AN S T ]' Heptapeptides : Q FSTVKG E QFSTVK R SVSEAA R SGSSPN R HDSGLD R RMSDEF A RGSFDA Repeat for positions -2 to +3 and corresponding SDRs Frequency matrix PWM (weights) matrix score substrates
  14. 14. Scoring matrices: filters and cutoffs Residue Phosphosites Disordered 1 TM Helix 2 S 24 637 23 081 16 T 5 405 4 898 5 Y 4 285 3 318 12 Total 34 327 31 297 ( 91.2% ) 33 ( 0.1% ) <ul><li>Most sites disordered (DisEMBL prediction) </li></ul><ul><li>Most sites not in TM helix (TMHMM prediction) </li></ul>
  15. 15. Evaluation of Predikin A brief area under ROC curve primer <ul><li>Outline of evaluation procedure </li></ul><ul><li>Obtain kinase-substrate pairs from PredikinDB </li></ul><ul><li>Construct scoring matrix for kinase (don't include its substrates) </li></ul><ul><li>Score all XXX[ST]XXX sites in corresponding substrate </li></ul><ul><li>Label sites as 1 (known, annotated) or 0 (unknown, unannotated) </li></ul><ul><li>Generate AROC values using R package ROCR </li></ul>TN TP FP FN unannotated sites annotated sites scores ROC curve
  16. 16. Evaluation of comparable methods Comparison with existing methods is not easy <ul><li>Existing tools take a substrate and score sites based on a kinase family </li></ul><ul><li>Predikin takes kinase(s) + substrate(s) and scores sites based on kinase sequence </li></ul><ul><li>Problems to solve </li></ul><ul><li>Determine the kinase families common to other tools </li></ul><ul><li>Relate families to kinase sequences in PredikinDB </li></ul><ul><li>Submit corresponding substrates to each server </li></ul><ul><li>- (no API, standalone tools, web services...) </li></ul><ul><li>Collate scored XXX[ST]XXX sites common to all methods </li></ul><ul><li>Format data for AROC analysis </li></ul>Example submission using HTML::Form (NetPhosK) # get the form my $ua = LWP::UserAgent->new; my $response = $ua->get($url); my @forms = HTML::Form->parse($response); # set the values $forms[0]->value(' SEQSUB ', “myfile.fa”); $forms[0]->value(' threshold ', '0.00'); # submit the form my $output = $ua->request($form[0]-> click ); # parse output
  17. 17. Evaluation results <ul><li>Predikin performance equals or exceeds that of existing methods </li></ul><ul><li>Performance may depend on type of kinase </li></ul>Ser/Thr kinase substrates Method Score Predikin SDR score sites used +/- NetPhosK 0.93 0.90 75/5334 KinasePhos 0.80 0.88 76/5663 GPS 0.88 0.87 72/5307 PPSP 0.92 0.87 75/3778 Scansite 0.96 0.87 55/2936 CMGC kinase substrates Method Score Predikin SDR score sites used +/- NetPhosK 0.62 0.96 211/9146 KinasePhos 0.94 0.96 211/9106 GPS 0.93 0.96 211/9146 PPSP 0.94 0.96 208/8158 Scansite 0.95 0.97 175/5039
  18. 18. Usage cases kinase substrate score CLA4 1 CLA4 727 KRA T MVG 92.93 CLA4 1 YOL113W 541 KRATMVG 92.93 CLA4 1 YHL021C 129 KGSSFVS 91.87 CLA4 1 YKR010C 527 KRNSITE 91.70 CLA4 1 YNL049C 526 RATSFFG 90.14 CLA4 1 YDL056W 477 KRKSTTP 88.70 CLA4 1 YOL157C 527 KLFSFTK 88.25 CLA4 1 YBR198C 157 RAYSMLK 87.71 CLA4 1 YML076C 878 HRESMTG 87.62 CLA4 1 YOR181W 619 KRKTKVG 87.37 kinase substrate score NP_001547 1 COA1 80 SSM S GLH 85.49 NP_001269 1 COA1 80 SSM S GLH 85.49 XP_042066 1 COA1 80 SSMSGLH 75.77 XP_001128827 1 COA1 80 SSMSGLH 75.77 NP_001013725 1 COA1 80 SSMSGLH 74.72 NP_004064 1 COA1 80 SSMSGLH 73.84 NP_006613 1 COA1 80 SSMSGLH 73.84 NP_001778 1 COA1 80 SSMSGLH 72.21 XP_001128005 1 COA1 80 SSMSGLH 72.21 NP_277021 1 COA1 80 SSMSGLH 72.21 <ul><li>Substrates for CLA4 </li></ul><ul><li>A PAK/STE-20 kinase in S. cerevisiae </li></ul><ul><li>Phosphorylates own activation loop T727? </li></ul><ul><li>Evidence for this in literature </li></ul><ul><li>Kinases for acetyl CoA carboxylase </li></ul><ul><li>Known phosphorylation site on S80 </li></ul><ul><li>Phosphorylated in AMPK knockout mice </li></ul><ul><li>Suggested alternate kinases: IKK α/β </li></ul><ul><li>Experimental evidence (Bruce Kemp) </li></ul>
  19. 19. The Predikin webserver: implementation http://predikin.biosci.uq.edu.au perl.so MySQL PredikinDB PHP Predikin.pm Apache Server DisEMBL TMHMM BLAST pantherScore HMMER Client (browser)
  20. 20. The Predikin webserver: screenshots Kinase sequence submission
  21. 21. The Predikin webserver: screenshots Frequency and weight matrices
  22. 22. The Predikin webserver: screenshots Scored sites
  23. 23. Acknowledgements <ul><li>Funding & advice (UQ) Testing </li></ul><ul><li>Bostjan Kobe (ARC, NHMRC) Bruce Kemp (SVI Melbourne) </li></ul><ul><li>Thomas Huber Brenda Andrews (U. Toronto) </li></ul><ul><li>Predikin 1.0 (UQ) General </li></ul><ul><li>Ross Brinkworth Kobe Lab </li></ul><ul><li>Robert Breinl </li></ul>

×