Your SlideShare is downloading. ×
0
Predikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificity
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Predikin and PredikinDB: tools to predict protein kinase peptide specificity

1,901

Published on

Talk given at Bioinformatics Australia 2007 meeting in Brisbane. Note: the ROC analyses are out of date now, but the conclusions still hold.

Talk given at Bioinformatics Australia 2007 meeting in Brisbane. Note: the ROC analyses are out of date now, but the conclusions still hold.

Published in: Economy & Finance, Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,901
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
26
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Outline of talk <ul><li>Introduction to protein kinases </li></ul><ul><li>Prediction of substrate specificity </li></ul><ul><li>Predikin and PredikinDB </li></ul><ul><li>Evaluation </li></ul>Neil Saunders School of Molecular and Microbial Sciences University of Queensland
  • 2. Introduction to protein kinases kinase ATP protein OH + protein OPi kinase ADP + Biochemistry <ul><li>Two major (eukaryotic) types: (1) Ser/Thr; (2) Tyr </li></ul><ul><li>~ 2% of human genes encode a protein kinase </li></ul><ul><li>At least 30-50% of human proteins phosphorylated </li></ul><ul><li>Regulate essentially every cellular process </li></ul>
  • 3. Complex signalling networks How do protein kinases find their targets?
  • 4. Kinase specificity – substrate recruitment Remenyi et al. (2006) Docking interactions in protein kinase and phosphatase networks. Curr Opin Struct Biol 16: 676-685 LOCATE calcium/calmodulin-dependent protein kinase IV <ul><li>Substrate recruitment </li></ul><ul><li>Any process that brings substrate to kinase </li></ul><ul><li>- docking </li></ul><ul><li>- binding to scaffolding protein(s) </li></ul><ul><li>- colocalisation </li></ul><ul><li>- coregulation </li></ul>Docking interactions Colocalisation
  • 5. Kinase specificity - peptide specificity Amino acid frequency in substrate sequences at X{7}[ST]X{7} sites CK-2 PKA MAPK
  • 6. Structural basis for peptide specificity Substrate heptapeptide binding to protein kinase A PKA surface + heptapeptide RRASIHD Schematic of heptapeptide + PKA SDRs
  • 7. Accurate location of key residues using HMMER *-&gt;Yellkkl GkG aFGkVylardkktgrlv AiK vik..........eril Y+++k+lG+G+FGkV+la+++ tg++vA+K+i+++ +++ + ri+ snf1p 55 YQIVKTL GE GS F GKVKLAYHTTTGQKV ALK IINkkvl aks dmqGRIE 101 rEikiLkk.dHPNIVkLydvfed.dklylVmEyceGdl GdL fdllkkrgr rEi+ L+ +HP+I+kLydv+ ++d++ +V Ey+++ +Lfd++++r + snf1p 102 REISYLRLlRHPHIIKLYDVIKSkDEIIMVIEYAGN-- - E L FD YIVQRDK 148 rglrkvlsE.earfyfrQilsaLeYLHsqgIiHRDLKPeNiLLds..hvK +sE+ear++f+Qi+sa+eY+H+++I+HRDLKPeN+LLd++ +vK snf1p 149 ------MSEqEARRFFQQIISAVEYCHRHKIVHRDLKPENLLLDEhlNVK 192 la DFG lArql......ttfvGTpeYm APE vl...gYgkpavDiWSlGcil +aDFGl+ ++++++ +t +G+p+Y APEv++++ Y +p+vD+WS+G+il snf1p 193 IA DFG L SNIMtdgnflK TS CG S P NY A APE VIsgkLYAGPEVDVWSCGVIL 242 yElltGkpPFp..qldlifkkig..........SpeakdLikklLvkdPe y +l+++ PF+++ + ++fk+i ++ ++ ++ Sp a Lik++L ++P snf1p 243 YVMLCRRLPFDdeSIPVLFKNISngvytlpkflSPGAAGLIKRMLIVNPL 292 kRlta.eaLedeldikaHPff&lt;-* +R++++e+++ + +f snf1p 293 NRISIhEIMQ-------DDWF 306 GkG, AiK, GdL, DFG, APE anchor positions -3 +3 Substrate heptapeptide X X X [ST] X X X
  • 8. Predikin: components <ul><li>PredikinDB : database of phosphorylation sites </li></ul><ul><li>Predikin.pm : Perl module to process kinases </li></ul><ul><li>Web server </li></ul>
  • 9. Why not phospho.ELM? +------+-----------+--------+----+-------+------------+------+----------------------+ | acc | sequence | position | code | pmids | kinases | source | entry_date | +------+-----------+--------+----+-------+------------+------+----------------------+ | P04083 |AMVSEFLK...| 20 | Y |2457390| Abl;Src;EGFR |LTP |2004-12-31 00:00:00+01| +------+-----------+--------+----+-------+------------+------+----------------------+ A phosphoELM entry <ul><li>Problems </li></ul><ul><li>Incorrect/missing accession numbers </li></ul><ul><li>Phosphoresidues not at given positions </li></ul><ul><li>Multiple kinase entries per substrate </li></ul><ul><li>Inconsistent names for kinase families </li></ul><ul><li>No way to link kinase name with kinase sequence </li></ul>FT MOD_RES 26 26 Phospho serine ( by PKC ). phospho.ELM is derived from SwissProt entries http://phospho.elm.eu.org
  • 10. PredikinDB construction Substrate UniProt entry ID IF2A_MOUSE AC Q6ZWX6 ; Q3TIQ0; OS Mus musculus (Mouse) . FT MOD_RES 49 49 Phospho serine (by HRI ) ( By similarity ). FT MOD_RES 52 52 Phospho serine (by EIF2AK3 , GCN2 , HRI and PKR ). Entries in table_kinases that match kinase name and species Q9Z2R9 EIF2AK1 Eif2ak1; Hri Q9Z2B5 EIF2AK3 Eif2ak3;Pek;Perk Q9QZ05 EIF2AK4 Eif2ak4; Gcn2 ;Kiaa1338 Q03963 EIF2AK2 Eif2ak2; Pkr ;Prkr;Tik Entry in table_psites substrate_ac residue posn hepta conf kinase_name kinase_ac Q6ZWX6 S 49 ILLSELS 2 HRI Q9Z2R9 Q6ZWX6 S 52 SELSRRR 1 EIF2AK3 Q9Z2B5 Q6ZWX6 S 52 SELSRRR 1 GCN2 Q9QZ05 Q6ZWX6 S 52 SELSRRR 1 HRI Q9Z2R9 Q6ZWX6 S 52 SELSRRR 1 PKR Q03963 PredikinDB links phosphorylation sites to their specific kinase sequences
  • 11. PredikinDB – table schema table_kinases kinase_ac (AC) kinase_id (ID) domain domain_seq kinase_type kinase_name (GN Name) kinase_syn (GN Synonyms) panther_name panther_ac panther_evalue ksd_name ksd_ac ksd_evalue species (OS) kingdom (OC) + 38 SDR-related residues table_psites ID substrate_ac (AC) residue (MOD_RES) position (MOD_RES) hepta confidence (MOD_RES) kinase_name (MOD_RES) kinase_ac (AC) table_substrates substrate_ac (AC) substrate_id (ID) species (OS) kingdom (OC)
  • 12. The Predikin Perl module <ul><li>External tools </li></ul><ul><li>- HMMER + HMM libraries </li></ul><ul><li>- pantherScore </li></ul><ul><li>- DisEMBL, TMHMM (filters) </li></ul><ul><li>Bioperl libraries ( http://www.bioperl.org) </li></ul>protein kinase sequence find catalytic domains assign kinase type locate SDRs assign KSD family assign PANTHER family find substrate XXX[STY]XXX make kinase scoring matrix score XXX[STY]XXX sites
  • 13. Scoring matrices: SDR method Query kinase: GEL+1 = E GEL+3 = F GEL+4 = S Type = Ser/Thr SQL query for heptapeptide position -3: select hepta from psites, kinases where kinase_type = &apos;Ser/Thr&apos; and psites.kinase_ac = kinases.kinase_ac and GELp1 rlike &apos;[ D E N ] &apos; and GELp3 rlike &apos;[ F WY ] &apos; and GELp4 rlike &apos;[ AN S T ]&apos; Heptapeptides : Q FSTVKG E QFSTVK R SVSEAA R SGSSPN R HDSGLD R RMSDEF A RGSFDA Repeat for positions -2 to +3 and corresponding SDRs Frequency matrix PWM (weights) matrix score substrates
  • 14. Scoring matrices: filters and cutoffs Residue Phosphosites Disordered 1 TM Helix 2 S 24 637 23 081 16 T 5 405 4 898 5 Y 4 285 3 318 12 Total 34 327 31 297 ( 91.2% ) 33 ( 0.1% ) <ul><li>Most sites disordered (DisEMBL prediction) </li></ul><ul><li>Most sites not in TM helix (TMHMM prediction) </li></ul>
  • 15. Evaluation of Predikin A brief area under ROC curve primer <ul><li>Outline of evaluation procedure </li></ul><ul><li>Obtain kinase-substrate pairs from PredikinDB </li></ul><ul><li>Construct scoring matrix for kinase (don&apos;t include its substrates) </li></ul><ul><li>Score all XXX[ST]XXX sites in corresponding substrate </li></ul><ul><li>Label sites as 1 (known, annotated) or 0 (unknown, unannotated) </li></ul><ul><li>Generate AROC values using R package ROCR </li></ul>TN TP FP FN unannotated sites annotated sites scores ROC curve
  • 16. Evaluation of comparable methods Comparison with existing methods is not easy <ul><li>Existing tools take a substrate and score sites based on a kinase family </li></ul><ul><li>Predikin takes kinase(s) + substrate(s) and scores sites based on kinase sequence </li></ul><ul><li>Problems to solve </li></ul><ul><li>Determine the kinase families common to other tools </li></ul><ul><li>Relate families to kinase sequences in PredikinDB </li></ul><ul><li>Submit corresponding substrates to each server </li></ul><ul><li>- (no API, standalone tools, web services...) </li></ul><ul><li>Collate scored XXX[ST]XXX sites common to all methods </li></ul><ul><li>Format data for AROC analysis </li></ul>Example submission using HTML::Form (NetPhosK) # get the form my $ua = LWP::UserAgent-&gt;new; my $response = $ua-&gt;get($url); my @forms = HTML::Form-&gt;parse($response); # set the values $forms[0]-&gt;value(&apos; SEQSUB &apos;, “myfile.fa”); $forms[0]-&gt;value(&apos; threshold &apos;, &apos;0.00&apos;); # submit the form my $output = $ua-&gt;request($form[0]-&gt; click ); # parse output
  • 17. Evaluation results <ul><li>Predikin performance equals or exceeds that of existing methods </li></ul><ul><li>Performance may depend on type of kinase </li></ul>Ser/Thr kinase substrates Method Score Predikin SDR score sites used +/- NetPhosK 0.93 0.90 75/5334 KinasePhos 0.80 0.88 76/5663 GPS 0.88 0.87 72/5307 PPSP 0.92 0.87 75/3778 Scansite 0.96 0.87 55/2936 CMGC kinase substrates Method Score Predikin SDR score sites used +/- NetPhosK 0.62 0.96 211/9146 KinasePhos 0.94 0.96 211/9106 GPS 0.93 0.96 211/9146 PPSP 0.94 0.96 208/8158 Scansite 0.95 0.97 175/5039
  • 18. Usage cases kinase substrate score CLA4 1 CLA4 727 KRA T MVG 92.93 CLA4 1 YOL113W 541 KRATMVG 92.93 CLA4 1 YHL021C 129 KGSSFVS 91.87 CLA4 1 YKR010C 527 KRNSITE 91.70 CLA4 1 YNL049C 526 RATSFFG 90.14 CLA4 1 YDL056W 477 KRKSTTP 88.70 CLA4 1 YOL157C 527 KLFSFTK 88.25 CLA4 1 YBR198C 157 RAYSMLK 87.71 CLA4 1 YML076C 878 HRESMTG 87.62 CLA4 1 YOR181W 619 KRKTKVG 87.37 kinase substrate score NP_001547 1 COA1 80 SSM S GLH 85.49 NP_001269 1 COA1 80 SSM S GLH 85.49 XP_042066 1 COA1 80 SSMSGLH 75.77 XP_001128827 1 COA1 80 SSMSGLH 75.77 NP_001013725 1 COA1 80 SSMSGLH 74.72 NP_004064 1 COA1 80 SSMSGLH 73.84 NP_006613 1 COA1 80 SSMSGLH 73.84 NP_001778 1 COA1 80 SSMSGLH 72.21 XP_001128005 1 COA1 80 SSMSGLH 72.21 NP_277021 1 COA1 80 SSMSGLH 72.21 <ul><li>Substrates for CLA4 </li></ul><ul><li>A PAK/STE-20 kinase in S. cerevisiae </li></ul><ul><li>Phosphorylates own activation loop T727? </li></ul><ul><li>Evidence for this in literature </li></ul><ul><li>Kinases for acetyl CoA carboxylase </li></ul><ul><li>Known phosphorylation site on S80 </li></ul><ul><li>Phosphorylated in AMPK knockout mice </li></ul><ul><li>Suggested alternate kinases: IKK α/β </li></ul><ul><li>Experimental evidence (Bruce Kemp) </li></ul>
  • 19. The Predikin webserver: implementation http://predikin.biosci.uq.edu.au perl.so MySQL PredikinDB PHP Predikin.pm Apache Server DisEMBL TMHMM BLAST pantherScore HMMER Client (browser)
  • 20. The Predikin webserver: screenshots Kinase sequence submission
  • 21. The Predikin webserver: screenshots Frequency and weight matrices
  • 22. The Predikin webserver: screenshots Scored sites
  • 23. Acknowledgements <ul><li>Funding &amp; advice (UQ) Testing </li></ul><ul><li>Bostjan Kobe (ARC, NHMRC) Bruce Kemp (SVI Melbourne) </li></ul><ul><li>Thomas Huber Brenda Andrews (U. Toronto) </li></ul><ul><li>Predikin 1.0 (UQ) General </li></ul><ul><li>Ross Brinkworth Kobe Lab </li></ul><ul><li>Robert Breinl </li></ul>

×