Slideshow transcript
Slide 1: Outline of talk ● Introduction to protein kinases ● Prediction of substrate specificity ● Predikin and PredikinDB ● Evaluation Neil Saunders School of Molecular and Microbial Sciences University of Queensland
Slide 2: Introduction to protein kinases Biochemistry OH OPi kinase kinase protein + protein + ATP ADP ● Two major (eukaryotic) types: (1) Ser/Thr; (2) Tyr ● ~ 2% of human genes encode a protein kinase ● At least 30-50% of human proteins phosphorylated ● Regulate essentially every cellular process
Slide 3: Complex signalling networks How do protein kinases find their targets?
Slide 4: Kinase specificity – substrate recruitment Docking interactions Colocalisation LOCATE calcium/calmodulin-dependent protein kinase IV Substrate recruitment ● Any process that brings substrate to kinase - docking - binding to scaffolding protein(s) - colocalisation - coregulation Remenyi et al. (2006) Docking interactions in protein kinase and phosphatase networks. Curr Opin Struct Biol 16: 676-685
Slide 5: Kinase specificity - peptide specificity Amino acid frequency in substrate sequences at X{7}[ST]X{7} sites CK-2 PKA MAPK
Slide 6: Structural basis for peptide specificity Substrate heptapeptide binding to protein kinase A PKA surface + heptapeptide RRASIHD Schematic of heptapeptide + PKA SDRs
Slide 7: Accurate location of key residues using HMMER *->YellkklGkGaFGkVylardkktgrlvAiKvik..........eril Y+++k+lG+G+FGkV+la+++ tg++vA+K+i+++ +++ + ri+ snf1p 55 YQIVKTLGEGSFGKVKLAYHTTTGQKVALKIINkkvlaksdmqGRIE 101 rEikiLkk.dHPNIVkLydvfed.dklylVmEyceGdlGdLfdllkkrgr rEi+ L+ +HP+I+kLydv+ ++d++ +V Ey+++ +Lfd++++r + snf1p 102 REISYLRLlRHPHIIKLYDVIKSkDEIIMVIEYAGN---ELFDYIVQRDK 148 rglrkvlsE.earfyfrQilsaLeYLHsqgIiHRDLKPeNiLLds..hvK +sE+ear++f+Qi+sa+eY+H+++I+HRDLKPeN+LLd++ +vK snf1p 149 ------MSEqEARRFFQQIISAVEYCHRHKIVHRDLKPENLLLDEhlNVK 192 laDFGlArql......ttfvGTpeYmAPEvl...gYgkpavDiWSlGcil +aDFGl+ ++++++ +t +G+p+Y APEv++++ Y +p+vD+WS+G+il snf1p 193 IADFGLSNIMtdgnflKTSCGSPNYAAPEVIsgkLYAGPEVDVWSCGVIL 242 yElltGkpPFp..qldlifkkig..........SpeakdLikklLvkdPe y +l+++ PF+++ + ++fk+i ++ ++ ++ Sp a Lik++L ++P snf1p 243 YVMLCRRLPFDdeSIPVLFKNISngvytlpkflSPGAAGLIKRMLIVNPL 292 kRlta.eaLedeldikaHPff<-* +R++++e+++ + +f snf1p 293 NRISIhEIMQ-------DDWF 306 GkG, AiK, GdL, DFG, APE anchor positions -3 +3 Substrate heptapeptide X X X [ST] X X X
Slide 8: Predikin: components ● PredikinDB: database of phosphorylation sites ● Predikin.pm: Perl module to process kinases ● Web server
Slide 9: Why not phospho.ELM? phospho.ELM is derived from SwissProt entries FT MOD_RES 26 26 Phosphoserine (by PKC). A phosphoELM entry +------+-----------+--------+----+-------+------------+------+----------------------+ |acc |sequence |position|code|pmids |kinases |source|entry_date | +------+-----------+--------+----+-------+------------+------+----------------------+ |P04083|AMVSEFLK...|20 |Y |2457390|Abl;Src;EGFR|LTP |2004-12-31 00:00:00+01| +------+-----------+--------+----+-------+------------+------+----------------------+ http://phospho.elm.eu.org Problems ● Incorrect/missing accession numbers ● Phosphoresidues not at given positions ● Multiple kinase entries per substrate ● Inconsistent names for kinase families ● No way to link kinase name with kinase sequence
Slide 10: PredikinDB construction PredikinDB links phosphorylation sites to their specific kinase sequences Substrate UniProt entry ID IF2A_MOUSE AC Q6ZWX6; Q3TIQ0; OS Mus musculus (Mouse). FT MOD_RES 49 49 Phosphoserine (by HRI) (By similarity). FT MOD_RES 52 52 Phosphoserine (by EIF2AK3, GCN2, HRI and PKR). Entries in table_kinases that match kinase name and species Q9Z2R9 EIF2AK1 Eif2ak1;Hri Q9Z2B5 EIF2AK3 Eif2ak3;Pek;Perk Q9QZ05 EIF2AK4 Eif2ak4;Gcn2;Kiaa1338 Q03963 EIF2AK2 Eif2ak2;Pkr;Prkr;Tik Entry in table_psites substrate_ac residue posn hepta conf kinase_name kinase_ac Q6ZWX6 S 49 ILLSELS 2 HRI Q9Z2R9 Q6ZWX6 S 52 SELSRRR 1 EIF2AK3 Q9Z2B5 Q6ZWX6 S 52 SELSRRR 1 GCN2 Q9QZ05 Q6ZWX6 S 52 SELSRRR 1 HRI Q9Z2R9 Q6ZWX6 S 52 SELSRRR 1 PKR Q03963
Slide 11: PredikinDB – table schema table_psites ID table_kinases substrate_ac (AC) kinase_ac (AC) residue (MOD_RES) kinase_id (ID) position (MOD_RES) domain hepta domain_seq confidence (MOD_RES) kinase_type kinase_name (MOD_RES) kinase_name (GN Name) kinase_ac (AC) kinase_syn (GN Synonyms) panther_name panther_ac panther_evalue ksd_name table_substrates ksd_ac substrate_ac (AC) ksd_evalue substrate_id (ID) species (OS) species (OS) kingdom (OC) kingdom (OC) + 38 SDR-related residues
Slide 12: The Predikin Perl module find catalytic domains assign kinase type locate SDRs assign KSD family protein kinase sequence assign PANTHER family make kinase scoring matrix ● External tools find substrate XXX[STY]XXX - HMMER + HMM libraries - pantherScore - DisEMBL, TMHMM (filters) score XXX[STY]XXX sites ● Bioperl libraries (http://www.bioperl.org)
Slide 13: Scoring matrices: SDR method Query kinase: GEL+1 = E Frequency matrix GEL+3 = F GEL+4 = S Type = Ser/Thr SQL query for heptapeptide position -3: select hepta from psites, kinases where kinase_type = 'Ser/Thr' and psites.kinase_ac = kinases.kinase_ac and GELp1 rlike '[DEN]' PWM (weights) matrix and GELp3 rlike '[FWY]' and GELp4 rlike '[ANST]' Heptapeptides: QFSTVKG EQFSTVK RSVSEAA RSGSSPN RHDSGLD RRMSDEF score substrates ARGSFDA Repeat for positions -2 to +3 and corresponding SDRs
Slide 14: Scoring matrices: filters and cutoffs Residue Phosphosites Disordered1 TM Helix2 S 24 637 23 081 16 T 5 405 4 898 5 Y 4 285 3 318 12 Total 34 327 31 297 (91.2%) 33 (0.1%) 1 Most sites disordered (DisEMBL prediction) 2 Most sites not in TM helix (TMHMM prediction)
Slide 15: Evaluation of Predikin A brief area under ROC curve primer scores ROC curve TN TP FN FP unannotated sites annotated sites Outline of evaluation procedure ● Obtain kinase-substrate pairs from PredikinDB ● Construct scoring matrix for kinase (don't include its substrates) ● Score all XXX[ST]XXX sites in corresponding substrate ● Label sites as 1 (known, annotated) or 0 (unknown, unannotated) ● Generate AROC values using R package ROCR
Slide 16: Evaluation of comparable methods Comparison with existing methods is not easy Existing tools take a substrate and score sites based on a kinase family Predikin takes kinase(s) + substrate(s) and scores sites based on kinase sequence Problems to solve ● Determine the kinase families common to other tools ● Relate families to kinase sequences in PredikinDB ● Submit corresponding substrates to each server - (no API, standalone tools, web services...) ● Collate scored XXX[ST]XXX sites common to all methods ● Format data for AROC analysis Example submission using HTML::Form (NetPhosK) # get the form my $ua = LWP::UserAgent->new; my $response = $ua->get($url); my @forms = HTML::Form->parse($response); # set the values $forms[0]->value('SEQSUB', “myfile.fa”); $forms[0]->value('threshold', '0.00'); # submit the form my $output = $ua->request($form[0]->click); # parse output
Slide 17: Evaluation results Ser/Thr kinase substrates Method Score Predikin SDR score sites used +/- NetPhosK 0.93 0.90 75/5334 KinasePhos 0.80 0.88 76/5663 GPS 0.88 0.87 72/5307 PPSP 0.92 0.87 75/3778 Scansite 0.96 0.87 55/2936 CMGC kinase substrates Method Score Predikin SDR score sites used +/- NetPhosK 0.62 0.96 211/9146 KinasePhos 0.94 0.96 211/9106 GPS 0.93 0.96 211/9146 PPSP 0.94 0.96 208/8158 Scansite 0.95 0.97 175/5039 ● Predikin performance equals or exceeds that of existing methods ● Performance may depend on type of kinase
Slide 18: Usage cases kinase substrate score CLA4 1 CLA4 727 KRATMVG 92.93 CLA4 1 YOL113W 541 KRATMVG 92.93 CLA4 1 YHL021C 129 KGSSFVS 91.87 Substrates for CLA4 CLA4 1 YKR010C 527 KRNSITE 91.70 ● A PAK/STE-20 kinase in S. cerevisiae CLA4 1 YNL049C 526 RATSFFG 90.14 CLA4 1 YDL056W 477 KRKSTTP 88.70 ● Phosphorylates own activation loop T727? CLA4 1 YOL157C 527 KLFSFTK 88.25 CLA4 1 YBR198C 157 RAYSMLK 87.71 ● Evidence for this in literature CLA4 1 YML076C 878 HRESMTG 87.62 CLA4 1 YOR181W 619 KRKTKVG 87.37 kinase substrate score NP_001547 1 COA1 80 SSMSGLH 85.49 NP_001269 1 COA1 80 SSMSGLH 85.49 Kinases for acetyl CoA carboxylase XP_042066 1 COA1 80 SSMSGLH 75.77 ● XP_001128827 1 COA1 80 SSMSGLH 75.77 Known phosphorylation site on S80 NP_001013725 1 COA1 80 SSMSGLH 74.72 ● Phosphorylated in AMPK knockout mice NP_004064 1 COA1 80 SSMSGLH 73.84 NP_006613 1 COA1 80 SSMSGLH 73.84 ● Suggested alternate kinases: IKK α/β NP_001778 1 COA1 80 SSMSGLH 72.21 ● Experimental evidence (Bruce Kemp) XP_001128005 1 COA1 80 SSMSGLH 72.21 NP_277021 1 COA1 80 SSMSGLH 72.21
Slide 19: The Predikin webserver: implementation Client (browser) Server Apache MySQL PHP PredikinDB HMMER pantherScore perl.so Predikin.pm BLAST DisEMBL TMHMM http://predikin.biosci.uq.edu.au
Slide 20: The Predikin webserver: screenshots Kinase sequence submission
Slide 21: The Predikin webserver: screenshots Frequency and weight matrices
Slide 22: The Predikin webserver: screenshots Scored sites
Slide 23: Acknowledgements Funding & advice (UQ) Testing ● Bostjan Kobe (ARC, NHMRC) Bruce Kemp (SVI Melbourne) ● Thomas Huber Brenda Andrews (U. Toronto) Predikin 1.0 (UQ) General ● Ross Brinkworth Kobe Lab ● Robert Breinl



Add a comment on Slide 1
If you have a SlideShare account, login to comment; else you can comment as a guest- Favorites & Groups
Showing 1-50 of 1 (more)