SlideShare a Scribd company logo
1 of 22
Prediction of peptidyl prolyl residues in  cis/trans  configuration  using machine learning algorithms Jae-Hyung Lee Genetics, Development and Cell Biology Bioinformatics and Computational Biology Program
What is a protein?
Proteins are polypeptide chains
Torsional angles Although the peptide bond is planer and fixed rotation can and  does  occur about the two single bonds on either side of the a carbon: Φ (Phi), the bond between N and Ca  Ψ (Psi), the bond between Ca and C
Two different peptide configurations cis ω  = -20° ~ 20° trans ω  = -180° ~ -160° or ω  =  160° ~  180°
Importance in isomerization of prolyl peptide bond ,[object Object],[object Object],[object Object],[object Object]
Datasets (1) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Datasets (2) ,[object Object],[object Object],[object Object],[object Object],[object Object],K, A, I, I, S, E, N,  P , C, I, K, H, Y, H, I,  t   K, A, I, I, S, E, N,  P , C, I, K, H, Y, H, I, E ,  E , C,  C, C, C, C, C, E ,  E ,  E ,  E ,  E ,  C, C,  t   secondary structure information Class label amino acid sequence information a. b. Window size: 15 and no ss information Window size: 15 and ss information
Naïve Bayes Classifier (1)  ,[object Object],[object Object]
Naïve Bayes Classifier (2) ,[object Object],[object Object],[object Object]
Support Vector Machine (SVM) (1) ,[object Object],[object Object],[object Object],o x o o o x x x  (o)  (o)  (o)  (o)  (x)  (x)  (x)  (x)
Support Vector Machine (SVM) (2) ,[object Object],A maximal margin hyperplane with its support vector highlighted in the 2-dimensional feature space (  1 ,   2 ) x o x o o o o o x x x x  1  2
Performance evaluation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Result – Naïve Bayes Classifier (1) 0.230  0.610  0.201  0.602  21 0.229  0.609  0.199  0.601  19 0.235  0.610  0.202  0.602  17 0.245  0.614  0.209  0.606  15 0.254  0.618  0.194  0.598  13 0.270  0.625  0.201  0.602  11 0.282  0.629  0.210  0.606  9 0.302  0.637  0.195  0.599  7 0.296  0.635  0.204  0.604  5 0.276  0.628  0.190  0.596  3 CC Accuracy CC Accuracy window size ss information no ss information  
Result – Naïve Bayes Classifier (2) No ss information  ss information incorporated
Result – SVM (1) Polynomial kernel – third degree 0.3258 0.6634 0.2053 0.6037 21 0.3289 0.6649 0.2027 0.6028 19 0.3399 0.6704 0.1966 0.6001 17 0.3312 0.6662 0.1821 0.593 15 0.3253 0.6631 0.2059 0.6047 13 0.3192 0.66 0.2036 0.6034 11 0.3021 0.6515 0.1815 0.5921 9 0.305 0.653 0.1613 0.5817 7 0.246 0.6233 0.1499 0.575 5 0.2594 0.63 0.1625 0.5826 3 CC Accuracy CC Accuracy window size ss information no ss information  
Result – SVM (2) No ss information  ss information incorporated Polynomial kernel – third degree
Result – SVM (3) Window size 17 and ss information incorporated
Discussion (1) ,[object Object]
Discussion (2) ,[object Object],[object Object],Ligand: Phosphopeptide Ligand: SH3 domain of ITK
Summary ,[object Object],[object Object],[object Object],[object Object],[object Object]
References ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

More Related Content

Similar to Project Presentation

Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
jaumebp
 
OpenCL applications in genomics
OpenCL applications in genomicsOpenCL applications in genomics
OpenCL applications in genomics
USC
 
Multivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological dataMultivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological data
Dmitry Grapov
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
butest
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Natalio Krasnogor
 

Similar to Project Presentation (20)

Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
 
Bioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmmBioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmm
 
OpenCL applications in genomics
OpenCL applications in genomicsOpenCL applications in genomics
OpenCL applications in genomics
 
Bioinformatics life sciences_v2015
Bioinformatics life sciences_v2015Bioinformatics life sciences_v2015
Bioinformatics life sciences_v2015
 
“To Fuse or Not to Fuse: Cognitive Diversity for Combining Multiple Scoring S...
“To Fuse or Not to Fuse: Cognitive Diversity for Combining Multiple Scoring S...“To Fuse or Not to Fuse: Cognitive Diversity for Combining Multiple Scoring S...
“To Fuse or Not to Fuse: Cognitive Diversity for Combining Multiple Scoring S...
 
The Infobiotics Contact Map predictor at CASP9
The Infobiotics Contact Map predictor at CASP9The Infobiotics Contact Map predictor at CASP9
The Infobiotics Contact Map predictor at CASP9
 
final_presentation
final_presentationfinal_presentation
final_presentation
 
Structure Modeling of Disordered Protein Interactions
Structure Modeling of Disordered Protein InteractionsStructure Modeling of Disordered Protein Interactions
Structure Modeling of Disordered Protein Interactions
 
ISMB2014読み会 イントロ + Deep learning of the tissue-regulated splicing code
ISMB2014読み会 イントロ + Deep learning of the tissue-regulated splicing codeISMB2014読み会 イントロ + Deep learning of the tissue-regulated splicing code
ISMB2014読み会 イントロ + Deep learning of the tissue-regulated splicing code
 
Cray HPC + D + A = HPDA
Cray HPC + D + A = HPDACray HPC + D + A = HPDA
Cray HPC + D + A = HPDA
 
Basen Network
Basen NetworkBasen Network
Basen Network
 
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
 
6조
6조6조
6조
 
Making effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computationsMaking effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computations
 
Cornell Pbsb 20090126 Nets
Cornell Pbsb 20090126 NetsCornell Pbsb 20090126 Nets
Cornell Pbsb 20090126 Nets
 
Systems Biology & Pharmacology from a Structural Perspective
Systems Biology & Pharmacology from a Structural PerspectiveSystems Biology & Pharmacology from a Structural Perspective
Systems Biology & Pharmacology from a Structural Perspective
 
Multivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological dataMultivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological data
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
 
Microarray Analysis
Microarray AnalysisMicroarray Analysis
Microarray Analysis
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
butest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
butest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
butest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
butest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
butest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
butest
 
Facebook
Facebook Facebook
Facebook
butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
butest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
butest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
butest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
butest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Project Presentation

  • 1. Prediction of peptidyl prolyl residues in cis/trans configuration using machine learning algorithms Jae-Hyung Lee Genetics, Development and Cell Biology Bioinformatics and Computational Biology Program
  • 2. What is a protein?
  • 4. Torsional angles Although the peptide bond is planer and fixed rotation can and does occur about the two single bonds on either side of the a carbon: Φ (Phi), the bond between N and Ca Ψ (Psi), the bond between Ca and C
  • 5. Two different peptide configurations cis ω = -20° ~ 20° trans ω = -180° ~ -160° or ω = 160° ~ 180°
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. Result – Naïve Bayes Classifier (1) 0.230 0.610 0.201 0.602 21 0.229 0.609 0.199 0.601 19 0.235 0.610 0.202 0.602 17 0.245 0.614 0.209 0.606 15 0.254 0.618 0.194 0.598 13 0.270 0.625 0.201 0.602 11 0.282 0.629 0.210 0.606 9 0.302 0.637 0.195 0.599 7 0.296 0.635 0.204 0.604 5 0.276 0.628 0.190 0.596 3 CC Accuracy CC Accuracy window size ss information no ss information  
  • 15. Result – Naïve Bayes Classifier (2) No ss information ss information incorporated
  • 16. Result – SVM (1) Polynomial kernel – third degree 0.3258 0.6634 0.2053 0.6037 21 0.3289 0.6649 0.2027 0.6028 19 0.3399 0.6704 0.1966 0.6001 17 0.3312 0.6662 0.1821 0.593 15 0.3253 0.6631 0.2059 0.6047 13 0.3192 0.66 0.2036 0.6034 11 0.3021 0.6515 0.1815 0.5921 9 0.305 0.653 0.1613 0.5817 7 0.246 0.6233 0.1499 0.575 5 0.2594 0.63 0.1625 0.5826 3 CC Accuracy CC Accuracy window size ss information no ss information  
  • 17. Result – SVM (2) No ss information ss information incorporated Polynomial kernel – third degree
  • 18. Result – SVM (3) Window size 17 and ss information incorporated
  • 19.
  • 20.
  • 21.
  • 22.