SlideShare a Scribd company logo
Improved Prediction of
Protein-Protein binding
sites using SVM
Presented by
Malavika Vidyarthi Siddhant Gawsane
1001398623 1001231597
01
BIOINFORMATICS PAPER
 The authors of this paper are James R. Bradford and David R.
Westhead, research scholars at the School of Biochemistry
and Molecular Biology, University of Leeds, UK
02
Contents
 Keywords
 Understanding Proteins, their Structure, Interactions and Patches
 Purpose of Paper - Introduction
 Dataset used
 Surface generation and interface definitions
 Properties of Protein Patches
 Why SVMs?
 What is a SVM and how does it work?
 Experimental set-up and Methodology
 Validation and Results
 References
03
Keywords
 Complexes
 Transient Interfaces
 Obligate Interfaces
 Docking Algorithms
 Residues
 Resolution of a Protein (higher resolution)
04
Proteins 101
• Complex Amino-acids
that fold into a highly
stable spherical structure
• Structure, function, and
regulation of the body’s
tissues and organs
05
Protein Structure06
Protein Interactions
 Proteins often form temporary bonds with each other called Protein-Protein
Interactions
 Protein interactions are responsible for most processes in your body
1. Signal Transduction
2. Transport Across Membranes
3. Cell Metabolism
4. Muscle Contraction
 Protein surfaces can be split into 'patches'.
 These patches can either be interacting or non-interacting.
07
PROTEIN PATCHES
08
Purpose of the Paper
 Multiple protein-protein structures are produced, with unknown functions
 By identifying interacting surfaces, important clues to the function of proteins can
be determined
 Each binding site shares important properties, which differentiate them from the
rest of the protein
 However, no single property can do an absolute distinction
 Multiple such physical-chemical properties are combined for this purpose
 The authors have applied Support Vector Machines to predict these binding sites
09
Dataset Used
 The authors manually produced their own high-quality, non-redundant dataset.
 A comprehensive set of complexes was chosen from the Protein Data Bank.
 They were subject to a number of stringent filters such as the following:
 Proteins sharing >20% surface identity with higher resolution proteins were
eliminated
 Interfaces that were a result of crystal packing were eliminated only retaining
dimers
 Dimers containing <20% residue were eliminated
 A total of 180 proteins taken from 149 complexes survived the filtering process.
 Of the 180, 36 were involved in enzyme inhibitor interactions, 27 in hetero-obligate
interactions, 87 in homo-obligate interactions and 30 in non-enzyme inhibitor
transient interactions.
10
Surface Generation & Interface Definition
 All protein surfaces used were solvent excluded surfaces
 An atom became a part of the surface if it lost >99% of its surface in complex
formation
 Surface Patch Generation – The radius of the sphere needed to produce the
patch is calculated and is placed on the center of the surface vector chosen to be
the center of the patch
 Due to irregular topography of the protein surface, a large surface connected by
several small surfaces is generated. Only the largest one is retained.
 Patch Size - The size of the patch is determined by the size of the interacting
proteins in the complex and the size of the interface.
 Using Linear Regression it was found that the interface size was equivalent to
13% of the smallest protein in the complex and 12% of the size of the parent
protein.
11
Patches and Interfaces12
Properties of Protein Patches
Every surface vertex is labelled with 6 surface properties. They are:
 Surface Shape – Two parameters called "Shape Index" and "Curvedness" are
calculated.
 Shape Index – Describes the shape of the local surface at any given point as is independent of
the scale of the surface.
 Curvedness – Is the measure of the curvature of the surface.
 Conservation – The rate of evolution among amino acids. A BLAST search was
performed and the resultant homologous sequences and the query sequence
was aligned and the conservation score was determined.
13
 Electrostatic Potential – It is basically the charge on the protein. For example, a
DNA-binding protein has a pocket of positive charge so that it can bind with
which has negative charge from all the phosphates. The electrostatic potential of
each individual protein was computed.
 Hydrophobicity - The tendency of non-polar substances to aggregate in an
aqueous solution and exclude water molecules. All hydrophobic protein molecule
surfaces were determined.
 Residue and Interface Propensity – The interactions of all proteins in the entire
protein family is calculated.
 Solvent Accessible Surface Area – The solvent accessible surface area for each
atom on the protein was taken from the previously calculated studies.
14
Why Support Vector Machines?
 SVMs demonstrate high prediction accuracy whilst avoiding over-fitting.
 They also handle large feature spaces and condense the information given by the
training dataset using support vectors.
 SVMs have also been applied to other similar molecular biology applications such
as gene expression classification, protein classification, protein fold recognition,
prediction of protein solvent accessibility, etc.
15
How does SVM work?
 Classification – Visibly separating data points in the given feature space
 Distance from example xi to the separator is
 Examples closest to the hyperplane are support vectors.
 Margin ρ of the separator is the distance between support vectors.
 It now reduces to an maximization problem, where ρ needs to be maximized
 For data-points that aren't linearly separable, we use kernel methods
16
How does SVM work?17
The Experimental Set-up
 Kernel: (Gaussian) Radial Basis Function
 Regularization Parameter C= 1.0
 Kernel Coefficient γ = 0.01
 Dataset of 180 patches hand annotated into interacting and non-interacting
 Varied evaluation and validation techniques
18
Methodology
19
Validation & Results
 Primary Validation: Leave-one-out cross validation run 5 times
 Patch Creation Evaluation:
 Specificity: number of interface residues in patch/number of patch residues
 Sensitivity: number of interface residues in patch/number of interface residues
 Success if a patch with over 50% specificity and 20% sensitivity was ranked in the top
three
 Able to predict the location of the interface on 76% (136/180) of the proteins in
dataset. In 60% (81/136) of these instances, a patch with over 50% specificity and
20% sensitivity was the top ranked patch
 P-value significance tests were used to prove that this method performed atleast
twice as better as random sampling
20
Validation & Results21
Validation & Results
 Other Dataset:
 Jones and Thornton achieved a success rate of 64% (30/47) with their path analysis
tool
 Our method achieved 72% (34/47) on the same dataset
 Secondary Heterogeneous Cross-validation: training the SVM on the proteins
involved in obligate interactions and predicting on the transient (enzyme-
inhibitor and NEIT) complex types and vice versa gave a success rate of 64%
[42/66; Table 2] and on obligate interfaces based on training with transients with
a success rate of 83% [95/114;Table 2]
22
Validation & Results23
Validation & Results
 Unbound Proteins: Select 10 unbound proteins that have >70% sequence identity
within the dataset
 All nine of these predictions reached >50% sensitivity, which suggested that our
patch sizes, calculated as 6% of the whole protein surface are were an accurate
estimate of interface size
 No patch was ranked below two and five were ranked first
24
Validation & Results25
Validation & Results
 Unbound Proteins: Select 10 unbound proteins that have >70% sequence identity
within the dataset
 All nine of these predictions reached >50% sensitivity, which suggested that our
patch sizes, calculated as 6% of the whole protein surface are were an accurate
estimate of interface size
 No patch was ranked below two and five were ranked first
 CAPRI (Critical Assessment of PRediction of Interactions): A significant prediction
of the interface was made in 11 of the 15 cases where the P-value for random
predictions was <0.25
26
Validation & Results27
References
 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.911.9822&rep=rep1&typ
e=pdf
 https://en.wikipedia.org/wiki/Radial_basis_function_kernel
28
Questions or Comments?29
THANK YOU!!
30

More Related Content

What's hot

Presentation on concept of pharmacophore mapping and pharmacophore based scre...
Presentation on concept of pharmacophore mapping and pharmacophore based scre...Presentation on concept of pharmacophore mapping and pharmacophore based scre...
Presentation on concept of pharmacophore mapping and pharmacophore based scre...
B V V S Hanagal Shri Kumareshwar College of Pharmacy, Bagalkote
 
Molecular docking
Molecular dockingMolecular docking
Molecular dockingpalliyath91
 
Beyer MDM2 Publication 2016.PDF
Beyer MDM2 Publication 2016.PDFBeyer MDM2 Publication 2016.PDF
Beyer MDM2 Publication 2016.PDFGeorge Beyer
 
CADD
CADDCADD
Exploration of a potential FtsZ inhibitors as new scaffolds by Ligand and Str...
Exploration of a potential FtsZ inhibitors as new scaffolds by Ligand and Str...Exploration of a potential FtsZ inhibitors as new scaffolds by Ligand and Str...
Exploration of a potential FtsZ inhibitors as new scaffolds by Ligand and Str...
Pavan Kumar
 
Lecture 7 computer aided drug design
Lecture 7  computer aided drug designLecture 7  computer aided drug design
Lecture 7 computer aided drug design
RAJAN ROLTA
 
In silico drug design/Molecular docking
In silico drug design/Molecular dockingIn silico drug design/Molecular docking
In silico drug design/Molecular docking
Kannan Iyanar
 
SAR BY NMR (Structure Activity Relationship by Using NMR)
SAR BY NMR (Structure Activity Relationship by Using NMR)SAR BY NMR (Structure Activity Relationship by Using NMR)
SAR BY NMR (Structure Activity Relationship by Using NMR)
SAKEEL AHMED
 
Lecture 4 ligand based drug design
Lecture 4 ligand based drug designLecture 4 ligand based drug design
Lecture 4 ligand based drug design
RAJAN ROLTA
 
Denovo Drug Design
Denovo Drug DesignDenovo Drug Design
Denovo Drug Design
Somasekhar Gupta
 
Computer aided drug design
Computer aided drug designComputer aided drug design
Computer aided drug design
St. Xavier's college, maitighar,Kathmandu
 
Virtual screening strategies and applications
Virtual screening strategies and applicationsVirtual screening strategies and applications
Virtual screening strategies and applications
Ashishkumar3249
 
Lecture 10 pharmacophore modeling and sar paradox
Lecture 10  pharmacophore modeling and sar paradoxLecture 10  pharmacophore modeling and sar paradox
Lecture 10 pharmacophore modeling and sar paradox
RAJAN ROLTA
 
Lecture 6 –active site identification
Lecture 6 –active site identificationLecture 6 –active site identification
Lecture 6 –active site identification
RAJAN ROLTA
 
Rational drug design
Rational drug designRational drug design
Rational drug design
Naresh Juttu
 
Lecture 3 rational drug design
Lecture 3  rational drug designLecture 3  rational drug design
Lecture 3 rational drug design
RAJAN ROLTA
 
Scoring function
Scoring functionScoring function
Scoring function
SAURABH KUMAR
 
3D-Screen Technology
3D-Screen Technology3D-Screen Technology
3D-Screen Technologypguedat
 

What's hot (20)

Presentation on concept of pharmacophore mapping and pharmacophore based scre...
Presentation on concept of pharmacophore mapping and pharmacophore based scre...Presentation on concept of pharmacophore mapping and pharmacophore based scre...
Presentation on concept of pharmacophore mapping and pharmacophore based scre...
 
Molecular docking
Molecular dockingMolecular docking
Molecular docking
 
Computer aided Drug designing (CADD)
Computer aided Drug designing (CADD)Computer aided Drug designing (CADD)
Computer aided Drug designing (CADD)
 
Beyer MDM2 Publication 2016.PDF
Beyer MDM2 Publication 2016.PDFBeyer MDM2 Publication 2016.PDF
Beyer MDM2 Publication 2016.PDF
 
CADD
CADDCADD
CADD
 
Exploration of a potential FtsZ inhibitors as new scaffolds by Ligand and Str...
Exploration of a potential FtsZ inhibitors as new scaffolds by Ligand and Str...Exploration of a potential FtsZ inhibitors as new scaffolds by Ligand and Str...
Exploration of a potential FtsZ inhibitors as new scaffolds by Ligand and Str...
 
Lecture 7 computer aided drug design
Lecture 7  computer aided drug designLecture 7  computer aided drug design
Lecture 7 computer aided drug design
 
In silico drug design/Molecular docking
In silico drug design/Molecular dockingIn silico drug design/Molecular docking
In silico drug design/Molecular docking
 
SAR BY NMR (Structure Activity Relationship by Using NMR)
SAR BY NMR (Structure Activity Relationship by Using NMR)SAR BY NMR (Structure Activity Relationship by Using NMR)
SAR BY NMR (Structure Activity Relationship by Using NMR)
 
Lecture 4 ligand based drug design
Lecture 4 ligand based drug designLecture 4 ligand based drug design
Lecture 4 ligand based drug design
 
Denovo Drug Design
Denovo Drug DesignDenovo Drug Design
Denovo Drug Design
 
Computer aided drug design
Computer aided drug designComputer aided drug design
Computer aided drug design
 
Virtual screening strategies and applications
Virtual screening strategies and applicationsVirtual screening strategies and applications
Virtual screening strategies and applications
 
Lecture 10 pharmacophore modeling and sar paradox
Lecture 10  pharmacophore modeling and sar paradoxLecture 10  pharmacophore modeling and sar paradox
Lecture 10 pharmacophore modeling and sar paradox
 
Lecture 6 –active site identification
Lecture 6 –active site identificationLecture 6 –active site identification
Lecture 6 –active site identification
 
Rational drug design
Rational drug designRational drug design
Rational drug design
 
25.qsar
25.qsar25.qsar
25.qsar
 
Lecture 3 rational drug design
Lecture 3  rational drug designLecture 3  rational drug design
Lecture 3 rational drug design
 
Scoring function
Scoring functionScoring function
Scoring function
 
3D-Screen Technology
3D-Screen Technology3D-Screen Technology
3D-Screen Technology
 

Similar to Predicting protein binding sites using svm

Powerpoint
PowerpointPowerpoint
Powerpointbutest
 
The Butterfly Effect: How to see the impact of small changes to your ADC
The Butterfly Effect: How to see the impact of small changes to your ADCThe Butterfly Effect: How to see the impact of small changes to your ADC
The Butterfly Effect: How to see the impact of small changes to your ADC
MilliporeSigma
 
The Butterfly Effect: How to see the impact of small changes to your ADC
The Butterfly Effect: How to see the impact of small changes to your ADCThe Butterfly Effect: How to see the impact of small changes to your ADC
The Butterfly Effect: How to see the impact of small changes to your ADC
Merck Life Sciences
 
An Integrated Approach to Uncover Drivers of Cancer
An Integrated Approach to Uncover Drivers of CancerAn Integrated Approach to Uncover Drivers of Cancer
An Integrated Approach to Uncover Drivers of CancerRaunak Shrestha
 
Crimson Publishers-Predicting Protein Transmembrane Regionsby Using LSTM Model
Crimson Publishers-Predicting Protein Transmembrane Regionsby Using LSTM ModelCrimson Publishers-Predicting Protein Transmembrane Regionsby Using LSTM Model
Crimson Publishers-Predicting Protein Transmembrane Regionsby Using LSTM Model
CrimsonPublishers-SBB
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Rajarshi Guha
 
cadd-191129134050 (1).pptx
cadd-191129134050 (1).pptxcadd-191129134050 (1).pptx
cadd-191129134050 (1).pptx
Noorelhuda2
 
Viral Protein Structure Predictions - Consensus Strategy
Viral Protein Structure Predictions - Consensus StrategyViral Protein Structure Predictions - Consensus Strategy
Viral Protein Structure Predictions - Consensus StrategyKeiji Takamoto
 
Mapping metabolites against pathway databases
Mapping metabolites against pathway databases Mapping metabolites against pathway databases
Mapping metabolites against pathway databases
Dinesh Barupal
 
Strong Heredity Models in High Dimensional Data
Strong Heredity Models in High Dimensional DataStrong Heredity Models in High Dimensional Data
Strong Heredity Models in High Dimensional Data
sahirbhatnagar
 
Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxBioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptx
Ranjan Jyoti Sarma
 
HIGH THROUGHPUT SCREENING.pptx
HIGH THROUGHPUT SCREENING.pptxHIGH THROUGHPUT SCREENING.pptx
HIGH THROUGHPUT SCREENING.pptx
SunaynaChoudhary
 
IRJET - A Framework for Predicting Drug Effectiveness in Human Body
IRJET - A Framework for Predicting Drug Effectiveness in Human BodyIRJET - A Framework for Predicting Drug Effectiveness in Human Body
IRJET - A Framework for Predicting Drug Effectiveness in Human Body
IRJET Journal
 
Protein Qualitative Analysis Services
Protein Qualitative Analysis ServicesProtein Qualitative Analysis Services
Protein Qualitative Analysis Services
Creative Proteomics
 
Modelling Proteins By Computational Structural Biology
Modelling Proteins By Computational Structural BiologyModelling Proteins By Computational Structural Biology
Modelling Proteins By Computational Structural Biology
Antonio E. Serrano
 
mAbChem poster-2015 ADC meeting
mAbChem poster-2015 ADC meetingmAbChem poster-2015 ADC meeting
mAbChem poster-2015 ADC meetingRongliang Lou
 
Bioanalytical Method Validation
Bioanalytical Method ValidationBioanalytical Method Validation
Bioanalytical Method Validation
Sanket Shinde
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformaticsAbhishek Vatsa
 
Affinity chromatography
Affinity chromatographyAffinity chromatography
Affinity chromatography
SIHAS
 

Similar to Predicting protein binding sites using svm (20)

Powerpoint
PowerpointPowerpoint
Powerpoint
 
1207.2600
1207.26001207.2600
1207.2600
 
The Butterfly Effect: How to see the impact of small changes to your ADC
The Butterfly Effect: How to see the impact of small changes to your ADCThe Butterfly Effect: How to see the impact of small changes to your ADC
The Butterfly Effect: How to see the impact of small changes to your ADC
 
The Butterfly Effect: How to see the impact of small changes to your ADC
The Butterfly Effect: How to see the impact of small changes to your ADCThe Butterfly Effect: How to see the impact of small changes to your ADC
The Butterfly Effect: How to see the impact of small changes to your ADC
 
An Integrated Approach to Uncover Drivers of Cancer
An Integrated Approach to Uncover Drivers of CancerAn Integrated Approach to Uncover Drivers of Cancer
An Integrated Approach to Uncover Drivers of Cancer
 
Crimson Publishers-Predicting Protein Transmembrane Regionsby Using LSTM Model
Crimson Publishers-Predicting Protein Transmembrane Regionsby Using LSTM ModelCrimson Publishers-Predicting Protein Transmembrane Regionsby Using LSTM Model
Crimson Publishers-Predicting Protein Transmembrane Regionsby Using LSTM Model
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
 
cadd-191129134050 (1).pptx
cadd-191129134050 (1).pptxcadd-191129134050 (1).pptx
cadd-191129134050 (1).pptx
 
Viral Protein Structure Predictions - Consensus Strategy
Viral Protein Structure Predictions - Consensus StrategyViral Protein Structure Predictions - Consensus Strategy
Viral Protein Structure Predictions - Consensus Strategy
 
Mapping metabolites against pathway databases
Mapping metabolites against pathway databases Mapping metabolites against pathway databases
Mapping metabolites against pathway databases
 
Strong Heredity Models in High Dimensional Data
Strong Heredity Models in High Dimensional DataStrong Heredity Models in High Dimensional Data
Strong Heredity Models in High Dimensional Data
 
Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxBioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptx
 
HIGH THROUGHPUT SCREENING.pptx
HIGH THROUGHPUT SCREENING.pptxHIGH THROUGHPUT SCREENING.pptx
HIGH THROUGHPUT SCREENING.pptx
 
IRJET - A Framework for Predicting Drug Effectiveness in Human Body
IRJET - A Framework for Predicting Drug Effectiveness in Human BodyIRJET - A Framework for Predicting Drug Effectiveness in Human Body
IRJET - A Framework for Predicting Drug Effectiveness in Human Body
 
Protein Qualitative Analysis Services
Protein Qualitative Analysis ServicesProtein Qualitative Analysis Services
Protein Qualitative Analysis Services
 
Modelling Proteins By Computational Structural Biology
Modelling Proteins By Computational Structural BiologyModelling Proteins By Computational Structural Biology
Modelling Proteins By Computational Structural Biology
 
mAbChem poster-2015 ADC meeting
mAbChem poster-2015 ADC meetingmAbChem poster-2015 ADC meeting
mAbChem poster-2015 ADC meeting
 
Bioanalytical Method Validation
Bioanalytical Method ValidationBioanalytical Method Validation
Bioanalytical Method Validation
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
Affinity chromatography
Affinity chromatographyAffinity chromatography
Affinity chromatography
 

Recently uploaded

一比一原版纽约大学毕业证(NYU毕业证)成绩单留信认证
一比一原版纽约大学毕业证(NYU毕业证)成绩单留信认证一比一原版纽约大学毕业证(NYU毕业证)成绩单留信认证
一比一原版纽约大学毕业证(NYU毕业证)成绩单留信认证
o6ov5dqmf
 
定制(wsu毕业证书)美国华盛顿州立大学毕业证学位证书实拍图原版一模一样
定制(wsu毕业证书)美国华盛顿州立大学毕业证学位证书实拍图原版一模一样定制(wsu毕业证书)美国华盛顿州立大学毕业证学位证书实拍图原版一模一样
定制(wsu毕业证书)美国华盛顿州立大学毕业证学位证书实拍图原版一模一样
khvdq584
 
CCSN_June_06 2024_jones. Cancer Rehabpptx
CCSN_June_06 2024_jones. Cancer RehabpptxCCSN_June_06 2024_jones. Cancer Rehabpptx
CCSN_June_06 2024_jones. Cancer Rehabpptx
Canadian Cancer Survivor Network
 
PET CT beginners Guide covers some of the underrepresented topics in PET CT
PET CT  beginners Guide  covers some of the underrepresented topics  in PET CTPET CT  beginners Guide  covers some of the underrepresented topics  in PET CT
PET CT beginners Guide covers some of the underrepresented topics in PET CT
MiadAlsulami
 
CHAPTER 1 SEMESTER V PREVENTIVE-PEDIATRICS.pdf
CHAPTER 1 SEMESTER V PREVENTIVE-PEDIATRICS.pdfCHAPTER 1 SEMESTER V PREVENTIVE-PEDIATRICS.pdf
CHAPTER 1 SEMESTER V PREVENTIVE-PEDIATRICS.pdf
Sachin Sharma
 
HUMAN BRAIN.pptx.PRIYA BHOJWANI@GAMIL.COM
HUMAN BRAIN.pptx.PRIYA BHOJWANI@GAMIL.COMHUMAN BRAIN.pptx.PRIYA BHOJWANI@GAMIL.COM
HUMAN BRAIN.pptx.PRIYA BHOJWANI@GAMIL.COM
priyabhojwani1200
 
How Effective is Homeopathic Medicine for Anxiety and Stress Relief.pdf
How Effective is Homeopathic Medicine for Anxiety and Stress Relief.pdfHow Effective is Homeopathic Medicine for Anxiety and Stress Relief.pdf
How Effective is Homeopathic Medicine for Anxiety and Stress Relief.pdf
Dharma Homoeopathy
 
Child Welfare Clinic and Well baby clinicin Sri Lanka.ppsx
Child Welfare Clinic and Well baby clinicin Sri Lanka.ppsxChild Welfare Clinic and Well baby clinicin Sri Lanka.ppsx
Child Welfare Clinic and Well baby clinicin Sri Lanka.ppsx
Sankalpa Gunathilaka
 
Stem Cell Solutions: Dr. David Greene's Path to Non-Surgical Cardiac Care
Stem Cell Solutions: Dr. David Greene's Path to Non-Surgical Cardiac CareStem Cell Solutions: Dr. David Greene's Path to Non-Surgical Cardiac Care
Stem Cell Solutions: Dr. David Greene's Path to Non-Surgical Cardiac Care
Dr. David Greene Arizona
 
Champions of Health Spotlight On Leaders Shaping Germany's Healthcare.pdf
Champions of Health Spotlight On Leaders Shaping Germany's Healthcare.pdfChampions of Health Spotlight On Leaders Shaping Germany's Healthcare.pdf
Champions of Health Spotlight On Leaders Shaping Germany's Healthcare.pdf
eurohealthleaders
 
TEST BANK For Accounting Information Systems, 3rd Edition by Vernon Richardso...
TEST BANK For Accounting Information Systems, 3rd Edition by Vernon Richardso...TEST BANK For Accounting Information Systems, 3rd Edition by Vernon Richardso...
TEST BANK For Accounting Information Systems, 3rd Edition by Vernon Richardso...
rightmanforbloodline
 
Luxurious Spa In Ajman Chandrima Massage Center
Luxurious Spa In Ajman Chandrima Massage CenterLuxurious Spa In Ajman Chandrima Massage Center
Luxurious Spa In Ajman Chandrima Massage Center
Chandrima Spa Ajman
 
CMHPSM Regional Compliance Training 2024
CMHPSM Regional Compliance Training 2024CMHPSM Regional Compliance Training 2024
CMHPSM Regional Compliance Training 2024
JColaianne
 
Cold Sores: Causes, Treatments, and Prevention Strategies | The Lifesciences ...
Cold Sores: Causes, Treatments, and Prevention Strategies | The Lifesciences ...Cold Sores: Causes, Treatments, and Prevention Strategies | The Lifesciences ...
Cold Sores: Causes, Treatments, and Prevention Strategies | The Lifesciences ...
The Lifesciences Magazine
 
Navigating Challenges: Mental Health, Legislation, and the Prison System in B...
Navigating Challenges: Mental Health, Legislation, and the Prison System in B...Navigating Challenges: Mental Health, Legislation, and the Prison System in B...
Navigating Challenges: Mental Health, Legislation, and the Prison System in B...
Guillermo Rivera
 
Professional Secrecy: Forensic Medicine Lecture
Professional Secrecy: Forensic Medicine LectureProfessional Secrecy: Forensic Medicine Lecture
Professional Secrecy: Forensic Medicine Lecture
DIVYANSHU740006
 
Cardiac Arrhythmias (2).pdf for nursing student
Cardiac Arrhythmias (2).pdf for nursing studentCardiac Arrhythmias (2).pdf for nursing student
Cardiac Arrhythmias (2).pdf for nursing student
fahmyahmed789
 
Dr. David Greene R3 stem cell Breakthroughs: Stem Cell Therapy in Cardiology
Dr. David Greene R3 stem cell Breakthroughs: Stem Cell Therapy in CardiologyDr. David Greene R3 stem cell Breakthroughs: Stem Cell Therapy in Cardiology
Dr. David Greene R3 stem cell Breakthroughs: Stem Cell Therapy in Cardiology
R3 Stem Cell
 
Top massage center in ajman chandrima Spa
Top massage center in ajman chandrima  SpaTop massage center in ajman chandrima  Spa
Top massage center in ajman chandrima Spa
Chandrima Spa Ajman
 
Trauma Outpatient Center .
Trauma Outpatient Center                       .Trauma Outpatient Center                       .
Trauma Outpatient Center .
TraumaOutpatientCent
 

Recently uploaded (20)

一比一原版纽约大学毕业证(NYU毕业证)成绩单留信认证
一比一原版纽约大学毕业证(NYU毕业证)成绩单留信认证一比一原版纽约大学毕业证(NYU毕业证)成绩单留信认证
一比一原版纽约大学毕业证(NYU毕业证)成绩单留信认证
 
定制(wsu毕业证书)美国华盛顿州立大学毕业证学位证书实拍图原版一模一样
定制(wsu毕业证书)美国华盛顿州立大学毕业证学位证书实拍图原版一模一样定制(wsu毕业证书)美国华盛顿州立大学毕业证学位证书实拍图原版一模一样
定制(wsu毕业证书)美国华盛顿州立大学毕业证学位证书实拍图原版一模一样
 
CCSN_June_06 2024_jones. Cancer Rehabpptx
CCSN_June_06 2024_jones. Cancer RehabpptxCCSN_June_06 2024_jones. Cancer Rehabpptx
CCSN_June_06 2024_jones. Cancer Rehabpptx
 
PET CT beginners Guide covers some of the underrepresented topics in PET CT
PET CT  beginners Guide  covers some of the underrepresented topics  in PET CTPET CT  beginners Guide  covers some of the underrepresented topics  in PET CT
PET CT beginners Guide covers some of the underrepresented topics in PET CT
 
CHAPTER 1 SEMESTER V PREVENTIVE-PEDIATRICS.pdf
CHAPTER 1 SEMESTER V PREVENTIVE-PEDIATRICS.pdfCHAPTER 1 SEMESTER V PREVENTIVE-PEDIATRICS.pdf
CHAPTER 1 SEMESTER V PREVENTIVE-PEDIATRICS.pdf
 
HUMAN BRAIN.pptx.PRIYA BHOJWANI@GAMIL.COM
HUMAN BRAIN.pptx.PRIYA BHOJWANI@GAMIL.COMHUMAN BRAIN.pptx.PRIYA BHOJWANI@GAMIL.COM
HUMAN BRAIN.pptx.PRIYA BHOJWANI@GAMIL.COM
 
How Effective is Homeopathic Medicine for Anxiety and Stress Relief.pdf
How Effective is Homeopathic Medicine for Anxiety and Stress Relief.pdfHow Effective is Homeopathic Medicine for Anxiety and Stress Relief.pdf
How Effective is Homeopathic Medicine for Anxiety and Stress Relief.pdf
 
Child Welfare Clinic and Well baby clinicin Sri Lanka.ppsx
Child Welfare Clinic and Well baby clinicin Sri Lanka.ppsxChild Welfare Clinic and Well baby clinicin Sri Lanka.ppsx
Child Welfare Clinic and Well baby clinicin Sri Lanka.ppsx
 
Stem Cell Solutions: Dr. David Greene's Path to Non-Surgical Cardiac Care
Stem Cell Solutions: Dr. David Greene's Path to Non-Surgical Cardiac CareStem Cell Solutions: Dr. David Greene's Path to Non-Surgical Cardiac Care
Stem Cell Solutions: Dr. David Greene's Path to Non-Surgical Cardiac Care
 
Champions of Health Spotlight On Leaders Shaping Germany's Healthcare.pdf
Champions of Health Spotlight On Leaders Shaping Germany's Healthcare.pdfChampions of Health Spotlight On Leaders Shaping Germany's Healthcare.pdf
Champions of Health Spotlight On Leaders Shaping Germany's Healthcare.pdf
 
TEST BANK For Accounting Information Systems, 3rd Edition by Vernon Richardso...
TEST BANK For Accounting Information Systems, 3rd Edition by Vernon Richardso...TEST BANK For Accounting Information Systems, 3rd Edition by Vernon Richardso...
TEST BANK For Accounting Information Systems, 3rd Edition by Vernon Richardso...
 
Luxurious Spa In Ajman Chandrima Massage Center
Luxurious Spa In Ajman Chandrima Massage CenterLuxurious Spa In Ajman Chandrima Massage Center
Luxurious Spa In Ajman Chandrima Massage Center
 
CMHPSM Regional Compliance Training 2024
CMHPSM Regional Compliance Training 2024CMHPSM Regional Compliance Training 2024
CMHPSM Regional Compliance Training 2024
 
Cold Sores: Causes, Treatments, and Prevention Strategies | The Lifesciences ...
Cold Sores: Causes, Treatments, and Prevention Strategies | The Lifesciences ...Cold Sores: Causes, Treatments, and Prevention Strategies | The Lifesciences ...
Cold Sores: Causes, Treatments, and Prevention Strategies | The Lifesciences ...
 
Navigating Challenges: Mental Health, Legislation, and the Prison System in B...
Navigating Challenges: Mental Health, Legislation, and the Prison System in B...Navigating Challenges: Mental Health, Legislation, and the Prison System in B...
Navigating Challenges: Mental Health, Legislation, and the Prison System in B...
 
Professional Secrecy: Forensic Medicine Lecture
Professional Secrecy: Forensic Medicine LectureProfessional Secrecy: Forensic Medicine Lecture
Professional Secrecy: Forensic Medicine Lecture
 
Cardiac Arrhythmias (2).pdf for nursing student
Cardiac Arrhythmias (2).pdf for nursing studentCardiac Arrhythmias (2).pdf for nursing student
Cardiac Arrhythmias (2).pdf for nursing student
 
Dr. David Greene R3 stem cell Breakthroughs: Stem Cell Therapy in Cardiology
Dr. David Greene R3 stem cell Breakthroughs: Stem Cell Therapy in CardiologyDr. David Greene R3 stem cell Breakthroughs: Stem Cell Therapy in Cardiology
Dr. David Greene R3 stem cell Breakthroughs: Stem Cell Therapy in Cardiology
 
Top massage center in ajman chandrima Spa
Top massage center in ajman chandrima  SpaTop massage center in ajman chandrima  Spa
Top massage center in ajman chandrima Spa
 
Trauma Outpatient Center .
Trauma Outpatient Center                       .Trauma Outpatient Center                       .
Trauma Outpatient Center .
 

Predicting protein binding sites using svm

  • 1. Improved Prediction of Protein-Protein binding sites using SVM Presented by Malavika Vidyarthi Siddhant Gawsane 1001398623 1001231597 01
  • 2. BIOINFORMATICS PAPER  The authors of this paper are James R. Bradford and David R. Westhead, research scholars at the School of Biochemistry and Molecular Biology, University of Leeds, UK 02
  • 3. Contents  Keywords  Understanding Proteins, their Structure, Interactions and Patches  Purpose of Paper - Introduction  Dataset used  Surface generation and interface definitions  Properties of Protein Patches  Why SVMs?  What is a SVM and how does it work?  Experimental set-up and Methodology  Validation and Results  References 03
  • 4. Keywords  Complexes  Transient Interfaces  Obligate Interfaces  Docking Algorithms  Residues  Resolution of a Protein (higher resolution) 04
  • 5. Proteins 101 • Complex Amino-acids that fold into a highly stable spherical structure • Structure, function, and regulation of the body’s tissues and organs 05
  • 7. Protein Interactions  Proteins often form temporary bonds with each other called Protein-Protein Interactions  Protein interactions are responsible for most processes in your body 1. Signal Transduction 2. Transport Across Membranes 3. Cell Metabolism 4. Muscle Contraction  Protein surfaces can be split into 'patches'.  These patches can either be interacting or non-interacting. 07
  • 9. Purpose of the Paper  Multiple protein-protein structures are produced, with unknown functions  By identifying interacting surfaces, important clues to the function of proteins can be determined  Each binding site shares important properties, which differentiate them from the rest of the protein  However, no single property can do an absolute distinction  Multiple such physical-chemical properties are combined for this purpose  The authors have applied Support Vector Machines to predict these binding sites 09
  • 10. Dataset Used  The authors manually produced their own high-quality, non-redundant dataset.  A comprehensive set of complexes was chosen from the Protein Data Bank.  They were subject to a number of stringent filters such as the following:  Proteins sharing >20% surface identity with higher resolution proteins were eliminated  Interfaces that were a result of crystal packing were eliminated only retaining dimers  Dimers containing <20% residue were eliminated  A total of 180 proteins taken from 149 complexes survived the filtering process.  Of the 180, 36 were involved in enzyme inhibitor interactions, 27 in hetero-obligate interactions, 87 in homo-obligate interactions and 30 in non-enzyme inhibitor transient interactions. 10
  • 11. Surface Generation & Interface Definition  All protein surfaces used were solvent excluded surfaces  An atom became a part of the surface if it lost >99% of its surface in complex formation  Surface Patch Generation – The radius of the sphere needed to produce the patch is calculated and is placed on the center of the surface vector chosen to be the center of the patch  Due to irregular topography of the protein surface, a large surface connected by several small surfaces is generated. Only the largest one is retained.  Patch Size - The size of the patch is determined by the size of the interacting proteins in the complex and the size of the interface.  Using Linear Regression it was found that the interface size was equivalent to 13% of the smallest protein in the complex and 12% of the size of the parent protein. 11
  • 13. Properties of Protein Patches Every surface vertex is labelled with 6 surface properties. They are:  Surface Shape – Two parameters called "Shape Index" and "Curvedness" are calculated.  Shape Index – Describes the shape of the local surface at any given point as is independent of the scale of the surface.  Curvedness – Is the measure of the curvature of the surface.  Conservation – The rate of evolution among amino acids. A BLAST search was performed and the resultant homologous sequences and the query sequence was aligned and the conservation score was determined. 13
  • 14.  Electrostatic Potential – It is basically the charge on the protein. For example, a DNA-binding protein has a pocket of positive charge so that it can bind with which has negative charge from all the phosphates. The electrostatic potential of each individual protein was computed.  Hydrophobicity - The tendency of non-polar substances to aggregate in an aqueous solution and exclude water molecules. All hydrophobic protein molecule surfaces were determined.  Residue and Interface Propensity – The interactions of all proteins in the entire protein family is calculated.  Solvent Accessible Surface Area – The solvent accessible surface area for each atom on the protein was taken from the previously calculated studies. 14
  • 15. Why Support Vector Machines?  SVMs demonstrate high prediction accuracy whilst avoiding over-fitting.  They also handle large feature spaces and condense the information given by the training dataset using support vectors.  SVMs have also been applied to other similar molecular biology applications such as gene expression classification, protein classification, protein fold recognition, prediction of protein solvent accessibility, etc. 15
  • 16. How does SVM work?  Classification – Visibly separating data points in the given feature space  Distance from example xi to the separator is  Examples closest to the hyperplane are support vectors.  Margin ρ of the separator is the distance between support vectors.  It now reduces to an maximization problem, where ρ needs to be maximized  For data-points that aren't linearly separable, we use kernel methods 16
  • 17. How does SVM work?17
  • 18. The Experimental Set-up  Kernel: (Gaussian) Radial Basis Function  Regularization Parameter C= 1.0  Kernel Coefficient γ = 0.01  Dataset of 180 patches hand annotated into interacting and non-interacting  Varied evaluation and validation techniques 18
  • 20. Validation & Results  Primary Validation: Leave-one-out cross validation run 5 times  Patch Creation Evaluation:  Specificity: number of interface residues in patch/number of patch residues  Sensitivity: number of interface residues in patch/number of interface residues  Success if a patch with over 50% specificity and 20% sensitivity was ranked in the top three  Able to predict the location of the interface on 76% (136/180) of the proteins in dataset. In 60% (81/136) of these instances, a patch with over 50% specificity and 20% sensitivity was the top ranked patch  P-value significance tests were used to prove that this method performed atleast twice as better as random sampling 20
  • 22. Validation & Results  Other Dataset:  Jones and Thornton achieved a success rate of 64% (30/47) with their path analysis tool  Our method achieved 72% (34/47) on the same dataset  Secondary Heterogeneous Cross-validation: training the SVM on the proteins involved in obligate interactions and predicting on the transient (enzyme- inhibitor and NEIT) complex types and vice versa gave a success rate of 64% [42/66; Table 2] and on obligate interfaces based on training with transients with a success rate of 83% [95/114;Table 2] 22
  • 24. Validation & Results  Unbound Proteins: Select 10 unbound proteins that have >70% sequence identity within the dataset  All nine of these predictions reached >50% sensitivity, which suggested that our patch sizes, calculated as 6% of the whole protein surface are were an accurate estimate of interface size  No patch was ranked below two and five were ranked first 24
  • 26. Validation & Results  Unbound Proteins: Select 10 unbound proteins that have >70% sequence identity within the dataset  All nine of these predictions reached >50% sensitivity, which suggested that our patch sizes, calculated as 6% of the whole protein surface are were an accurate estimate of interface size  No patch was ranked below two and five were ranked first  CAPRI (Critical Assessment of PRediction of Interactions): A significant prediction of the interface was made in 11 of the 15 cases where the P-value for random predictions was <0.25 26

Editor's Notes

  1. https://jeremykun.files.wordpress.com/2017/06/svm_solve_by_hand-e1496076457793.gif?w=1800 gamma: Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’. Higher the value of gamma, will try to exact fit the as per training data set i.e. generalization error and cause over-fitting problem.
  2. gamma: Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’. Higher the value of gamma, will try to exact fit the as per training data set i.e. generalization error and cause over-fitting problem. C: Penalty parameter C of the error term. It also controls the trade-off between smooth decision boundary and classifying the training points correctly.
  3. https://en.wikipedia.org/wiki/Matthews_correlation_coefficient