SlideShare a Scribd company logo
1 of 21
Use of PCA
(Principal Component Analysis)
1InSilico Seminar Slides: Interpret PCA plots
Picture from http://www.nlpca.org/pca_principal_component_analysis.html
General information on PCA
2InSilico Seminar Slides: Interpret PCA plots
X
E
(Noise)
P1
P2
∙
∙
P (Loading Matrix)
Data Matrix T (Scoring Matrix)
t1 t2 ∙ ∙ ∙
 Approximation of data matrix, X = TP + E
 General steps of PCA :
* Pretreatment of data: scaling
* Calculate Covariance / Correlation matrix
* Calculate eigen values and eigen vector
(PC1,PC2,… which constitutes loading matrix)
* Calculate scores, [X][PT]-1= [T]
λ1
λ2
Q1
PC1
PC2
Q2
• PC2 is orthogonal to PC1
• Eigen value (λ1 and λ2 ) decide the length of
the major and minor axes of the ellipse
• Q1 (slope of major axis) : ratio of elements of
eigen vector of the corresponding high λ1
• Q2 (slope of minor axis): ratio of eigen vector
of the corresponding second high λ2
X1
X2
0
0
General information on PCA
3InSilico Seminar Slides: Interpret PCA plots
 Generate few informative plots, suitable for data overview
 PCA rotates the data points to capture maximum variability.
Use of PCA
Outlier
detection
Prediction
Classification
Variable
Selection
List of articles considered…
4InSilico Seminar Slides: Interpret PCA plots
 Conformation Diversity
* Mapping the nucleotide and isoform-dependent structural and dynamical features of Ras proteins.
Structure (2008),16(6):885-896.
* The distinct conformational dynamics of K-Ras and H-Ras A59G. PLOS Computational Biology
(2010),6(9).
 Explore Enzyme-Ligand Interactions
* Exploration of enzyme-ligand interactions in CYP2D6 & 3A4 homology models and crystal
structures using a novel computational approach. Journal of Chemical Information and Modeling
(2007), 47(3):1234-1247.
 SAR of peptides
* Quantitative structure-activity relationship of peptides binding to the class II major
histocompatibility complex molecule Aq associated with autoimmune arthritis. Journal of
Medicinal Chemistry (2007), 50(9):2049-2059.
Conformational Diversity
Structure (2008),16(6):885-896 and PLOS Computational Biology (2010),6(9)
provide means to visualize the existence of
distinct conformational groupings.
5InSilico Seminar Slides: Interpret PCA plots
Conformational Diversity: Structural insight
Structure (2008),16(6):885-896 and PLOS Computational Biology (2010),6(9)
6InSilico Seminar Slides: Interpret PCA plots
 GTPase H-Ras : Conformational switches involved in
regulating cell division in response to growth factor
stimulation .
 Experiment : To understand the conformational
transition between inactive GDP-bound and active GTP-
bound states.
Structural Insight :
 Based on mutational study:
* magenta colored residues: associated with a large
number of cancers.
* brown colored residues : associated with various
cancers and developmental diseases.
 Ras catalytic domain: composed of a six stranded central
β-sheet surrounded by five α-helices (bottom left).
 Nucleotide binds to conserved phosphate- binding loop
(P-loop) shown as green in the figures.
 Two switch loop regions (switch1, blue and switch2, red)
and loop 3 region colored orange are also highlighted in
the figures .
7InSilico Seminar Slides: Interpret PCA plots
Conformational Diversity: PCA plot
Structure (2008),16(6):885-896 and PLOS Computational Biology (2010),6(9)
 Inter-conformer analysis : 46 chains from
41 H-ras crystal structures which included
both GDP and GTP bound forms.
 PCA was used to examine the major
conformational differences between
structures.
 Covariance matrix : cartesian coordinates
of aligned Cα-atoms.
 Over 57.4% variance was captured in two
dimensions (PC1 and PC2).
 Figure (left) shows the relationship
between structures ( conformational
differences) captured by the first two PCs
(PC1 and PC2).
GDP
GTP
* Two major clusters are evident along PC1 corresponding to distinct GTP and GDP bound conformations
with an exception of PDB 6q21.
* The GTP/GTP-analog/ GDP structures which had mutations at the P-loop or switch regions were
situated out of GTP-cluster/ GDP-cluster.
8InSilico Seminar Slides: Interpret PCA plots
 The contribution of each residue to the first
three PCs is displayed in figure (left).
 Ras catalytic domain, with displacements scaled
along the first PC (PC1) is shown in figure (top).
 The height of each bar represents the relative
displacement of each residue.
* Dominant feature described by PC1: Displacement of the switch region.
* Dominant feature described by PC2 and PC3: Displacement of switch region, α3-β5 loop region and β2-β3
loop
Conformational Diversity: Analysis
Structure (2008),16(6):885-896 and PLOS Computational Biology (2010),6(9)
Enzyme-Ligand Interactions
Journal of Chemical Information and Modeling (2007), 47(3):1234-1247.
Separates protein structures on the basis of
the amino acids relevant for the interaction
with the ligand
9InSilico Seminar Slides: Interpret PCA plots
10InSilico Seminar Slides: Interpret PCA plots
Enzyme-Ligand Interactions : Introduction
Journal of Chemical Information and Modeling (2007), 47(3):1234-1247.
Flow Chart of the Experiment  Aim of the experiment:
* Compare homology model with crystal
structure.
* Identify the sites of interaction.
 Area of focus:
Consensus PCA (CPCA) and PCA
performed on CYP3A4.
 Data set:
* Four structures
• PDB without inhibitor : 1TQN & 1WOE
• PDB with inhibitor : 2J0D
(erythromycin)
• Homology model: J.Comput.-Aided
Mol. Des. (2000), 14:93-116
* Compounds for interaction study
• 25 compounds of opioid analgesics
• 15 well known CYP3A4 inhibitors
11InSilico Seminar Slides: Interpret PCA plots
Probe Chemical group Used in
OH2 water CPCA/dockings
DRY hydrophobic CPCA/dockings
H neutral hydrogen dockings
N1
neutral flat NH
(e.g., amide)
CPCA/dockings
N1/2/3+ sp3
amine cation CPCA/dockings
N: sp3
N with lone pair dockings
O sp2
carbonyl oxygen CPCA/dockings
O- sp2
phenolate oxygen CPCA
O:: sp2
carboxy oxygen dockings
O1 alkyl hydroxy OH group dockings
OC1
aromatic/aliphatic
ether oxygen
dockings
Energy calculation
Identify amino acid
responsible for the
interaction with each probe
Accumulate all energy
values for each amino
acid and docking pose
Spreadsheet with
energy values
PCA
Enzyme-Ligand Interactions : PCA based workflow
Journal of Chemical Information and Modeling (2007), 47(3):1234-1247.
Grid probes used for calculation of
Molecular Interaction Fields (MIFs) in
CPCA & Dockings with GLUE
Flowchart over
Energy Calculation
Define atom types and
assign appropriate GRID
probes
Docking Pose
Docking pose filter: Those that were within 6Å
from any atom of the heme were selected.
12InSilico Seminar Slides: Interpret PCA plots
Enzyme-Ligand Interactions : CPCA based workflow
Journal of Chemical Information and Modeling (2007), 47(3):1234-1247.
(X1,Y1,Z1) (Xi,Yi,Zi)
K grid points (variables)
Probe 1 Probe 2 Probe n∙ ∙ ∙ ∙ ∙
Block-2 (Target 2)
Block-1 (Target 1)
 CPCA (Consensus PCA):
Two levels PCA (Block level
and super level)
 Super-level: capture the
influence of each probe on the
whole model.
 Super- level is a super weight
matrix, which gives the
partition of each probe on the
overall scores
Block-2 (Target 2)
Block-1 (Target 1)
Super-level: consensus of blocks
Block-level: PCA
Extract scoring matrix
Extract scoring matrix
PCA: combined scoring matrix
13InSilico Seminar Slides: Interpret PCA plots
PCA results on energy calculations from dockings of
opioid analgesics in CYP3A4 homology model (▵)
and crystal structures 2j0d (gray □), 1tqn (▪), and
1woe (*). (a) PCA score plot; (b) PCA loading plot.
Enzyme-Ligand Interactions : PCA based analysis
Journal of Chemical Information and Modeling (2007), 47(3):1234-1247.
 The score plot and loading plot are related…
* Variables that influence on an observation
are positioned in same place in the loading
plot as the observation in the score plot.
* But here we deal with negative energies, so
variables are positioned with same
coordinates but with opposite signs in
loading plot
 Over PC1 of score plot: 1TQN and homology
model are best separated.
 Most discriminative interactions (loading plot):
* Homology model: Phe304, Thr309 and heme
* 1TQN: Arg212, Phe215, Ala370 and Glu374
* 2J0D: Ser119 and Phe304
* 1WOE: No pronounced different interactions
compared to the other structures.
14InSilico Seminar Slides: Interpret PCA plots
CPCA on all CYP3A4 structures based on
molecular interaction fields.
(a) Super-weights plot describing the
influence of the different probes.
(b) PCA score plot showing the inter-
correlation between the structures.
Interactions Homology
Ligand
complex
Ligand free
2J0D 1TQN 1WOE
Common
A305, E308, T309,R372 - -
- A370, M371 -
- - R212
L483 L483
I369 I369
- F108,F213
F304, E374, G481, L482
Uncommon
N104, R105,
V111, T310,
S312,V313,R372,
Heme
R106, F241,
G306, Y307,
L373
D76, I120,
D214, Q484
F215
Enzyme-Ligand Interactions : CPCA based analysis
Journal of Chemical Information and Modeling (2007), 47(3):1234-1247.
* Over PC2 of Super-weights plot: Hydrophobic probe (DRY)
differs from the rest
* Over PC1 of score plot: Homology is separated from crystal
structures.
* Over PC2 of score plot: Erythromycin bound structure(1J0D)
is separated from substrate free structures (1WOE and 1TQN)
SAR of peptides
Journal of Medicinal Chemistry (2007), 50(9):2049-2059.
Provides means to study molecular property
preferences for peptides binding to Aq
15InSilico Seminar Slides: Interpret PCA plots
16InSilico Seminar Slides: Interpret PCA plots
SAR of peptides: Structural insight
Journal of Medicinal Chemistry (2007), 50(9):2049-2059.
 Rheumatoid arthritis (RA), autoimmune
inflammatory disease is linked to major
histocompatibility complex (MHC) class II
molecules DR1 and DR4.
 RA is directed against type-2 collagen(CII).
 Animal model of RA: Collagen induced
arthritis (CIA) linked to mouse MHC class II
molecule Aq.
 Octa-peptide (CII260-267) is required for
binding to Aq and induce T-cell response.
Peptide scaffold used to study molecular property preferences for peptide binding to Aq
Aqueous
solubility
17InSilico Seminar Slides: Interpret PCA plots
SAR of peptides: Statistical Molecular Design (SMD)
Journal of Medicinal Chemistry (2007), 50(9):2049-2059.
 Amino acids indicated in red (Met, Ala, Thr) and green (Val, Ser) were chosen as building blocks for the
variations at positions 1−3.
 The building blocks in blue (Arg, Asn, Tyr, Asp) and green (Val, Ser) was used for positions 4 and 5.
: Size Descriptors
: Electronic property/ polarity Descriptors
: Lipophilicity Descriptors
: Solubility Descriptors
: Size/Polarity Descriptors
: Size/ Lipophilicity Descriptors
: H-Bonding Descriptors
: Shape & flexibility Descriptors
: Flexibility Descriptors
: Saturation Descriptors
* t1 to t3 principal components described 65% of the variation
* t1 separated amino acids(Aa) based on size: Gly & Ala have High score, while Arg & Try
have low score
* t2 separated based on lipophilicity and flexibility (similarly trend observed for t3)
* Three groups of Aa could be distinguished in the t1 vs t2 score plot.
Aromatic
Score (a) and loading plots (b) resulting from PCA of the 20 coded
amino acids described by 28 molecular descriptors.
18InSilico Seminar Slides: Interpret PCA plots
 Virtual library of 4500 peptides (53 X 62) : generated by varying the selected amino acids (Aa)
at five positions.
 D-optimal design applied: to reduce the size of library to 22 peptides.
* Maximizing the volume spanned in the principal property space.
 Principal property space:
* Each Aa at the five altered positions were represented by the three values of the scaled
principal property (t1 to t3)
* Each peptide is represented by 15 values which in turn represented the principal property
space for D-optimal design.
Peptide-1
Pos1-t1 Pos1-t2 Pos1-t3 ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ Pos5-t1 Pos5-t2 Pos5-t3
Peptide-4500
∙
∙
Data matrix: 4500 rows X 15 columns
Scoring plot extracted for D-optimal design
SAR of peptides: Library design
Journal of Medicinal Chemistry (2007), 50(9):2049-2059.
19InSilico Seminar Slides: Interpret PCA plots
SAR of peptides: Partial Least Square (PLS) model
Journal of Medicinal Chemistry (2007), 50(9):2049-2059.
 Main contributors at position 1 & 2 :
* Small sized, rigid groups (positive
weights for t1-t3)
 Main contributors at position 3:
* Flexible Hydrophobic preferred
(negative weights for t2 and t3)
 Main contributors at position 4:
* Large sized, flexible groups. (t1 & t3
negatively correlated with the
response)
* H-bond donors/ acceptors preferred
(Positive weight for t2)
 Main contributors at position 5:
* Large sized, flexible groups.(similar to
position 4)
* Hydrophobic preferred (negative
weight for t2)
PLS weight values (w × c) for the QSAR model based on 15
principal property values (t1−t3 at positions 1−5) and three
biological responses represented as % inhibition at three
different peptide concentrations (Y2: 250μM, Y3: 83μM,
Y4: 28μM).
Discrimination
 t1: Size
 t2: Hydrophilicity
 t3: Rigidity
20InSilico Seminar Slides: Interpret PCA plots
SAR of peptides: Scope of use
Journal of Medicinal Chemistry (2007), 50(9):2049-2059.
Scope of use: Lets discuss!
Protein sequence alignment: Nature Structural & Molecular Biology (1995), 2(2):171-178
 PCA was used on multiple sequence alignments to identify possible functional residues.
 Columns in the alignment : a vector of binary variables of length 20, which represented the
absence/presence of an amino acid at this position.
Small molecule statistical molecular design(SMD) based SAR analysis: Bioorganic & Medicinal
Chemistry (2010), 18(7):2686-2703.
 Extracted PCA score vectors for different substitutions (i.e. Salicylic aldehydes and Hydrazides)
 Performed Hierarchical –Partial Least square (Hi-PLS) model to interpret SAR
Softwares which could be used: R language, PanelCheck, Cimpl2, Codessa, Canvas
21InSilico Seminar Slides: Interpret PCA plots
Take home message…
 Use of PCA
* Discriminate the observations (e.g. active / inactive) based on the influence of different
variables (e.g. descriptors)
* Data reduction: visualize high dimensional data
* Generate informative plots
 Limitation of PCA
* Assumes linear relationship between variables
* Requires preprocessing step
* Unsupervised method
 Points to be cautious
* Relate both score and loading plots
• Observations with high score on a given PC (principal component) are positively
correlated with variables with high positive loading [ beware of variables with
negatively signed values (e.g. negative energies)]
Let your conscience be your guide: check your results with raw data

More Related Content

What's hot

Numerical Simulation of Gaseous Microflows by Lattice Boltzmann Method
Numerical Simulation of Gaseous Microflows by Lattice Boltzmann MethodNumerical Simulation of Gaseous Microflows by Lattice Boltzmann Method
Numerical Simulation of Gaseous Microflows by Lattice Boltzmann MethodIDES Editor
 
ESQC 2013 Poster - Anders Christensen
ESQC 2013 Poster - Anders ChristensenESQC 2013 Poster - Anders Christensen
ESQC 2013 Poster - Anders ChristensenAnders S. Christensen
 
On selection of periodic kernels parameters in time series prediction
On selection of periodic kernels parameters in time series predictionOn selection of periodic kernels parameters in time series prediction
On selection of periodic kernels parameters in time series predictioncsandit
 
Compensation of Data-Loss in Attitude Control of Spacecraft Systems
 Compensation of Data-Loss in Attitude Control of Spacecraft Systems  Compensation of Data-Loss in Attitude Control of Spacecraft Systems
Compensation of Data-Loss in Attitude Control of Spacecraft Systems rinzindorjej
 
Signature PSO: A novel inertia weight adjustment using fuzzy signature for LQ...
Signature PSO: A novel inertia weight adjustment using fuzzy signature for LQ...Signature PSO: A novel inertia weight adjustment using fuzzy signature for LQ...
Signature PSO: A novel inertia weight adjustment using fuzzy signature for LQ...journalBEEI
 

What's hot (7)

Numerical Simulation of Gaseous Microflows by Lattice Boltzmann Method
Numerical Simulation of Gaseous Microflows by Lattice Boltzmann MethodNumerical Simulation of Gaseous Microflows by Lattice Boltzmann Method
Numerical Simulation of Gaseous Microflows by Lattice Boltzmann Method
 
01 05 j_chem_phys_123_074102
01 05 j_chem_phys_123_07410201 05 j_chem_phys_123_074102
01 05 j_chem_phys_123_074102
 
ESQC 2013 Poster - Anders Christensen
ESQC 2013 Poster - Anders ChristensenESQC 2013 Poster - Anders Christensen
ESQC 2013 Poster - Anders Christensen
 
On selection of periodic kernels parameters in time series prediction
On selection of periodic kernels parameters in time series predictionOn selection of periodic kernels parameters in time series prediction
On selection of periodic kernels parameters in time series prediction
 
แนวทางเชิงประจักษ์สำหรับการประมาณค่าความหนาแน่นและความหนืดของกรดไขมันเอทิลเอส...
แนวทางเชิงประจักษ์สำหรับการประมาณค่าความหนาแน่นและความหนืดของกรดไขมันเอทิลเอส...แนวทางเชิงประจักษ์สำหรับการประมาณค่าความหนาแน่นและความหนืดของกรดไขมันเอทิลเอส...
แนวทางเชิงประจักษ์สำหรับการประมาณค่าความหนาแน่นและความหนืดของกรดไขมันเอทิลเอส...
 
Compensation of Data-Loss in Attitude Control of Spacecraft Systems
 Compensation of Data-Loss in Attitude Control of Spacecraft Systems  Compensation of Data-Loss in Attitude Control of Spacecraft Systems
Compensation of Data-Loss in Attitude Control of Spacecraft Systems
 
Signature PSO: A novel inertia weight adjustment using fuzzy signature for LQ...
Signature PSO: A novel inertia weight adjustment using fuzzy signature for LQ...Signature PSO: A novel inertia weight adjustment using fuzzy signature for LQ...
Signature PSO: A novel inertia weight adjustment using fuzzy signature for LQ...
 

Viewers also liked

A new wavelet feature for fault diagnosis
A new wavelet feature for fault diagnosisA new wavelet feature for fault diagnosis
A new wavelet feature for fault diagnosisIAEME Publication
 
"Principal Component Analysis - the original paper" presentation @ Papers We ...
"Principal Component Analysis - the original paper" presentation @ Papers We ..."Principal Component Analysis - the original paper" presentation @ Papers We ...
"Principal Component Analysis - the original paper" presentation @ Papers We ...Adrian Florea
 
Basics of process fault detection and diagnostics
Basics of process fault detection and diagnosticsBasics of process fault detection and diagnostics
Basics of process fault detection and diagnosticsRahul Dey
 
PPT_Final_Presentation
PPT_Final_PresentationPPT_Final_Presentation
PPT_Final_PresentationSleeba Paul
 
Fault diagnosis notes
Fault diagnosis notesFault diagnosis notes
Fault diagnosis notesiteclearners
 
Fault Diagnosis Guide
Fault Diagnosis GuideFault Diagnosis Guide
Fault Diagnosis GuideMike Lacsa
 
5 data analysis case study
5  data analysis case study5  data analysis case study
5 data analysis case studyDmitry Grapov
 
Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)Mohammed Musah
 
SIGNAL PROCESSING TECHNIQUES USED FOR GEAR FAULT DIAGNOSIS
SIGNAL PROCESSING TECHNIQUES USED FOR GEAR FAULT DIAGNOSISSIGNAL PROCESSING TECHNIQUES USED FOR GEAR FAULT DIAGNOSIS
SIGNAL PROCESSING TECHNIQUES USED FOR GEAR FAULT DIAGNOSISJungho Park
 
3 principal components analysis
3  principal components analysis3  principal components analysis
3 principal components analysisDmitry Grapov
 
Introduction to Wavelet Transform with Applications to DSP
Introduction to Wavelet Transform with Applications to DSPIntroduction to Wavelet Transform with Applications to DSP
Introduction to Wavelet Transform with Applications to DSPHicham Berkouk
 
FAULT DETECTION AND FAULT DIAGNOSIS
FAULT DETECTION AND FAULT DIAGNOSIS FAULT DETECTION AND FAULT DIAGNOSIS
FAULT DETECTION AND FAULT DIAGNOSIS Anand Kumar
 
Introduction to wavelet transform
Introduction to wavelet transformIntroduction to wavelet transform
Introduction to wavelet transformRaj Endiran
 

Viewers also liked (16)

Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 
A new wavelet feature for fault diagnosis
A new wavelet feature for fault diagnosisA new wavelet feature for fault diagnosis
A new wavelet feature for fault diagnosis
 
"Principal Component Analysis - the original paper" presentation @ Papers We ...
"Principal Component Analysis - the original paper" presentation @ Papers We ..."Principal Component Analysis - the original paper" presentation @ Papers We ...
"Principal Component Analysis - the original paper" presentation @ Papers We ...
 
Basics of process fault detection and diagnostics
Basics of process fault detection and diagnosticsBasics of process fault detection and diagnostics
Basics of process fault detection and diagnostics
 
PPT_Final_Presentation
PPT_Final_PresentationPPT_Final_Presentation
PPT_Final_Presentation
 
Fault diagnosis notes
Fault diagnosis notesFault diagnosis notes
Fault diagnosis notes
 
Fault Diagnosis Guide
Fault Diagnosis GuideFault Diagnosis Guide
Fault Diagnosis Guide
 
5 data analysis case study
5  data analysis case study5  data analysis case study
5 data analysis case study
 
Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)
 
SIGNAL PROCESSING TECHNIQUES USED FOR GEAR FAULT DIAGNOSIS
SIGNAL PROCESSING TECHNIQUES USED FOR GEAR FAULT DIAGNOSISSIGNAL PROCESSING TECHNIQUES USED FOR GEAR FAULT DIAGNOSIS
SIGNAL PROCESSING TECHNIQUES USED FOR GEAR FAULT DIAGNOSIS
 
3 principal components analysis
3  principal components analysis3  principal components analysis
3 principal components analysis
 
Introduction to Wavelet Transform with Applications to DSP
Introduction to Wavelet Transform with Applications to DSPIntroduction to Wavelet Transform with Applications to DSP
Introduction to Wavelet Transform with Applications to DSP
 
Pca ppt
Pca pptPca ppt
Pca ppt
 
FAULT DETECTION AND FAULT DIAGNOSIS
FAULT DETECTION AND FAULT DIAGNOSIS FAULT DETECTION AND FAULT DIAGNOSIS
FAULT DETECTION AND FAULT DIAGNOSIS
 
Introduction to wavelet transform
Introduction to wavelet transformIntroduction to wavelet transform
Introduction to wavelet transform
 
07. PCA
07. PCA07. PCA
07. PCA
 

Similar to PCA-CompChem_seminar

Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...jaumebp
 
Asymptotic features of Hessian Matrix in Receding Horizon Model Predictive Co...
Asymptotic features of Hessian Matrix in Receding Horizon Model Predictive Co...Asymptotic features of Hessian Matrix in Receding Horizon Model Predictive Co...
Asymptotic features of Hessian Matrix in Receding Horizon Model Predictive Co...TELKOMNIKA JOURNAL
 
Multisite UTE 31P Rosette MRSI(PETALUTE)
Multisite UTE 31P Rosette MRSI(PETALUTE)Multisite UTE 31P Rosette MRSI(PETALUTE)
Multisite UTE 31P Rosette MRSI(PETALUTE)Uzay Emir
 
Graphs, Environments, and Machine Learning for Materials Science
Graphs, Environments, and Machine Learning for Materials ScienceGraphs, Environments, and Machine Learning for Materials Science
Graphs, Environments, and Machine Learning for Materials Scienceaimsnist
 
An accessibility-incorporated method for accurate prediction of RNA-RNA inter...
An accessibility-incorporated method for accurate prediction of RNA-RNA inter...An accessibility-incorporated method for accurate prediction of RNA-RNA inter...
An accessibility-incorporated method for accurate prediction of RNA-RNA inter...Richard Schäfer
 
ISFragkopoulos - Seminar on Electrochemical Promotion
ISFragkopoulos - Seminar on Electrochemical PromotionISFragkopoulos - Seminar on Electrochemical Promotion
ISFragkopoulos - Seminar on Electrochemical PromotionIoannis S. Fragkopoulos
 
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...IJRESJOURNAL
 
A New Enhanced Method of Non Parametric power spectrum Estimation.
A New Enhanced Method of Non Parametric power spectrum Estimation.A New Enhanced Method of Non Parametric power spectrum Estimation.
A New Enhanced Method of Non Parametric power spectrum Estimation.CSCJournals
 
DISTRIBUTION LOAD FLOW ANALYSIS FOR RDIAL & MESH DISTRIBUTION SYSTEM
DISTRIBUTION LOAD FLOW ANALYSIS FOR RDIAL & MESH DISTRIBUTION SYSTEMDISTRIBUTION LOAD FLOW ANALYSIS FOR RDIAL & MESH DISTRIBUTION SYSTEM
DISTRIBUTION LOAD FLOW ANALYSIS FOR RDIAL & MESH DISTRIBUTION SYSTEMIAEME Publication
 
A novel particle swarm optimization for papr reduction of ofdm systems
A novel particle swarm optimization for papr reduction of ofdm systemsA novel particle swarm optimization for papr reduction of ofdm systems
A novel particle swarm optimization for papr reduction of ofdm systemsaliasghar1989
 
CBE_Symposium_Poster_Aparajita - sjp
CBE_Symposium_Poster_Aparajita - sjpCBE_Symposium_Poster_Aparajita - sjp
CBE_Symposium_Poster_Aparajita - sjpAparajita Dasgupta
 
CONFERENCE POSTER v2(1)
CONFERENCE POSTER v2(1)CONFERENCE POSTER v2(1)
CONFERENCE POSTER v2(1)Amber Harding
 
Optimal population size of particle swarm optimization for photovoltaic syst...
Optimal population size of particle swarm optimization for  photovoltaic syst...Optimal population size of particle swarm optimization for  photovoltaic syst...
Optimal population size of particle swarm optimization for photovoltaic syst...IJECEIAES
 
2017 - Plausible Bioindicators of Biological Nitrogen Removal Process in WWTPs
2017 - Plausible Bioindicators of Biological Nitrogen Removal Process in WWTPs2017 - Plausible Bioindicators of Biological Nitrogen Removal Process in WWTPs
2017 - Plausible Bioindicators of Biological Nitrogen Removal Process in WWTPsWALEBUBLÉ
 
consensus superiority of the pharmacophore based alignment, over maximum comm...
consensus superiority of the pharmacophore based alignment, over maximum comm...consensus superiority of the pharmacophore based alignment, over maximum comm...
consensus superiority of the pharmacophore based alignment, over maximum comm...Deepak Rohilla
 
Poster presentat a les jornades doctorals de la UAB
Poster presentat a les jornades doctorals de la UABPoster presentat a les jornades doctorals de la UAB
Poster presentat a les jornades doctorals de la UABElisabeth Ortega
 
LSSC2011 Optimization of intermolecular interaction potential energy paramete...
LSSC2011 Optimization of intermolecular interaction potential energy paramete...LSSC2011 Optimization of intermolecular interaction potential energy paramete...
LSSC2011 Optimization of intermolecular interaction potential energy paramete...Dragan Sahpaski
 

Similar to PCA-CompChem_seminar (20)

Swaati pro sa web
Swaati pro sa webSwaati pro sa web
Swaati pro sa web
 
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
 
Asymptotic features of Hessian Matrix in Receding Horizon Model Predictive Co...
Asymptotic features of Hessian Matrix in Receding Horizon Model Predictive Co...Asymptotic features of Hessian Matrix in Receding Horizon Model Predictive Co...
Asymptotic features of Hessian Matrix in Receding Horizon Model Predictive Co...
 
acs.jpca.9b08723.pdf
acs.jpca.9b08723.pdfacs.jpca.9b08723.pdf
acs.jpca.9b08723.pdf
 
Multisite UTE 31P Rosette MRSI(PETALUTE)
Multisite UTE 31P Rosette MRSI(PETALUTE)Multisite UTE 31P Rosette MRSI(PETALUTE)
Multisite UTE 31P Rosette MRSI(PETALUTE)
 
MLMM_16_08_2022.pdf
MLMM_16_08_2022.pdfMLMM_16_08_2022.pdf
MLMM_16_08_2022.pdf
 
Graphs, Environments, and Machine Learning for Materials Science
Graphs, Environments, and Machine Learning for Materials ScienceGraphs, Environments, and Machine Learning for Materials Science
Graphs, Environments, and Machine Learning for Materials Science
 
An accessibility-incorporated method for accurate prediction of RNA-RNA inter...
An accessibility-incorporated method for accurate prediction of RNA-RNA inter...An accessibility-incorporated method for accurate prediction of RNA-RNA inter...
An accessibility-incorporated method for accurate prediction of RNA-RNA inter...
 
ISFragkopoulos - Seminar on Electrochemical Promotion
ISFragkopoulos - Seminar on Electrochemical PromotionISFragkopoulos - Seminar on Electrochemical Promotion
ISFragkopoulos - Seminar on Electrochemical Promotion
 
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...
 
A New Enhanced Method of Non Parametric power spectrum Estimation.
A New Enhanced Method of Non Parametric power spectrum Estimation.A New Enhanced Method of Non Parametric power spectrum Estimation.
A New Enhanced Method of Non Parametric power spectrum Estimation.
 
DISTRIBUTION LOAD FLOW ANALYSIS FOR RDIAL & MESH DISTRIBUTION SYSTEM
DISTRIBUTION LOAD FLOW ANALYSIS FOR RDIAL & MESH DISTRIBUTION SYSTEMDISTRIBUTION LOAD FLOW ANALYSIS FOR RDIAL & MESH DISTRIBUTION SYSTEM
DISTRIBUTION LOAD FLOW ANALYSIS FOR RDIAL & MESH DISTRIBUTION SYSTEM
 
A novel particle swarm optimization for papr reduction of ofdm systems
A novel particle swarm optimization for papr reduction of ofdm systemsA novel particle swarm optimization for papr reduction of ofdm systems
A novel particle swarm optimization for papr reduction of ofdm systems
 
CBE_Symposium_Poster_Aparajita - sjp
CBE_Symposium_Poster_Aparajita - sjpCBE_Symposium_Poster_Aparajita - sjp
CBE_Symposium_Poster_Aparajita - sjp
 
CONFERENCE POSTER v2(1)
CONFERENCE POSTER v2(1)CONFERENCE POSTER v2(1)
CONFERENCE POSTER v2(1)
 
Optimal population size of particle swarm optimization for photovoltaic syst...
Optimal population size of particle swarm optimization for  photovoltaic syst...Optimal population size of particle swarm optimization for  photovoltaic syst...
Optimal population size of particle swarm optimization for photovoltaic syst...
 
2017 - Plausible Bioindicators of Biological Nitrogen Removal Process in WWTPs
2017 - Plausible Bioindicators of Biological Nitrogen Removal Process in WWTPs2017 - Plausible Bioindicators of Biological Nitrogen Removal Process in WWTPs
2017 - Plausible Bioindicators of Biological Nitrogen Removal Process in WWTPs
 
consensus superiority of the pharmacophore based alignment, over maximum comm...
consensus superiority of the pharmacophore based alignment, over maximum comm...consensus superiority of the pharmacophore based alignment, over maximum comm...
consensus superiority of the pharmacophore based alignment, over maximum comm...
 
Poster presentat a les jornades doctorals de la UAB
Poster presentat a les jornades doctorals de la UABPoster presentat a les jornades doctorals de la UAB
Poster presentat a les jornades doctorals de la UAB
 
LSSC2011 Optimization of intermolecular interaction potential energy paramete...
LSSC2011 Optimization of intermolecular interaction potential energy paramete...LSSC2011 Optimization of intermolecular interaction potential energy paramete...
LSSC2011 Optimization of intermolecular interaction potential energy paramete...
 

PCA-CompChem_seminar

  • 1. Use of PCA (Principal Component Analysis) 1InSilico Seminar Slides: Interpret PCA plots Picture from http://www.nlpca.org/pca_principal_component_analysis.html
  • 2. General information on PCA 2InSilico Seminar Slides: Interpret PCA plots X E (Noise) P1 P2 ∙ ∙ P (Loading Matrix) Data Matrix T (Scoring Matrix) t1 t2 ∙ ∙ ∙  Approximation of data matrix, X = TP + E  General steps of PCA : * Pretreatment of data: scaling * Calculate Covariance / Correlation matrix * Calculate eigen values and eigen vector (PC1,PC2,… which constitutes loading matrix) * Calculate scores, [X][PT]-1= [T] λ1 λ2 Q1 PC1 PC2 Q2 • PC2 is orthogonal to PC1 • Eigen value (λ1 and λ2 ) decide the length of the major and minor axes of the ellipse • Q1 (slope of major axis) : ratio of elements of eigen vector of the corresponding high λ1 • Q2 (slope of minor axis): ratio of eigen vector of the corresponding second high λ2 X1 X2 0 0
  • 3. General information on PCA 3InSilico Seminar Slides: Interpret PCA plots  Generate few informative plots, suitable for data overview  PCA rotates the data points to capture maximum variability. Use of PCA Outlier detection Prediction Classification Variable Selection
  • 4. List of articles considered… 4InSilico Seminar Slides: Interpret PCA plots  Conformation Diversity * Mapping the nucleotide and isoform-dependent structural and dynamical features of Ras proteins. Structure (2008),16(6):885-896. * The distinct conformational dynamics of K-Ras and H-Ras A59G. PLOS Computational Biology (2010),6(9).  Explore Enzyme-Ligand Interactions * Exploration of enzyme-ligand interactions in CYP2D6 & 3A4 homology models and crystal structures using a novel computational approach. Journal of Chemical Information and Modeling (2007), 47(3):1234-1247.  SAR of peptides * Quantitative structure-activity relationship of peptides binding to the class II major histocompatibility complex molecule Aq associated with autoimmune arthritis. Journal of Medicinal Chemistry (2007), 50(9):2049-2059.
  • 5. Conformational Diversity Structure (2008),16(6):885-896 and PLOS Computational Biology (2010),6(9) provide means to visualize the existence of distinct conformational groupings. 5InSilico Seminar Slides: Interpret PCA plots
  • 6. Conformational Diversity: Structural insight Structure (2008),16(6):885-896 and PLOS Computational Biology (2010),6(9) 6InSilico Seminar Slides: Interpret PCA plots  GTPase H-Ras : Conformational switches involved in regulating cell division in response to growth factor stimulation .  Experiment : To understand the conformational transition between inactive GDP-bound and active GTP- bound states. Structural Insight :  Based on mutational study: * magenta colored residues: associated with a large number of cancers. * brown colored residues : associated with various cancers and developmental diseases.  Ras catalytic domain: composed of a six stranded central β-sheet surrounded by five α-helices (bottom left).  Nucleotide binds to conserved phosphate- binding loop (P-loop) shown as green in the figures.  Two switch loop regions (switch1, blue and switch2, red) and loop 3 region colored orange are also highlighted in the figures .
  • 7. 7InSilico Seminar Slides: Interpret PCA plots Conformational Diversity: PCA plot Structure (2008),16(6):885-896 and PLOS Computational Biology (2010),6(9)  Inter-conformer analysis : 46 chains from 41 H-ras crystal structures which included both GDP and GTP bound forms.  PCA was used to examine the major conformational differences between structures.  Covariance matrix : cartesian coordinates of aligned Cα-atoms.  Over 57.4% variance was captured in two dimensions (PC1 and PC2).  Figure (left) shows the relationship between structures ( conformational differences) captured by the first two PCs (PC1 and PC2). GDP GTP * Two major clusters are evident along PC1 corresponding to distinct GTP and GDP bound conformations with an exception of PDB 6q21. * The GTP/GTP-analog/ GDP structures which had mutations at the P-loop or switch regions were situated out of GTP-cluster/ GDP-cluster.
  • 8. 8InSilico Seminar Slides: Interpret PCA plots  The contribution of each residue to the first three PCs is displayed in figure (left).  Ras catalytic domain, with displacements scaled along the first PC (PC1) is shown in figure (top).  The height of each bar represents the relative displacement of each residue. * Dominant feature described by PC1: Displacement of the switch region. * Dominant feature described by PC2 and PC3: Displacement of switch region, α3-β5 loop region and β2-β3 loop Conformational Diversity: Analysis Structure (2008),16(6):885-896 and PLOS Computational Biology (2010),6(9)
  • 9. Enzyme-Ligand Interactions Journal of Chemical Information and Modeling (2007), 47(3):1234-1247. Separates protein structures on the basis of the amino acids relevant for the interaction with the ligand 9InSilico Seminar Slides: Interpret PCA plots
  • 10. 10InSilico Seminar Slides: Interpret PCA plots Enzyme-Ligand Interactions : Introduction Journal of Chemical Information and Modeling (2007), 47(3):1234-1247. Flow Chart of the Experiment  Aim of the experiment: * Compare homology model with crystal structure. * Identify the sites of interaction.  Area of focus: Consensus PCA (CPCA) and PCA performed on CYP3A4.  Data set: * Four structures • PDB without inhibitor : 1TQN & 1WOE • PDB with inhibitor : 2J0D (erythromycin) • Homology model: J.Comput.-Aided Mol. Des. (2000), 14:93-116 * Compounds for interaction study • 25 compounds of opioid analgesics • 15 well known CYP3A4 inhibitors
  • 11. 11InSilico Seminar Slides: Interpret PCA plots Probe Chemical group Used in OH2 water CPCA/dockings DRY hydrophobic CPCA/dockings H neutral hydrogen dockings N1 neutral flat NH (e.g., amide) CPCA/dockings N1/2/3+ sp3 amine cation CPCA/dockings N: sp3 N with lone pair dockings O sp2 carbonyl oxygen CPCA/dockings O- sp2 phenolate oxygen CPCA O:: sp2 carboxy oxygen dockings O1 alkyl hydroxy OH group dockings OC1 aromatic/aliphatic ether oxygen dockings Energy calculation Identify amino acid responsible for the interaction with each probe Accumulate all energy values for each amino acid and docking pose Spreadsheet with energy values PCA Enzyme-Ligand Interactions : PCA based workflow Journal of Chemical Information and Modeling (2007), 47(3):1234-1247. Grid probes used for calculation of Molecular Interaction Fields (MIFs) in CPCA & Dockings with GLUE Flowchart over Energy Calculation Define atom types and assign appropriate GRID probes Docking Pose Docking pose filter: Those that were within 6Å from any atom of the heme were selected.
  • 12. 12InSilico Seminar Slides: Interpret PCA plots Enzyme-Ligand Interactions : CPCA based workflow Journal of Chemical Information and Modeling (2007), 47(3):1234-1247. (X1,Y1,Z1) (Xi,Yi,Zi) K grid points (variables) Probe 1 Probe 2 Probe n∙ ∙ ∙ ∙ ∙ Block-2 (Target 2) Block-1 (Target 1)  CPCA (Consensus PCA): Two levels PCA (Block level and super level)  Super-level: capture the influence of each probe on the whole model.  Super- level is a super weight matrix, which gives the partition of each probe on the overall scores Block-2 (Target 2) Block-1 (Target 1) Super-level: consensus of blocks Block-level: PCA Extract scoring matrix Extract scoring matrix PCA: combined scoring matrix
  • 13. 13InSilico Seminar Slides: Interpret PCA plots PCA results on energy calculations from dockings of opioid analgesics in CYP3A4 homology model (▵) and crystal structures 2j0d (gray □), 1tqn (▪), and 1woe (*). (a) PCA score plot; (b) PCA loading plot. Enzyme-Ligand Interactions : PCA based analysis Journal of Chemical Information and Modeling (2007), 47(3):1234-1247.  The score plot and loading plot are related… * Variables that influence on an observation are positioned in same place in the loading plot as the observation in the score plot. * But here we deal with negative energies, so variables are positioned with same coordinates but with opposite signs in loading plot  Over PC1 of score plot: 1TQN and homology model are best separated.  Most discriminative interactions (loading plot): * Homology model: Phe304, Thr309 and heme * 1TQN: Arg212, Phe215, Ala370 and Glu374 * 2J0D: Ser119 and Phe304 * 1WOE: No pronounced different interactions compared to the other structures.
  • 14. 14InSilico Seminar Slides: Interpret PCA plots CPCA on all CYP3A4 structures based on molecular interaction fields. (a) Super-weights plot describing the influence of the different probes. (b) PCA score plot showing the inter- correlation between the structures. Interactions Homology Ligand complex Ligand free 2J0D 1TQN 1WOE Common A305, E308, T309,R372 - - - A370, M371 - - - R212 L483 L483 I369 I369 - F108,F213 F304, E374, G481, L482 Uncommon N104, R105, V111, T310, S312,V313,R372, Heme R106, F241, G306, Y307, L373 D76, I120, D214, Q484 F215 Enzyme-Ligand Interactions : CPCA based analysis Journal of Chemical Information and Modeling (2007), 47(3):1234-1247. * Over PC2 of Super-weights plot: Hydrophobic probe (DRY) differs from the rest * Over PC1 of score plot: Homology is separated from crystal structures. * Over PC2 of score plot: Erythromycin bound structure(1J0D) is separated from substrate free structures (1WOE and 1TQN)
  • 15. SAR of peptides Journal of Medicinal Chemistry (2007), 50(9):2049-2059. Provides means to study molecular property preferences for peptides binding to Aq 15InSilico Seminar Slides: Interpret PCA plots
  • 16. 16InSilico Seminar Slides: Interpret PCA plots SAR of peptides: Structural insight Journal of Medicinal Chemistry (2007), 50(9):2049-2059.  Rheumatoid arthritis (RA), autoimmune inflammatory disease is linked to major histocompatibility complex (MHC) class II molecules DR1 and DR4.  RA is directed against type-2 collagen(CII).  Animal model of RA: Collagen induced arthritis (CIA) linked to mouse MHC class II molecule Aq.  Octa-peptide (CII260-267) is required for binding to Aq and induce T-cell response. Peptide scaffold used to study molecular property preferences for peptide binding to Aq Aqueous solubility
  • 17. 17InSilico Seminar Slides: Interpret PCA plots SAR of peptides: Statistical Molecular Design (SMD) Journal of Medicinal Chemistry (2007), 50(9):2049-2059.  Amino acids indicated in red (Met, Ala, Thr) and green (Val, Ser) were chosen as building blocks for the variations at positions 1−3.  The building blocks in blue (Arg, Asn, Tyr, Asp) and green (Val, Ser) was used for positions 4 and 5. : Size Descriptors : Electronic property/ polarity Descriptors : Lipophilicity Descriptors : Solubility Descriptors : Size/Polarity Descriptors : Size/ Lipophilicity Descriptors : H-Bonding Descriptors : Shape & flexibility Descriptors : Flexibility Descriptors : Saturation Descriptors * t1 to t3 principal components described 65% of the variation * t1 separated amino acids(Aa) based on size: Gly & Ala have High score, while Arg & Try have low score * t2 separated based on lipophilicity and flexibility (similarly trend observed for t3) * Three groups of Aa could be distinguished in the t1 vs t2 score plot. Aromatic Score (a) and loading plots (b) resulting from PCA of the 20 coded amino acids described by 28 molecular descriptors.
  • 18. 18InSilico Seminar Slides: Interpret PCA plots  Virtual library of 4500 peptides (53 X 62) : generated by varying the selected amino acids (Aa) at five positions.  D-optimal design applied: to reduce the size of library to 22 peptides. * Maximizing the volume spanned in the principal property space.  Principal property space: * Each Aa at the five altered positions were represented by the three values of the scaled principal property (t1 to t3) * Each peptide is represented by 15 values which in turn represented the principal property space for D-optimal design. Peptide-1 Pos1-t1 Pos1-t2 Pos1-t3 ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ ∙ Pos5-t1 Pos5-t2 Pos5-t3 Peptide-4500 ∙ ∙ Data matrix: 4500 rows X 15 columns Scoring plot extracted for D-optimal design SAR of peptides: Library design Journal of Medicinal Chemistry (2007), 50(9):2049-2059.
  • 19. 19InSilico Seminar Slides: Interpret PCA plots SAR of peptides: Partial Least Square (PLS) model Journal of Medicinal Chemistry (2007), 50(9):2049-2059.  Main contributors at position 1 & 2 : * Small sized, rigid groups (positive weights for t1-t3)  Main contributors at position 3: * Flexible Hydrophobic preferred (negative weights for t2 and t3)  Main contributors at position 4: * Large sized, flexible groups. (t1 & t3 negatively correlated with the response) * H-bond donors/ acceptors preferred (Positive weight for t2)  Main contributors at position 5: * Large sized, flexible groups.(similar to position 4) * Hydrophobic preferred (negative weight for t2) PLS weight values (w × c) for the QSAR model based on 15 principal property values (t1−t3 at positions 1−5) and three biological responses represented as % inhibition at three different peptide concentrations (Y2: 250μM, Y3: 83μM, Y4: 28μM). Discrimination  t1: Size  t2: Hydrophilicity  t3: Rigidity
  • 20. 20InSilico Seminar Slides: Interpret PCA plots SAR of peptides: Scope of use Journal of Medicinal Chemistry (2007), 50(9):2049-2059. Scope of use: Lets discuss! Protein sequence alignment: Nature Structural & Molecular Biology (1995), 2(2):171-178  PCA was used on multiple sequence alignments to identify possible functional residues.  Columns in the alignment : a vector of binary variables of length 20, which represented the absence/presence of an amino acid at this position. Small molecule statistical molecular design(SMD) based SAR analysis: Bioorganic & Medicinal Chemistry (2010), 18(7):2686-2703.  Extracted PCA score vectors for different substitutions (i.e. Salicylic aldehydes and Hydrazides)  Performed Hierarchical –Partial Least square (Hi-PLS) model to interpret SAR Softwares which could be used: R language, PanelCheck, Cimpl2, Codessa, Canvas
  • 21. 21InSilico Seminar Slides: Interpret PCA plots Take home message…  Use of PCA * Discriminate the observations (e.g. active / inactive) based on the influence of different variables (e.g. descriptors) * Data reduction: visualize high dimensional data * Generate informative plots  Limitation of PCA * Assumes linear relationship between variables * Requires preprocessing step * Unsupervised method  Points to be cautious * Relate both score and loading plots • Observations with high score on a given PC (principal component) are positively correlated with variables with high positive loading [ beware of variables with negatively signed values (e.g. negative energies)] Let your conscience be your guide: check your results with raw data