Talk by Tim Schäfer at the German Conference on Bioinformatics 2012. Jena, Germany.
Ligand information is of great interest to understand protein function. Protein structure topology can be modeled as a graph, with secondary structure elements as vertices and spatial contacts between them as edges. Meaningful representations of such graphs in 2D are required for the visual inspection, comparison and analysis of protein folds, but their automatic visualization is still challenging.
We present an approach which solves this task, supports several graph types and includes ligands. Our method generates a mathematically unique representation and high quality 2D plots of the secondary structure of proteins based on a protein ligand graph. This graph is computed from 3D atom coordinates in Protein Databank (PDB) and the corresponding secondary structure elements (SSE) assignments of the DSSP algorithm.
The Visualization of Protein-Ligand Graphs (VPLG) software enables rapid visualization of protein structures and exports graphs in various standard formats for further analysis.
Advantages of Hiring UIUX Design Service Providers for Your Business
Computation and visualization of protein topology graphs including ligands
1. GCB 2012
Computation and visualization of protein topology
graphs including ligands
Tim Schäfer, Patrick May, Ina Koch
Goethe-University Frankfurt am Main
Institute for Computer Science
Department for Molecular Bioinformatics
2. Contents
●
Introduction
●
●
Protein structure, ligands and graphs
Methods
●
●
Types of protein ligand graphs
●
●
Computing protein ligand graphs from PDB and DSSP data
Visualization of protein ligand graphs
Results
●
The VPLG software
●
Case studies
Computation and visualization of protein topology graphs including ligands
2
3. Protein structure
●
Primary structure
●
●
●
String of 20 AAs
3D: peptide bonds, steric constraints, Ramachandran
Secondary structure
●
●
super-secondary structure
●
●
local, H-bonds, SSE types
protein domains
Tertiary structure
●
●
atom coordinates
Quaternary structure
●
multi-chain proteins
Computation and visualization of protein topology graphs including ligands
3
4. Modeling protein structure as a graph
●
●
●
Graph G = (V, E)
Pervasive data structure in
computer science and
bioinformatics
Usage of graps to model
proteins
●
CATH [Orengo et al.]
●
SCOP [Murzin et al.]
●
TOPS+ [Gilbert, Veeramalai]
●
PTGL [May, Koch]
Computation and visualization of protein topology graphs including ligands
4
5. Protein ligand interactions
●
●
●
Proteins interact with their environment to fulfill their biological
function: other proteins, ligands, ...
Knowledge on PLI is important in many applications, e.g. drug
design, molecular medicine
> 10,000 different ligands occur in PDB files
Computation and visualization of protein topology graphs including ligands
[PDB Ligand Expo, 1AVD]
5
6. A graph model of proteins on the supersecondary structure level
●
Secondary Structure Elements (SSEs)
●
●
●
Few: 40,000 atoms => 400 residues => 40 SSEs
Automated assignment from 3D data possible (e.g., DSSP)
Protein model
●
●
undirected, labeled graph for each chain of a protein
Protein graph
●
vertices: SSEs or ligands
●
edges: contacts and spatial relations between them
●
graph types: alpha-beta-graph, alpha-graph or beta-graph
Computation and visualization of protein topology graphs including ligands
[Koch 1997, May et al. 2004]
6
7. Protein ligand graphs
●
Extension of protein graph model
●
Protein ligand graph G = (V, E)
●
Vertex: SSE || Ligand
●
Edge: Spatial contact (parallel, antiparallel, mixed, ligand, backbone)
Computation and visualization of protein topology graphs including ligands
7
8. Computing protein ligand graphs
1. Obtain sequence from PDB file
SKVVVPAQGKKITLQNGKLNVPENPIIPYIEGDGIGVDVTPAM
Computation and visualization of protein topology graphs including ligands
8
9. Computing protein ligand graphs
1. Obtain sequence from PDB file
2. Obtain secondary structure assignments from DSSP file
SKVVVPAQGKKITLQNGKLNVPENPIIPYIEGDGIGVDVTPAM
HHHHHHHHHH
EEEEEEE
HHHHHH EEEEEEE
Computation and visualization of protein topology graphs including ligands
9
10. Computing protein ligand graphs
1. Obtain sequence from PDB file
2. Obtain secondary structure assignments from DSSP file
3. Add ligand information from PDB file
SKVVVPAQGKKITLQNGKLNVPENPIIPYIEGDGIGVDVTPAMJ
HHHHHHHHHH
EEEEEEE
HHHHHH EEEEEEE
Computation and visualization of protein topology graphs including ligands
L
10
11. Computing protein ligand graphs
1. Obtain sequence from PDB file
2. Obtain secondary structure assignments from DSSP file
3. Add ligand information from PDB file
4. Build graph: add a vertex for each SSE
H
E
H
Computation and visualization of protein topology graphs including ligands
E
L
11
12. Computing protein ligand graphs
1. Obtain sequence from PDB file
2. Obtain secondary structure assignments from DSSP file
3. Add ligand information from PDB file
4. Build graph: add a vertex for each SSE
5. Compute spatial contacts and add edges between SSEs
H
E
H
Computation and visualization of protein topology graphs including ligands
E
L
12
13. VPLG – an open source implementation in Java
Computation and visualization of protein topology graphs including ligands
13
15. Contact computation
●
Parser for PDB and DSSP files
●
Atom level contacts
●
●
●
vdW radius overlap, radius 2 A
complexity for n atoms is Ο (n*n)
Residue level contacts
●
●
Protein-ligand contacts
●
●
collision spheres, CA is center
collision spheres, min max center
SSE level contacts
●
rules depending on SSE type, difficult
Computation and visualization of protein topology graphs including ligands
15
16. Types of protein graphs
●
Alpha
●
Beta
●
Alpha-beta
●
Ligands
●
Not ness. connected
●
Connected components
●
Backbone option
Computation and visualization of protein topology graphs including ligands
16
17. The VPLG software
●
Command line tool
●
GUI in Java/Swing
●
Screenshots
●
Graph file formats
●
Database support
Computation and visualization of protein topology graphs including ligands
17
18. Protein graphs for the whole PDB –
Some statistics
●
62,364 proteins consisting of 151,025 chains
●
1.4 million helices, 1.2 million beta strands, 300k ligands
●
Contacts
●
●
Parallel: 450.000
●
Antiparallel: 870.000
●
Total non-ligand: 2.000.000
●
●
Mixed: 710.000
Ligand: 760.000 => 2.50 contacts per ligand
=> 0.76 contacts per SSE
High number of isolated ligands (~30 %) at first run
●
Atom radius too small?
●
Contact definition?
●
Small ligands (few atoms?)
Computation and visualization of protein topology graphs including ligands
18
19. Outlook and possible improvements
●
VPLG software
●
●
New spatial relation 'close to' for ligands
●
●
Post-processing of DSSP output (short SSEs)
RNA / DNA as ligands
Backend
●
Get Web server online
●
Protein structure comparison (not necessarily graph-based)
●
Use VPLG to get ligands into the PTGL
Computation and visualization of protein topology graphs including ligands
19
20. Summary
●
Ligands are part of protein graph
●
Visualization based on primary structure ordering of SSEs
●
VPLG software is open source implementation
●
●
●
Visualization of protein graphs from PDB and DSSP files
http://www.bioinformatik.uni-frankfurt.de
https://sourceforge.net/projects/vplg/
Evaluation
●
Statistics
●
Hemoglobin as an example
Computation and visualization of protein topology graphs including ligands
20
21. Acknowledgments
●
●
Supervisor Prof. Ina Koch
Molecular Bininformatics group at Goethe University Frankfurt
am Main (MolBI)
Computation and visualization of protein topology graphs including ligands
21
22. Thanks for your attention!
Questions?
Computation and visualization of protein topology graphs including ligands
22