Group Members and Content Introduction - biological aspect of protein-protein interaction. (Zhenli Su) Protein-protein interaction databases - BIND (Xin Hong) - DIP (Xiang Zhou) Pathway databases and Algorithms (Paul Ma) Visualization Tools (James Coleman) present by Xiang Zhou
Biological Aspects ofProtein-Protein Interaction Zhenlu Su
Introduction to protein protein interactions The importance of the interactions Impact of protein interaction technologies on other fields The types of protein interactions The methods of protein interactions
Introduction to protein protein interactionsProteins control and mediate many of the biological activities of cells A cell is not static Changes in shape Division Metabolism All cells are not equivalent Lymphoid Neural
Why are protein-protein interactions so important?The binding of one signaling protein to another can have a number of consequences: Such binding can serve to recruit a signaling protein to a location where it is activated and/or where it is needed to carry out its function. The binding of one protein to another can induce conformational changes that affect activity or accessibility of additional binding domains, permitting additional protein interactions.
Why are protein-protein interactions so important? Imagine a cell in which, suddenly, the specific interactions between proteins would disappear. This unfortunate cell would become deaf and blind, paralytic and finally would disintegrate, because specific interactions are involved in almost any physiological process.
Impact on other fields Cancer Biology The study of protein-protein interactions has provided important insights into the functions of many of the known oncogenes, tumor suppressors, and DNA repair proteins. Pharmacogenetics Pharmacogenetic research has expanded to include the study of drug transporters, drug receptors, and drug targets.
The types of protein interactions Binary protein protein interactions Scaffolding proteinshttp://www.udel.edu/che m/bahnson/chem667/cr otty/scaffolding_protein s.html#scaffolding
The types of protein interactions -another classification Metabolic and signaling (genetic)pathways Morphogenic pathways in which groups of proteins participate in the same cellular function during a developmental process Structural complexes and molecular machines in which numerous macromolecules are brought together
Experimental methods The first comprise and ‘atomic observation’ in which the protein interaction is detected using, for example, X-ray crystallography. These experiments can yield specific information on the atoms or residues involved in the interaction. The second is a ‘direct interaction observation’ where protein interaction between two partners can be detected as in a two-hybrid experiment. At a third level of observation, multi-protein complexes can be detected using methods such as immuno-precipitation or mass-specific analysis. This type of experiment does not unveil the chemical detail of the interactions or even reveal which proteins are in direct contact but gives information as to which proteins are found in a complex at a given time. The fourth category comprises measurements at the cellular level, where an ‘activity bioassay’ is used to observe an interaction; for example, proliferation assays of cells by a receptor-ligand interaction.
Protein-Protein Interaction Databases BIND(Biomolecular Interaction Network Database) Xin Hong
Introduction of BIND Background What is BIND MCODE Algorithm How to use BIND Reference
Background Recent advances in proteomics technologies such as two-hybrid, phage display and mass spectrometry have enabled us to create a detailed map of biomolecular interaction networks. Initial mapping efforts have already produced a wealth of data. As the size of the interaction set increases, databases and computational methods will be required to store, visualize and analyze the information in order to effectively aid in knowledge discovery. For the protein-protein interactions, there are mnay websites can be reached, here I just show several. BIND (Interaction Network Database) DIP (Database of Interacting Proteins) Protein-Protein Interaction Server Protein-Protein Interface
What is BINDThe Biomolecular Interaction Network Database is a database designed tostore full descriptions of interactions, molecular complexes and pathways.Development of the BIND 2.0 data model has led to the incorporation ofvirtually all components of molecular mechanisms including interactionsbetween any two molecules composed of proteins, nucleic acids and smallmolecules. Chemical reactions, photochemical activation and conformationalchanges can also be described. Everything from small molecule biochemistry tosignal transduction is abstracted in such a way that graph theory methods may beapplied for data mining. The database can be used to study networks of interactions, to map pathwaysacross taxonomic branches and to generate information for kinetic simulations.BIND anticipates the coming large influx of interaction information from high-throughput proteomics efforts including detailed information about post-translational modifications from mass spectrometry.
What kind of data stored in BIND?• INTERACTION: The interaction between two molecules as well as any chemical reactions that occur as a direct result of interaction.• Example: P-P, P-n, P-s. (phosphorylation of P, methylation of D, hydrolysis of sugar)• COMPLEX: describes a molecular complex by listing the series of interaction records that are present in the complex.• Example: multi-sub enzyme, actin fiber, ribosome• PATHWAY: describes a cellular process pass a sequential list of interaction records and its associated Chemical Action data.• Example: cell-signaling pathway, synthesis of an amino acid, transcription and splicing of a pre-massager RNA.
What BIND can and cannot do right now The design of the BIND database structure is a robust one that has been built to accept data from all cell systems, the interface that you see is NOT the data structure and it does not accurately reflect all of the potentialities of the database. Tools are being built to implement these potentials, and changes are constantly being made to the interface to make the database easier to use and understand. BIND is currently able to accept records that describe protein-protein and protein- nucleic acid interactions. The BIND data specification is available as ASN.1 and XML DTD. ASN.1 data can describe details underlying biochemical and genetic networks. XML versions of all data with accompanying DTDs are supported through the use of the NCBI programming toolkit.
Demonstrating the use of Binding sites and Binding Site Pairs for a protein-protein interaction The grey shapes represent autonomous domains in proteins A and B that mediate a protein-protein interaction. The black lines in these grey shapes represent polypeptide chains that continue outside of these domains to make up the rest of proteins A and B. The protein-protein interaction between these two domains is mediated by two Binding Site pairs. The first pair (a salt bridge) consists of a single amino acid on molecule A (SLID 0) and a single amino acid on B (SLID 0). These two amino acids form the first Binding Site Pair. The second pair consists of a range of amino acids on A (SLID 1) and a range of amino acids on B (SLID 1). These two ranges of amino acids form the second Binding Site Pair.
The Algorithm MCODE-An automated method for finding molecular complexes inlarge protein interaction networks.•The MCODE algorithm operates in three stages, vertex weighting,complex prediction and optionally post-processing to filter or addproteins in the resulting complexes by certain connectivity criteriaBackgroundRecent advances in proteomics technologies such as two-hybrid, phagedisplay and mass spectrometry have enabled us to create a detailed mapof biomolecular interaction networks. The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2105/4/2
The Algorithm MCODE-An automated method for finding molecularcomplexes in large protein interaction networks.ResultsThe algorithm has the advantage over other graph clustering methods ofhaving a directed mode that allows fine-tuning of clusters of interest withoutconsidering the rest of the network and allows examination of clusterinterconnectivity, which is relevant for protein networks. Protein interactionand complex information from the yeast Saccharomyces cerevisiae was usedfor evaluation.ConclusionDense regions of protein interaction networks can be found, based solely onconnectivity data, many of which correspond to known protein complexes.The algorithm is not affected by a known high rate of false positives in datafrom high-throughput interaction techniques. The program is available fromftp://ftp.mshri.on.ca/pub/BIND/Tools/MCODEhttp://www.biomedcentral.com/1471-2105/4/2
How to use BINDPathwayThe INAD Pathway in Drosophila Photoreceptors - A Tutorialhttp://bind.ca/index2.phtml?site=tutor
How to use BINDBIND Interaction Viewer Java Applet BIND Interaction Viewer Java applet showing how molecules can be connected in the database from molecular complex to small molecule. Yellow, protein; purple, small molecule; white, molecular complex; red, a square is fixed in place and will not be moved by the graph layout algorithm. This session was seeded by the interaction between human LAT and Grb2 proteins involved in cell signaling in the T-cell.
Reference•Gary D Bader et al BMC Bioinformatics 2003 Jan 13;4(1):2An automated method for finding molecular complexes in large proteininteraction networks•Gary D. Bader Nucleic Acids Research, 2001, Vol. 29, No. 1 242-245BIND—The Biomolecular Interaction Network Database•http://bind.ca/•http://nar.oupjournals.org/cgi/content/full/29/1/242
What is DIP? Established in 1999 in UCLA Primary goal extract and integrate protein-protein info and build a user-friendly environment. The usage of DIP
The usage of DIPStudy Protein function Protein-protein relationship Evolution of protein-protein interaction The network of interacting proteins The environments of protein-protein interactionsPredict Unknown protein-protein interaction The best interaction conditions
The structure of DIPProtein Table Method TableInteraction Table Reference Table
Protein Table DIP accession number : <DIP:nnnN> Identification numbers from : SWISS-Prot, GenBank, PIR Protein Name and description Cross references Graph
The current status of DIP Number of proteins: 6978 Number of organisms: 101 Number of interactions:18260 Number of distinct experiments describing an interaction: 22229 Number of articles: 2203
Other satellite databases DLRP (http://dip.doe-mbi.ucla.edu/dip/DLRP.cgi) - Database of Ligand-Receptor Partners LiveDIP(http://dip.doe-mbi.ucla.edu/ldipc/tmpl/livedip.cgi) - data of the protein states and state transition in protein-protein interaction. JDIP - a stand-alone Java application that provides a graphical, browser- independent interface to the DIP database.
1) KEGG(Kyoto Encyclopedia of Genes and Genomes) Representation of higher order functions in terms of the network of interaction molecules GENES database contains 240 943 entries from the published genomes, including the bacteria, mouse and human. Has 3 databases, GENES, PATHWAY and LIGAND databases. Each entry has the form, database:entry or organism:gene ex) EC:22.214.171.124 : enzyme genbank:DROALPC: gene D.melanogaster:dpp : organism specific gene
By matching genes in the genome and gene products in the pathway, KEGG can be used to predict protein interaction networks and associated cellular function. The data object stored in the PATHWAY database is called the generalized protein interaction network, which is a network of gene products with three types of interactions or relations: enzyme-enzyme relations which catalyzes the successive reaction steps in the metabolic pathway, direct protein-protein interactions and gene expression relations. Currently, only enzyme-enzyme relations are maintained. PATHWAY database contains 5761 entries including 201 pathway diagrams with 14,960 enzyme-enzyme relations.
An example of a pathway entry in KEGG- Glycolysis
2) WIT database – Oak Ridge National Laboratory Similar to KEGG3) Eco Cyc – E Coli Encyclopedia the genome and gene products of E Coli, its metabolic and signal transduction pathways and its RNAs. Contains 4391 genes, 904 metabolic reactions and 129 metabolic pathways
Graph theoretical algorithm for finding the molecular complex Small-world networks- How to identify a set of central metabolites such as in BIND database MCODE- Many biological networks have small-world characteristic ex) Erdos numberPaul Erdos : A prominent Hungarian graph-theorist. He is the center of mathematical collaboration. Coauthors of a paper with Erdos are one step from Erdos and has Erdos number 1. Coauthors of a paper with mathematicians with Erdos number 1 have Etrdos number 2. Most mathematicians active in this century has a small Erdos numberex) Kevin Bacon gameIt aims at connecting an arbitrary actor with the actor Kevin Bacon by the shortest sequence of actor-pairs who have appeared together in a film. The average Bacon number for an arbitrary actor turns out to be 2.87. (However, Kevin Bacon is not the center of this small world of film actor collaboration. The center turns out to be Christopher Lee, with a mean center of 2.60)
Small-world lies between two extremes of graph, completely regular and completely random graph. Regular networks have long path lengths, and are clustered, while random graphs has short path length but shows little clustering. Small-world networks has short path lengths but highly clustered. The metabolic network of E. coli falls into the small- world network. The center of the map is glutamate with a mean path of 2.46, followed by pyruvate with a value of 2.59
MCODE(Molecular Complex Detection) in BIND database Algorithms for finding clusters – an active area of computer science - often based on network flow/minimum cut theory or spectral clustering - MCODE uses a vertex-weighting scheme based on the clustering coefficient, Ci, which means the ‘cliquishness’ of the neighborhood of a vertex.- Ci = 2n/ki (ki -1), where ki is the vertex size of the neighborhood of vertex i and n is the number of edges in the neighborhood.
Density of a subgraph is the number of edges divided by the maximum possible number of edges, so it ranges from 0.0 to 1.0 A k-core is a subgraph of minimal degree k, i.e, every vertex of it has degree >= k. So, the highest k-core of a graph is the central most densely connected subgraph We define the core-clustering coefficient of a vertex to be the density of the highest k-core of the immediate neighborhood of v, including v.
The core-clustering coefficient amplifies the weighting of the heavily interconnected graph regions while removing the many less connected vertices that are characteristics of the bimolecular interaction network Then, the weight of a vertex is the product of the vertex core- clustering coefficient and the highest k-core level, kmax, of the immediate neighborhood of the vertex. Then, finds a complex with the highest weight vertex and recursively moves outward from this vertex, including vertices whose weight is above a given threshold of the seed vertex. In this way the densest regions of the network are identified. The time complexity is O(nmh3), where n is the number of vertices, m is the number of edges and h is the vertex size of the average neighborhood in the graph
It is slower than the fastest min-cut graph clustering algorithm with O(n2 log n) time complexity. But MCODE has a number of advantages. Since weighting is done only once and it comprises most of the execution time we can try many parameters. Another is MCODE is relatively easy to implement.
Structure Visualization Tools Written by James Coleman Presented by Xiang Zhou
Structure Visualization One of the primary activities in proteomics R&D is determining and Visualizing the 3D structure of proteins in order to find where drugs might modulate their activity. Other activities include identifying all of the proteins produced by a given cell or tissue and determining how these proteins interact. BIOINFORMATICS COMPUTING, p.186, Bryon Bergeron, M.D., Prentice Hall 2002
Structure Visualization It’s generally understood by the molecular biology research community that the sequencing of the human genome, which will likely take several more years to complete, is relatively trivial compared to definitively characterizing the interactions within the proteome. BIOINFORMATICS COMPUTING, p.186, Bryon Bergeron, M.D., Prentice Hall 2002
Non-Static Structure Visualization Unlike a nucleotide sequence, which is a relatively static structure, proteins are dynamic entities that change their shape and association with other molecules as a function of temperature, chemical interactions, pH, and other changes in the environment. BIOINFORMATICS COMPUTING, p.186, Bryon Bergeron, M.D., Prentice Hall 2002
Primary vs. Secondary and Tertiary Structure In contrast to visualizing the sequence of nucleotides on a strand of DNA, visualizing the primary structure of a protein adds little to the knowledge of protein function. More interesting and relevant are the higher-order structures.
Why Visualize? In each area of bioinformatics, the rationale for using graphics instead of tables or strings of data is to shift the user’s mental processing from reading and mathematical, logical interpretation to faster pattern recognition. BIOINFORMATICS COMPUTING, p.180, Bryon Bergeron, M.D., Prentice Hall 2002 Pattern recognition is an area where humans are much more efficient than computers.
Some Common Tools 100’s of visualization tools have been developed in bioinformatics. Many are specific to hardware such as microarray devices. Shareware utilities for PC’s PDB Viewer, WebMol, RasMol, Protein Explorer, Cn3D VMD, MolMol, MidasPlus, Pymol, Chime, Chimera
Application Feature Summary Feature RasMol Cn3D PyMol SWISS- Chimera PDBViewer Architecture Stand-Alone Plug-in Web- Web-enabled Web-enabled EnabledManipulation Low High High High High Power Hardware Low/Moderate High High Moderate HighRequirements Ease of Use High; Moderate Moderate High Moderate;GUI command line +command line Special Small Size; Powerful GUI; ray Powerful GUI GUI; Features easy install GUI tracing collaborationOutput Quality Moderate Very high High High Very highDocumentation Good Good Limited Good Very good Support Online; Users Online; Online; Online; Users Online; Users groups Users Users groups groups groups groups Speed High Moderate Moderate Moderate Moderate/Slow OpenGL Yes Yes Yes Yes Yes Support
Molecule RepresentationsWireframe Bonds and Bond AnglesBall and Stick Shows Atoms, Bonds and Bonds AnglesRibbon diagrams Shows Secondary StructureVan der Waals Shows Atomic Volumessurface DiagramBackbone Shows Overall Molecular Structure
Other properties that can be Visualized MolMol supports the display of electrostatic potentials across a protein molecule. MidasPlus (a predecessor of Chimera) allows for the editing of sequences visually to see the effects of point mutations.
HCI and Protein-Protein Interaction Creating a suitable metaphor to transform data into a form that means something to the user. Large volumes of complex data require more complex metaphors than, for example, the pie chart used in business graphics. Different users require different levels of complexity – and therefore different metaphors. The desktop, folder, trashcan metaphor could be replaced by a chromosome, gene, protein, pathway metaphor.
For Protein interactions, we need a metaphor that reveals dynamics Haptic Joystick: Provides Stereo view of interaction of two proteins. Scripting allows for the force feedback when user movement of individual molecules creating a movie. manipulates a molecule near another one. 3D Goggles combined with haptic gloves to feel electrostatic potentials and see tertiary structure dynamics. PyMol provides scripting that can produce a movie in 3D of the geometrical relationship between multiple proteins.
The field is wide open. To definitively characterize the interactions within the proteome, we need more tools. We need new metaphors for managing this complex data. We need tools to reveal dynamic relationships.