Published on

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Protein-Protein Interaction L519 presentation
  2. 2. Group Members and Content Introduction - biological aspect of protein-protein interaction. (Zhenli Su) Protein-protein interaction databases - BIND (Xin Hong) - DIP (Xiang Zhou) Pathway databases and Algorithms (Paul Ma) Visualization Tools (James Coleman) present by Xiang Zhou
  3. 3. Biological Aspects ofProtein-Protein Interaction Zhenlu Su
  4. 4.  Introduction to protein protein interactions The importance of the interactions Impact of protein interaction technologies on other fields The types of protein interactions The methods of protein interactions
  5. 5. Introduction to protein protein interactionsProteins control and mediate many of the biological activities of cells A cell is not static Changes in shape Division Metabolism All cells are not equivalent Lymphoid Neural
  6. 6. Why are protein-protein interactions so important?The binding of one signaling protein to another can have a number of consequences:  Such binding can serve to recruit a signaling protein to a location where it is activated and/or where it is needed to carry out its function. The binding of one protein to another can induce conformational changes that affect activity or accessibility of additional binding domains, permitting additional protein interactions. 
  7. 7. Why are protein-protein interactions so important? Imagine a cell in which, suddenly, the specific interactions between proteins would disappear. This unfortunate cell would become deaf and blind, paralytic and finally would disintegrate, because specific interactions are involved in almost any physiological process.
  8. 8. Impact on other fields Cancer Biology The study of protein-protein interactions has provided important insights into the functions of many of the known oncogenes, tumor suppressors, and DNA repair proteins. Pharmacogenetics Pharmacogenetic research has expanded to include the study of drug transporters, drug receptors, and drug targets.
  9. 9. The types of protein interactions Binary protein protein interactions Scaffolding proteinshttp://www.udel.edu/che m/bahnson/chem667/cr otty/scaffolding_protein s.html#scaffolding
  10. 10. The types of protein interactions -another classification Metabolic and signaling (genetic)pathways Morphogenic pathways in which groups of proteins participate in the same cellular function during a developmental process Structural complexes and molecular machines in which numerous macromolecules are brought together
  11. 11. Signaling pathways
  12. 12. Morphogenic pathways
  13. 13. Structural complexes and molecular machinesChaperones: protein refolding machineshttp://www-cryst.bioc.cam.ac.uk/cgi-bin/cgiwrap/homhttp://www.nature.com/nsb/web_specials/movies/sa
  14. 14. Experimental methods Tagged Fusion Proteins Coimmunoprecipitation Yeast Two-hybrid Biacore Atomic Force Microscopy (AFM) Fluorescence Resonace Energy Trasfer (FRET) X-ray Diffraction
  15. 15. Experimental methods The first comprise and ‘atomic observation’ in which the protein interaction is detected using, for example, X-ray crystallography. These experiments can yield specific information on the atoms or residues involved in the interaction. The second is a ‘direct interaction observation’ where protein interaction between two partners can be detected as in a two-hybrid experiment. At a third level of observation, multi-protein complexes can be detected using methods such as immuno-precipitation or mass-specific analysis. This type of experiment does not unveil the chemical detail of the interactions or even reveal which proteins are in direct contact but gives information as to which proteins are found in a complex at a given time. The fourth category comprises measurements at the cellular level, where an ‘activity bioassay’ is used to observe an interaction; for example, proliferation assays of cells by a receptor-ligand interaction.
  16. 16. Protein-Protein Interaction Databases BIND(Biomolecular Interaction Network Database) Xin Hong
  17. 17. Introduction of BIND Background What is BIND MCODE Algorithm How to use BIND Reference
  18. 18. Background Recent advances in proteomics technologies such as two-hybrid, phage display and mass spectrometry have enabled us to create a detailed map of biomolecular interaction networks. Initial mapping efforts have already produced a wealth of data. As the size of the interaction set increases, databases and computational methods will be required to store, visualize and analyze the information in order to effectively aid in knowledge discovery. For the protein-protein interactions, there are mnay websites can be reached, here I just show several.  BIND (Interaction Network Database)  DIP (Database of Interacting Proteins)  Protein-Protein Interaction Server  Protein-Protein Interface
  19. 19. What is BINDThe Biomolecular Interaction Network Database is a database designed tostore full descriptions of interactions, molecular complexes and pathways.Development of the BIND 2.0 data model has led to the incorporation ofvirtually all components of molecular mechanisms including interactionsbetween any two molecules composed of proteins, nucleic acids and smallmolecules. Chemical reactions, photochemical activation and conformationalchanges can also be described. Everything from small molecule biochemistry tosignal transduction is abstracted in such a way that graph theory methods may beapplied for data mining. The database can be used to study networks of interactions, to map pathwaysacross taxonomic branches and to generate information for kinetic simulations.BIND anticipates the coming large influx of interaction information from high-throughput proteomics efforts including detailed information about post-translational modifications from mass spectrometry.
  20. 20. What kind of data stored in BIND?• INTERACTION: The interaction between two molecules as well as any chemical reactions that occur as a direct result of interaction.• Example: P-P, P-n, P-s. (phosphorylation of P, methylation of D, hydrolysis of sugar)• COMPLEX: describes a molecular complex by listing the series of interaction records that are present in the complex.• Example: multi-sub enzyme, actin fiber, ribosome• PATHWAY: describes a cellular process pass a sequential list of interaction records and its associated Chemical Action data.• Example: cell-signaling pathway, synthesis of an amino acid, transcription and splicing of a pre-massager RNA.
  21. 21. Current BIND Database StatisticsDatabase Record CountInteraction Database 15145Biomolecular Pathway Database 8Molecular Complex Database 1306Organisms represented 14GI Database 4961DI Database 0Publication Database 454
  22. 22. What BIND can and cannot do right now The design of the BIND database structure is a robust one that has been built to accept data from all cell systems, the interface that you see is NOT the data structure and it does not accurately reflect all of the potentialities of the database. Tools are being built to implement these potentials, and changes are constantly being made to the interface to make the database easier to use and understand. BIND is currently able to accept records that describe protein-protein and protein- nucleic acid interactions. The BIND data specification is available as ASN.1 and XML DTD. ASN.1 data can describe details underlying biochemical and genetic networks. XML versions of all data with accompanying DTDs are supported through the use of the NCBI programming toolkit.
  23. 23. Demonstrating the use of Binding sites and Binding Site Pairs for a protein-protein interaction  The grey shapes represent autonomous domains in proteins A and B that mediate a protein-protein interaction. The black lines in these grey shapes represent polypeptide chains that continue outside of these domains to make up the rest of proteins A and B.  The protein-protein interaction between these two domains is mediated by two Binding Site pairs. The first pair (a salt bridge) consists of a single amino acid on molecule A (SLID 0) and a single amino acid on B (SLID 0). These two amino acids form the first Binding Site Pair. The second pair consists of a range of amino acids on A (SLID 1) and a range of amino acids on B (SLID 1). These two ranges of amino acids form the second Binding Site Pair.
  24. 24. The Algorithm MCODE-An automated method for finding molecular complexes inlarge protein interaction networks.•The MCODE algorithm operates in three stages, vertex weighting,complex prediction and optionally post-processing to filter or addproteins in the resulting complexes by certain connectivity criteriaBackgroundRecent advances in proteomics technologies such as two-hybrid, phagedisplay and mass spectrometry have enabled us to create a detailed mapof biomolecular interaction networks. The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2105/4/2
  25. 25. The Algorithm MCODE-An automated method for finding molecularcomplexes in large protein interaction networks.ResultsThe algorithm has the advantage over other graph clustering methods ofhaving a directed mode that allows fine-tuning of clusters of interest withoutconsidering the rest of the network and allows examination of clusterinterconnectivity, which is relevant for protein networks. Protein interactionand complex information from the yeast Saccharomyces cerevisiae was usedfor evaluation.ConclusionDense regions of protein interaction networks can be found, based solely onconnectivity data, many of which correspond to known protein complexes.The algorithm is not affected by a known high rate of false positives in datafrom high-throughput interaction techniques. The program is available fromftp://ftp.mshri.on.ca/pub/BIND/Tools/MCODEhttp://www.biomedcentral.com/1471-2105/4/2
  26. 26. How to use BINDPathwayThe INAD Pathway in Drosophila Photoreceptors - A Tutorialhttp://bind.ca/index2.phtml?site=tutor
  27. 27. How to use BINDBIND Interaction Viewer Java Applet BIND Interaction Viewer Java applet showing how molecules can be connected in the database from molecular complex to small molecule.  Yellow, protein;  purple, small molecule;  white, molecular complex;  red, a square is fixed in place and will not be moved by the graph layout algorithm. This session was seeded by the interaction between human LAT and Grb2 proteins involved in cell signaling in the T-cell.
  28. 28. Reference•Gary D Bader et al BMC Bioinformatics 2003 Jan 13;4(1):2An automated method for finding molecular complexes in large proteininteraction networks•Gary D. Bader Nucleic Acids Research, 2001, Vol. 29, No. 1 242-245BIND—The Biomolecular Interaction Network Database•http://bind.ca/•http://nar.oupjournals.org/cgi/content/full/29/1/242
  29. 29. Protein-Protein Interaction Databases DIP (Database of Interacting Proteins) Xiang Zhou
  30. 30. What is DIP? Established in 1999 in UCLA Primary goal extract and integrate protein-protein info and build a user-friendly environment. The usage of DIP
  31. 31. The usage of DIPStudy Protein function Protein-protein relationship Evolution of protein-protein interaction The network of interacting proteins The environments of protein-protein interactionsPredict Unknown protein-protein interaction The best interaction conditions
  32. 32. The structure of DIPProtein Table Method TableInteraction Table Reference Table
  33. 33. Protein Table DIP accession number : <DIP:nnnN> Identification numbers from : SWISS-Prot, GenBank, PIR Protein Name and description Cross references Graph
  34. 34. A sample DIP protein table
  35. 35. Interaction Table Interacting proteins Links to - Methods - Original papers
  36. 36. A sample interaction table
  37. 37. The current status of DIP Number of proteins: 6978 Number of organisms: 101 Number of interactions:18260 Number of distinct experiments describing an interaction: 22229 Number of articles: 2203
  38. 38. Other satellite databases DLRP (http://dip.doe-mbi.ucla.edu/dip/DLRP.cgi) - Database of Ligand-Receptor Partners LiveDIP(http://dip.doe-mbi.ucla.edu/ldipc/tmpl/livedip.cgi) - data of the protein states and state transition in protein-protein interaction. JDIP - a stand-alone Java application that provides a graphical, browser- independent interface to the DIP database.
  39. 39. Document types and annotations Document types - XIN and tab-delimited formats Annotations - Node: <DIP: nnnN> - Edge: <DIP: nnnE>
  40. 40. Search DIPhttp://dip.doe-mbi.ucla.edu/dip/Search.cgi
  41. 41. BIND and DIP Comparison Data Stored Data FormatBIND  interactions  ASN.1  Molecular Complex  XML  PathwaysDIP  interactions  XIN  Protein information  tab-delimited
  42. 42. BIND and DIP Comparison Size of the databases Interactions Proteins Organisms BIND 15145 Unknown 14 DIP 18260 6978 101
  43. 43. BIND and DIP Comparison Graphic tools Data display layout
  44. 44. Pathway Databases and Algorithms Paul Ma
  45. 45. 1) KEGG(Kyoto Encyclopedia of Genes and Genomes) Representation of higher order functions in terms of the network of interaction molecules GENES database contains 240 943 entries from the published genomes, including the bacteria, mouse and human. Has 3 databases, GENES, PATHWAY and LIGAND databases. Each entry has the form, database:entry or organism:gene ex) EC: : enzyme genbank:DROALPC: gene D.melanogaster:dpp : organism specific gene
  46. 46.  By matching genes in the genome and gene products in the pathway, KEGG can be used to predict protein interaction networks and associated cellular function. The data object stored in the PATHWAY database is called the generalized protein interaction network, which is a network of gene products with three types of interactions or relations: enzyme-enzyme relations which catalyzes the successive reaction steps in the metabolic pathway, direct protein-protein interactions and gene expression relations. Currently, only enzyme-enzyme relations are maintained. PATHWAY database contains 5761 entries including 201 pathway diagrams with 14,960 enzyme-enzyme relations.
  47. 47. An example of a pathway entry in KEGG- Glycolysis
  48. 48. 2) WIT database – Oak Ridge National Laboratory Similar to KEGG3) Eco Cyc – E Coli Encyclopedia the genome and gene products of E Coli, its metabolic and signal transduction pathways and its RNAs. Contains 4391 genes, 904 metabolic reactions and 129 metabolic pathways
  49. 49. Graph theoretical algorithm for finding the molecular complex Small-world networks- How to identify a set of central metabolites such as in BIND database  MCODE- Many biological networks have small-world characteristic ex) Erdos numberPaul Erdos : A prominent Hungarian graph-theorist. He is the center of mathematical collaboration. Coauthors of a paper with Erdos are one step from Erdos and has Erdos number 1. Coauthors of a paper with mathematicians with Erdos number 1 have Etrdos number 2. Most mathematicians active in this century has a small Erdos numberex) Kevin Bacon gameIt aims at connecting an arbitrary actor with the actor Kevin Bacon by the shortest sequence of actor-pairs who have appeared together in a film. The average Bacon number for an arbitrary actor turns out to be 2.87. (However, Kevin Bacon is not the center of this small world of film actor collaboration. The center turns out to be Christopher Lee, with a mean center of 2.60)
  50. 50.  Small-world lies between two extremes of graph, completely regular and completely random graph. Regular networks have long path lengths, and are clustered, while random graphs has short path length but shows little clustering. Small-world networks has short path lengths but highly clustered. The metabolic network of E. coli falls into the small- world network. The center of the map is glutamate with a mean path of 2.46, followed by pyruvate with a value of 2.59
  51. 51. Three Cases of Networks
  52. 52. MCODE(Molecular Complex Detection) in BIND database Algorithms for finding clusters – an active area of computer science - often based on network flow/minimum cut theory or spectral clustering - MCODE uses a vertex-weighting scheme based on the clustering coefficient, Ci, which means the ‘cliquishness’ of the neighborhood of a vertex.- Ci = 2n/ki (ki -1), where ki is the vertex size of the neighborhood of vertex i and n is the number of edges in the neighborhood.
  53. 53.  Density of a subgraph is the number of edges divided by the maximum possible number of edges, so it ranges from 0.0 to 1.0 A k-core is a subgraph of minimal degree k, i.e, every vertex of it has degree >= k. So, the highest k-core of a graph is the central most densely connected subgraph We define the core-clustering coefficient of a vertex to be the density of the highest k-core of the immediate neighborhood of v, including v.
  54. 54.  The core-clustering coefficient amplifies the weighting of the heavily interconnected graph regions while removing the many less connected vertices that are characteristics of the bimolecular interaction network Then, the weight of a vertex is the product of the vertex core- clustering coefficient and the highest k-core level, kmax, of the immediate neighborhood of the vertex. Then, finds a complex with the highest weight vertex and recursively moves outward from this vertex, including vertices whose weight is above a given threshold of the seed vertex. In this way the densest regions of the network are identified. The time complexity is O(nmh3), where n is the number of vertices, m is the number of edges and h is the vertex size of the average neighborhood in the graph
  55. 55.  It is slower than the fastest min-cut graph clustering algorithm with O(n2 log n) time complexity. But MCODE has a number of advantages. Since weighting is done only once and it comprises most of the execution time we can try many parameters. Another is MCODE is relatively easy to implement.
  56. 56. Structure Visualization Tools Written by James Coleman Presented by Xiang Zhou
  57. 57. Structure Visualization One of the primary activities in proteomics R&D is determining and Visualizing the 3D structure of proteins in order to find where drugs might modulate their activity. Other activities include identifying all of the proteins produced by a given cell or tissue and determining how these proteins interact. BIOINFORMATICS COMPUTING, p.186, Bryon Bergeron, M.D., Prentice Hall 2002
  58. 58. Structure Visualization It’s generally understood by the molecular biology research community that the sequencing of the human genome, which will likely take several more years to complete, is relatively trivial compared to definitively characterizing the interactions within the proteome. BIOINFORMATICS COMPUTING, p.186, Bryon Bergeron, M.D., Prentice Hall 2002
  59. 59. Non-Static Structure Visualization Unlike a nucleotide sequence, which is a relatively static structure, proteins are dynamic entities that change their shape and association with other molecules as a function of temperature, chemical interactions, pH, and other changes in the environment. BIOINFORMATICS COMPUTING, p.186, Bryon Bergeron, M.D., Prentice Hall 2002
  60. 60. Primary vs. Secondary and Tertiary Structure In contrast to visualizing the sequence of nucleotides on a strand of DNA, visualizing the primary structure of a protein adds little to the knowledge of protein function. More interesting and relevant are the higher-order structures.
  61. 61. Why Visualize? In each area of bioinformatics, the rationale for using graphics instead of tables or strings of data is to shift the user’s mental processing from reading and mathematical, logical interpretation to faster pattern recognition. BIOINFORMATICS COMPUTING, p.180, Bryon Bergeron, M.D., Prentice Hall 2002 Pattern recognition is an area where humans are much more efficient than computers.
  62. 62. Some Common Tools 100’s of visualization tools have been developed in bioinformatics. Many are specific to hardware such as microarray devices. Shareware utilities for PC’s  PDB Viewer, WebMol, RasMol, Protein Explorer, Cn3D  VMD, MolMol, MidasPlus, Pymol, Chime, Chimera
  63. 63. Application Feature Summary Feature RasMol Cn3D PyMol SWISS- Chimera PDBViewer Architecture Stand-Alone Plug-in Web- Web-enabled Web-enabled EnabledManipulation Low High High High High Power Hardware Low/Moderate High High Moderate HighRequirements Ease of Use High; Moderate Moderate High Moderate;GUI command line +command line Special Small Size; Powerful GUI; ray Powerful GUI GUI; Features easy install GUI tracing collaborationOutput Quality Moderate Very high High High Very highDocumentation Good Good Limited Good Very good Support Online; Users Online; Online; Online; Users Online; Users groups Users Users groups groups groups groups Speed High Moderate Moderate Moderate Moderate/Slow OpenGL Yes Yes Yes Yes Yes Support
  64. 64. Molecule RepresentationsWireframe Bonds and Bond AnglesBall and Stick Shows Atoms, Bonds and Bonds AnglesRibbon diagrams Shows Secondary StructureVan der Waals Shows Atomic Volumessurface DiagramBackbone Shows Overall Molecular Structure
  65. 65. Wireframe used to show individual chains:
  66. 66. Stick view showing atoms and bonds:
  67. 67. Surface View showing surface fields:
  68. 68. Ribbon view of secondary structure:
  69. 69. Distinct geometrical features by color:
  70. 70. Other properties that can be Visualized MolMol supports the display of electrostatic potentials across a protein molecule. MidasPlus (a predecessor of Chimera) allows for the editing of sequences visually to see the effects of point mutations.
  71. 71. HCI and Protein-Protein Interaction Creating a suitable metaphor to transform data into a form that means something to the user. Large volumes of complex data require more complex metaphors than, for example, the pie chart used in business graphics. Different users require different levels of complexity – and therefore different metaphors. The desktop, folder, trashcan metaphor could be replaced by a chromosome, gene, protein, pathway metaphor.
  72. 72. For Protein interactions, we need a metaphor that reveals dynamics Haptic Joystick: Provides Stereo view of interaction of two proteins. Scripting allows for the force feedback when user movement of individual molecules creating a movie. manipulates a molecule near another one. 3D Goggles combined with haptic gloves to feel electrostatic potentials and see tertiary structure dynamics. PyMol provides scripting that can produce a movie in 3D of the geometrical relationship between multiple proteins.
  73. 73. The field is wide open. To definitively characterize the interactions within the proteome, we need more tools. We need new metaphors for managing this complex data. We need tools to reveal dynamic relationships.