Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to Network Medicine

333 views

Published on

I elaborated these slides for an introductory class on Network Medicine given at UPV (Valencia) in October 2017. The fundamental principle behind Network Medicine is that disease phenotypes emerge from genotypes via the network properties of interactions between the underlying biological components. These phenotypes are best conceptualized as consequences of perturbations to disease modules of the biological networks in the cell, whether at the node level (disease genes) or the link level (disease edgotypes). With the further analysis of drug-disease association and drug-target association data, one can investigate the effects - therapeutic and undesired - of the associated medication. Understanding the molecular level networks allows to understand the connections between different diseases and the effects of drugs designed to target them, paving the way for personalized treatments based on one's own interactome.

Published in: Science
  • Be the first to comment

Introduction to Network Medicine

  1. 1. INTRODUCTION TO NETWORK MEDICINE Marc Santolini Center for Complex Network Research (CCNR)
  2. 2. Reductionism,which has dominated biological research for over a century, has provided a wealth of knowledge about individual cellular components and their func- tions. Despite its enormous success, it is increasingly clear that a discrete biological function can only rarely be attributed to an individual molecule. Instead, most biological characteristics arise from complex interac- tions between the cell’s numerous constituents, such as proteins,DNA,RNA and small molecules1–8 .Therefore, akeychallengeforbiologyinthetwenty-firstcenturyisto understand the structure and the dynamics of the com- plex intercellular web of interactions that contribute to the structure and function of a living cell. The development of high-throughput data-collection techniques, as epitomized by the widespread use of microarrays,allows for the simultaneous interrogation of the status of a cell’s components at any given time. In turn,new technology platforms,such as PROTEIN CHIPS or semi-automatedYEAST TWO-HYBRID SCREENS,help to deter- mine how and when these molecules interact with each other.Various types of interaction webs, or networks, (including protein–protein interaction,metabolic,sig- nalling and transcription-regulatory networks) emerge from the sum of these interactions.None of these net- works are independent,instead they form a‘network of networks’ that is responsible for the behaviour of the cell.A major challenge of contemporary biology is to programmetomapout,understandandmodelinquan- tifiabletermsthetopologicalanddynamicpropertiesof the variousnetworksthatcontrolthebehaviourof thecell. Helpalongthewayisprovidedbytherapidlydevelop- ing theory of complex networks that, in the past few years,has made advances towards uncovering the orga- nizingprinciplesthatgoverntheformationandevolution of various complex technological and social networks9–12 . This research is already making an impact on cell biology. It has led to the realization that the architectural features of molecularinteractionnetworkswithinacellareshared to a large degree by other complex systems,such as the Internet,computer chips and society.This unexpected universality indicates that similar laws may govern most complex networks in nature,which allows the expertise fromlargeandwell-mappednon-biologicalsystemstobe usedtocharacterizetheintricateinterwovenrelationships thatgoverncellularfunctions. In this review,we show that the quantifiable tools of network theory offer unforeseen possibilities to under- stand the cell’s internal organization and evolution, fundamentally altering our view of cell biology. The emerging results are forcing the realization that, not- withstanding the importance of individual molecules, cellular function is a contextual attribute of strict and quantifiable patterns of interactions between the myriad of cellular constituents. Although uncovering NETWORK BIOLOGY: UNDERSTANDING THE CELL’S FUNCTIONAL ORGANIZATION Albert-László Barabási* & Zoltán N. Oltvai‡ A key aim of postgenomic biomedical research is to systematically catalogue all molecules and their interactions within a living cell. There is a clear need to understand how these molecules and the interactions between them determine the function of this enormously complex machinery, both in isolation and when surrounded by other cells. Rapid advances in network biology indicate that cellular networks are governed by universal laws and offer a new conceptual framework that could potentially revolutionize our view of biology and disease pathologies in the twenty-first century. oarrays, gy nomic set surface hem. The t a high e inding ysics, Dame, na 46556, hology, ersity, 611, u; R E V I E W S Barabasi et al., Nat Rev Genet 2004 NATURE REVIEWS | GENETICS VOLUME 5 | FEBRUARY 2004 | 105 Ba; blue nodes). In the Barabási–Albert model of a scale-free network , at each time point a node with M links is added to the network, which connects to an already existing node I with probability ΠI = kI /ΣJ kJ , where kI is the degree of node I (FIG. 3) and J is the index denoting the sum over network nodes. The network that is generated by this growth process has a power-law degree distribution that is characterized by the degree exponent γ = 3. Such distributions are seen as a straight line on a log–log plot (see figure, part Bb). The network that is created by the Barabási–Albert model does not have an inherent modularity, so C(k) is independent of k (see figure, part Bc). Scale-free networks with degree exponents 2<γ<3, a range that is observed in most biological and non-biological networks, are ultra-small34,35 , with the average path length following ഞ ~ log log N, which is significantly shorter than log N that characterizes random small-world networks. Hierarchicalnetworks To account for the coexistence of modularity, local clustering and scale-free topology in many real systems it has to be assumed that clusters combine in an iterative manner, generating a hierarchical network47,53 (see figure, part C). The starting point of this construction is a small cluster of four densely linked nodes (see the four central nodes in figure, part Ca). Next, three replicas of this module are generated and the three external nodes of the replicated clusters connected to the central node of the old cluster, which produces a large 16-node module. Three replicas of this 16-node module are then generated and the 16 peripheral nodes connected to the central node of the old module, which produces a new module of 64 nodes. The hierarchical network model seamlessly integrates a scale-free topology with an inherent modular structure by generating a network that has a power-law degree distribution with degree exponent γ = 1 + ഞn4/ഞn3 = 2.26 (see figure, part Cb) and a large, system-size independent average clustering coefficient <C> ~ 0.6. The most important signature of hierarchical modularity is the scaling of the clustering coefficient, which follows C(k) ~ k –1 a straight line of slope –1 on a log–log plot (see figure, part Cc). A hierarchical architecture implies that sparsely connected nodes are part of highly clustered areas, with communication between the different highly clustered neighbourhoods being maintained by a few hubs (see figure, part Ca). A Random network Ab Ac Aa Bb Bc Ba Cb Cc Ca B Scale-free network C Hierarchical network 1 0.1 0.01 0.001 0.0001 1 10 100 1,000 P(k)C(k) k k k k k P(k) P(k) 100 10 10–1 10–2 10–3 10–4 10–5 10–6 10–7 10–8 100 1,000 10,000 C(k) logC(k) log k SCALE-FREE NETWORKS
  3. 3. R E V I E W S mathematical properties of random networks14 .Their much-investigated random network model assumes that a fixed number of nodes are connected randomly to each other(BOX2).Themostremarkablepropertyof themodel is its‘democratic’or uniform character,characterizing the degree,orconnectivity(k;BOX1),of theindividualnodes. Because, in the model, the links are placed randomly among the nodes,it is expected that some nodes collect only a few links whereas others collect many more.In a random network, the nodes degrees follow a Poisson distribution, which indicates that most nodes have roughly the same number of links,approximately equal to the network’s average degree,<k> (where <> denotes the average); nodes that have significantly more or less linksthan<k>areabsentorveryrare(BOX2). Despite its elegance, a series of recent findings indi- cate that the random network model cannot explain the topological properties of real networks. The deviations from the random model have several key signatures, the most striking being the finding that, in contrast to the Poisson degree distribution, for many social and technological networks the number of nodes with a given degree follows a power law. That is, the probability that a chosen node has exactly k links follows P(k) ~ k –γ , where γ is the degree exponent, with its value for most networks being between 2 and 3 (REF.15).Networks that are characterized by a power-law degree distribution are highly non-uniform, most of the nodes have only a few links.A few nodes with a very large number of links,which are often called hubs,hold these nodes together. Networks with a power degree Figure 2 | Yeast protein interaction network. A map of protein–protein interactions18 in Saccharomyces cerevisiae, which is based on early yeast two-hybrid measurements23 , illustrates that a few highly connected nodes (which are also known as hubs) hold the network together. The largest cluster, which contains ~78% of all proteins, is shown. The colour of a node indicates the phenotypic effect of removing the corresponding protein (red = lethal, green = non-lethal, orange = slow growth, yellow = unknown). Reproduced with permission from REF.18 © Jeong et al., “Lethality and centrality in protein networks“ Nature 2001 THE YEAST INTERACTOME
  4. 4. FABRICATING HUBSR E V I E W S major engineer of the genomic landscape, it is likely to be a key mechanism for generating the scale-free topology. Two further results offer direct evidence that net- work growth is responsible for the observed topological features. The scale-free model (BOX 2) predicts that the nodes that appeared early in the history of the network are the most connected ones15 .Indeed,an inspection of the metabolic hubs indicates that the remnants of the RNA world, such as coenzyme A, NAD and GTP, are among the most connected substrates of the metabolic network, as are elements of some of the most ancient metabolic pathways, such as glycolysis and the tricar- boxylic acid cycle17 .In the context of the protein interac- tion networks, cross-genome comparisons have found that, on average, the evolutionarily older proteins have more links to other proteins than their younger coun- terparts45,46 . This offers direct empirical evidence for preferential attachment. Motifs, modules and hierarchical networks Cellular functions are likely to be carried out in a highly modular manner1 . In general, modularity refers to a group of physically or functionally linked molecules (nodes) that work together to achieve a (relatively) dis- tinct function1,6,8,47 . Modules are seen in many systems, for example,circles of friends in social networks or web- sites that are devoted to similar topics on the World Wide Web. Similarly, in many complex engineered sys- tems, from a modern aircraft to a computer chip, a highly modular structure is a fundamental design a b Proteins 1 2 Proteins Genes Genes Before duplication After duplication Figure 3 | The origin of the scale-free topology and hubs in biological networks. The origin of the scale-free topology
  5. 5. NETWORK MOTIFS (2003). 16. N. Keyghobadi, M. A. Matrone, G. D. Ebel, L. D. Kramer, D. M. Fonseca, Mol. Ecol. Notes 4, 20 (2004). 17. D. M. Fonseca, C. T. Atkinson, R. C. Fleischer, Mol. Ecol. 7, 1617 (1998). 18. F. H. Drummond, Trans. R. Entomol. Soc. Lond. 102, 369 (1951). 19. K. Tanaka, K. Mizusawa, E. S. Saugstad, Contrib. Am. Entomol. Inst. 16, 1 (1979). 20. J. K. Pritchard, M. Stephens, P. Donnelly, Genetics 155, 945 (2000). 21. A. R. Barr, Am. J. Trop. Med. Hyg. 6, 153 (1957). 22. A. J. Cornel et al., J. Med. Entomol. 40, 36 (2003). 23. S. Urbanelli, F. Silvestrini, W. K. Reisen, E. De Vito, L. Bullini, J. Med. Entomol. 34, 116 (1997). 24. L. L. Cavalli-Sforza, F. Cavalli-Sforza, The Great Human Diasporas: The History of Diversity and Evolution (Addison-Wesley, Reading, MA, 1995). 25. J. de Zulueta, Parassitologia 36, 7 (1994). 26. S. Urbanelli et al., in Ecologia, Atti I Congr. Naz. versity of Pennsylvania, for technical assistance; and A. Bhandoola and four anonymous reviewers for comments and valuable suggestions on an ear- lier version of this manuscript. Supported by a National Research Council Associateship through the Walter Reed Army Institute of Research (D.M.F.), by NIH grant nos. U50/CCU220532 and 1R01GM063258, and by NSF grant no. DEB-0083944. This material reflects the views of the authors and should not be construed to repre- sent those of the Department of the Army or the Department of Defense. Supporting Online Material www.sciencemag.org/cgi/content/full/303/5663/1535/ DC1 Materials and Methods Tables S1 to S8 References and Notes 2 December 2003; accepted 16 January 2004 Superfamilies of Evolved and Designed Networks Ron Milo, Shalev Itzkovitz, Nadav Kashtan, Reuven Levitt, Shai Shen-Orr, Inbal Ayzenshtat, Michal Sheffer, Uri Alon* Complex biological, technological, and sociological networks can be of very different sizes and connectivities, making it difficult to compare their struc- tures. Here we present an approach to systematically study similarity in the local structure of networks, based on the significance profile (SP) of small subgraphs in the network compared to randomized networks. We find several superfamilies of previously unrelated networks with very similar SPs. One superfamily, including transcription networks of microorganisms, rep- resents “rate-limited” information-processing networks strongly con- strained by the response time of their components. A distinct superfamily includes protein signaling, developmental genetic networks, and neuronal wiring. Additional superfamilies include power grids, protein-structure net- works and geometric networks, World Wide Web links and social networks, and word-adjacency networks from different languages. Many networks in nature share global prop- erties (1, 2). Their degree sequences (the number of edges per node) often follow a long-tailed distribution, in which some nodes are much more connected than the average (3). In addition, natural networks often show the small-world property of short paths be- tween nodes and highly clustered connections (1, 2, 4). Despite these global similarities, networks from different fields can have very different local structure (5). It was recently found that networks display certain patterns, termed “network motifs,” at much higher fre- quency than expected in randomized net- works (6, 7). In biological networks, these motifs were suggested to be recurring circuit elements that carry out key information- processing tasks (6, 8–10). Departments of Molecular Cell Biology, Physics of Complex Systems, and Computer Science, Weizmann Institute of Science, Rehovot 76100, Israel. *To whom correspondence should be addressed at Department of Molecular Cell Biology, Weizmann In- stitute of Science, Rehovot 76100, Israel. E-mail: urialon@weizmann.ac.il CH 2004 VOL 303 SCIENCE www.sciencemag.org ors that readily transmit the vi- and between avian hosts and ld have created the current ep- itions. nt study suggests that changes in pacity and the creation of new tors may occur with new intro- particular, the arrival of hybrid rms in northern Europe has the adically change the dynamics of rope. s and Notes adova, Culex pipiens pipiens Mosquitoes: Distribution, Ecology, Physiology, Genet- Importance, and Control (Pensoft, Mos- n, Ann. N.Y. Acad. Sci. 951, 220 (2001). M. L. O’Guinn, D. J. Dohm, J. W. Jones, omol. 38, 130 (2001). ard et al., Emerg. Infect. Dis. 7, 679 ekera et al., Emerg. Infect. Dis. 7, 722 m, M. R. Sardelis, M. J. Turell, J. Med. 9, 640 (2002). et al., Emerg. Infect. Dis. 7, 742 (2001). local structure of networks, based on the significance profile (SP) of small subgraphs in the network compared to randomized networks. We find several superfamilies of previously unrelated networks with very similar SPs. One superfamily, including transcription networks of microorganisms, rep- resents “rate-limited” information-processing networks strongly con- strained by the response time of their components. A distinct superfamily includes protein signaling, developmental genetic networks, and neuronal wiring. Additional superfamilies include power grids, protein-structure net- works and geometric networks, World Wide Web links and social networks, and word-adjacency networks from different languages. Many networks in nature share global prop- erties (1, 2). Their degree sequences (the number of edges per node) often follow a long-tailed distribution, in which some nodes are much more connected than the average (3). In addition, natural networks often show the small-world property of short paths be- tween nodes and highly clustered connections (1, 2, 4). Despite these global similarities, networks from different fields can have very different local structure (5). It was recently found that networks display certain patterns, termed “network motifs,” at much higher fre- quency than expected in randomized net- works (6, 7). In biological networks, these motifs were suggested to be recurring circuit elements that carry out key information- processing tasks (6, 8–10). Departments of Molecular Cell Biology, Physics of Complex Systems, and Computer Science, Weizmann Institute of Science, Rehovot 76100, Israel. *To whom correspondence should be addressed at Department of Molecular Cell Biology, Weizmann In- stitute of Science, Rehovot 76100, Israel. E-mail: urialon@weizmann.ac.il 5 MARCH 2004 VOL 303 SCIENCE www.sciencemag.org To understand the design principles of com- plex networks, it is important to compare the local structure of networks from different fields. The main difficulty is that these networks can be of vastly different sizes [for example, World Wide Web (WWW) hyperlink networks with millions of nodes and social networks with tens of nodes] and degree sequences. Here, we present an ap- proach for comparing network local structure, based on the significance profile (SP). To calcu- late the SP of a network, the network is compared to an ensemble of randomized networks with the same degree sequence. The comparison to ran- domized networks compensates for effects due to network size and degree sequence. For each sub- graph i, the statistical significance is described by the Z score (11): Zi ϭ ͑Nreali Ϫ <Nrandi>)/std(Nrandi) where Nreali is the number of times the sub- graph appears in the network, and ϽNrandiϾ and std(Nrandi) are the mean and standard deviation of its appearances in the random- ized network ensemble. The SP is the vector of Z scores normalized to length 1: SPiϭZi/(⌺Zi 2 )1/2 The normalization emphasizes the relative significance of subgraphs, rather than the ab- solute significance. This is important for comparison of networks of different sizes, because motifs (subgraphs that occur much more often than expected at random) in large networks tend to display higher Z scores than motifs in small networks (7). We present in Fig. 1 the SP of the 13 possible directed connected triads (triad sig- nificance profile, TSP) for networks from different fields (12). The TSP of these net- works is almost always insensitive to removal of 30% of the edges or to addition of 50% new edges at random, demonstrating that it is robust to missing data or random data errors (SOM Text). Several superfamilies of net- works with similar TSPs emerge from this analysis. One superfamily includes sensory transcription networks that control gene ex- pression in bacteria and yeast in response to external stimuli. In these transcription net- works, the nodes represent genes or operons and the edges represent direct transcriptional regulation (6, 13–15). Networks from three microorganisms, the bacteria Escherichia coli (6) and Bacillus subtilis (14) and the yeast Saccharomyces cerevisiae (7, 15), were analyzed. The networks have very similar TSPs (correlation coefficient c Ͼ 0.99). They show one strong motif, triad 7, termed “feed- forward loop.” The feedforward loop has been theoretically and experimentally shown Fig. 1. The triad significance profile (TSP) of networks from various disciplines. The TSP shows the normalized significance level (Z score) for each of the 13 triads. Networks with similar characteristic profiles are URCHIN N ϭ 45, E ϭ 83), and synaptic connections between neurons in C. elegans (NEURONS N ϭ 280, E ϭ 2170). (iii) WWW hyperlinks between Web pages in the www.nd.edu site (3) (WWW-1 N ϭ 325729,
  6. 6. The human disease network Kwang-Il Goh*†‡§ , Michael E. Cusick†‡¶ , David Valleʈ , Barton Childsʈ , Marc Vidal†‡¶ **, and Albert-La´szlo´ Baraba´si*†‡ ** *Center for Complex Network Research and Department of Physics, University of Notre Dame, Notre Dame, IN 46556; †Center for Cancer Systems Biology (CCSB) and ¶Department of Cancer Biology, Dana–Farber Cancer Institute, 44 Binney Street, Boston, MA 02115; ‡Department of Genetics, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115; §Department of Physics, Korea University, Seoul 136-713, Korea; and ʈDepartment of Pediatrics and the McKusick–Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205 Edited by H. Eugene Stanley, Boston University, Boston, MA, and approved April 3, 2007 (received for review February 14, 2007) A network of disorders and disease genes linked by known disorder– gene associations offers a platform to explore in a single graph- theoretic framework all known phenotype and disease gene associ- ations, indicating the common genetic origin of many diseases. Genes associated with similar disorders show both higher likelihood of physical interactions between their products and higher expression profiling similarity for their transcripts, supporting the existence of distinct disease-specific functional modules. We find that essential human genes are likely to encode hub proteins and are expressed widely in most tissues. This suggests that disease genes also would play a central role in the human interactome. In contrast, we find that the vast majority of disease genes are nonessential and show no tendency to encode hub proteins, and their expression pattern indi- cates that they are localized in the functional periphery of the network. A selection-based model explains the observed difference between essential and disease genes and also suggests that diseases caused by somatic mutations should not be peripheral, a prediction we confirm for cancer genes. biological networks ͉ complex networks ͉ human genetics ͉ systems biology ͉ diseasome Decades-long efforts to map human disease loci, at first genet- ically and later physically (1), followed by recent positional cloning of many disease genes (2) and genome-wide association studies (3), have generated an impressive list of disorder–gene association pairs (4, 5). In addition, recent efforts to map the protein–protein interactions in humans (6, 7), together with efforts to curate an extensive map of human metabolism (8) and regulatory networks offer increasingly detailed maps of the relationships between different disease genes. Most of the successful studies building on these new approaches have focused, however, on a single disease, using network-based tools to gain a better under- standing of the relationship between the genes implicated in a selected disorder (9). Here we take a conceptually different approach, exploring whether human genetic disorders and the corresponding disease genes might be related to each other at a higher level of cellular and organismal organization. Support for the validity of this approach is provided by examples of genetic disorders that arise from mutations in more than a single gene (locus heterogeneity). For example, Zellweger syndrome is caused by mutations in any of at least 11 genes, all associated with peroxisome biogenesis (10). Similarly, there are many examples of different mutations in the same gene (allelic heterogeneity) giving rise to phenotypes cur- rently classified as different disorders. For example, mutations in TP53 have been linked to 11 clinically distinguishable cancer- related disorders (11). Given the highly interlinked internal orga- nization of the cell (12–17), it should be possible to improve the single gene–single disorder approach by developing a conceptual framework to link systematically all genetic disorders (the human ‘‘disease phenome’’) with the complete list of disease genes (the ‘‘disease genome’’), resulting in a global view of the ‘‘diseasome,’’ the combined set of all known disorder/disease gene associations. Results Construction of the Diseasome. We constructed a bipartite graph consisting of two disjoint sets of nodes. One set corresponds to all known genetic disorders, whereas the other set corresponds to all known disease genes in the human genome (Fig. 1). A disorder and a gene are then connected by a link if mutations in that gene are implicated in that disorder. The list of disorders, disease genes, and associations between them was obtained from the Online Mende- lian Inheritance in Man (OMIM; ref. 18), a compendium of human disease genes and phenotypes. As of December 2005, this list contained 1,284 disorders and 1,777 disease genes. OMIM initially focused on monogenic disorders but in recent years has expanded to include complex traits and the associated genetic mutations that confer susceptibility to these common disorders (18). Although this history introduces some biases, and the disease gene record is far from complete, OMIM represents the most complete and up-to- date repository of all known disease genes and the disorders they confer. We manually classified each disorder into one of 22 disorder classes based on the physiological system affected [see supporting information (SI) Text, SI Fig. 5, and SI Table 1 for details]. Starting from the diseasome bipartite graph we generated two biologically relevant network projections (Fig. 1). In the ‘‘human disease network’’ (HDN) nodes represent disorders, and two disorders are connected to each other if they share at least one gene in which mutations are associated with both disorders (Figs. 1 and 2a). In the ‘‘disease gene network’’ (DGN) nodes represent disease genes, and two genes are connected if they are associated with the same disorder (Figs. 1 and 2b). Next, we discuss the potential of these networks to help us understand and represent in a single framework all known disease gene and phenotype associations. Properties of the HDN. If each human disorder tends to have a distinct and unique genetic origin, then the HDN would be dis- connected into many single nodes corresponding to specific disor- ders or grouped into small clusters of a few closely related disorders. In contrast, the obtained HDN displays many connections between both individual disorders and disorder classes (Fig. 2a). Of 1,284 disorders, 867 have at least one link to other disorders, and 516 disorders form a giant component, suggesting that the genetic origins of most diseases, to some extent, are shared with other diseases. The number of genes associated with a disorder, s, has a broad distribution (see SI Fig. 6a), indicating that most disorders relate to a few disease genes, whereas a handful of phenotypes, such as deafness (s ϭ 41), leukemia (s ϭ 37), and colon cancer (s ϭ 34), relate to dozens of genes (Fig. 2a). The degree (k) distribution of HDN (SI Fig. 6b) indicates that most disorders are linked to only Author contributions: D.V., B.C., M.V., and A.-L.B. designed research; K.-I.G. and M.E.C. performed research; K.-I.G. and M.E.C. analyzed data; and K.-I.G., M.E.C., D.V., M.V., and A.-L.B. wrote the paper. The authors declare no conflict of interest. This article is a PNAS Direct Submission. Abbreviations: DGN, disease gene network; HDN, human disease network; GO, Gene Ontology; OMIM, Online Mendelian Inheritance in Man; PCC, Pearson correlation coeffi- cient. **To whom correspondence may be addressed. E-mail: alb@nd.edu or marc࿝vidal@ dfci.harvard.edu. This article contains supporting information online at www.pnas.org/cgi/content/full/ 0701361104/DC1. © 2007 by The National Academy of Sciences of the USA www.pnas.org͞cgi͞doi͞10.1073͞pnas.0701361104 PNAS ͉ May 22, 2007 ͉ vol. 104 ͉ no. 21 ͉ 8685–8690 APPLIEDPHYSICAL SCIENCES AR ATM BRCA1 BRCA2 CDH1 GARS HEXB KRAS LMNA MSH2 PIK3CA TP53 MAD1L1 RAD54L VAPB CHEK2 BSCL2 ALS2 BRIP1 Androgen insensitivity Breast cancer Perineal hypospadias Prostate cancer Spinal muscular atrophy Ataxia-telangiectasia Lymphoma T-cell lymphoblastic leukemia Ovarian cancer Papillary serous carcinoma Fanconi anemia Pancreatic cancer Wilms tumor Charcot-Marie-Tooth disease Sandhoff disease Lipodystrophy Amyotrophic lateral sclerosis Silver spastic paraplegia syndrome Spastic ataxia/paraplegia AR ATM BRCA1 BRCA2 CDH1 GARS HEXB KRAS LMNA MSH2 PIK3CA TP53 MAD1L1 RAD54L VAPB CHEK2 BSCL2 ALS2 BRIP1 Androgen insensitivity Breast cancer Perineal hypospadiasProstate cancer Spinal muscular atrophy Ataxia-telangiectasia Lymphoma T-cell lymphoblastic leukemia Ovarian cancer Papillary serous carcinoma Fanconi anemia Pancreatic cancer Wilms tumor Charcot-Marie-Tooth disease Sandhoff disease Lipodystrophy Amyotrophic lateral sclerosis Silver spastic paraplegia syndrome Spastic ataxia/paraplegia Human Disease Network (HDN) Disease Gene Network (DGN) disease genomedisease phenome DISEASOME Fig. 1. Construction of the diseasome bipartite network. (Center) A small subset of OMIM-based disorder–disease gene associations (18), where circles and rectangles Goh et al., PNAS 2007 GENES AND DISEASES
  7. 7. Asthma Atheroscierosis Blood group Breast cancer Complement_component deficiency Cardiomyopathy Cataract Charcot-Marie-Tooth disease Colon cancer Deafness Diabetes mellitus Epidermolysis bullosa Epilepsy Fanconi anemia Gastric cancer Hypertension Leigh syndrome Leukemia Lymphoma Mental retardation Muscular dystrophy Myocardial infarction Myopathy Obesity Parkinson disease Prostate cancer Retinitis pigmentosa Spherocytosis Spinocereballar ataxia Stroke Thyroid carcinoma Zellweger syndrome APC COL2A1 ACE PAX6 ERBB2 FBN1 FGFR3 FGFR2 GJB2 GNAS KIT KRAS LRP5 MSH2 MEN1 NF1 PTEN SCN4A TP53 ARX a b Human Disease Network Disease Gene Network Disorder Class Bone Cancer Cardiovascular Connective tissue Dermatological Developmental Ear, Nose, Throat Endocrine Gastrointestinal Hematological Immunological Metabolic Muscular Neurological Nutritional Ophthamological Psychiatric Renal Respiratory Skeletal multiple Unclassified Node size 1 5 10 15 21 25 30 34 41 Hirschprung disease Trichothio- dystrophy Alzheimer disease Heinz body anemia Bethlem myopathy Hemolytic anemia Ataxia- telangiectasia Pseudohypo- aldosteronism APPLIEDPHYSICAL SCIENCES
  8. 8. Leading Edge Review Interactome Networks and Human Disease Marc Vidal,1,2,* Michael E. Cusick,1,2 and Albert-La´ szlo´ Baraba´ si1,3,4,* 1Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA 2Department of Genetics, Harvard Medical School, Boston, MA 02115, USA 3Center for Complex Network Research (CCNR) and Departments of Physics, Biology and Computer Science, Northeastern University, Boston, MA 02115, USA 4Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA *Correspondence: marc_vidal@dfci.harvard.edu (M.V.), alb@neu.edu (A.-L.B.) DOI 10.1016/j.cell.2011.02.016 Complex biological systems and cellular networks may underlie most genotype to phenotype relationships. Here, we review basic concepts in network biology, discussing different types of interactome networks and the insights that can come from analyzing them. We elaborate on why interactome networks are important to consider in biology, how they can be mapped and integrated with each other, what global properties are starting to emerge from interactome network models, and how these properties may relate to human disease. Introduction Since the advent of molecular biology, considerable progress has been made in the quest to understand the mechanisms that underlie human disease, particularly for genetically inherited disorders. Genotype-phenotype relationships, as summarized in the Online Mendelian Inheritance in Man (OMIM) database (Am- berger et al., 2009), include mutations in more than 3000 human genes known to be associated with one or more of over 2000 human disorders. This is a truly astounding number of geno- type-phenotype relationships considering that a mere three decades have passed since the initial description of Restriction Fragment Length Polymorphisms (RFLPs) as molecular markers to map genetic loci of interest (Botstein et al., 1980), only two decades since the announcement of the first positional cloning experiments of disease-associated genes using RFLPs (Amberger et al., 2009), and just one decade since the release of the first reference sequences of the human genome (Lander et al., 2001; Venter et al., 2001). For complex traits, the informa- tion gathered by recent genome-wide association studies suggests high-confidence genotype-phenotype associations between close to 1000 genomic loci and one or more of over phenotypic associations, there would still be major problems to fully understand and model human genetic variations and their impact on diseases. To understand why, consider the ‘‘one-gene/one-enzyme/ one-function’’ concept originally framed by Beadle and Tatum (Beadle and Tatum, 1941), which holds that simple, linear connections are expected between the genotype of an organism and its phenotype. But the reality is that most genotype-pheno- type relationships arise from a much higher underlying com- plexity. Combinations of identical genotypes and nearly identical environments do not always give rise to identical phenotypes. The very coining of the words ‘‘genotype’’ and ‘‘phenotype’’ by Johannsen more than a century ago derived from observations that inbred isogenic lines of bean plants grown in well-controlled environments give rise to pods of different size (Johannsen, 1909). Identical twins, although strikingly similar, nevertheless often exhibit many differences (Raser and O’Shea, 2005). Like- wise, genotypically indistinguishable bacterial or yeast cells grown side by side can express different subsets of transcripts and gene products at any given moment (Elowitz et al., 2002; Blake et al., 2003; Taniguchi et al., 2010). Even straightforward Mapping Interactome Networks Network science deals with complexity by ‘‘simplifying’’ com- plex systems, summarizing them merely as components (nodes) and interactions (edges) between them. In this simplified approach, the functional richness of each node is lost. Despite or even perhaps because of such simplifications, useful discov- eries can be made. As regards cellular systems, the nodes are metabolites and macromolecules such as proteins, RNA mole- cules and gene sequences, while the edges are physical, biochemical and functional interactions that can be identified with a plethora of technologies. One challenge of network biology is to provide maps of such interactions using systematic and standardized approaches and assays that are as unbiased as possible. The resulting ‘‘interactome’’ networks, the networks of interactions between cellular components, can serve as scaf- fold information to extract global or local graph theory proper- et al., 2010). Computational prediction maps are fast and effi- cient to implement, and usually include satisfyingly large numbers of nodes and edges, but are necessarily imperfect because they use indirect information (Plewczynski and Ginalski, 2009). While high-throughput maps attempt to describe unbi- ased, systematic, and well-controlled data, they were initially more difficult to establish, although recent technological advances suggest that near completion can be reached within a few years for highly reliable, comprehensive protein-protein were discovered in and are being applied genome-wide for these model organisms (Mohr et al., 2010). Metabolic Networks Metabolic network maps attempt to comprehensively describe all possible biochemical reactions for a particular cell or organism (Schuster et al., 2000; Edwards et al., 2001). In many representations of metabolic networks, nodes are biochemical metabolites and edges are either the reactions that convert Figure 2. Networks in Cellular Systems To date, cellular networks are most available for the ‘‘super-model’’ organisms (Davis, 2004) yeast, worm, fly, and plant. High-throughput interactome mapping relies upon genome-scale resources such as ORFeome resources. Several types of interactome networks discussed are depicted. In a protein interaction network, nodes represent proteins and edges represent physical interactions. In a transcriptional regulatory network, nodes represent transcription factors (circular nodes) or putative DNA regulatory elements (diamond nodes); and edges represent physical binding between the two. In a disease network, nodes represent diseases, and edges represent gene mutations of which are associated with the linked diseases. In a virus-host network, nodes represent viral proteins (square nodes) or host proteins (round nodes), and edges represent physical interactions between the two. In a metabolic network, nodes represent enzymes, and edges represent metabolites that are products or substrates of the enzymes. The network depictions seem dense, but they represent only small portions of available interactome network maps, which themselves constitute only a few percent of the complete interactomes within cells. Cell 2011 DISEASES AS NETWORK PERTURBATIONS
  9. 9. Most cellular components exert their functions through interactions with other cellular components, which can be located either in the same cell or across cells, and even across organs. In humans, the potential complexity of the resulting network — the human interactome — is daunting: with ~25,000 protein-coding genes, ~1,000 metabolites and an undefined number of distinct proteins1 and functional RNA molecules, the number of cellular components that serve as the nodes of the inter- actome easily exceeds 100,000. The number of function- ally relevant interactions between the components of Network-based approaches to human disease have multiple potential biological and clinical applications. A better understanding of the effects of cellular intercon- nectedness on disease progression may lead to the iden- tification of disease genes and disease pathways, which, in turn, may offer better targets for drug development. These advances may also lead to better and more accurate biomarkers to monitor the functional integrity of net- works that are perturbed by diseases as well as to better disease classification. Here we present an overview of the organizing principles that govern cellular networks Network medicine: a network-based approach to human disease Albert-László Barabási*‡§ , Natali Gulbahce*‡|| and Joseph Loscalzo§ Abstract | Given the functional interdependencies between the molecular components in a human cell, a disease is rarely a consequence of an abnormality in a single gene, but reflects the perturbations of the complex intracellular and intercellular network that links tissue and organ systems. The emerging tools of network medicine offer a platform to explore systematically not only the molecular complexity of a particular disease, leading to the identificationofdiseasemodulesandpathways,butalsothemolecularrelationshipsamong apparentlydistinct(patho)phenotypes.Advancesinthisdirectionareessentialforidentifying new disease genes, for uncovering the biological significance of disease-associated mutationsidentifiedbygenome-wideassociationstudiesandfull-genomesequencing,and foridentifyingdrugtargetsandbiomarkersforcomplexdiseases. Predicting disease genes Disease-associated genes have generally been identified using linkage mapping or, more recently, genome-wide association (GWA) studies53 . Both methodologies can suggest large numbers of disease-gene candidates, but identifying the particular gene and the causal muta- tion remains difficult. Recently, a series of increasingly sophisticated network-based tools have been devel- oped to predict potential disease genes; these tools can be loosely grouped into three categories, as discussed below (FIG. 4). Linkage methods. These methods assume that the direct interaction partners of a disease protein are likely to be associated with the same disease phenotype45,54–56 . Indeed, for one disease locus, the set of genes within the locus whose products interacted with a known nes in the interactome. a | Of the approximately iated with specific diseases. The figure shows the ssociated genes that were known42 in 2007 and ntial, that is, their absence is associated with agram of the differences between essential and ential disease genes (shown as blue nodes) are eriphery, whereas in utero essential genes (shown REVIEWS Figure 2 | Disease modules. Schematic diagram of the three modularity concepts that are discussed in this Review. a | Topological modules correspond to locally dense neighbourhoods of the interactome, such that the nodes of REVIEWS this network, representing the links of the interactome, is expected to be much larger2 . This inter- and intracellular interconnectivity implies that the impact of a specific genetic abnormality is not restricted to the activity of the gene product that carries it, but can spread along the links of the network and alter the activity of gene products that otherwise carry no defects. Therefore, an understanding of a gene’s net- work context is essential in determining the phenotypic impact of defects that affect it3,4 . Following on from this principle, a key hypothesis underlying this Review is that a disease phenotype is rarely a consequence of an abnormality in a single effector gene product, but reflects various pathobiological processes that inter- act in a complex network. A corollary of this widely held hypothesis is that the interdependencies among a cell’s molecular components lead to deep functional, molecular and causal relationships among apparently distinct phenotypes. and the implications of these principles for understand- ing disease. These principles and the tools and method- ologies that are derived from them are facilitating the emergence of a body of knowledge that is increasingly referred to as network medicine5–7 . The human interactome Although much of our understanding of cellular net- works is derived from model organisms, the past dec- ade has seen an exceptional growth in human-specific molecular interaction data8 . Most attention has been directed towards molecular networks, including protein interaction networks, whose nodes are proteins that are linked to each other by physical (binding) interactions9,10 ; metabolic networks, whose nodes are metabolites that are linked if they participate in the same biochemi- cal reactions11–13 ; regulatory networks, whose directed links represent either regulatory relationships between a transcription factor and a gene14 , or post-translational 111 Dana Research Center, Boston, Massachusetts 02115, USA. ‡ Center for Cancer Systems Biology, Dana-Farber Cancer Institute, 44 Binney Street, Boston, Massachusetts 02115, USA. § Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, 75 Francis Street, Boston, Massachusetts 02115, USA. || Department of Cellular and Molecular Pharmacology, University of California, 1700 4th Street, Byers Hall 309, Box 2530, San Francisco, California 94158, USA. Correspondence to A.-L.B. e-mail: alb@neu.edu doi:10.1038/nrg2918 56 | JANUARY 2011 | VOLUME 12 www.nature.com/reviews/genetics © 2011 Macmillan Publishers Limited. All rights reserved between the components of g the links of the interactome, ger2 . ular interconnectivity implies ic genetic abnormality is not the gene product that carries he links of the network and roducts that otherwise carry understanding of a gene’s net- n determining the phenotypic ct it3,4 . Following on from this is underlying this Review is e is rarely a consequence of e effector gene product, but logical processes that inter- k. A corollary of this widely e interdependencies among ents lead to deep functional, tionships among apparently the organizing principles that govern cellular networks and the implications of these principles for understand- ing disease. These principles and the tools and method- ologies that are derived from them are facilitating the emergence of a body of knowledge that is increasingly referred to as network medicine5–7 . The human interactome Although much of our understanding of cellular net- works is derived from model organisms, the past dec- ade has seen an exceptional growth in human-specific molecular interaction data8 . Most attention has been directed towards molecular networks, including protein interaction networks, whose nodes are proteins that are linked to each other by physical (binding) interactions9,10 ; metabolic networks, whose nodes are metabolites that are linked if they participate in the same biochemi- cal reactions11–13 ; regulatory networks, whose directed links represent either regulatory relationships between a transcription factor and a gene14 , or post-translational www.nature.com/reviews/genetics Macmillan Publishers Limited. All rights reserved NETWORK MEDICINE
  10. 10. RESEARCH ARTICLE SUMMARY ◥ DISEASE NETWORKS Uncovering disease-disease relationships through the incomplete interactome Jörg Menche, Amitabh Sharma, Maksim Kitsak, Susan Dina Ghiassian, Marc Vidal, Joseph Loscalzo, Albert-László Barabási* INTRODUCTION: A disease is rarely a straight- forward consequence of an abnormality in a single gene, but rather reflects the interplay of multiple molecular processes. The rela- tionships among these processes are encoded in the interactome, a network that integrates all physical interactions within a cell, from protein-protein to regulatory protein–DNA and metabolic interactions. The documented propensity of disease-associated proteins to interact with each other suggests that they tend to cluster in the same neighborhood of the interactome, forming a disease module, a connected subgraph that contains all molecu- lar determinants of a disease. The accurate identification of the corresponding disease module represents the first step toward a sys- tematic understanding of the molecular mech- anisms underlying a complex disease. Here, we present a network-based framework to iden- tify the location of disease modules within the interactome and use the overlap between the modules to predict disease-disease relationships. RATIONALE: Despite impressive advances in high-throughput interactome mapping and disease gene identification, both the interac- tome and our knowledge of disease-associated genes remain incomplete. This incomplete- ness prompts us to ask to what extent the current data are sufficient to map out the disease modules, the first step toward an in- tegrated approach toward human disease. To make progress, we must formulate math- ematically the impact of network inc ness on the identifiability of disease quantifying the predictive power and itations of the current interactome. RESULTS: Using the tools of network we show that we can only uncover modules for diseases whose number ciated genes excee ical threshold det bythenetworkinc ness. We find tha proteins associa 226 diseases are inthesame netwo borhood, displaying a statistically sig tendency to form identifiable disease m The higher the degree of agglomerati disease proteins within the interact higher the biological and functional ity of the corresponding genes. The ings indicate that many local neighb of the interactome represent the ob part of the true, larger and denser modules. If two disease modules overlap, lo turbations causing one disease can pathways of the other disease module resulting in shared clinical and path ical characteristics. To test this hyp we measure the network-based sepa each disease pair, observing a direct between the pathobiological simi diseases and their relative distanc RES ON OUR WEB SITE ◥ Read the full article at http://dx.doi. org/10.1126/ science.1257601 .................................................. Menche et al., Science 2015 DISEASES AS NETWORK NEIGHBORHOODS
  11. 11. The Interactome as a Map
  12. 12. Diseases As Local Neighborhoods
  13. 13. Asthma Parkinson’s Leukemia MS Hypertension Rheumatoidarthritis Crohn’s disease Type 2 diabetes Glioblastoma Ulcerative colitis Heart failure Network Clustering Means Explainable Biology AIF1 ZBTB12 NFKBIZ MERTK HHEX CFB Diseases As Local Neighborhoods
  14. 14. Interactome and disease genes GWAS Multiple sclerosis genes OMIM Signalling Complexes Kinase - Substrate Metabolic Literature Regulatory Yeast two-hybrid GWAS & OMIM Other disease genes Molecular interactions Gene with multiple disease associations OMIM Immunologic deficiency syndromes Hematologic diseases Blood protein disorders GWAS Connective tissue diseases Autoimmune diseases Joint diseases Musculoskeletal diseases Rheumatoid arthritis Signaling, Complexes, Literature, Regulatory Interaction with multiple lines of evidences AKT1 HLA-B HLA-C STAT3 TAP2 NFKBIZ IL2RA TNFRSF1A EHMT2 PTK2 IL7R MAPK1 Observable module for Multiple sclerosis • %The interactome contains 141,296 physical interactions between 13,460 proteins • We study 299 diseases with at least 20 gene associations Menche et al., Science 2015
  15. 15. Measures of network localization multiple sclerosis proteins shortest distance connected component of size S=11 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 2 4 6 8 10 12 14 16 18 20 frequency size of largest component data random 11 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 1 2 3 4 5 6 7 8 9 frequency shortest distance d data random AIF1 TRAF6 VCAM1 IRF8 ITGB7 ADRA1B CD6 HLA-DQA1 CD5 HLA-DQB1 SLC30A7 TRIM27 NEDD4 C2 MLANA CBLB CBL MALT1 PTPRC CD40 PTPRK CD4 BAG6 CD28 PLEK TGFBR1 HLA-DQA2 CD58 CDSN UBQLN4 TNFSF14 IL12A IL12B TNFRSF14 GRB2 CRK MLH1 IL20RA ZNF512B DKKL1 SMYD2 FLNC AHI1 ZFP36L1 UBE2I TNFRSF1AAKT1 RAP1GAP IL2RA PTK2 EHMT2 HERPUD1 MERTK DDX39B DHX16HAAO ARHGDIA CD86 LCP1 YAP1 METTL1 CD24 DENND3 PSMA4 FGR STAT3 POU5F1 MAPK1 YWHAH BATF AR PRRC2A KIF1B JUN HLA-B MICAHLA-C MBP KLRC4 MICB ZBTB12TAP2 EXOC6 PDZK1 IL7R MYOD1 ARRB1 ALB NEDD9 NFKBIZ TNXB BACH2 BANP RDBP HLA-DRB1 NOTCH4 HLA-DOB HLA-DRB5 ZFP36L2 HLA-DRAFBXW7 4276 HLA-DMB HHEX PFDN1 SIRT2 SLC15A2 SP140 CFB EOMES d=2 d=3 d=3 • We use two measures to quantify the interactome-based localization of a disease • 226 out of 299 diseases are significantly localized according to both measures
  16. 16. Relations between Diseases 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 DistributionP(d) Shortest distance d PD MS Pairwise Separated Modules 0.1 0.2 0.3 0.4 0.5 0.6 0 1 2 3 4 DistributionP(d) Shortest distance d MS Pairwise Overlapping Modules RA s = 1.3ABs = - 0.2AB ABs (d , d or d )AA BB AB (d , d or d )AA BB AB s < 0AB s ≥ 0AB FKBP7 PEX12 PEX3 SLC2A4 PEX19 UBQLN4 WRAP73 CCDC14 PEX10 ZNF512B CAT GPX3 ACAD ACOX1 AHI1HADHA TM6SF1 PEX16 HADHB PEX11B ABCD1 SMYD2 SLC27A2 HERPUD1 EWSR1 CD58 MED8 PEX1 PRR13 PEX5 PEX14 PEX2 IDH1 ZNF772 PEX13 PEX6 BANP PEX26 TNXB MVK SLC30A7 ZSCAN1 LK FGR IL7R FYN 4 CD5 BATF JUN NFKBIL1 YOD1 DDX39B 6KA1 PTPN11 SIRT2 DHX16 NFKBIZ STAT3 VDR NCOR1 KIF1B RDBP RBPJ HLA-DMB TNPO3 HLA-DRA SRSF1 HLA-DOB ITGB7 PTPRC CBLB VCAM1 CDKN2A BAG6TRAF2 PSMA4CD40 GMCL1 FAM107A SUMO1 PFKL TRAF6 MIF C2 RHOA TNFAIP3 TNFSF14 ATF7IP USP53 HLA-B EHMT2 TRAF1 OLIG3 RPL14 PHYH CCL21 CAPRIN2 KLF6 CDSN PEX7AGXT IL2RA MAPK1 YWHAG PTK2 TNFRSF1A HDAC2 STAT4 HLA-DRB1 TRA@ FCRL3 SMAD3 HLA-DRB5 TAP2 HLA-C MALT1 POU5F1 ARHGDIA HAAO FAM167A ARRB1 HSPA5 REL BACH2 SMARCC2 ALB UBE2I RSBN1 C5orf30 EXOC6 GRB2 APOM DDO MKRN3 RAB35 SLX4 PHF19 GNPATPRRC2A PTPN22 HLA-DQA2 PTPRK GHR HHEX RTF1 CFBAGPS ADRA1B IL23A ACTA1 SLC22A4 S100A6 SLC15A2 F8PDZK1 MLYCDMICA SSTR5 FKBP4PFDN1 RNF167 OTUD5 FLNC MLH1 PADI4 MERTK HRAS AFF3 IL20RA HLA-DQA1 PPARG Multiple sclerosis (MS) Peroxisomal disorders (PD) Rheumatoid arthritis (RA) • We introduce a network-based measure to quantify the overlap/separation of two diseases • Most disease pairs are well separated on the Interactome Menche et al., Science 2015
  17. 17. Network Distance vs. Biomedical Similarity 22 RRksirevitaler Separation smean ytiralimismotpmyS Separation smean ytiralimismretOG Separation smean 10-3 10-2 10-1 100 -3 -2 -1 0 1 2 ytiralimismretOG Separation Expectation smean 10-1 100 -3 -2 0 1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 -3 -2 -1 0 1 2 noisserpxe-oC Separation smean 10-3 10-2 10-1 100 -3 -2 -1 0 1 2 ytiralimismretOG Separation smean 10-3 10-2 10-1 -3 -2 -1 0 1 2 10-1 100 101 102 103 -3 -2 -1-1 0 1 biological process molecular function co-expression symptoms comorbidity s AB s AB s AB s AB s ABs AB cellular component • Diseases that are close in the Interactome have similar biomedical properties
  18. 18. The Disease Space Type 1 diabetes Rheumatoid arthritis Sutoimmune diseases of the nervous system Demyelinating autoimmune diseases Immune System Diseases 1 5 6 7 8 12 13 14 11 10 9 Retinitis pigmentosa Retinal degeneration Graves disease Macular degeneration Eye Diseases 1 2 3 4 9 10 Asthma Respiratory hypersensitivity Respiratory Tract Diseases 13 14 11 12 Cerebrovascular disorders Myocardial infarction Coronary artery disease Myocardial ischemia Cardiovascular Diseases 5 6 7 8 2 3 4 • Diseases and their network-based relationships can be represented in a 3D Diseases-Space • Diseases belonging to the same class agglomerate Menche et al., Science 2015
  19. 19. Overlapping diseases • Examples of unexpected disease relationships uncovered using the disease space IL1RL1 IL18R1 HLA-DRA HLA-DPA1 HLA-DQB1 HLA-DPB1 HLA-DOA HLA-DQA2 IL33 CDK2 SMAD3 NOTCH4 IL2RB PTPN2 RUNX3 ETS1 BACH2 UBE2E3 IL18RAP XCR1 OLIG3 TNFAIP3 CTLA4 EGFR KIAA1109 MYO9B CCR4 SH2B3 PLEK CCR1 PTPRK ARHGAP31 RGS1 ZMIZ1 SLC9A4 IL12A RMI2 SYF2 LPP IL21 PRM1 ATXN2 GLB1 HLA-DQA1 IL2 ITGA4 ICOS ICOSLG IKZF4 DPP10 ELF3 ORMDL3 ADAM33 RANBP6 TSLP CRB1 PLA2G7 USP38 IL6RSLC25A46 SLC30A8 TBX21 MUC7 CHIT1 PBX2 PDE4D C11orf30 BRD2 SUOX Celiac disease Celiac disease asthma asthma celiac diseaseasthma atherosclerosis coronary artery disease biliary tract diseases hepatic cirrhosis Intestinal immune network for IGA production Intestinal immune network
  20. 20. O R I G I N A L A R T I C L E A disease module in the interactome explains disease heterogeneity, drug response and captures novel pathways and genes in asthma Amitabh Sharma1,2,3,†, Jörg Menche1,2,4,8,†, C. Chris Huang5, Tatiana Ort5, Xiaobo Zhou3, Maksim Kitsak1,2, Nidhi Sahni2, Derek Thibault3, Linh Voung3, Feng Guo3, Susan Dina Ghiassian1,2, Natali Gulbahce6, Frédéric Baribaud5, Joel Tocker5, Radu Dobrin5, Elliot Barnathan5, Hao Liu5, Reynold A. Panettieri Jr7, Kelan G. Tantisira3, Weiliang Qiu3, Benjamin A. Raby3, Edwin K. Silverman3, Marc Vidal2,9, Scott T. Weiss3 and Albert-László Barabási1,2,3,4,8,* 1 Center for Complex Networks Research, Department of Physics, Northeastern University, Boston, MA 02115, USA, 2 Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA, 3 Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA, 4 Department of Theoretical Physics, Budapest University of Technology and Economics, H1111, Budapest, Hungary, 5 Janssen Research & Development, Inc., 1400 McKean Road, Spring House, PA 19477, USA, 6 Department of Cellular and Molecular Pharmacology, University of California 1700, 4th Street, Byers Hall 308D, San Francisco, CA 94158, USA, 7 Pulmonary Allergy and Critical Care Division, Department of Medicine, University of Pennsylvania, 125 South 31st Street, TRL Suite 1200, Philadelphia, PA 19104, USA, 8 Center for Network Science, Central European University, Nador u. 9, 1051 Budapest, Hungary and 9 Department of Genetics, Harvard Medical School, Boston, MA 02115, USA *To whom correspondence should be addressed at: Center for Complex Networks Research, Department of Physics, Northeastern University, Boston, MA 02115, USA. Email: barabasi@gmail.com Abstract Recent advances in genetics have spurred rapid progress towards the systematic identification of genes involved in complex diseases. Still, the detailed understanding of the molecular and physiological mechanisms through which these genes affect disease phenotypes remains a major challenge. Here, we identify the asthma disease module, i.e. the local neighborhood of the interactome whose perturbation is associated with asthma, and validate it for functional and pathophysiological relevance, using both computational and experimental approaches. We find that the asthma disease module is enriched with modest GWAS P-values against the background of random variation, and with differentially expressed genes from normal and asthmatic fibroblast cells treated with an asthma-specific drug. The asthma module also contains immune response mechanisms that are shared with other immune-related disease modules. Further, using diverse omics (genomics, † These authors contributed equally to this work. Received: September 1, 2014. Revised: November 19, 2014. Accepted: January 5, 2015 © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com Human Molecular Genetics, 2015, Vol. 24, No. 11 3005–3020 doi: 10.1093/hmg/ddv001 Advance Access Publication Date: 12 January 2015 Original Article 3005 atNortheasternUniversityLibrariesonDecember8,2015http://hmg.oxfordjournals.org/Downloadedfrom RESEARCH ARTICLE A DIseAse MOdule Detection (DIAMOnD) Algorithm Derived from a Systematic Analysis of Connectivity Patterns of Disease Proteins in the Human Interactome Susan Dina Ghiassian1,2☯ , Jörg Menche1,2,3☯ , Albert-László Barabási1,2,3,4 * 1 Center for Complex Networks Research and Department of Physics, Northeastern University, Boston, Massachusetts, United States of America, 2 Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America, 3 Center for Network Science, Central European University, Budapest, Hungary, 4 Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America ☯ These authors contributed equally to this work. * barabasi@gmail.com Abstract The observation that disease associated proteins often interact with each other has fueled the development of network-based approaches to elucidate the molecular mechanisms of human disease. Such approaches build on the assumption that protein interaction networks can be viewed as maps in which diseases can be identified with localized perturbation with- in a certain neighborhood. The identification of these neighborhoods, or disease modules, is therefore a prerequisite of a detailed investigation of a particular pathophenotype. While numerous heuristic methods exist that successfully pinpoint disease associated modules, OPEN ACCESS Citation: Ghiassian SD, Menche J, Barabási A-L (2015) A DIseAse MOdule Detection (DIAMOnD) Algorithm Derived from a Systematic Analysis of Connectivity Patterns of Disease Proteins in the Human Interactome. PLoS Comput Biol 11(4): e1004120. doi:10.1371/journal.pcbi.1004120 Editor: Andrey Rzhetsky, University of Chicago, UNITED STATES Received: August 25, 2014 Accepted: January 9, 2015 Sharma et al., HMG 2015 Ghiassian et al., PLoS Comp Biol 2015 BUILDING DISEASE MODULES
  21. 21. Disease Module Detection and Analysis The general workflow of a detailed analysis for a disease of interest: I Interactome construction II Disease Module Identification III Validation IV Biological interpretation - Gene expression data - Gene Ontologies - Pathways - Comorbidity - OMIM, GWAS, literature - DIAMOnD: Disease Module Detection Algorithm - Pathway prioritization - Molecular mechanism Seed gene selection - Binary interactions, metabolic couplings, regulatory interactions ...
  22. 22. DIAM DISEASE MODULES VS COMMUNITIES
  23. 23. original seed genes gene selected at iteration i DIAMOnD genes legend: iteration 3 iteration 2iteration 1initial seeds 0.18 0.46 0.46 0.07 0.53 0.46 0.21 0.46 0.29 p-value: A B genes connected to a seed gene proto-module DIAMOnD genes Disease module:C DIAMOnD –Disease Module Detection Algorithm  purely%topological%method%  %all%genes%in%the%network%are% prioriSzed%according%to%their% potenSal%relevance%for%the%disease%
  24. 24. DIAMOnD and Disease Modules within the Human BUILDING A DISEASE MODULE
  25. 25. ARTICLE Received 7 May 2015 | Accepted 29 Nov 2015 | Published 1 Feb 2016 Network-based in silico drug efficacy screening Emre Guney1,2, Jo¨rg Menche1,3, Marc Vidal2,4 & Albert-La´szlo´ Bara´basi1,2,3,5 The increasing cost of drug development together with a significant drop in the number of new drug approvals raises the need for innovative approaches for target identification and efficacy prediction. Here, we take advantage of our increasing understanding of the network-based origins of diseases to introduce a drug-disease proximity measure that quantifies the interplay between drugs targets and diseases. By correcting for the known biases of the interactome, proximity helps us uncover the therapeutic effect of drugs, as well as to distinguish palliative from effective treatments. Our analysis of 238 drugs used in 78 diseases indicates that the therapeutic effect of drugs is localized in a small network neighborhood of the disease genes and highlights efficacy issues for drugs used in Parkinson and several inflammatory disorders. Finally, network-based proximity allows us to predict novel drug-disease associations that offer unprecedented opportunities for drug repurposing and the detection of adverse effects. DOI: 10.1038/ncomms10331 OPEN Guney et al., Nature Comm 2015 DRUGS
  26. 26. DRUGS ABCC8 VEGFA RUNX1 INS KAT6A TOP2A IRS1 TOP2B CAPN10 NPM1 A Disease gene Drug target Shortest path to the closest disease gene d R R R RR z =s2 s3 1 t2 Random gene sets with the same degrees ... T1 S1d1` Tn Sndn s1 t1 s2 s3 t2 2+3 2 d= Drug - disease proximity Gliclazide Daunorubicin Type 2 diabetes Acute myeloid leukaemia dc = 2.5 zc = 1.3 dc = 1.0 zc = –1.6 zc = 1.0zc = –3.3 dc = 2.0 dc = 1.0 b c Disease genes Acute myeloid leukaemiaType 2 diabetes Drug targetsDrug targets Gliclazide Daunorubicin Figure 1 | Network-based drug-disease proximity. (a) Illustration of the closest distance (dc) of a drug T with targets t1 and t2 to the proteins s1, s2 and s3
  27. 27. PROXIMITY TO DISEASE MODULES a b c Seperation (dss) dc dk dcc dss Disease module Drugds Center (dcc)Kernel (dk)Shortest (ds)Closest (dc) AUC(%) R2 = 0.175 th(dc) R2 = 0.003 80 70 60 50 40 30 4 3 E COMMUNICATIONS | DOI: 10.1038/ncomms10331 A
  28. 28. GOING FURTHER Full text: http://barabasi.com/networksciencebook/

×