National Resource for Network Biology Annual Report May 2011NRNB
Annual Progress Report - Research Highlights 2011 National Resource for Network Biology P41 RR031228-01Contents: ● NRNB Study Published in PNAS: Correlated Genotypes in Friendship Networks ● NRNB Collaboration Producing Results: Synthetic Genetic Analysis of Budding Yeast ● NRNB Collaboration Connects Networks and Disease: Genetic Networks Underlying DNA Damage ● Cytoscape 3.0: Development proceeding at Full Speed ● New NRNB Services, Training and Outreach: Year OneNRNB Study Published in PNAS: Correlated Genotypes in Friendship Networks (Fowler)In their book CONNECTED, Nicholas Christakis and NRNB investigator, James Fowler,argued that "social networks are in our nature." Then last year they published a paper showingthat genes influence our social network position -- how central we are, and how likely it isthat our friends know one another. In the NRNB study published in PNAS this year , weexamine another important social network process called "homophily" -- its a word that literallymeans "love of like" and it refers to the idea that we tend to make friends with people whoresemble us -- "birds of a feather flock together." Humans are unusual as a species in that we form long-term, non-reproductive unionswith other members of our species. But why do we choose the friends we do? We hypothesizethat we not only choose friends who are socially similar, but who are biologically, actually evengenetically, similar to us. In the NRNB study published in PNAS we find just that -- there are some gene variantsthat we share in common with our friends and other gene variants that differ between friends(opposites attract).The results have a number of important implications:• This is the first study to identify specific genes involved in these social network processes.• This is a first step towards understanding the biology of "chemistry" -- that feeling you haveabout a person that you will like or dislike them. We may choose our friends not just because ofthe social features we consciously notice about them, but because of the biological features weunconsciously notice. Some specific genotypes may be more compatible than others.• What happens to us may depend not only on our own genes but also on the genes of ourfriends. This has been shown already in hens, whose feathers change depending on thegenetic constitution of the hens that are caged near them. But something similar may happen inhumans. We each live in a sea of the genes of others. In fact, we are metagenomic.• There can be feedback effects -- our genes not only influence us, but they bias our choice offriends based their genes, which in turn has an additional effect on us. For example, the DRD2gene variant we study has been associated with alcoholism, and if you have this gene variant,your friends are likely to have it, too. So you are not only more susceptible to alcoholismyourself, but you are likely to be surrounded by friends who are susceptible, too.
• Correlated genotypes means that it makes even more sense for us to treat outcomes likealcohol abuse as social, group-level problems. And anything that spreads in networks --from obesity to happiness to the flu -- may spread more easily in some parts of the humanpopulation. There is a patchwork of localized susceptibility within networks, created by ourgenes and the genes of those around us.References1. James H. Fowler, Jaime E. Settle, Nicholas A. Christakis. Correlated Genotypes in FriendshipNetworks. PNAS 108 (5): 1993–1997 (1 February 2011). PMID: 21245293, PMC3033315.NRNB Collaboration Producing Results: Synthetic Genetic Analysis of Budding Yeast(Bader)The Bader lab has been collaborating with the Boone and Andrews lab since 2001, includinganalysis and visualization of the budding yeast genetic interaction network. Drs. Andrews andBoone are working to complete the first complete genetic interaction network for a cell and todecipher the general principles that govern this network. This reference map provides a modelfor expanding genetic network analysis to higher organisms, and it will stimulate valuableinsights into gene function, drug target and mode-of-action analysis. The resulting completemap of genetic interactions for budding yeast, with ~6000 genes, will contain 36 millionquantitative interaction pairs (18 million unique pairs). The most recent publication of roughly20% of the complete map was in the top 30 most cited papers of 2.2 million in 2010 . Themap currently is 75% complete and continues to be analyzed.The fundamental principle underlying this work is that we need to discover the rules governinghow genes interact with one another in order to be able to predict which rare combinations ofgene mutations cause human disease or other significant phenotypes. Their approach is paying off. In the last five months, NRNB investigator, Gary Bader, haspublished three new papers with Drs. Boone and Andrews, extracting knowledge about proteincomplexes , regions of protein disorder  and physiological fitness  from comparisonsof genetic interactions on a genome scale. All of these projects required Cytoscape, the open-source network analysis and visualization engine promoted by NRNB investigators (see below).References1. Costanzo M, Baryshnikova A, Bellay J, Kim Y, Spear ED, Sevier CS, Ding H, Koh JL,Toufighi K, Mostafavi S, Prinz J, St Onge RP, VanderSluis B, Makhnevych T, Vizeacoumar FJ,Alizadeh S, Bahr S, Brost RL, Chen Y, Cokol M, Deshpande R, Li Z, Lin ZY, Liang W, MarbackM, Paw J, San Luis BJ, Shuteriqi E, Tong AH, van Dyk N, Wallace IM, Whitney JA, WeirauchMT, Zhong G, Zhu H, Houry WA, Brudno M, Ragibizadeh S, Papp B, Pál C, Roth FP, GiaeverG, Nislow C, Troyanskaya OG, Bussey H, Bader GD, Gingras AC, Morris QD, Kim PM, KaiserCA, Myers CL, Andrews BJ, Boone C. The genetic landscape of a cell. Science. 2010 Jan22;327(5964):425-31. PubMed PMID: 20093466.2. Michaut M, Baryshnikova A, Costanzo M, Myers CL, Andrews BJ, Boone C, Bader GD.Protein complexes are central in the yeast genetic landscape. PLoS Comput Biol. 2011Feb;7(2):e1001092. Epub 2011 Feb 24. PMID: 21390331; PMCID: PMC3044758.
3. Bellay J, Han S, Michaut M, Kim T, Costanzo M, Andrews BJ, Boone C, Bader GD, MyersCL, Kim PM. Bringing order to protein disorder through comparative genomics and geneticinteractions. Genome Biol. 2011 Feb 16;12(2):R14. PMID: 21324131.4. Baryshnikova A, Costanzo M, Kim Y, Ding H, Koh J, Toufighi K, Youn JY, Ou J, San Luis BJ,Bandyopadhyay S, Hibbs M, Hess D, Gingras AC, Bader GD, Troyanskaya OG, Brown GW,Andrews B, Boone C, Myers CL. Quantitative analysis of fitness and genetic interactions inyeast on a genome scale. Nat Methods. 2010 Dec;7(12):1017-24. Epub 2010 Nov 14. PMID:21076421.NRNB Collaboration Connects Networks and Disease: Genetic Networks UnderlyingDNA Damage (Ideker)Although cellular behaviors are dynamic, the networks that govern these behaviors have beenmapped primarily as static snapshots. To explore network dynamics, the Ideker laboratory hasbeen collaborating with the laboratory of Nevan Krogan at UCSF to analyze interaction networksas they are remodeled by different cellular stresses and stimuli. This year, they developed anew approach called differential epistasis mapping (dE-MAP) which creates a genetic networkbased on the changes in interaction strength observed between two static conditions. Using thisapproach, they have mapped widespread changes in genetic interaction among yeast kinases,phosphatases, and transcription factors as the cell responds to DNA damage . Differentialinteractions uncover many gene functions that go undetected in static conditions. In thepublished study, they proved very effective at identifying DNA repair pathways, highlighting newdamage-dependent roles for the Slt2 kinase, Pph3 phosphatase, and histone variant Htz1. Theiranalysis also reveals that protein complexes are generally stable in response to perturbation,but the functional relations between these complexes are substantially reorganized. This proof-of-principle work suggests that differential networks chart a new type ofgenetic landscape that will be invaluable for mapping many different cellular responses tostimuli. We are now applying the dE-MAP procedure to examine the interaction dynamicsamong yeast genes involved in cellular processes such as autophagy, aging, and the responseto chemotherapeutic compounds. This research is highly complimentary to the work of theBader, Boone and Andrews laboratories described above (see Synthetic Genetic Analysis ofBudding Yeast), which seeks to map the entire genetic network in yeast for a single condition.References1. Bandyopadhyay S, Mehta M, Kuo D, Sung MK, Chuang R, Jaehnig EJ, Bodenmiller B,Licon K, Copeland W, Shales M, Fiedler D, Dutkowski J, Guénolé A, van Attikum H,Shokat KM, Kolodner RD, Huh WK, Aebersold R, Keogh MC, Krogan NJ, Ideker T.Rewiring of genetic networks in response to DNA damage. Science. 2010 Dec3;330(6009):1385-9. Erratum in: Science. 2011 Jan 21;331(6015):284. PMID:21127252; PMCID: PMC3006187.
Cytoscape 3.0: Proceeding at Full SpeedA recent New York Times article highlights the open source nature and plugin architecture ofCytoscape as a model for modern day collaborative science . Indeed, Cytoscape enablesa broad range of development projects and applied research that scale with support anddistribution. A primary goal of NRNB is to amplify and propagate the community developmentmodel of Cytoscape. Cytoscape is a core research tool either used by or representingthe research effort of every project and collaboration engaged by the NRNB. As such, thedevelopment and maintenance of Cytoscape receives a large amount of attention. Cytoscapedevelopment is progressing along two fronts: we are continuing to maintain the existing 2.8series of releases  and we are developing version 3.0 of Cytoscape which represents anevolution of our architecture designed to modularize the core of Cytoscape, define a clearand consistent API, and simplify the experience of developing and maintaining plugins forCytoscape.The Cytoscape 3.0 development effort has resulted in the first developer milestone release of3.0 at the end of January 2011. The purpose of this milestone was to present a functioningapplication to the core Cytoscape development team so that they could begin porting pluginsand to use and critique the 3.0 API. The Bader Group ported a number of core plugins from 2.8,including BioPAX and PathwayCommons and implemented session reading and writing. In earlyMarch 2011 we held a small meeting of core developers at UC San Diego to discuss the designof Cytoscape 3.0 and to plan the remaining development. We are currently on track to releasedeveloper milestone 2 prior to the 2011 Cytoscape Symposium in May. The primary goal of Cytoscape 3.0 is to achieve feature parity with Cytoscape 2.X, butthere will be new features included as well. We have begun initial development of a “QuickStart” plugin designed to help novice users get their network and associated attribute data intoCytoscape as quickly and easily as possible.References1. Markoff J. Digging Deeper, Seeing Farther: Supercomputers Alter Science. New York Times.April 26, 2011. p. D1 (Science).2. Smoot ME, Ono K, Ruscheinski J, Wang PL, Ideker T. Cytoscape 2.8: new features for dataintegration and network visualization. Bioinformatics. 2011 Feb 1;27(3):431-2. Epub 2010 Dec12. PMID: 21149340; PMCID: PMC3031041.New NRNB Services, Training and Outreach: Year OneLess than a year since its inception, the National Resource for Network Biology (NRNB) isestablishing itself as an influential and effective resource. NRNB is setting the standard amongresources for software development and collaborative research, as well as training and support. We provide a wide range of support services to the Cytoscape development andresearch communities, including organizational, technical development, training and outreach.In the first year, NRNB has made progress on nine new Technological Research andDevelopment projects that drive Cytoscape core and extension development into new territory.These projects range from revealing network modules as biomarkers, to developing newvisualization tools, to inferring networks from data, to using social networks in the study of
disease. Each project is driven by a number of biological and biomedical applications withhuman health implications. In addition to our own projects, we actively seek and supportcollaborations around Cytoscape development and application. In the first year, we haveestablished a total of 33 collaborations involving NRNB investigators. With such rapid earlygrowth, we anticipate this number will double over the next two years. NRNB is responsible for measuring and increasing the impact of Cytoscape on theresearch community. We track news, publications, events and collaborations related toCytoscape development and usage. Our policy is to report these measures at the NRNBwebsite, the annual meeting of our External Advisory Committee (EAC) and in our annual reportto the National Center for Research Resources. These objective metrics make it possible toidentify a variety of areas for improvement. For example, we recently performed an extensiveanalysis of help desk activity for the Cytoscape user community. For many users, the helpdesk is the primary access point to NRNB support. Based on our analysis, we found that only~60% of messages were being responded to. To raise this percentage to an effective 100% (notevery post is a question), we have developed a three-pronged approach: (1) Identify particularlyweak months during which greater vigilance is needed, (2) Transform the most commonissues into technical solutions, e.g., effective user messages, and (3) Implement a weeklyalert for unanswered threads which are then discussed at weekly conference calls amongNRNB software developers. This approach should improve the user experience by increasingresponsiveness and enhancing the usability of the software tool. The most popular service NRNB provides is training. We launched a new tutorialmanagement system called Open Tutorials, which is being used by researchers, developersand presenters. The link to training receives more clicks than anything else on our website.There is a clear demand for tutorials, which we are only just beginning to meet. Through thematerial resources at Open Tutorials and our organizing efforts to seek out and support trainingevents, we anticipate this will be a growing and effective NRNB function. Finally, in terms of outreach, NRNB serves a number of roles. We organize and run theannual Google Summer of Code event for Cytoscape and related network biology tools. For thesummer of 2011, we will have ten students writing code for NRNB. The students and mentorsare paid by Google for this work, totaling $55,000 for 12 weeks. The new NRNB website isalso a major form of outreach. Through the website, we gather programmers, collaborators andresearchers looking for training materials. To help direct traffic to the NRNB site and relatedsoftware sites (e.g., Cytoscape download page), we make use of a free, non-profit account forGoogle AdWords through the Cytoscape Consortium. We are directing >1,300 clicks a monthto NRNB tools and resources. This is worth just over $1,000 a month, which we are gettingfree-of-charge. Note: approximately half of the traffic is for WikiPathways, which is not yetofficially a NRNB resource. We have a spending limit of $329 per day through this program,so we will continue to identify new ad words and relevant resources to promote. NRNB is alsohelping to organize this year’s annual Cytoscape Retreat and Symposium, http://cytoscape.org/CytoscapeRetreat2011. In addition to developer meetings, the retreat will include user and newdeveloper tutorials, a Plugin Expo, and special symposium. The symposium will be presentedin conjunction with the San Diego Center for Systems Biology as the sixth annual Systems toSynthesis Symposium on May 20th at the Salk Institute. It has been an exciting first year ramping up NRNB research and services. While we
have met early success in facilitating collaborations, training and outreach, we have alsoidentified many areas for improvement. We are confident that NRNB will become an essentialresource for network biology research and its application to human health.
Annual Progress Report - Administrative Information 2011 National Resource for Network Biology P41 RR031228-01Administrative StructureWithin the first few weeks of establishing the NRNB, we finalized the administrative structureof the resource, including defining and filling some unique new roles within the organization(Fig. 1). The roles of Principal Investigator (PI), Co-PI, External Advisory Committee (EAC),Resource Administrator and Chief Software Architect were defined as in the original grant. Wedefined a new role of Executive Director (ED) to oversee some of the new resource functionsthat NRNB provides, including Training & Outreach, Communications and Infrastructure. TheED (Alex Pico, Gladstone Institute / UCSF) is responsible for coordinating these efforts as wellas conducting all of the necessary tracking and due diligence for the annual reporting to NIH.The Technology Research and Development (TRD) projects and leads have not changed fromtheir original description other than to factor them into proper subprojects per BTRC reportingconventions. Each TRD, for example, is diversifying over time into between 1 - 4 discretelydefined subprojects. Finally, we were very pleased to have all seven invited members promptlyagree to join our EAC, including Dr. Stephen Friend as chair of the committee.Figure 1. Resource Administration Structure. Blue boxes down the center define the
core leadership positions for NRNB. Purple boxes at right define resource roles under thecoordination of the Executive Director. The red boxes along the bottom describe the main TRDprojects currently defining the resource’s research direction, as well as its driving biologicalprojects and collaborations.Allocation of Resource AccessBeyond the active distribution and support of Cytoscape, which is covered in later sections,NRNB resource allocation can be categorized in the following way: 1. On-site training events: NRNB has organized 8 training events (in 7 cities in 4 countries), with an additional 4 events already planned for 2011. These events include tutorials, workshops and courses. 2. Funding requests: This year we had a request to fund a Cytoscape training event for the Medical Library Association’s annual meeting in Minnesota. We denied this request as it is outside the scope of our resource to fund external meetings that do not involve NRNB staff. However, we have adapted this request into a new idea of establishing a travel scholarship for external, distributed Cytoscape tutors to attend the annual Cytoscape Retreat. We will be presenting this idea as a proposal to our EAC during their annual meeting next month. If approved, we may see more requests of this type in the future. 3. Requests for training material support: We receive requests for tutorial materials throughout the year from inside and outside the Cytoscape core development team. We have implemented a new Open Tutorials system which makes it easy to approve all such requests. For example, we recently directed colleagues from the University of Michigan to existing tutorials already formatted to be used as online sessions, slide shows and printed handouts. 4. Joining our Google Summer of Code effort: We have received requests from a number of groups to join our NRNB umbrella organization for the Google Summer of Code. Such requests translate into significant administrative support services that we can provide for these groups. As long as a group is working on open source software relating to our core network biology projects, our policy is to be very open to these requests. We have the additional requirement that each group be able to demonstrate that their personnel can commit sufficient time as mentors. These policies have been developed over the past four years of successfully participating in the Google-sponsored program. This year we accepted (or vouched for) a total of nine groups in addition to our core Cytoscape team, representing the following software projects: Vanted, Reactome, Cytoscape Web, GenMAPP-CS, PathVisio, WikiPathways, Savant Genome Browser, and Systems Biology projects by the Theoretical Biophysics group at Humboldt University Berlin. 5. Providing software community support: Our software “menu of services” is rapidly growing. Our goal is to develop a generic template of services based on the support we provide the Cytoscape community of users and developers. We will be seeking EAC advice on the scope and depth of such services at the EAC meeting this May, 2011. We
anticipate adding one or several new groups to the list of NRNB-supported tools and resources over the next year.Awards and HonorsNoneDisseminationWe are averaging just over 10,000 visits (~60,000 page views) to the Cytoscape website permonth. An additional 3,000 visits per month were logged at the new NRNB website, whichwent live in December 2010. Our new tutorial management system, Open Tutorials, hasreceived over 1,000 visitors in the past month and is being used by researchers, developers andpresenters. A key statistic in terms of dissemination is number of software downloads. Currently, theprimary software offered and supported by NRNB is Cytoscape and its suite of plugins. Wehave seen a dramatic uptick in Cytoscape downloads in the first quarter of 2011, representing adoubling in download activity over the past year (Fig. 2). Figure 2. Chart of Cytoscape software downloads per month over the past 10 months. The NRNB website has a dedicated Tools page, which provides links to Cytoscape fordownload. We also offer a Training page, which displays upcoming training events and trainingmaterials in multiple formats. The Training page is the most popular page on the site, indicatingthe need and demand for the training services NRNB is providing. We also make researchers aware of our tools and services through the manyconferences our representatives attend. For example, the NRNB will have a major presence atthe Nineteenth Annual International Conference on Intelligent Systems for Molecular Biology(ISMB 2011) which will be held jointly with the Tenth Annual European Conference onComputational Biology (ECCB) in Vienna, Austria, July 17 - July 19, 2011. ISMB has becomethe largest conference on computational biology worldwide. This year over 1500 attendees areexpected. As part of this meeting, we are organizing the first annual NetBio Special InterestGroup (SIG) meeting dedicated to network biology tools, resources and research applications.NRNB tools are also represented in the research literature through our development and
research publications. Numerous Cytoscape plugin articles and research articles usingCytoscape are published annually: 235 in the past year alone (HighWire search). We arecurrently drafting a position paper that will describe NRNB to the research community toincrease awareness of our new resource. Finally, most visibility for our software arguably comes from our consistent dedication toan “open source” policy. Our open-source license allows us to easily disseminate our softwarecode through public repositories (Sourceforge, code.google, self-hosted servers) and participatein social networks in support of code development (Ohloh). We take very seriously our activeparticipation and cultivation of an open development community. This should not be taken forgranted. Many academic software projects suffer from relatively short cycles of commitmentfrom graduate students and postdocs progressing through their careers. The open sourcemodel offers a means to develop software inclusively and sustainably. We have worked hard tobuild, develop and maintain this community. The benefits are a sustained project that continuesto grow and to stay relevant. It also instills confidence in potential contributors as well as usersthat their work will be acknowledged and that the product will persist and remain free and open.It is through the software development community that Cytoscape maintains it most ardentevangelists, presenting new functionality at their home institutions and through conferences andpublications. Our open source commitment also allows us to participate in programs such as theGoogle Summer of Code, where Google sponsors 9-10 students to write code for us eachsummer.Patents, Licenses, Inventions, and CopyrightsNone. We are committed to an Open-Source dissemination policy.Training and OutreachAnnual Cytoscape RetreatWe are actively planning this year’s annual Cytoscape Retreat and Symposium, hosted by theNational Resource for Network Biology (NRNB) in collaboration with the San Diego Centerfor Systems Biology (SDCSB). In addition to developer meetings, the retreat will include userand new developer tutorials, a Plugin Expo, and a special symposium. The symposium will bepresented in conjunction with the SDCSB as the sixth annual Systems to Synthesis Symposiumon May 20th at the Salk Institute (http://cytoscape.org/CytoscapeRetreat2011/).WorkshopsFor the reporting period, NRNB has organized a total of 8 training events (in 7 cities in 4countries), with an additional 4 events planned for the remainder of 2011. These events includetutorials, workshops and courses. For the same period, NRNB investigators and staff havegiven 7 invited lectures. For 2011, several conferences are planned, including the annualCytoscape Retreat and Symposium, which will take place on May 18-21 in San Diego, CA.HelpdeskA major means of support for NRNB tools is through dedicated helpdesk and discussion mailinglists. The NRNB has begun monitoring the activity of these lists for the Cytoscape community
as an ongoing metric for the effectiveness of our support. As a starting baseline, this first yearsaw 723 messages and a response rate of 61%. A fraction of the messages are informationalposts that do not require a response, so we do not expect our response rate to hit 100%.Nevertheless, we have identified an opportunity for substantial improvement. From an analysisof our mailing list patterns, we have identified three approaches for improving response ratesand disseminating information to users: ● Monthly response rates will be collected to identify months with lower than average response rates. A targeted strategy can then be employed to increase the response rate during these months. ● The most common discussion topics and questions will be identified, in order to improve the dissemination of critical information to users. In addition to FAQ topics, we will use this information to create innovative context-specific solutions tailored to each question. For example, users often ask about the syntax for increased memory allocation for Cytoscape. This information could be communicated in an error message any time Cytoscape experiences a memory-related failure, before the user even formulates the question. ● We are automating the analysis of helpdesk activity so that weekly alerts can be sent to NRNB staff whenever an email goes unanswered. This will allow us to maximize our response rate and to quickly address gaps in our collective attention.Social MediaWe have initiated a social media effort for Cytoscape through a number of different tools(http://www.cytoscape.org/community.html). For example, a Twitter account is used for quickannouncements (http://twitter.com/cytoscape) and YouTube is utilized for video tutorials (http://www.youtube.com/results?search_query=cytoscape).Google AdWordsWe were awarded a non-profit account in the Google AdWords program. We are directing>1,300 clicks a month to NRNB tools and resources via AdWords. We are running 7 campaigngroups consisting of over 500 key words and phrases. These activities are worth just over$1,000 a month, which we are getting free-of-charge. Note: approx half of the traffic is forWikiPathways, which is not yet officially a NRNB resource. We have a spending limit of $329per day through this program, a potential value of $120,000 per year, so we will continue toidentify new ads and relevant resources.Google Summer of CodeWe were accepted as a mentoring organization in the 2011 GSoC program. Google allocated10 student “slots” to us, which we have filled with qualified and enthusiastic summer students.The students will write open source code for NRNB-related projects during the summer. This isequivalent to $55,000 paid out as student and mentor stipends.
Annual Progress Report - Advisory Committee 2011 National Resource for Network Biology P41 RR031228-01In our first year (8 months), we have assembled an External Advisory Committee (EAC) andscheduled the first EAC meeting for May 19th, 2011. We were very pleased to have all seveninvited members promptly agree to join our EAC, including Dr. Stephen Friend as chair of thecommittee.Committee Members:● Stephen Friend, M.D, Ph.D. is President, Co-Founder and Director of Sage Bionetworks. He was previously Senior Vice President and Franchise Head for Oncology Research at Merck & Co., Inc.● David Hill, Ph.D. is Associate Director of the Center for Cancer Systems Biology at the Dana-Farber Cancer Institute where he is also co-leader of the Pathogen Host Interactomes group.● Tamara Munzner, Ph.D. is Associate Professor in the Department of Computer Science at the University of British Columbia and is a member of theIMAGER Graphics, Visualization and HCI research group. ● Nicholas Schork, Ph.D. is Director of Biostatistics and Bioinformatics at theScripps Translational Science Institute and Professor in the department of Molecular and Experimental Medicine at the Scripps Research Institute.● Gustavo Stolovitzky, Ph.D. is Manager of the Functional Genomics and Systems Biology group at the IBM Computational Biology Center. He is a Fellow of the American Physical Society, a Fellow of the New York Academy of Sciences, and an adjunct Associate Professor at Columbia University.● Marian Walhout, Ph.D. is Associate Professor at the University of Massachusetts Medical School in the program of Program in Gene Function and Expression.● Annette Adler is the Section Manager for the Computational Biology and Informatics within Agilent Labs.A full report of our first EAC meeting will be provided in next year’s progress report. The agendaincludes discussion of major NRNB projects, including Cytoscape 3.0, and our collaboration,service and outreach efforts. In addition to asking for feedback on our progress so far, we willprepare a set of specific proposals to engage the EAC in our most complex decisions. Finally,we will also set milestones for our second year as a resource.
Annual Progress Report - Research Progress 2011 National Resource for Network Biology P41 RR031228-01Recent progress in high-throughput experimental technologies has released enormous amountsof interaction data into the public domain. Analysis of these interactions— and the networks theyform— relies in large part on robust bioinformatic technology. The mission of the NRNB is todevelop and support a suite of bioinformatic tools that broadly enable Network Biology for theNIH-funded public. In this first year of our resource we have significantly advanced our goalsthrough basic research, collaboration, dissemination of software tools, and community support.Here, we describe our progress in research, both basic and collaborative. This progressincludes algorithms for identification of network substructures (modules); use of networkmodules for patient diagnostics; tools to enable fundamentally new network visualizations; and amajor new version of our Cytoscape network analysis platform.Contents: ● NRNB Technology Research and Development Projects 1. Network-Guided Forests Identify Network Modules as Biomarkers (Ideker) 2. Identifying Altered Networks in Cancer (Sander) 3. Visualizing Cancer Genomic Data in the Context of Biological Networks (Sander) 4. Recognizing Trend Motifs and Dynamics in Networks (Fowler) 5. General Layout Algorithms and Views for Hierarchical, Modular Networks (Bader) 6. Semantic Zooming and Information Layering (Bader) 7. Network Layout by Known Ontology Attributes (Conklin, Pico) 8. Mapping and Visualizing Complex Attributes (Conklin, Pico) 9. The CYNI Modular Network Induction Framework (Schwikowski) ● NRNB Research Driving Biological Projects and Collaborations 1. Continuing DBP: Synthetic Genetic Analysis of Budding Yeast 2. New CSP: Genetic Networks Underlying DNA Damage ● NRNB Software and Resources 1. Cytoscape Core 2. SDSC Triton Resource 3. Open Tutorials 4. New NRNB WebsiteNRNB Technology Research and Development ProjectsIn the original grant proposal, we detailed four Technology Research and Development (TRD)projects. These projects have specialized and diversified into the nine TRD projects listedbelow. We anticipate further diversification and thus are shifting away from the limiting, originalnotation. To help translate, we include the labels TRD A - TRD D below for each project.1. Network-Guided Forests Identify Network Modules as Biomarkers (Ideker: TRD A)Over the past year, the NRNB has been pursuing a number of bioinformatic advances to better
identify modular structures within biological networks and to apply network modules to predictdisease outcomes. These developments are enabling what we call “network-based biomarkers”,based on the concept that network modules are better markers of cell state than are individualgenes or proteins [1-5]. Indeed, many biological and clinical outcomes are based on modulesof several interacting proteins working in combination. In development, for instance, it is largelycombinatorial modules of transcription factors that give rise to the diversity of tissues. Proteincombinations are equally instrumental in the pathogenesis of human disease, for instance theinappropriate fusion of Bcr and Abl that leads to chronic myelogenous leukemia or the abnormalinteractions acquired by the huntington protein in Huntington’s Disease. A fundamental unanswered question is how the proteins within each module contributeto the overall module activity. Over the past year, we have performed a case study of themodules underlying three representative biological programs related to tissue development,breast cancer metastasis, or progression of brain cancer, respectively. To facilitate this studywe have developed a new bioinformatic method, called Network-Guided Forests (NGF), toidentify predictive modules together with logic functions which tie the activity of each moduleto the activity of its component genes . NGF integrates key ideas from Random Forests(RF)  with biological constraints induced by a protein-protein interaction network— the firstuse of protein networks in ensemble learning. The NGF framework learns a set of decisiontrees (the “forest”) in which each tree maps to a connected component of the protein-proteininteraction network (Fig. 1). The decision tree specifies a function that determines the outputof the network component based on the activity of its genes. In turn, the collection of all treeoutputs is used to predict the cell type or disease state of the biological sample (the “class”). By construction, decision trees detect genes that influence a phenotypic outcome bothindividually and through multiway interactions with other genes. As in the standard RandomForests algorithm, NGF uses a permutation-based procedure to assess the importance of eachgene on the classification accuracy of the forest. We also assess the importance of pairs ofgenes in a tree — in our study these pairs are constrained by the network neighborhood. Genesand gene pairs with significantly high importance scores are placed into clusters that capturesimilar patterns of presence/absence across the forest of decision trees. Each clusteraggregates genes that fall into the same network region and, in combination, have predictivepower over the sample class. Hence these clusters are termed “consensus decision modules”(Fig. 1). Use of NGF to analyze the three representative biological programs (early development,breast cancer metastasis, and mesenchymal transformation of brain tumor) identifies networkmodules which capture known causal mechanisms of development or disease. The modulesimplement diverse logic functions using both coherent and opposing gene activities, in whichthe module output depends on expression increases for some genes and concomitantdecreases for others. Notably, we found that in cancer progression the most predictive decisionfunctions can be linked to interactions between known oncogenes and tumor suppressors, suchthat the combined activity of both types of genes determines the disease outcome.
Figure 1. Network decision modules underlying embryonic origin, breast cancermetastasis and mesenchymal transformation of brain tumors. Expression profiles for eachof these three case studies are combined with a network of protein-protein interactions amonghuman transcription factors. Network-guided forests are used to identify key network modulesthat are most important for correct sample classification (representative modules are shownfor each study). Grey edges indicate physical protein-protein interactions, blue edges indicateinteractions that occur in the same decision trees and are most important for classification.Node color indicates gene importance as indicated by a permutation test. Each module isassigned a decision tree that specifies the output of the module based on the activity of itsgenes.References1. Segal, E., et al., Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat Genet, 2003. 34(2): p. 166-76.2. Chuang, H.Y., et al., Network-based classification of breast cancer metastasis. Mol SystBiol, 2007. 3: p. 140.3. Muller, F.J., et al., Regulatory networks define phenotypic classes of human stem cell lines.Nature, 2008. 455(7211): p. 401-5.4. Ravasi, T., et al., An atlas of combinatorial transcriptional regulation in mouse and man.Cell, 2010. 140(5): p. 744-52.5. Ulitsky, I., et al., DEGAS: de novo discovery of dysregulated pathways in human diseases.PLoS One, 2010. 5(10): p. e13367.6. Dutkowski, J. and Ideker, T. Protein networks as logic functions in development and cancer.PLoS Comp. Bio., In second round of review.7. Breiman, L., Random forests. Machine Learning, 2001. 45(1): p. 5-32.2. Identifying Altered Networks in Cancer (Sander: TRD A)As another project involving network-based biomarkers, we have been developing tools tovertically integrate multidimensional genomic profiling data (including sequence mutations,
DNA copy-number alterations, and mRNA expression profiles) in order to identify alteredsub-networks in cancer. We refer to these modules as “driver networks”, as they are likelyto contribute to tumorigenesis in multiple patients. Recently, NRNB investigators and othershave shown proof of principle that the use of network and pathway information can help usunderstand the pronounced genetic heterogeneity seen in individual tumors of the same cancertype  and that they can lead to more accurate and robust signatures for classifying diseasestates [2-5]. To date, such methods have been explored in glioblastoma multiforme [6,7], as wellas pancreatic , lung , breast and colorectal cancer .With NRNB funding, we have begun to explore the use of an optimization algorithm borrowedfrom statistical physics to connect altered genes in cancer with minimal spanning networks.Such networks can identify the set of interactions able to explain the pattern of correlatedalterations in cancer, i.e. the driver networks, from a human reference interaction network. Thealgorithm we are using, which addresses the minimum Steiner tree problem, attempts to findthe shortest connection between altered genes in a specific cancer type. This network may beconstructed with direct connections between altered genes and/or with connections betweenaltered and unaltered genes within a human reference protein interaction network. In general, this problem is classified as an NP-complete problem, meaning that there isno efficient way of finding a solution. Additionally, once a network approaches an order ofmagnitude of ~10 altered genes, the minimum Steiner tree can only be approximated. In orderto handle the much larger number of genes in the human protein interaction network, we areusing an algorithmic framework based on a distributed method called message passing. Thismethod has been shown to be successful in various applications, such as detecting proteinassociations in cell signaling  and data clustering . We are currently evaluating the useof the Steiner tree algorithm by using a training dataset from glioblastoma and exploring therange of improvement obtained by varying the interaction weights between genes in the humanreference network based on the mRNA expression profiles (Fig. 2).This algorithmic research is being applied to analyze multiple cancer types derived from TheCancer Genome Atlas (TCGA), prostate cancer genome data derived from the MSKCC ProstateCancer Genome Project (PCGP), and expression data from chronic lymphocytic leukemia (CLL)patients at UCSD (Thomas Kipps, MD/PhD), all of which are being provided by active DrivingBiological Projects (DBPs).
Figure 2. Application of the Steiner tree algorithm to glioblastoma mulitforme (GBM).Blue nodes represent genes altered by somatic mutation or copy number alteration. Pink nodesrepresent Steiner tree “linker” nodes that minimally connect altered nodes. Canonical pathways,including PI3K, P53 and RB signaling are outlined.References1. Lin, J. et al. A multidimensional analysis of genes mutated in breast and colorectal cancers.Genome Research 17, 1304-18 (2007).2. Chuang, H.Y., Lee, E., Liu, Y.T., Lee, D. & Ideker, T. Network-based classification of breastcancer metastasis. Mol Syst Biol 3, 140 (2007).3. Efroni, S., Schaefer, C.F. & Buetow, K.H. Identification of key processes underlying cancerphenotypes using biologic pathway analysis. PLoS ONE 2, e425 (2007).4. Tuck, D.P., Kluger, H.M. & Kluger, Y. Characterizing disease states from topologicalproperties of transcriptional regulatory networks. BMC Bioinformatics 7, 236 (2006).5. Ideker, T. & Sharan, R. Protein networks in disease. Genome Research 18, 644-52 (2008).6. TCGA. Comprehensive genomic characterization defines novel cancer genes and corepathways in human glioblastomas 43 (2008).7. Parsons, W.D. et al. An Integrated Genomic Analysis of Human Glioblastoma Multiforme.Science, 13 (2008).8. Jones, S. et al. Core Signaling Pathways in Human Pancreatic Cancers Revealed by GlobalGenomic Analyses. Science (2008).9. Ding, L. et al. Somatic mutations affect key pathways in lung adenocarcinoma. Nature 455,1069-75 (2008).10. Bailly-Bechet M, Borgs C, Braunstein A, et al. Finding undetected protein associations in cellsignaling by belief propagation. Proc Natl Acad Sci U S A. 2011;108(2):882-887.11. M. Bailly-Bechet, S. Bradde, A. Braunstein, A. Flaxman, L. Foini, R. Zecchina. Clusteringwith shallow trees. J Stat Mech. 2009;P12010.
3. Visualizing Cancer Genomic Data in the Context of Biological Networks (Sander: TRDA)This project focuses on visualizing cancer genomic data in the context of specific pathways andnetworks. We have developed an initial prototype using Cytoscape Web , which is capableof displaying networks derived from Pathway Commons , and overlaying these networkswith genomic data derived from the TCGA project. The prototype displays a fully interactivenetwork of the genes analyzed, plus details regarding individual genomic alterations (Figure3). We are planning to transfer knowledge we have gained from this prototype and apply it toour cBio Cancer Genomics Portal (http://cbioportal.org). The portal currently enables users tovisualize, analyze and download large-scale cancer genomic data sets, but is currently lackingin network visualization. With Cytoscape Web, users will soon be able to enter a set of genes,visualize those genes in a network context, and dynamically overlay genomic data onto thenetworks of interest. This will provide a critical exploratory data analysis module to the portal,enabling the wider research community to more easily visualize genomic data in the context ofbiological pathways, and to develop and confirm hypotheses regarding cancer development andprogression.Figure 3. Prototype of cancer network visualization, built with Cytoscape Web . Left panelshows a global network view of genes altered by somatic mutation or copy number alterationin serous ovarian cancer (TCGA). Node size is proportional to frequency of alteration. Rightpanel shows a local view of the BRCA/RB subnetwork, with genomic alterations displayed as acompact OncoPrint. Experience gained from this prototype will be used to add a new networkvisualization component to our cBio Cancer Genomics Portal (http://cbioportal.org).References1. Lopes CT, Franz M, Kazi F, Donaldson SL, Morris Q, Bader GD., Cytoscape Web: aninteractive web-based network browser. Bioinformatics. 2010 Sep 15;26(18):2347-8.2. Cerami EG, Gross BE, Demir E, et al., Pathway Commons, a web resource for biological
pathway data. Nucleic Acids Res. 2011;39(Database issue):D685-D690. PMID:210713924. Recognizing Trend Motifs and Dynamics in Networks (Fowler: TRD B)It is well known that humans tend to associate with other humans who have similarcharacteristics, but it is unclear whether this tendency has consequences for the distributionof genotypes in a population. Although geneticists have shown that populations tend to stratifygenetically, this process results from geographic sorting or assortative mating, and it is unknownwhether genotypes may be correlated as a consequence of non-reproductive associations orother processes. In this TRD project published in PNAS, we study six available genotypes from theNational Longitudinal Study of Adolescent Health to test for genetic similarity between friends[1,2]. Maps of the friendship networks show clustering of genotypes, and, after we apply strictcontrols for population stratification, the results show that one genotype is positively correlated(homophily) and one genotype is negatively correlated (heterophily). A replication study on anindependent sample from the Framingham Heart Study verifies that DRD2 exhibits significanthomophily and that CYP2A6 exhibits significant heterophily. These novel results show thathomophily and heterophily obtain on a genetic (indeed, an allelic) level, which has implicationsfor the study of population genetics and social behavior. In particular, the results suggest thatassociation tests should include friends genes and that theories of evolution should take intoaccount the fact that humans might, in some sense, be "metagenomic" with respect to thehumans around them. This work continues to build off our original DBP for the “Role of SocialNetworks in the Spread of Disease,” led by Nicholas Christakis.References1. Fowler JH, Dawes CT, Christakis NA. Model of genetic variation in human social networks.Proc Natl Acad Sci U S A. 2009 Feb 10;106(6):1720-4. Epub 2009 Jan 26. PMID: 19171900;PMCID: PMC2644104.2. Fowler JH, Settle JE, Christakis NA. Correlated genotypes in friendship networks. Proc NatlAcad Sci U S A. 2011 Feb 1;108(5):1993-7. Epub 2011 Jan 18. PMID: 21245293, PMC30333155. General Layout Algorithms and Views for Hierarchical, Modular Networks (Bader: TRDC)Biologists frequently use networks to represent the structure and function of the cell, usingintuitive metaphors to reduce multiple levels of spatial and temporal relationships to a two-dimensional image. At the same time, computational representations of the cell are moreabstract and tend to be less intuitive for biologists than human-made diagrams. We areworking to improve the biological relevance of computational visualizations of biologicalnetworks in Cytoscape, in collaboration with investigators leading driving biological projectsand collaborative service projects. More intuitive biological network visualizations will speedinterpretation of large-scale data about cellular processes being generated by biologists. We developed the Thematic Map plugin for Cytoscape, based on an earlier prototypepresented in our original NRNB grant application. This plugin ‘rolls-up’ node or edge attributes
into individual nodes, i.e. it transforms an input network of interactions among proteins into anattribute network, in which node attributes are nodes and edges summarize all connectionsbetween nodes with the corresponding attributes in the original network. This view can be usedin a number of biologically useful ways, such as summarizing the functional content of a largeprotein-protein interaction network. We are currently testing this plugin for release in the secondhalf of 2011. Figure 4. Thematic map based on node attributes. We have also developed a second plugin, the Enrichment Map, in a similar spiritto the Thematic Map plugin. Gene-set enrichment analysis is a useful technique to helpfunctionally characterize large gene lists, such as the results of gene expression experiments.This technique finds functionally coherent gene-sets, such as pathways, that are statisticallyover-represented in a given gene list. Ideally, the number of resulting sets is smaller than thenumber of genes in the list, thus simplifying interpretation. However, the increasing number andredundancy of gene-sets used by many current enrichment analysis software works againstthis ideal. To overcome gene-set redundancy and help in the interpretation of large genelists, we developed "Enrichment Map", a network-based visualization method for gene-setenrichment results. Gene-sets are organized in a network, where each set is a node and edgesrepresent gene overlap between sets. Automated network layout groups related gene-sets intonetwork clusters, enabling the user to quickly identify the major enriched functional themes andmore easily interpret the enrichment results. Enrichment Map is a significant advance in theinterpretation of enrichment analysis. Any research project that generates a list of genes cantake advantage of this visualization framework. Enrichment Map is implemented as a freelyavailable and user friendly plug-in for the Cytoscape network visualization software (http://baderlab.org/Software/EnrichmentMap/) .
Figure 5. Enrichment Map for estrogen-treated cells versus untreated cellsReferences1. Merico D, Isserlin R, Stueker O, Emili A, Bader GD. Enrichment map: a network-basedmethod for gene-set enrichment visualization and interpretation. PLoS One. 2010 Nov15;5(11):e13984. PMID: 21085593; PMCID: PMC2981572.6. Semantic Zooming and Information Layering (Bader: TRD C)Our goal in this project is to develop methods to help researchers explore and interpretlarge networks and their associated genome-scale data sets. As the volume, resolution andcomplexity of biological data continue to increase, so do the challenges associated withvisualizing, analyzing and interpreting the data. Methods that we develop will help networkvisualization scale, while still remaining interactive to support live exploration and hypothesistesting. We have developed an initial implementation of a new filtering API for Cytoscape 3,which will enable us to develop the next-generation interactive filtering system for Cytoscape.We have verified that using a BitSet implementation to handle filter set operations can supportlarge networks of up to 10 million nodes and edges. We are currently receiving feedback aboutthe new API and will implement it fully in Cytoscape 3.1 . We are also making progress on this project by developing support for visualizingdetailed biological pathways in Cytoscape. We have recently implemented BioPAX Level 3support in Cytoscape (BioPAX Level 3 reader, writer and visualizer) . This enables importof biological pathway information from various pathway databases, including Reactome ,WikiPathways  and Pathway Commons . Future pathway visualization features that wedevelop in Cytoscape will depend on this functionality. We continue to closely collaborate with the Charlie Boone and Brenda Andrews labs
who lead our DBP: Synthetic genetic analysis of budding yeast (see DBP progress reportsbelow).References1. http://cytoscape.wodaklab.org/wiki/Outdated_Cytoscape_3.0/FilterAPI2. Demir E et al. The BioPAX community standard for pathway data sharing. Nat Biotechnol.2010 Sep;28(9):935-42. Epub 2010 Sep 9. PMID: 208298333. Matthews L, Gopinath G, Gillespie M, Caudy M, et al. Reactome knowledgebase of biologicalpathways and processes. Nucleic Acids Res. 2008 Nov 3. PMID: 189810524. Pico AR, Kelder T, van Iersel MP, Hanspers K, et al. (2008) WikiPathways: Pathway Editingfor the People. PLoS Biol 6(7): doi:10.1371/journal.pbio.00601845. Cerami et al. Pathway Commons, a web resource for biological pathway data. Nucl. AcidsRes. (2010) doi: 10.1093/nar/gkq10397. Network Layout by Known Ontology Attributes (Conklin, Pico: TRD C)Organizing network or pathway data into a diagram that effectively communicates informationabout biological systems requires biological expertise and even a bit of artistry. A good diagrammight need to illustrate myriad interactions between genes, proteins and small molecules andmight also convey their spatial and temporal arrangement. One of the most biologically intuitiveways to organize information about cellular systems is to place it in the context of a familiarphysical map of the cell, with the nucleus surrounded by cytosol, organelles and a plasmamembrane. Similarly, proteins known to be part of the same pathway should be placed closetogether in the diagram. A good source of information about a protein’s cellular location andbiological process involvement is the Gene Ontology (GO) project , a collaborative effortto standardize nomenclature for biological concepts and link these to genes and proteinsfrom many genomes. The GO project has developed three structured controlled vocabularies(ontologies) that describe gene products in terms of their associated biological processes,cellular components and molecular functions in a species-independent manner. Gene Ontologyprovides much broader coverage of genomes for this type of information than is available fromany other source, such as traditional pathway models stored in pathway databases . We developed a network layout plugin for Cytoscape, which utilizes Gene Ontology(GO) annotations to help organize nodes in a biologically relevant way. The first versionof the GOLayout plugin is currently being tested and will be released in the second half of2011. GOLayout first partitions a given network into subnetworks based on biological processannotations, such as cell differentiation or cell cycle, provided by a pruned set of Gene Ontologycalled “GO slim” (Fig. 6). Each subnetwork is laid out based on cellular component annotationsover a scalable template of a typical cell diagram. Finally, each node is colored based on adiscrete mapping to molecular function annotations, such that all kinases, for example, might becolored green. The result is a biologically informative layout. This project is complementary tothe Thematic Map described in TRD 5 above. While that plugin generates a descriptive networkof attribute-based metanodes, the GOLayout plugin generates a series of subnetworks usingattributes to partition, layout and color given nodes. As both projects are under the umbrella ofthe original TRD C proposal, we continue to coordinate on the development of these related
efforts. Key aspects of the design and implementation of the plugin were done as part of anNRNB collaboration and service project with Allan Kuchinsky from Agilent Technologies, led byAnnette Adler (Visualizing Biological Networks with a Biologist’s Eye).Figure 6. The result of using GOLayout to partition a massive “hairball” network into a series ofbiological processes, each laid out into cellular compartments and colored by molecular functionaccording to Gene Ontology annotations. The next version of GOLayout will include user-driven heuristics for highlightingbiologically interesting paths within the layout, as well as better ontology handling, i.e., fornavigating nested terms. Other key features planned for the next release include support forimporting/exporting/printing the layouts in multiple formats. This will allow for custom layouttemplates, as well as unique visualization, analysis and sharing workflows.References1. The_Gene_Ontology_Consortium. Gene ontology: tool for the unification of biology. 25, 25-29(2000).2. Cary, M.P., Bader, G.D. & Sander, C. Pathway information for systems biology. FEBS Lett579, 1815-20 (2005).8. Mapping and Visualizing Complex Attributes (Conklin, Pico: TRD C)An increasing number of experimental methods, such as scans for Single NucleotidePolymorphisms (SNPs) or exon microarrays, are generating data at sub-gene levels. It isextremely useful to interpret this information in the context of biological networks and pathways[1-4]. For this purpose, we are extending Cytoscape to enable network visualizations of data onsub-gene structures, similar to how Cytoscape already allows visualization of gene expressiondata on nodes that represent genes or proteins. The input to the system is a data set of sub-gene or protein features, such as SNPs, exons or protein domains, and their associated data
(e.g. population frequency, expression level or domain type). The parent node (gene) color maythen be based on the expression values of the exon, or could be based on a gene expressionexperiment, to allow comparison between exon expression and gene expression. We have made progress toward supporting the mapping of attributes across thesevarious levels of abstraction with our ongoing work on entity grouping concepts andrepresentations in Cytoscape. The initial benefits of this work are expressed in new metanodefeatures, supporting the mapping of member node attributes up to the parent node using basicfunctions (average, sum, minimum, maximum, median). Next, we plan to add weighted average,threshold and modal functions. This mapping infrastructure for metanodes is critical to alldownstream visualization work with sub-gene and supra-gene level features and entities.Proteins and genes in biological networks are associated with an increasing amount of datafrom multiple experiments, such as gene expression measured across a time series or acrossnormal and disease states. Ideally, this multi-dimensional information could be visualized inthe context of networks, but this is not possible with the current version of Cytoscape. We areextending the Cytoscape visual mapping system to support multiple node attributes at the sametime using new types of visual attributes. Our primary DBP (Alternative splicing in embryonicstem cells, Mercola/Burnham Institute), for instance, requires this ability to view time seriesgene expression experiments. Through a new collaboration and service project (Visualizing Multiple Attributes, Morris/UCSF), we coordinated on the design and implementation of the new nodeCharts plugin. Thisplugin provides an interface for drawing pie, line, bar charts, and histograms onto nodes usingeither attribute data or arbitrary data values (Fig. 7).Figure 7. The sample network galFiltered with nodes painted with a pie graph representing thesignificance of the expression difference for each experimental condition as expressed in theattributes "gal1RGsig (red), gal4RGsig (yellow), and gal80Rsig (green)". This work is just the beginning of our larger aim of supporting information layering andcomplex attribute visualization. There are other visual styles to add to nodeCharts, includingradar, concentric, grid, and so on. Furthermore, the current nodeCharts plugins provides onlyprogrammatic support through the CyCommands interface to the Cytoscape core application.We plan to implement control panels that utilize nodeCharts to present a user interface tosupport complex mapping decisions. Finally, it will be important to connect the mapping work tothe visualization work and to consider the unique cases of visualizing aggregate informationfrom sub-gene features to network nodes or from network nodes to metanodes.References1. Mourich, D.V. & Iversen, P.L. Splicing in the immune system: potential targets for therapeutic
intervention by antisense-mediated alternative splicing. Curr Opin Mol Ther 11, 124-32 (2009).2. Venables, J.P. et al. Cancer-associated regulation of alternative splicing. Nat Struct Mol Biol(2009).3. Chang, J.S. et al. Pathway analysis of single-nucleotide polymorphisms potentially associatedwith glioblastoma multiforme susceptibility using random forests. Cancer Epidemiol BiomarkersPrev 17, 1368-73 (2008).4. Hoffman, A.E. et al. Clock-cancer connection in non-Hodgkins lymphoma: a geneticassociation study and pathway analysis of the circadian gene cryptochrome 2. Cancer Res 69,3605-13 (2009).9. The CYNI Modular Network Induction Framework (Schwikowski: TRD D)In spite of steady progress in the development of methods that automatically learn networkstructure from data, these methods have not yet found broader use in the biological literature.The CYNI project aims to provide an easy-to-use interface for network inference algorithmsmaking data-driven analysis of biological problems (including clustering and classificationtasks, hypothesis generation from data, and support for experiment design) amenable to usersof the Cytoscape software platform. It will also provide method developers with supportingfunctionality and technical infrastructure that makes it straightforward to distribute software to awide community. For tool users, the unified interface will permit easy access to a large numberstate-of-the-art methods allowing for the rapid adaptation of existing data-processing workflowsto new biological problems or the integration of novel tools in direct comparison with extantmethods. We will use reference implementations of tools that demonstrate the new interface tomethod developers, and provide examples of their use in biological application projects. Classification, clustering and network induction provide conceptually homogeneousapproaches with a wide range of practical applications. A large number of variations exist, forinstance with respect to the choice of particular algorithms, the distance/similarity measuresand the standardization of input data. For optimal results, these choices must be made incompliance with desired properties of the results and are thus application-dependent. To retainthe flexibility and extensibility required for a widely applicable framework, we are developingCYNI with a modular approach that allows functionality to be shared between tasks and allowstailoring of application-specific workflows from predefined building blocks. Specifically, thenetwork induction consist of three stages: 1. An edge assessment using an information-theoretic measure 2. A pathway aggregation step 3. A component for experiment selection These stages can be supplemented by an optional pre-processing step. Moreover, thepathway aggregation step can be configured to harmonize with various edge scoring measures(the default setting being adapted to a general-purpose method that does not put restrictions onthe interpretation of the edge weights). We have applied this design to a network induction and experiment design methodologyfor de-novo identification of pathways from large-scale data within the BaSysBio project, which
aims to elucidate regulatory networks in the gram-positive bacterium Bacillus subtilis. Theapproach links an observed phenotype to an external perturbation and is currently implementedas a series of stand-alone programs. Output is generated in several formats, including the .sifformat read by Cytoscape. Following the implementation of the Cyni-plugin interface, we areplanning to integrate the developed algorithms as reference plugin implementations fordemonstrating the network induction interface. We have applied the approach to a transcriptome time series measurement of cellsfollowing a nutrient change, in which a surprising consequence (bacterial competence)was induced. Our computational approach induced a network between regulatory pathwaycandidates involving a total of only 26 genes from expression data, from an initial selection ofmore that 400 genes. Many of the inferred edges coincide with known regulatory interactions.Newly indicated putative pathways are now being tested experimentally. This network induction problem is in many ways similar to the one posed in our DBP(Agents that Boost Innate Antimicrobial Defenses, Sansonetti/Institut Pasteur). We expect to beable to apply the pathway aggregation and experiment selection modules with the datagenerated in that project. Problem-specific interaction measures will be developed in closecollaboration with the Sansonetti group.NRNB Research Driving Biological Projects and CollaborationsDuring the first year, our research projects have remained coupled with the DBPs and CSPsoriginally presented in the grant proposal. You will find explicit references in many of thedescriptions above and each is registered as a subproject, which will be tracked and updatedannually. In addition, we have picked up many new collaborations this year (37 in total). Thesecollaborations involve both the application and technical development of NRNB tools andresources. We recognize that collaborations best showcase the actual utility of our Resourceand drive the direction and purpose of many of our research projects. In this progress report,we highlight two examples: one new and one continuing from the original grant (these are alsoResearch Highlights).1. Continuing DBP: Synthetic Genetic Analysis of Budding Yeast (Bader, Boone,Andrews)Since 2001, the Bader lab has been collaborating with the Boone and Andrews laboratorieson the analysis and visualization of the budding yeast genetic interaction network. Cytoscapeis in heavy use in the Boone and Andrews labs for this purpose. Accordingly, the Boone andAndrews labs provide a strong scientific driver for Bader lab network visualization and softwareprojects (TRD 5, above). Drs. Andrews and Boone are working to complete the first complete genetic interactionnetwork for a cell and to decipher the general principles that govern these networks. Thisreference map provides a model for expanding genetic network analysis to higher organisms,and it will stimulate valuable insights into gene function, drug target and mode-of-actionanalysis. The resulting complete map of genetic interactions for budding yeast, with ~6000genes, will contain 36 million quantitative interaction pairs (18 million unique pairs). The fundamental principle underlying this DBP is that we need to discover the rules
governing how genes interact with one another in order to be able to predict which rarecombinations of gene mutations cause human disease or other significant phenotypes.Andrews and Boone aim to discover the general principles of genetic interaction by mappingthe first complete genetic interaction network for a eukaryotic cell and directly testing theconservation of these principles. They are taking a unique experimental approach to defineand dissect the rules of complex genetic networks. The strategy entails the use of combinatorialgenetic perturbations to systematically screen for genetic interactions. In particular, theyhave established key infrastructure that enables the construction of all possible double genedeletion mutant combinations in genetically tractable yeast model systems in an automated,high throughput manner. Genetic interactions are subsequently scored by assessing extremephenotypes that result from the collapse of an essential cellular function. This information isassembled into a network that reflects the genetic landscape of a cell. During the reporting period NRNB investigator, Gary Bader, has collaborated with Drs.Boone and Andrews on three new publications, extracting knowledge about protein complexes, regions of protein disorder  and physiological fitness  from comparisons of geneticinteractions on a genome scale. Each of these published projects utilizes Cytoscape for networkanalysis and visualization.References1. Michaut M, Baryshnikova A, Costanzo M, Myers CL, Andrews BJ, Boone C, Bader GD.Protein complexes are central in the yeast genetic landscape. PLoS Comput Biol. 2011Feb;7(2):e1001092. Epub 2011 Feb 24. PMID: 21390331; PMCID: PMC3044758.2. Bellay J, Han S, Michaut M, Kim T, Costanzo M, Andrews BJ, Boone C, Bader GD, MyersCL, Kim PM. Bringing order to protein disorder through comparative genomics and geneticinteractions. Genome Biol. 2011 Feb 16;12(2):R14. PMID: 21324131.3. Baryshnikova A, Costanzo M, Kim Y, Ding H, Koh J, Toufighi K, Youn JY, Ou J, San Luis BJ,Bandyopadhyay S, Hibbs M, Hess D, Gingras AC, Bader GD, Troyanskaya OG, Brown GW,Andrews B, Boone C, Myers CL. Quantitative analysis of fitness and genetic interactions inyeast on a genome scale. Nat Methods. 2010 Dec;7(12):1017-24. Epub 2010 Nov 14. PMID:21076421.2. New CSP: Dynamic Genetic Networks Underlying DNA Damage (Ideker, Krogan)A very successful new CSP begun in the past year involves the laboratories of Trey Ideker(representing the NRNB) and Nevan Krogan at UCSF. The goal of this project is to understandthe extent to which genetic and protein networks are remodeled by changes in conditions.Indeed, although cellular behaviors are dynamic, the networks that govern these behaviors havebeen mapped primarily as static snapshots. To explore network dynamics, Ideker and Krogan are collaborating to generateinteraction networks as cells are exposed to different cellular stresses and stimuli. To analyzethe resulting network dynamics, the team has developed a new method we call differentialepistasis mapping (dE-MAP) which identifies “differential” interactions based on their changes ininteraction strength observed between two static conditions. Analyzing network data to identifydifferential interactions is very similar to analyzing gene expression microarrays to identify
differential expression, or using ICAT or ITRAC mass spectrometry to identify differentiallyexpressed proteins or protein post-translational modifications. Two-color microarraysrevolutionized gene expression analysis because they permitted direct comparison of twoconditions and thus identification of differentially expressed genes. In the same way, we feelthat differential analysis will be key to extracting the major response pathways encoded by alarge biological network. As proof-of-principle, we have recently used the dE-MAP approach to map widespreadchanges in genetic interaction among yeast kinases, phosphatases, and transcription factors asthe cell responds to DNA damage . In the published study, analysis of differential interactionsproved very effective at identifying DNA repair pathways, highlighting new damage-dependentroles for the Slt2 kinase, Pph3 phosphatase, and histone variant Htz1. This analysis alsorevealed that protein complexes are generally stable in response to perturbation, but thefunctional relations between these complexes are substantially reorganized. This proof-of-principle work suggests that differential networks chart a new type ofgenetic landscape that will be invaluable for mapping many different cellular responses tostimuli. We are now applying the dE-MAP procedure to examine the interaction dynamicsamong yeast genes involved in cellular processes such as autophagy, aging, and the responseto chemotherapeutic compounds. This research is highly complimentary to the work of theBader, Boone and Andrews laboratories described above (see Synthetic Genetic Analysis ofBudding Yeast), which seeks to map the entire genetic network in yeast for a single condition.This work is in continued collaboration with Nevan Krogan as well as with a cadre of otherinvestigators.References1. Bandyopadhyay S, Mehta M, Kuo D, Sung MK, Chuang R, Jaehnig EJ, Bodenmiller B,Licon K, Copeland W, Shales M, Fiedler D, Dutkowski J, Guénolé A, van Attikum H, ShokatKM, Kolodner RD, Huh WK, Aebersold R, Keogh MC, Krogan NJ, Ideker T. Rewiring ofgenetic networks in response to DNA damage. Science. 2010 Dec 3;330(6009):1385-9. PMID:21127252; PMCID: PMC3006187.NRNB Software and Resources1. Cytoscape CoreCytoscape (http://cytoscape.org) is a core research tool either used by the majority of projectsand collaborations engaged by the NRNB. As such, the development and maintenance ofCytoscape receives a large amount of attention. Cytoscape development is progressing alongtwo fronts. First, we are continuing to maintain the existing 2.8 series of releases. Second,we are developing version 3.0 of Cytoscape which represents a significant evolution of ourarchitecture in order to modularize the core of Cytoscape, define a clear and consistent API,and simplify the experience of developing and maintaining plugins for Cytoscape. Cytoscape 2.8.0 was released in October of 2010 and a subsequent maintenanceversion 2.8.1 was released in February of 2011. Version 2.8 introduces two powerful newfeatures that, when used together, can create rich visualizations . These features are
custom node graphics and attribute equations. Custom node graphics allow Cytoscape endusers to map arbitrary graphical images onto nodes in a Cytoscape network using the existingVizMapper interface. Attribute equations provide Excel-like functionality to the Cytoscapeattribute browser. We provide a variety of functions that allow normal Cytoscape attributes(numbers, strings, lists) to be manipulated in common ways within Cytoscape. The purposeof attribute equations is not to supplant the use of R or Excel for data analysis, but ratherto provide a convenient means for users to manipulate data within Cytoscape. Combiningcustom node graphics with attribute equations permits the generation of rich graphics. Forexample, given a Cytoscape node attribute linking each node to a corresponding identifier inthe Protein Data Bank (PDB), one is able to write an equation that concatenates the identifierstring together with other text to form a complete URL pointing to an image of the 3D structureprovided on the PDB website. It is then possible to map this URL to a node for which the URLis interpreted as an image resulting in the 3D structure of the specified protein being displayedon the node image in the network view. In conjunction with Cytoscape 2.8, we have also begun developing the next generationof Cytoscape, version 3.0. The Cytoscape 3.0 development effort has resulted in the firstdeveloper milestone release of 3.0 at the end of January 2011. The purpose of this milestonewas to present a functioning application to the core Cytoscape development team so that theycould begin porting plugins and providing feedback on the 3.0 Application Programmer Interface(API). The Bader Group has ported a number of core plugins from 2.8, including BioPAX andPathwayCommons, and they have implemented session reading and writing. In early March2011 we held a small meeting of core developers at UC San Diego to discuss the designof Cytoscape 3.0 and to plan the remaining development efforts that are required. We arecurrently on track to release developer milestone 2 prior to the 2011 Cytoscape retreat in May. Although the primary goal of Cytoscape 3.0 is to have feature parity with Cytoscape 2.X,there will be new features included as well. We have begun initial development on a “QuickStart” plugin designed to help novice users get their attribute and network data into Cytoscapewas quickly and easily as possible.References1. Smoot ME, Ono K, Ruscheinski J, Wang PL, Ideker T. Cytoscape 2.8: new features for dataintegration and network visualization. Bioinformatics. 2011 Feb 1;27(3):431-2. Epub 2010 Dec12. PMID: 21149340; PMCID: PMC3031041.2. SDSC Triton ResourceIn the short time that we have been using the Triton Resource, our users have used over100,000 hours of CPU time for NRNB projects. We have lined up approximately 500,000additional hours to be used in the next year.3. Open TutorialsWe have developed a unique tutorial management system that caters to developers (with wikitools for creating and updating content), presenters (with prepared slideshows and handouts),and students (with up-to-date online content). Open Tutorials (http://opentutorials.cgl.ucsf.edu)is now the primary source of tutorial material for the Cytoscape project. We recently created
a new Cytoscape tutorial for "Basic Expression Analysis" that uses publicly available humanexperimental data. This tutorial, like the original tutorial for yeast, represents one of the mostcommon use cases of Cytoscape for biologists. The site has received over 1,000 visitors inthe past month, including visits by biologists, clinicians, developers and presenters. Movingforward, this scalable tutorial management solution will allow NRNB to provide tutorial supportservices to a broad community.4. New NRNB WebsiteThe new NRNB website (http://www.nrnb.org) went live in late 2010 within a month of our awardannouncement. The website is the main representation of the NRNB resource for collaboratorsand researchers. The site includes information about available tools, resources, workshopsand training opportunities. There are easy-to-use web forms for requesting services, startinga collaboration, and organizing a training event. We also use these forms for tracking internalactivity throughout the year. Overall, the website is relatively dynamic with continuously updatedevents, news and community interactions. Over the past 5 months, we have registered 34events, 15 news items, 19 internal project updates, and 37 collaborations. During the lastmonth, traffic analytics show that we averaged close to 100 visitors a day. Interestingly, halfof this traffic is coming from our participation in the Google Summer of Code program (seeOutreach section in Research Highlights).