NRNB Annual Report 2012

3,309 views
3,285 views

Published on

The 2012 annual report for the National Resource for Network Biology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
3,309
On SlideShare
0
From Embeds
0
Number of Embeds
1,724
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

NRNB Annual Report 2012

  1. 1. Annual Progress Report - Research Progress 2012 National Resource for Network Biology P41 GM103504 (RR031228) 05/01/2011 - 04/30/2012The 2012 NRNB Network. On the left is a network representation of all NRNB personnel andcollaborators (blue circles), all TRD, DPB, Collaboration, and Service projects (orangediamonds), and associated publications (green triangles). Node size is proportional to thenumber of connections. Thick red borders indicate personnel and projects directly funded by theNRNB P41 grant. On the right is a zoomed inset, inclusive of all NRNB-funded personnelmaking up the vital core of the NRNB network. There are 315 nodes and 404 connections in thenetwork. NRNB funds 41 (13%) of these nodes, which make 217 (54%) of the connections. As aCytoscape network [1], we can interactively explore this representation with our ExternalAdvisory Committee, offering dynamic views of our projects, collaborations and budgets. Alsosee Appendix A for a full-page view of the entire network.1. Smoot ME, Ono K, Ruscheinski J, Wang PL, Ideker T (2011) Cytoscape 2.8: New features for dataintegration and network visualization. Bioinformatics 27:431–432.
  2. 2. SummaryContinued advances in high-throughput experimental technologies release enormous amountsof interaction data into the public domain. Analysis of these interactions – and the networks theyform – relies in large part on robust bioinformatics technology. The mission of the NRNB(nrnb.org) is to develop and support a suite of bioinformatics tools that broadly enable the studyof network biology. In our second year as a resource, we have significantly advanced our goalsthrough basic research, collaboration, dissemination of software tools, and community support.Here, we describe our progress in research, both basic and collaborative. This progressincludes algorithms for identification of network substructures (modules); use of networkmodules for patient diagnostics; tools to enable new network analyses and visualizations; andmajor new versions of our Cytoscape platform and plugin website. Each progress report below specifies the associated personnel and FTEs funded by theNRNB grant. In terms of our own research, NRNB enables a stable effort from each of theresource member sites, ranging from 0.48 to 1.08 FTEs. Many of these TRD projects leverageeffort from other grants and funding mechanisms as well in order to maximize the return oninvestment. Nevertheless, without NRNB support, these projects would be significantlydiminished, if not discontinued, and would lack the cohesion and synergy provided by a networkbiology resource (see reports #1-7 below). In terms of the services, training and dissemination, the impact of the NRNB resource isclear. Specifically, the extra effort needed to drive our mailing list response rate from 64% to93% is due to this resource (see Administrative Information report); the Open Tutorials systemfor collecting, maintaining and serving tutorial materials; the administration of NRNB’sparticipation in Google Summer of Code and our new NRNB Academy (see report #9 below);the organization of annual Network Biology SIG and Cytoscape Retreat meetings; the newCytoscape App Store, which will catalyze Cytoscape user and developer communities (seereport #10 below). These efforts are maintained by the 0.5 FTE executive director and 0.3 FTEcommunications coordinator roles defined and funded by NRNB. And finally, NRNB has wide-ranging impact on biomedical research, both nationally andinternationally through its collaboration projects. NRNB member sites were collectivelymaintaining an estimated two dozen collaborations prior to the formation of this Resource.During the first year, we established close to 40. And now at the conclusion of our second year,NRNB maintains almost 100 collaboration projects. These project range from the application ofCytoscape as a research tool for network analysis and visualization, to the development ofCytoscape plugins for custom data types and analyses, to the development and application ofother network and pathways tools and resources for network biology (see report #8 below). Thisactivity is a direct result of NRNB roles for executive director, communications coordinator and,new this year, collaboration coordinator (0.5 FTE). We’ve come a long way in just two years, and NRNB is still getting up-to-speed. Withcontinued support, we are committed to maintaining and growing these efforts as a Resourcefor the network biology community.
  3. 3. ContentsI. Technology Research and Development: Progress and ApplicationsWithin each TRD report, we have separated the description of development efforts from the applicationsof each technology for our own groups and our DBPs. References and figures are provided for eachproject and numbered independently. 1. Identification of Network Modules as Biomarkers (Ideker) 2. Network Analysis Tools for Cancer Genomics (Sander) 3. Network Analysis Methods for Inferring Causality in Networks (Sander) 4. Using Cytoscape for Social Network Research (Fowler, Pico) 5. Cytoscape 3.0 for the Visualization and Representation of Biological Networks (Bader) 6. Visualizing Complex Networks as Ontology-Partitioned Mosaics (Pico) 7. The CYNI Modular Network Inference Framework (Schwikowski)II. Collaboration and Service Projects: ProgressIn addition to the direct impact of our TRD projects on our research, NRNB also impacts new sciencethrough our many CSPs. A description for each CSP is provided in the bulk of the report. Here, wesummarize the efforts. 8. New Collaborations 9. Google Summer of Code and NRNB AcademyIII. Progress on Supplemental Award, 2011-2013We were awarded a two-year supplemental grant to work on the Cytoscape App Store. This is a progressreport on the first half of the first year. 10. The Cytoscape App Store (Pico)Appendix A. The 2012 NRNB NetworkA full-page view of this year’s network representation of NRNB.
  4. 4. I. Technology Research and Development: Progress and ApplicationsWithin each TRD report, we have separated the description of development efforts from the applicationsof each technology for our own groups and our DBPs. References and figures are provided for eachproject and numbered independently.1. Identification of Network Modules as Biomarkers (Ideker, 0.5 FTE: Mike Smoot,Rintaro Saito, Kei Ono)Biomarkers are typically thought of as individual genes or proteins. However, we and othershave demonstrated that biological pathways and protein interaction networks, which integratemany individual proteins under a common function, can serve as powerful biomarkers and insome cases are also more predictive [1-4]. Our ActiveModules method [1]is an unsupervisedapproach that first projects molecular profiles (e.g. mRNA or methylation profiles) onto thecorresponding nodes in an existing protein interaction map. Subsequently, a network search isperformed to identify connected subnetworks (i.e. network modules) whose average node valueis higher or lower than expected by chance. The PinnacleZ method [2] is similar toActiveModules but supervised: each molecular profile is associated with a class label (i.e.cancer subtype) and a network search is performed to identify network modules whose averagevalue is predictive of this sample class. Both PinnacleZ and ActiveModules are implemented asplugins to Cytoscape. Several tools by others, such as the successful HotNet algorithm [5], havebeen based on ideas introduced by the ActiveModules approach. The advantage of suchapproaches over regular clustering and classification methods is that they associate themolecular features with physical or functional structures, providing a wealth of hypotheses aboutthe pathway mechanisms underlying an observed set of molecular profiles. In some cases theyalso provide more robust classification performance. Our projects have been pursuingtechnological advances to better reveal network modular structure, define network logicfunctions associated with disease outcomes, and extend existing network-biomarkerapproaches to multiple types of molecular and phenotypic data. While ActiveModules and PinnacleZ use simple summary functions such as ‘average’ or‘median’ to summarize the activity of the genes within a module, these functions do not capturethe rich logical relationships known to occur within biological pathways. During the previousreporting period we have developed an approach called Network Guided Forests (NGF) whichdetects more complex logical relationships within modules such as AND, OR, A AND NOT B,XOR and so on [6]. NGF integrates key ideas from decision trees and Random Forests [7] withbiological constraints induced by a protein-protein interaction network – the first use of proteinnetworks in ensemble learning. The result is that, rather than relying on a general measure ofmodule activity, NGF fits decision trees to each module directly from data thus capturingpotentially complex network activities. In this reporting period we have further developed themethod. While many existing methods still use only one type of molecular feature (e.g. geneexpression levels or SNPs) and a single type of molecular interaction data (e.g. protein-proteininteractions), we anticipate that key improvements will come from integrating multiple layers ofmolecular measurements, as well as different types of interaction networks. Extending previouswork by other groups (see e.g. [5]) we have developed a preliminary version of a new diffusion-based method that is able to map disease-perturbed networks using combined evidence frommultiple heterogeneous data sources (Figure 1). Preliminary results suggest that networkmodules supported by multiple data layers improve robustness and interpretability and providemore complete models of the disease.
  5. 5. Figure 1. Map of network modules and associations integrating multiple data layers.Large orange nodes are modules enriched for somatic mutations while large blue nodes aremodules of genes highly over-expressed in cancer (TCGA level 3 data, z > 100 compared tocontrol). Gene size is scaled according to the percentage of the cohort in which they are alteredrelative to other genes in the module. Edges within a module represent protein interactionswhile weighted edges between modules represent statistical associations between modules.Insets in the top-left and top-right corner highlight representative modules for over-expressionand mutations, respectively.ApplicationsUsing NGF, we analyzed gene expression data gathered for diverse biological programsincluding breast cancer metastasis [8,9] or mesenchymal transformation of brain tumors [10].These case studies showed that, unlike the gene sets identified by regular Random Forests, thenetwork modules identified by NGF are highly enriched for known causal mechanisms ofdisease (e.g. dominated by known oncogenes and tumor suppressors), and they have veryconsistent performance across different sample cohorts. In this reporting period we have performed multiple analysis of additional large datasetsincluding those collected by one of our DBPs, The Cancer Genome Atlas (TCGA) [11]. Throughthis analysis we have identified and bioinformatically validated predictive modules found by NGFto associate with the specific subtypes of glioblastoma. The most predictive module associatedwith the mesenchymal subtype was strongly supported by independent transcriptional datasets.On the basis of these findings, this module is now being validated experimentally. We alsopublished an abstract with another one of our DBPs on a subnetwork-based analysis of chroniclymphocytic Leukemia, associating particular pathways with the progression of the disease [12]. Given a library of genes and network modules selected using various types of moleculardata, we can now investigate the relationships among these units such as the associationbetween a germline SNP and the output of a differentially-expressed network (i.e., an eQTL) orthe association between a pathway enriched for somatic cancer mutations and a clinical
  6. 6. phenotype such as survival. Together with our DBP, we have used this method to analyze TheCancer Genome Atlas (TCGA) Ovarian Cancer data (somatic mutations and expressionprofiles) using the HPRD protein interaction network. We identified modules enriched for geneticmutations, as well as modules highly over-expressed in cancer compared to normal tissue. Nextwe investigated all pairwise correlations between modules to reveal modular associations bothwithin and between the two data layers (Figure 1). Based on this preliminary analysis weconclude that the existing data and our toolset will enable us to construct multi-level modularmaps of cancer that will significantly extend single-level network models provided by currentmethods [13].References1. T. Ideker, O. Ozier, B. Schwikowski, A. F. Siegel, Discovering regulatory and signalling circuits inmolecular interaction networks. Bioinformatics 18 Suppl 1, S233 (2002).2. H. Y. Chuang, E. Lee, Y. T. Liu, D. Lee, T. Ideker, Network-based classification of breast cancermetastasis. Mol Syst Biol 3, 140 (2007).3. E. Lee, H. Y. Chuang, J. W. Kim, T. Ideker, D. Lee, Inferring pathway activity toward precise diseaseclassification. PLoS Comput Biol 4, e1000217 (Nov, 2008).4. I. W. Taylor et al., Dynamic modularity in protein interaction networks predicts breast cancer outcome.Nat Biotechnol 27, 199 (Feb, 2009).5. F. Vandin, E. Upfal, B. J. Raphael, Algorithms for detecting significantly mutated pathways in cancer. JComput Biol 18, 507 (Mar, 2011).6. J. Dutkowski, T. Ideker, Protein networks as logic functions in development and cancer. PLoS ComputBiol, (2011).7. L. Breiman, Random forests. Machine Learning 45, 5 (2001).8. Y. Wang et al., Gene-expression profiles to predict distant metastasis of lymph-node-negative primarybreast cancer. Lancet 365, 671 (Feb 19-25, 2005).9. L. J. van t Veer et al., Gene expression profiling predicts clinical outcome of breast cancer. Nature 415,530 (Jan 31, 2002).10. H. S. Phillips et al., Molecular subclasses of high-grade glioma predict prognosis, delineate a patternof disease progression, and resemble stages in neurogenesis. Cancer Cell 9, 157 (Mar, 2006).11. R. G. Verhaak et al., Integrated genomic analysis identifies clinically relevant subtypes ofglioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell 17, 98 (Jan19, 2010).12. Chuang, Han-Yu, et al., Subnetwork-Based Analysis of Chronic Lymphocytic Leukemia IdentifiesPathways That Associate with Disease Progression, ASH Annual Meeting Abstracts 2011 118: 3564.13. P. T. Spellman et al., Integrated genomic analyses of ovarian carcinoma. Nature 474, 609 (Jun 30,2011).2. Network Analysis Tools for Cancer Genomics (Sander, 0.65FTE: Ben Gross,Ethan Cerami)As described in our previous progress report, the first TRD project at MSKCC is focused onbuilding network analysis tools for interpreting high-throughput cancer genomic data sets. Ourprimary focus is building user friendly, open source tools for visualizing and analyzingmultidimensional cancer genomic data sets (including copy number, mutation, and mRNAexpression) in the context of known biological pathways and interaction networks, and makingthese tools broadly available within the cancer research community. Providing such tools to thecancer research community is critical, as numerous large-scale projects, including the CancerGenome Atlas (TCGA) project and the International Cancer Genome Consortium (ICGC), are
  7. 7. profiling dozens of cancer types and subtypes. Identifying altered pathways and networks withineach of these cancer types remains a critical and open challenge. During our first year of NRNB funding, we completed a prototype project for displayingmulti-dimensional cancer genomic data in the context of molecular interaction networks. Wechose to implement the prototype in Cytoscape Web [1], as Cytoscape Web does not requireany additional software installation or require Java Web Start. It therefore significantly lowersthe barriers for usage, particularly for biologists and clinical researchers – two of our main targetuser groups. In this progress report, we describe the transition of our tools from prototype toproduction mode, and describe how we have now made our software available to the entirecancer research community. Specifically, our NRNB-funded network tools are now availablewithin the cBio Cancer Genomics Portal, where it enables cancer researchers to performnetwork analysis on up to 20 different cancer types, including TCGA-funded projects related toour DBP, such as Glioblastoma Multiforme (GBM) [2] and serous ovarian cancer [3]. As general background, the cBio Cancer Genomics Portal (http://cbioportal.org) is anopen-access resource for interactively exploring multidimensional cancer genomics data sets. Itcurrently provides integrated access to cancer genomic data (including copy number, mutation,mRNA and microRNA expression, methylation, and protein and phosphoprotein data) on morethan 5,000 tumor samples from 20 cancer studies. With a focus on usability and ease of use,the cBio Portal specifically provides integrated access to multiple genomic data types, graphicalsummaries of genomic alterations, survival analysis and predicted functional consequences ofsomatic mutations. All features of the portal are available via a streamlined four-step webinterface, enabling researchers to interactively explore gene sets and pathways, anddynamically broaden or limit the scope of their query. By integrating data on thousands of tumorsamples, and providing a simple, yet powerful and flexible interface, the cBio Portal enablescancer researchers to translate genomic data into biological insights and clinical applications. During the past year, we have added our NRNB-funded network analysis tools to thecBio Portal (launched on November 14, 2011), and have made the functionality freely availableto the scientific community. The network functionality (Figure 1) is directly available via the maincancer query interface, and the portal now automatically generates a cancer specific network ofinterest, based on seed genes specified by the user. This network consists of pathways andinteractions from the Human Reference Protein Database (HPRD) [4], Reactome [5], NCI-Nature [6], and the MSKCC Cancer Cell Map (http://cancer.cellmap.org), as derived from theopen source Pathway Commons Project [7].
  8. 8. Figure 1. Network visualization and analysis now available within the cBio CancerGenomics Portal (http://cbioportal.org). A. Network view of TP53 in TCGA GlioblastomaMultiforme (GBM). Network of interest generated from the seed gene of TP53; MDM2 andMDM4 are highlighted. B. The portal overlays multi-dimensional genomic data (copy number,mutation, and mRNA expression) onto all nodes in the network. C. All edges are color-coded byinteraction types. Interaction types are derived from the BioPAX to Simple Interaction (SIF)inference rules [7]. For example, In Same Component indicates that Genes A and B areinvolved in the same biological component, such as a complex; State Change indicates thatGene A causes a state change, such as a phosphorylation change within Gene B; Other is usedto indicate all other types of interactions, including protein-protein interactions derived fromHPRD. D. Options for filtering, cropping and searching the network of interest.By default, the network of interest contains all neighbors of all seed genes specified by the user.If more than 50 neighbor nodes exist in the network, all genes are ranked by the frequency ofgenomic alteration within the specified cancer study, and less frequently altered genes areautomatically pruned from the network. By default, the portal also automatically overlays multi-dimensional genomic data onto each node, highlighting the frequency of alteration by mutationand copy number alteration (and optionally mRNA up/down regulation). This provides aneffective means of managing network complexity, while automatically highlighting those genesmost directly relevant to the cancer type in question. One can also download the full, non-pruned network for more complete visualization and analysis.In addition, users can filter the network by alteration frequency, highlight all neighbors of aselected gene, hide specific nodes, crop to a selected set of nodes, or search the network bygene symbol. These features enable cancer researchers to identify new cancer-specific genesthat go beyond the original set of seed genes, and provide an effective means for discoveringnovel cancer genes and novel genomic alterations. As originally outlined in our grant application, our goal is to eventually integrate cancergenomic data, pathway data and drug target data. In the next year, we therefore intend to focuson extending the network feature to include drug data and drug target information. We initiallyplan to integrate drug data from DrugBank [8], but are also evaluating other sources, including:ChEBI [9], NCBI PubChem [10], and PharmGKB [11].ApplicationsSee next section for summary of applications for this and the next TRD project.References1. Lopes CT, Franz M, Kazi F, Donaldson SL, Morris Q, Bader GD: Cytoscape Web: an interactive web-based network browser. Bioinformatics 2010, 26(18):2347-2348.2. TCGA: Comprehensive genomic characterization defines human glioblastoma genes and corepathways. Nature 2008, 455(7216):1061--1068.3. Integrated genomic analyses of ovarian carcinoma. Nature 2011, 474(7353):609-615.4. Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, Telikicherla D,Raju R, Shafreen B, Venugopal A et al: Human Protein Reference Database--2009 update. Nucleic acidsresearch 2009, 37(Database issue):D767-772.5. Matthews L, Gopinath G, Gillespie M, Caudy M, Croft D, de Bono B, Garapati P, Hemish J, HermjakobH, Jassal B et al: Reactome knowledgebase of human biological pathways and processes. Nucleic acidsresearch 2009, 37(Database issue):D619-622.
  9. 9. 6. Schaefer CF, Anthony K, Krupa S, Buchoff J, Day M, Hannay T, Buetow KH: PID: the PathwayInteraction Database. Nucleic acids research 2009, 37(Database issue):D674-679.7. Cerami EG, Gross BE, Demir E, Rodchenkov I, Babur O, Anwar N, Schultz N, Bader GD, Sander C:Pathway Commons, a web resource for biological pathway data. Nucleic acids research, 39(Databaseissue):D685-690.8. Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, Pon A, Banco K, Mak C, Neveu V et al: DrugBank3.0: a comprehensive resource for omics research on drugs. Nucleic acids research 2011, 39(Databaseissue):D1035-1041.9. de Matos P, Alcantara R, Dekker A, Ennis M, Hastings J, Haug K, Spiteri I, Turner S, Steinbeck C:Chemical Entities of Biological Interest: an update. Nucleic acids research 2010, 38(Databaseissue):D249-254.10. Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Zhou Z, Han L, Karapetyan K, Dracheva S, ShoemakerBA et al: PubChems BioAssay Database. Nucleic acids research 2012, 40(Database issue):D400-412.11. McDonagh EM, Whirl-Carrillo M, Garten Y, Altman RB, Klein TE: From pharmacogenomic knowledgeacquisition to clinical applications: the PharmGKB as a clinical pharmacogenomic biomarker resource.Biomarkers in medicine 2011, 5(6):795-806.3. Network Analysis Methods for Inferring Causality in Networks (Sander,0.65FTE: Ben Gross, Ethan Cerami)The goal of our second TRD project is to algorithmically infer causality within signaling networksfrom specific perturbation-induced experiments. High-throughput screens conducted withlibraries of small molecules or inhibitory RNAs have the ability to identify compounds that inducetumor suppressive responses in cancer cells [1]. While the effects of such perturbations can beeasily linked to transcriptional changes, identifying the causal mechanism is a main challenge.In a collaboration with Somwar and colleagues [2], we used a computational approach to predictthe target of a small molecule inducing reduced growth in lung adenocarcinoma cell lines.Interestingly, experimental follow up confirmed the prediction. Building on this concept, we have started working on computational approaches toreconstruct the causal signaling cascade inducing observed transcriptional changes withinperturbed cell lines. With NRNB funding, we have previously explored the use of an optimizationalgorithm borrowed from statistical physics to connect altered genes in cancer into minimalspanning networks. Now, we have begun to use the same approach to identify the minimal setof interactions able to connect genes that are differentially expressed after a perturbation, withcandidate targets of the same perturbation (Figure 1).
  10. 10. Figure 1. Given a perturbation and an observed response, the proposed network analysisalgorithms that we are developing aim to identify the perturbat-ion target and the signalingcascade inducing the observed transcriptional response.Our approach relies on an algorithm that solves the Steiner-tree problem. Given a set of“terminal” nodes, the Steiner-tree is defined as the tree of minimum weight connecting theseterminals, allowing the inclusion of additional nodes. Differentially expressed genes after aperturbation and/or candidate targets of the same perturbation can be used as terminals. Theresulting Steiner-tree can therefore contain both gene interactions able to explain the observedtranscriptional changes, and the putative target of the perturbation. This research remains awork in progress, and we are continuing to explore new algorithmic frameworks.ApplicationsLarge-scale cancer genomics projects, such as the Cancer Genome Atlas (TCGA), and theInternational Cancer Genome Consortium (ICGC), are providing an unprecedented and high-resolution view of the molecular defects in dozens of cancer types [3]. A key open challenge isto identify biological pathways that are frequently perturbed within tumor cells and lead to theacquisition of tumorigenic properties, such as cell proliferation, angiogenesis or metastasis [4,5]. A number of algorithmic methods have been identified for discovering altered networks andpathways in cancer, including: Mutually Exclusive Modules in Cancer (MEMo) [6], PARADIGM[7], and HotNet [8]. The network analysis tools we have built for our TRD enable researchers to interactivelyexplore perturbed pathways and networks in cancer. Unlike the algorithmic methods describedabove, the tools we have developed are specifically designed to support exploratory dataanalysis and hypothesis generation, and are designed for widespread use within the widercancer research community. By specifically adding network features to the cBio CancerGenomics Portal, we have also enabled network analysis on the full TCGA data set. In addition,the portal has become a crucial tool within TCGA and is actively used by a large number ofTCGA disease working groups, including serous ovarian cancer, colorectal cancer, breastcancer, and lung cancer (see collaborations). To cite one concrete translation application, we used the network analysis features ofthe portal to identify genomic alterations in the homologous recombination (HR) DNA repairpathway in serous ovarian cancer. BRCA1 and BRCA2 are known to be involved in the HRPathway, but additional defects may also abrogate HR functionality, leading to potentialsensitivity to PARP inhibitors [9]. To identify potential HR defects in ovarian cancer, we usedBRCA1 and BRCA2 as seed nodes for the network view and explored the resulting alterednetwork of interest (Figure 2A). By this means, we quickly identified alterations inC11orf30/EMSY (6% by amplification, 1.6% by mutation), a known interactor of BRCA2, and apossible alternate means for abrogating HR functionality [9]. We also readily identified all alteredFanconi Anemia genes (another family of genes involved in the HR pathway [9]), and identifiedlow frequency alterations in FANCA (altered in 3.5% of patients) and FANCE (2.8% of patients).Combining these results with other genes known to be involved in the HR pathway, our DBP(TCGA) was able to identify potential defects in the HR pathway in up to half of all patients,providing a rationale for including such cases in clinical trials involving PARP inhibitors (Figure2B) [10].
  11. 11. Figure 2: Extent of homologous recombination (HR) repair defects in serous ovariancancer. A. Network view of BRCA1/BRCA2 in TCGA serous ovarian cancer. BRCA1 andBRCA2 are seed genes (indicated with thick border), and all other genes are automaticallyidentified as altered in ovarian cancer. Multidimensional genomic details are shown for FANCA,FANC3 and C11orf30/EMSY. Darker red indicates increased frequency of alteration (defined bymutation, copy number amplification or homozygous deletion) in ovarian cancer. B. Extent ofHR defects in TCGA Ovarian Samples. Reprinted from [10].References1. Somwar R, Shum D, Djaballah H, Varmus H: Identification and preliminary characterization of novelsmall molecules that inhibit growth of human lung adenocarcinoma cells. Journal of biomolecularscreening 2009, 14(10):1176-1184.2. Somwar R, Erdjument-Bromage H, Larsson E, Shum D, Lockwood WW, Yang G, Sander C, OuerfelliO, Tempst PJ, Djaballah H et al: Superoxide dismutase 1 (SOD1) is a target for a small moleculeidentified in a screen for inhibitors of the growth of lung adenocarcinoma cell lines. Proceedings of theNational Academy of Sciences of the United States of America 2011, 108(39):16375-16380.3. Stratton MR, Campbell PJ, Futreal PA: The cancer genome. Nature 2009, 458(7239):719--724.4. Hanahan D, Weinberg RA: The hallmarks of cancer. Cell 2000, 100(1):57--70.5. Hanahan D, Weinberg RA: Hallmarks of cancer: the next generation. Cell 2011, 144(5):646-674.6. Ciriello G, Cerami E, Sander C, Schultz N: Mutual exclusivity analysis identifies oncogenic networkmodules. Genome research 2012, 22(2):398-406.7. Vaske CJ, Benz SC, Sanborn JZ, Earl D, Szeto C, Zhu J, Haussler D, Stuart JM: Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM.Bioinformatics 2010, 26(12):i237-245.
  12. 12. 8. Vandin F, Upfal E, Raphael BJ: Algorithms for detecting significantly mutated pathways in cancer.Journal of computational biology : a journal of computational molecular cell biology 2011, 18(3):507-522.9. Turner N, Tutt A, Ashworth A: Hallmarks of BRCAness in sporadic cancers. Nat Rev Cancer 2004,4(10):814-819.10. Integrated genomic analyses of ovarian carcinoma. Nature 2011, 474(7353):609-615.4. Using Cytoscape for Social Network Research (Fowler, 0.72FTE: JanuszDutkowski; Pico, 0.48FTE: Alex Pico, Alex Williams)It is well known that humans tend to associate with other humans who have similarcharacteristics, but it is unclear whether this tendency has consequences for the distribution ofgenotypes in a population. Although geneticists have shown that populations tend to stratifygenetically, this process results from geographic sorting or assortative mating, and it is unknownwhether genotypes may be correlated as a consequence of non-reproductive associations orother processes. In this TRD project, we began with a study of social networks and genotypes from theNational Longitudinal Study of Adolescent Health [1,2] and a replication study on anindependent sample from the Framingham Heart Study. These studies showed that homophilyand heterophily occur on a genetic (indeed, an allelic) level, which has implications for the studyof population genetics and social behavior. In particular, the results suggest that associationtests should include friends genes and that theories of evolution should take into account thefact that humans might, in some sense, be "metagenomic" with respect to the humans aroundthem. The analytical methods developed for these studies were implemented in the R scriptinglanguage, while the visualization methods were provided by a collection of disparate tools, noneof which were tailored for network visualization or for integration with R. During this reporting period, we collaborated with the Pico group on developing newtechnologies for network analysis and visualization that complement and many cases replaceprior methods. In particular, we developed the CyNetworkSignificance plugin, which can performthe same analysis pipeline formerly executed in R and other chart and network visualizationtools, but all in a single tool, integrated with wide-ranging functionality through other plugins.After loading a social network into Cytoscape together with genotypic or other data attributes,you can launch CyNetworkSignificance and customize the following parameters. Select the dataattribute to use for correlation. Select the correlation method (e.g., Pearson). Choose thenumber of randomized trials to compare against and randomization method (e.g., shufflenodes). The hit “Run” and the plugin will calculate correlation values for the original network andeach of the randomly generated networks for each Nth-degree represented in the network (e.g.,from pairs of nodes directly connected, to pairs of nodes connected by N-degrees ofseparation). These correlation values match the results of the existing R analysis. We will alsoadd a histogram visualization feature to the plugin before its official release (Fig 1.)
  13. 13. Figure 1. Social network of the Hadza hunter-gatherers of Tanzania. This analysis inCytoscape reproduces the results published earlier this year in Nature by Fowler et al., thatshow a strong social network-dependence on the donation of public goods across and withingroups [3]. The histogram plot is a mock-up at this stage, but based on the correlation valuescalculated by CyNetworkSignificance on the original and randomized networks.For extended R analyses, we are leveraging a new community-contributed plugin calledRCytoscape, which allows us to send network data to Cytoscape from within R after completingan analysis. The network and associated node and edge attributes are then available forvisualization and analysis within Cytoscape. The workflows enabled by these technologies willsupport the types of analyses we are most interested in pursuing through our DBPs andcollaborations. The NRNB grant has provided not only direct funding for my group, but also has createda unique fluidity of ideas and effort across NRNB sites. This project, for example, would notlikely have been initiated (let alone completed) outside of this resource organization, where wecould immediately launch and execute the work in collaboration with the Pico group withoutestablishing a new subcontract. The success of this intra-NRNB collaboration serves as apractical example of how our resource can work in new ways and will likely inspire future cross-group activities.ApplicationsWe just recently completed the technical implementation of the new Cytoscape plugin and Rworkflows. We have performed post-hoc analyses on prior datasets to confirm the reproductionof results from the prior methods. Indeed, the tools work well and should streamline futureanalyses. During the next reporting period we will apply the new technologies from this TRD toour ongoing research, DBPs and Collaborations. Specifically, we will be following up on thefindings above with a genome-wide study of correlated genotypes with the goal of using
  14. 14. associations to learn more about the role of networks in recent human evolution. By correlatingthese associations with measures of nucleotide diversity, we hope to show that the genotypesunder strongest friendship selection are also those under the strongest natural selection. In the meantime, we continue to publish with and track the work of our DBPs, applyingsocial network analysis methods to the study of obesity and aspirin use and cardiovascularevents [4,5]References1. Fowler JH, Dawes CT, Christakis NA. Model of genetic variation in human social networks.Proc Natl Acad Sci U S A. 2009 Feb 10;106(6):1720-4. Epub 2009 Jan 26. PMID: 19171900;PMCID: PMC2644104.2. Fowler JH, Settle JE, Christakis NA. Correlated genotypes in friendship networks. Proc NatlAcad Sci U S A. 2011 Feb 1;108(5):1993-7. Epub 2011 Jan 18. PMID: 21245293, PMC30333153. Coren L. Apicella, Frank W. Marlowe, James H. Fowler and Nicholas A. Christakis. Social networksand cooperation in hunter-gatherers. Nature, Vol. 481, Pg. 497-501.4. Block JP, Christakis NA, OMalley AJ, Subramanian SV. Proximity to food establishments and bodymass index in the Framingham Heart Study offspring cohort over 30 years. Am J Epidemiol. 2011 Nov15;174(10):1108-14. Epub 2011 Sep 30.5. Strully KW, Fowler JH, Murabito JM, Benjamin EJ, Levy D, Christakis NA.Aspirin use andcardiovascular events in social networks. Soc Sci Med. 2012 Apr;74(7):1125-9. Epub 2012 Feb.5. Cytoscape 3.0 for the Visualization and Representation of Biological Networks(Bader, 1.0FTE: Christian Lopes, Jason Montojo)Our major activity over the past year has been to ensure that Cytoscape 3.0 supports theadvanced visualization and representation features that we proposed in the NRNB grant, both insystem design and performance. This has required major effort porting visualization featuresfrom Cytoscape 2.8 and developing new visualization features in Cytoscape 3.0 to test thedesign of the new Cytoscape 3 application programming interfaces (APIs). For instance, weworked with the Ideker software development team to port Cytoscape 2 graph layout algorithmsto Cytoscape 3. We also developed a full featured 3D graph visualization and layout system totest that Cytoscape can handle multiple types of visualization systems at the same time(http://wiki.cytoscape.org/Cytoscape_3/3D_Renderer). This resulted in a substantially improveddesign for support of multiple simultaneous visualization engines in Cytoscape 3. Finally, weworked in collaboration with the i-Vis Information Visualization Research Group of BilkentUniversity to develop a compound node model for Cytoscape Web, which is a necessary featurefor pathway visualization on the web and full compatibility with the Cytoscape 3 network model. We are also laying the groundwork for representation and visualization of detailedbiological pathway information in Cytoscape 3. We have completed the following activities in thisarea. ● Tested and updated the design of the core Cytoscape 3 model to ensure hierarchical network models can be stored, queried, saved and loaded. This is the foundation for many advanced visualization features that we proposed in the grant, such as hierarchical views necessary for biological pathway visualization. ● Developed a prototype of a new app that uses the latest Cytoscape 3 API and Pathway Commons web services and client API, which provides search, access, and analysis of biological pathway information from the BioPAX Level 3 data warehouse (warehouse development funded by the Pathway Commons project). Also, we ensured that biological pathway information in the standard BioPAX format can be seamlessly mapped to the Cytoscape 3 network model.
  15. 15. Ensuring Cytoscape 3 will enable our stated aims has required tremendous effort, in that wehave need to implement a number of prototype features to test that the API design is robust.This work will pay off in 2012-2013 as we finally release Cytoscape 3 and start working on novelvisualization features in earnest.ApplicationsWhile Cytoscape 3 work is still in the active development phase and we anticipate manyapplications next year and beyond, we continue to maintain our highly successful EnrichmentMap visualization plugin for Cytoscape 2.8, responding to frequent requests by users for newfeatures. This visualization tool is heavily used in all of our collaborations with local biologygroups (see Collaboration and Service Projects) and by others (the papers describing themethod garnered almost 40 citations since 2010 [1]). In the following year, we plan to port thissystem to Cytoscape 3.0 and to integrate it with popular pathway enrichment analysis software,such as the Gene Set Enrichment Analysis (GSEA) software from Jill Mesirov’s group at theBroad Institute, MIT. We also continue to publish with and follow the work of our DBPs, whomhave had a very productive year applying Cytoscape and network analysis approaches to thestudy of the yeast interactome, genetic interactions and metabolism [2-5].References1. Merico D, Isserlin R, Stueker O, Emili A, Bader GD. Enrichment map: a network-based method forgene-set enrichment visualization and interpretation. PLoS One. 2010 Nov 15;5(11):e13984. PMID:21085593; PMCID: PMC2981572.2. Baryshnikova A, Costanzo M, Kim Y, Ding H, Koh J, Toufighi K, Youn JY, Ou J,San Luis BJ,Bandyopadhyay S, Hibbs M, Hess D, Gingras AC, Bader GD, Troyanskaya OG, Brown GW, Andrews B,Boone C, Myers CL. Quantitative analysis of fitness and genetic interactions in yeast on a genome scale.Nat Methods. 2010 Dec;7(12):1017-24. Epub 2010 Nov 143. Bellay J, Atluri G, Sing TL, Toufighi K, Costanzo M, Ribeiro PS, Pandey G,Baller J, VanderSluis B,Michaut M, Han S, Kim P, Brown GW, Andrews BJ, Boone C, Kumar V, Myers CL. Putting geneticinteractions in context through a global modular decomposition. Genome Res. 2011 Aug;21(8):1375-87.Epub 2011 Jun 294. Magtanong L, Ho CH, Barker SL, Jiao W, Baryshnikova A, Bahr S, Smith AM,Heisler LE, Choy JS,Kuzmin E, Andrusiak K, Kobylianski A, Li Z, Costanzo M,Basrai MA, Giaever G, Nislow C, Andrews B,Boone C. Dosage suppression genetic interaction networks enhance functional wiring diagrams of thecell. Nat Biotechnol. 2011 May 15;29(6):505-11. doi: 10.1038/nbt.1855.5. Szappanos B, Kovács K, Szamecz B, Honti F, Costanzo M, Baryshnikova A,Gelius-Dietrich G, LercherMJ, Jelasity M, Myers CL, Andrews BJ, Boone C, Oliver SG, Pál C, Papp B. An integrated approach tocharacterize genetic interaction networks in yeast metabolism. Nat Genet. 2011 May 29;43(7):656-62.doi:10.1038/ng.846.6. Visualizing Complex Networks as Ontology-Partitioned Mosaics (Pico,0.48FTE: Alex Pico, Kristina Hanspers)Increasing throughput and quality of molecular measurements in the domains of genomics,proteomics and metabolomics continues to fuel the understanding of biological processes.Collected per molecule, the scope of these data extends to physical, genetic and biochemicalinteractions that in turn comprise extensive networks. One challenge faced by these tools is howto make sense of such networks, which are often represented as massive “hairballs.” Manynetwork analysis algorithms filter or partition networks based on topological features, optionallyweighted by orthogonal node or edge data [1,2]. Another approach is to mathematically modelnetworks and rely on their statistical properties to make associations with other networks,
  16. 16. phenotypes and drug effects, sidestepping the issue of making sense of the network itselfaltogether [3]. Acknowledging that there is still great value in engaging the minds of researchersin exploratory data analysis at the level of networks, we have produced a Cytoscape plugincalled Mosaic [4] to support interactive network annotation and visualization that includespartitioning, layout and coloring based on biologically-relevant ontologies (Fig 1). The ultimateeffect of Mosaic is to present slices of a given network in the visual language of biologicalpathways, which are familiar to any biologist and ideal frameworks for integrating knowledge.Figure 1. Mosaic control panel, context menu and tiled result windows. The control panelshows both the color mapping legend and subnetwork display. Context menus for listedsubnetworks allow the user to partition deeper within a given ontology branch. While Mosaic can run using practically any annotation, the primary usage relies onontology-based annotations, especially Gene Ontology. GO provides a controlled vocabulary ofterms describing key characteristics of gene products (i.e., process, location, and function).Mosaic manages all identifier mapping and ontology annotation functions via integrateddatabases and CyCommand access to CyThesaurus. The program then proceeds to partition,layout and color the provided network. All subnetworks are listed hierarchically, includingsubnetworks that fall outside defined thresholds for display. Selecting a subnetwork in thecontrol panel will bring it into focus in the tiled window view. Additional functions can beaccessed by right-clicking on the name of a particular subnetwork in the control panel. Inparticular, "partition this network to one further level" allows users to interactively partition ahuge network to deep levels of GO efficiently without generating hundreds of other subnetworksfrom parallel branches.ApplicationsThis visualization approach is ideal for many types of ontology-based overrepresentationanalyses. As such, we are now working on an ensemble of plugins to handle the completepipeline from annotation to analysis to visualization. This is in collaboration with two new CSPsestablished during this reporting period. Through these collaborations and others we will publish
  17. 17. a series of reports on the applications of Mosaic and our integrated ontology analysis tools inCytoscape during the next reporting period.References1. Bader, G.D. and Hogue, C.W. (2003) An automated method for finding molecular complexes in largeprotein interaction networks, BMC Bioinformatics, 4, 2.2. Royer, L., et al. (2008) Unraveling protein networks with power graph analysis, PLoS Comput Biol, 4,e1000108.3. Machado, D., et al. (2011) Modeling formalisms in Systems Biology, AMB Express, 1, 45.4. Zhang C, Hanspers K, Kuchinsky A, Salomonis N, Xu D, Pico AR. Mosaic: Making Biological Sense ofComplex Networks. Bioinformatics, 2012. (accepted with minor revisions)7. The CYNI Modular Network Inference Framework (Schwikowski, 1.08FTE: FrankRugheimer, Oriol Guitart)Our goal during this period was the definition, implementation, and testing of workflows fornetwork induction for use in biological application projects and Cytoscape DBPs and CSPs. Asthe other TRD projects, this project, too, requires a combination of domain expertise (research-grade expertise in the area of network induction), which has been available to us for one year atthe time of this writing (Frank Rügheimer, who had been involved in the DBP) and softwareengineering capability, which we found difficult to muster until recently. We therefore proceededto first develop and implement a CYNI prototype in C, and apply it in the context of our DBP, totranscriptome data from the soil bacterium Bacillus subtilis. In a second step (starting March 1,2012), a professional computer engineer with more than five years of experience in industry andacademia (Oriol Guitart-Pla) has begun to integrate these software components into theCytoscape 3 framework. Proceeding in this order had the added advantage that CYNI can nowbe implemented against a stable Cytoscape 3 core. As the prototype was implemented using anobject-oriented design, its translation into Java is straightforward.Definition of the CYNI software componentsThe Figure below outlines the CYNI software architecture and current implementation state. Thecore of the ‘astre Extended prototype’ is a network inference toolbox that provides a data modeland functionality for computing association measures, which are an essential component ofnetwork inference algorithms, from data. This prototype was combined with an external textparser library (distributed under LGPL) and expanded into a functional command-line tool in C.In combination with the prototype implementation of a higher-level path-based network inductionapproach (scoreKO) and supporting command line scripts for preprocessing a completeprocessing pipeline is provided. The pipeline was developed within the DBP, which allowed toevolve design and its implementation in its application context, and helped guide the integrationof software features towards relevant requirements of that application.
  18. 18. Figure 1. Current view of CYNI architecture and implementation.astre Network inference toolboxIn our prototype toolbox, Cytoscape node attribute tables are represented via feature vectors.Each feature vector represents a case that is described as a joint instantiation over an attributeset (e.g. time series for RNA expression levels for a given gene). Simple node associationmeasures, such as correlation, are computed directly for pairs of feature vectors. Beyond that,additional support functionality for contingency tables, discretization and ranking, enables theimplementation of more advanced measures that draw on robust statistics and informationtheory.Supported discretization/ranking mechanisms to-date: ● Standard ranking ● Fractional ranking ● Quantile-based binningSupported association measures to-date:(values marked with * use contingency tables) ● Pearson correlation coefficient (numerical vectors only) ● Spearman rho rank correlation (ordinal scale or better) ● d2* (sum of element-wise squared deviation of contingency table from expected distribution under independence) (any type) ● Mutual information* (also Shannon information gain) (any type) ● Shannon information gain ratio* (any type) ● Kendall tau rank correlation* (ordinal scale or better)The astre Network inference toolbox can be used either interactively or in batch mode. Atstartup the program reads an attribute value table that contains data to be used for computinginteraction measures. In interactive mode the program will then continuously process queries foredge association measures and write output as is becomes available. This on-demandcomputation allows highly efficient heuristic search strategies. Alternatively, a predefined list ofqueries can be processed in batch mode. By restricting the selection of queries, it is possible toenforce structure constraints on the induced network.
  19. 19. astre also implements unit tests for critical data structures and the majority ofimplemented measures and discretization methods. As the unit tests can mostly be translatedinto Java in a straightforward way, they provide a defense against regression errors during thecode refinement and optimization phase of CYNI development. For the same purpose, weconducted profiling runs and optimized a number of the core algorithms (initially planned foryear 3). Converter scripts are provided to re-import the externally calculated results intoCytoscape for visualization and optional further processing.Sample workflow (compute association measures): 1. Load table data (e.g. expression matrix) into CLI tool and select suitable association measure 2. Generate queries and pass them to CLI tool to obtain association values or edges 3. Integrate association values into higher level network induction strategiesImplementation of the scoreKO approachIn addition to simple co-expression networks, we implemented a prototype higher-level networkinduction component, which we developed in the context of a large integrated EU-fundedresearch project. This prototype generates networks based on plausible chains of generegulatory interactions that connect a selection of source nodes to targets nodes in the network(manuscript in preparation).Figure 2. Illustration of prototype network induction component. From left to right: Networkbased on initial node association measures; Selected source nodes {A,B,C}; Selected targetnode {I}; Reduced network consisting of all interaction occurring on (near-) optimal interactionchains.Feature export from CYNI to other modulesSome CYNI elements share functionality with other Cytoscape plugins. In particular thesymmetric association measures implemented (all but mutual information and mutualinformation gain) provide natural notions of similarity and can be used in tasks such ashierarchical clustering. The same holds true for symmetric versions of the information gain ratio,that can be produced e.g., by averaging the value obtained by for both possible link directions.[1]An interesting option, which we consider, is an interface to register, group and accessimplementations of similarity and distance measures as a useful approach to foster reuse and toprevent redundancy between Cytoscape plugins. We are currently in contact with otherCytoscape developers (e.g., of the ClusterMaker plug-in) to present a draft proposal for such aninterface to the Cytoscape community. The export of discretization and ranking features couldbe organized in a similar way.Current Activities, translation of astre into the Cytoscape 3 frameworkThe arrival of a software engineer (Oriol Guithart) on March 1, 2012, marked the start of theCYNI implementation and integration of astre into Cytoscape. astre data structures andalgorithms can largely be translated without modifications into Java/the Cytoscape framework.
  20. 20. In parallel, we continue to increase test coverage of the implemented algorithms andevaluate the addition/modification of features based on experiences in ongoing applicationprojects.ApplicationsIn our collaboration with the lab of Jan Maarten van Dijl (Groningen, Netherlands), this workflowwas applied to a network (418 nodes; 174,306 edges) to explore the unknown chains ofregulatory interactions between the central carbon metabolism and the competence subsystemof Bacillus subtillis. The approach identifies hypothetical regulatory chains from expression data,perturbation sites in the known regulatory network segment and a marker gene associated withthe so-called competence phenotype. Suggested knockout targets were selected fromcandidate pathways identified by our network induction prototype. Currently, a subset of theproposed genes are evaluated in knock-out experiments to validate or their reject theirinvolvement in the putative regulatory cascade, and to collect additional pertinent transcriptomedata that may be fed back into our analysis.
  21. 21. II. Collaboration and Service Projects: Progress (1.3FTE: Alex Pico,Rintaro Saito, Kristina Hanspers)In addition to the direct impact of our TRD projects on our research, NRNB also has an effect on newscience through our many CSPs. A description for each CSP is provided in the bulk of the report. Here,we summarize the efforts.8. New CollaborationsDuring our second year, we established a formal collaboration processing system for NRNB.Each of the 5 NRNB sites has a designated Collaboration Contact who is responsible formanaging collaboration and service requests. They can start by directing potential collaboratorsto the main NRNB website at nrnb.org, where they will find numerous hooks into ourcollaboration system. Clicking on ‘Collaborate’ for example, leads to a simple web-based form,which is automatically logged in our Collaboration Tracker spreadsheet and email notificationsare sent to the contact. Entries are assessed per the availability and interest of each group. Ifaccepted, they are marked for entry into our annual reporting system. If not accepted, they aremarked as rejected but still recorded for reporting purposes. Numerous potential collaboratorsalso independently find the collaboration hooks on our website, such as the mentoring programswhich bring in the largest numbers and some of the most diverse and productive collaborations(see below). At the end of year-one, we had established close to 40 collaborations. During the courseof our second year, we took on another 60, totaling 97 collaborations in all! These range fromthe application of Cytoscape as a research tool for network analysis and visualization, to thedevelopment of Cytoscape plugins for custom data types and analyses, to the development andapplication of other network and pathways tools and resources for network biology.Applications of CytoscapeIn this category, we are enabling a wide range of medical research applications [1-3] includingthe study of Frontal Temporal Dementia, Alzheimer’s disease, Diabetes, Anorexia nervosa,Glaucoma, Heart disease, Leukemia, Brain tumors, Autism, Prostate cancer, Breast cancer,Endometrial cancer, Colorectal cancer, Lung cancer, and Malaria. Through NRNBcollaborations, Cytoscape is also being applied to study of the mechanisms [3,4] underlyinginflammation, stem cell differentiation, B-cell differentiation, ciliogenesis, cell-cellcommunication, oxidative stress response, DNA repair, cancer stem cells, and wound healing,as well as general interactome, proteomics and metabolomics research [5,6].Development of Cytoscape Plugins/AppsIt is a testament to the extensible model of Cytoscape and our outreach efforts to providetraining and documentation to developers, that we get an equal number of collaborationrequests for developing new Cytoscape features, which in turn can be applied to not only ourimmediate collaborators’ research, but more broadly to the Cytoscape user community. This is avery gratifying virtuous cycle that NRNB is specifically enabling and amplifying. In this category,we have established collaborations to develop plugins and apps [7,8] to connect with publicdatabases to access and load interactions and annotations, to provide new types of datavisualizations, to perform ontology analysis, graph analysis, partitioning, quantitative modeling,and to handle new data types such as next-gen sequencing data and variant data. We alsohave collaborations to develop interoperability between Cytoscape and 3D molecularvisualization tools, and integrated workbenches, such as the Cancer Gene Encyclopedia andthe cBio Cancer Genomics Portal.
  22. 22. Development and Application of Other NRNB Tools and ResourcesIn this final category of collaborations, we are beginning to extend beyond the immediate reachand scope of Cytoscape to identify complementary tools and resources that contributesignificantly to network biology. NRNB allocates time and resources to promote and engagethese other efforts, such as by making NRNB-funded network tools available within cBio, bycoordinating the curation of biofuel pathways at WikiPathways, by adding network analysisfunctionality to Broad’s IGV (Integrative Genomics Viewer), and by promoting the use ofBaSysBio (Bacillus Systems Biology) [9-11].References1. Liu JC, Voisin V, Bader GD, Deng T, Pusztai L, Symmans WF, Esteva FJ, Egan SE,Zacksenhaus E.Seventeen-gene signature from enriched Her2/Neu mammary tumor-initiating cells predicts clinicaloutcome for human HER2+:ERα- breast cancer. Proc Natl Acad Sci U S A. 2012 Apr 10;109(15):5832-7.Epub 2012 Mar 28.2. Zhang L, Lim SL, Du H, Zhang M, Kozak I, Hannum G, Wang X, Ouyang H, Hughes G,Zhao L, Zhu X,Lee C, Su Z, Zhou X, Shaw R, Geum D, Wei X, Zhu J, Ideker T, Oka C, Wang N, Yang Z, Shaw PX,Zhang K. High temperature requirement factor A1(HTRA1) gene regulates angiogenesis throughtransforming growth factor-β family member growth differentiation factor 6. J Biol Chem. 2012 Jan6;287(2):1520-6.Epub 2011 Nov 2.3. Dutkowski J, Ideker T. Protein networks as logic functions in development and cancer. PLoS ComputBiol. 2011 Sep;7(9):e1002180. Epub 2011 Sep 294. Atwood A, DeConde R, Wang SS, Mockler TC, Sabir JS, Ideker T, Kay SA.Cell-autonomous circadianclock of hepatocytes drives rhythms in transcription and polyamine synthesis. Proc Natl Acad Sci U S A.2011 Nov 8;108(45):18560-5.Epub 2011 Oct 315. Chuang HY, Hofree M, Ideker T. A decade of systems biology. Annu Rev Cell Dev Biol. 2010 Nov10;26:721-44. Review6. Diezmann S, Michaut M, Shapiro RS, Bader GD, Cowen LE. Mapping the Hsp90 Genetic InteractionNetwork in Candida albicans Reveals Environmental Contingency and Rewired Circuitry. PLoS Genet.2012 Mar;8(3):e1002562. Epub 2012 Mar 15.7. Aranda B, Blankenburg H, Kerrien S, Brinkman FS, Ceol A, Chautard E, Dana JM, De Las Rivas J,Dumousseau M, Galeota E, Gaulton A, Goll J, Hancock RE, Isserlin R, Jimenez RC, Kerssemakers J,Khadake J, Lynn DJ, Michaut M, OKelly G, Ono K,Orchard S, Prieto C, Razick S, Rigina O, Salwinski L,Simonovic M, Velankar S,Winter A, Wu G, Bader GD, Cesareni G, Donaldson IM, Eisenberg D, KleywegtGJ,Overington J, Ricard-Blum S, Tyers M, Albrecht M, Hermjakob H. PSICQUIC and PSISCORE:accessing and scoring molecular interactions. Nat Methods. 2011 Jun 29;8(7):528-9. doi:10.1038/nmeth.16378. Morris JH, Apeltsin L, Newman AM, Baumbach J, Wittkop T, Su G, Bader GD,Ferrin TE. clusterMaker:a multi-algorithm clustering plugin for Cytoscape. BMC Bioinformatics. 2011 Nov 9;12:436.9. Buescher JM, Liebermeister W, Jules M, Uhr M, Muntel J, Botella E, Hessling B,Kleijn RJ, Le Chat L,Lecointe F, Mäder U, Nicolas P, Piersma S, Rügheimer F,Becher D, Bessieres P, Bidnenko E, DenhamEL, Dervyn E, Devine KM, Doherty G,Drulhe S, Felicori L, Fogg MJ, Goelzer A, Hansen A, Harwood CR,Hecker M, Hubner S, Hultschig C, Jarmer H, Klipp E, Leduc A, Lewis P, Molina F, Noirot P, PeresS,Pigeonneau N, Pohl S, Rasmussen S, Rinn B, Schaffer M, Schnidder J, Schwikowski B, Van Dijl JM,Veiga P, Walsh S, Wilkinson AJ, Stelling J, Aymerich S, Sauer U. Global network reorganization duringdynamic adaptations of Bacillus subtilis metabolism. Science. 2012 Mar 2;335(6072):1099-103.10. Nicolas P, Mäder U, Dervyn E, Rochat T, Leduc A, Pigeonneau N, Bidnenko E,Marchadier E,Hoebeke M, Aymerich S, Becher D, Bisicchia P, Botella E, Delumeau O, Doherty G, Denham EL, FoggMJ, Fromion V, Goelzer A, Hansen A, Härtig E,Harwood CR, Homuth G, Jarmer H, Jules M, Klipp E, LeChat L, Lecointe F, Lewis P,Liebermeister W, March A, Mars RA, Nannapaneni P, Noone D, Pohl S, RinnB,Rügheimer F, Sappa PK, Samson F, Schaffer M, Schwikowski B, Steil L, Stülke J,Wiegert T, DevineKM, Wilkinson AJ, van Dijl JM, Hecker M, Völker U, Bessières P,Noirot P. Condition-dependenttranscriptome reveals high-level regulatory architecture in Bacillus subtilis. Science. 2012 Mar2;335(6072):1103-6.
  23. 23. 11. Kelder T, van Iersel MP, Hanspers K, Kutmon M, Conklin BR, Evelo CT, Pico AR. WikiPathways:building research communities on biological pathways. Nucleic Acids Res. 2012 Jan;40 (Databaseissue):D1301-7. Epub 2011 Nov 16.9. Google Summer of Code and NRNB AcademyIn addition to the outreach effort described above, we also leverage a Google-sponsoredprogram called Google Summer of Code (GSoC) to attract new developers for Cytoscape core,plugins/apps, WikiPathways, PathVisio and other tools we deem relevant to the NRNB mission.This year is the sixth year that Dr. Pico has coordinated the collective GSoC effort involvingCytoscape; this is the second year we’ve participated under the new banner of “NRNB”.Through the GSoC program we not only recruit new developers, but we are also significantlypromoting NRNB as an open source-friendly organization, putting us in an exclusive list of ~175organizations selected from around the world by Google to participate. Dr. Pico attends theannual GSoC Mentors Summit with other NRNB mentors to further engage the open sourcedevelopment community. In terms of collaborations, GSoC brings in new potential collaboratorswho want to participate as mentors in addition to the 40-60 student applicants. This year wecoordinated 36 mentors (10 with NRNB funding), thus leveraging the effort of 26 additionaldevelopers from the open source communities surrounding NRNB-related tools. And throughthe GSoC program we received over 60 student applications this year. From these we’veselected 16 students to mentor on Cytoscape and NRNB-related projects. The projects rangefrom core Cytoscape 3.0, to Cytoscape 3.0 apps, to GeneMANIA and MedSavant, to PathVisioand WikiPathways, to the cBio Cancer Genomics Portal, but the majority of the projects areCytoscape 3.0 related. Google is paying $5,000 per student, making their investment $80,000 inNRNB for 3 months of work. That’s what I call leveraging the community! Inspired by this very successful model for recruiting new code contributors, we designedand launched NRNB Academy in January of this year. The idea behind NRNB Academy is verysimilar to GSoC, except it’s not restricted to students, it’s not affiliated with Google, and it’s100% volunteer. Our experience has been that the major draw to our projects in the past hasbeen the opportunity to get direct mentorship in developing Cytoscape and our other tools. Thestudents and external mentors are eager to contribute time and effort when they know it will beguided and effectively amplified by the interaction with NRNB, thus dramatically increasing theodds for a productive output. In the first three months, we have already received 9 applications,started 4 new projects, and recruited 3 new mentors. We anticipate continued growth of thisprogram as word spreads. One of the principal goals of NRNB is to promote and enhance thedevelopment community around Cytoscape. The new NRNB Academy program gives us onemore way to reach out to the community and realize this potential. Based on our experience sofar, this program is effective in launching new developers and in establishing new collaborationswith long-term potential.
  24. 24. III. Progress on Supplemental Award, 11/2011-07/2013We were awarded a two-year supplemental grant to work on the Cytoscape App Store. This is a progressreport on the first half of the first year.10. The Cytoscape App Store (Pico, 1.0FTE: Samad Lotia)The Cytoscape App Store will offer a whole new way for researchers to search, install anddevelop custom apps for Cytoscape. Much of the Cytoscape App Store content will be createdby its users: ratings, comments, tags and the submission of new apps. Dynamic web sites likethe Cytoscape App Store often make use of a web framework to manage frequent changes.First, the web site puts all of its content in a database, because databases make it easy and fastto get the content back later. The web site code retrieves the content from the database. It thenprocesses the content and sends the user HTML, image, CSS, and JavaScript files, which areshown in the users web browser. At each step the web framework is involved in the web sitescode. The Cytoscape App Store uses the Django web framework, which is written in Pythonmaking it concise, versatile, and familiar. As a popular framework in the web developmentcommunity, Django also has many online forums with experienced developers willing to answertechnical questions. Django developers also have made a variety of software extensions thatprovide additional functionality relevant to our App Store plans. Beyond the web framework, weare using the MySQL database due to its ubiquity in web development. We make extensive useof the jQuery library in JavaScript, a programming language that adds interactivity to webpages. We also pervasively use the Twitter Bootstrap CSS library to provide a consistent andprofessional-quality look to the web site. Together, these technologies enable a rich set of features (Figure 1). Everything fromkeyword search with auto-completion and dynamic navigation through tag lists and tag clouds,to the display of interactive app buttons with icons, brief descriptions and ratings. Clicking on anapp button takes you to the corresponding app page where you’ll find a full description of theapp along with screenshots, version and author information, links to source websites andtutorials, and a comment section for reviews, questions and bug reports. We are currentlyimplementing a “one-click install” feature on each app page that will allow users to install appsfrom the website to any instance of Cytoscape 3.0+ that they have running. The submission ofnew Cytoscape apps is also handled directly by the App Store. Simply sign in (you can use anexisting Google account), click “submit a new app”, upload your .jar file, then interactively editthe app page as it will appear to other users.
  25. 25. Figure 1. Screenshots of Cytoscape App Store. The top screenshot is of the main page,showing navigation tools on the left and two columns of app buttons (with icons, names andbrief descriptions). The first app, MetaNetter, is moused-over and expands to show ratings,number of download and tags. The bottom screenshot show the app page for MetaNetter withscreenshots, full description, version details and the “one-click install” option. This project will completely replace the existing Cytoscape plugins web page in the nextmonth or two when we roll out the 2.x version of the site. Then, in conjunction with the public
  26. 26. release of Cytoscape 3.0, we will update the site with the 3.x-specific features like “one-clickinstall”. One of the main goals of NRNB is to actively engage developers and researchers.Ultimately, we can provide better tools and resources by facilitating participation by the greatercommunity and not discounting the sum of thousands of small contributions. This model isextensible beyond the Cytoscape project and could support software-as-a-service distribution.As NRNB broadens its scope in future years, this app-centric, community-based model can becloned for other tool and resource projects.ApplicationsPresently, the community is limited in how it can contribute to improve and build uponCytoscape. Recent developments in crowdsourcing technology and social structures andprocesses have enabled public software projects to engage vastly more users. These advancespromise to take Cytoscape community support to the next level. Just as Cytoscape’s opensource extensible software architecture has enabled a rich community of app developers toflourish, crowdsourcing technology will enable users to contribute to software testing,documentation updates, app creation, data set curation, workflow sharing and more. The crowdsourcing infrastructure we are proposing will not only reach out to users anddevelopers of apps, but also to external data sources (e.g., Sage Commons, PathwayCommons) and other data-centric research tools (Taverna, Genome Space) through webservice and format standards tailored for the web. Advances in web technologies andbroadband connections are allowing more data and computation to migrate to the “cloud” whileuser-friendly data mining and analysis tools are enabling more researchers to access theseresources. Online representations of Cytoscape apps will become hubs for groups ofresearchers to connect to data resources, analytical methods and relevant results.
  27. 27. Appendix A. The 2012 NRNB NetworkA network representation of all NRNB personnel and collaborators (blue circles), all TRD, DPB,Collaboration, and Service projects (orange diamonds), and associated publications (greentriangles). Node size is proportional to the number of connections. Thick red borders indicatepersonnel and projects directly funded by the NRNB P41 grant. There are 315 nodes and 404connections in the network. NRNB funds 41 (13%) of these nodes, which make 217 (54%) ofthe connections.
  28. 28. Annual Progress Report - Research Highlights 2012 National Resource for Network Biology P41 GM103504 (RR031228) 05/01/2011 - 04/30/2012Contents ● NRNB Supports Development of cBio Cancer Genomics Portal ● Cytoscape 3.0 and the Cytoscape App Store in 2012 ● NRNB Academy Is Now Accepting ApplicationsNRNB Supports Development of cBio Cancer Genomics PortalThe National Resource for Network Biology is proud to support the cBio Cancer GenomicsPortal (www.cbioportal.org), which has become a major resource for cancer genomics researchboth within the TCGA and within the broader cancer research community. Since the launch ofthe network analysis features in November 2011, the Portal has had 6,306 unique visitors, andhas served up over 275,000 page views. The cBio Portal was also recently highlighted in TheScientist, as “a user-friendly site for working with data from TCGA and other data sets” [1]. Thearticle points out the easy-to-use and valuable network and pathway visualization capabilities: Just enter your gene—say, Trim2—in the gray field and click Submit. After you select the tumor type and click View Cancer Study Details, you can review the network of known gene interactions and pathways involving the gene under the Network tab. You can mouse over a gene, represented as a node, to see a color- coded wheel summarizing its mutation, expression, and copy number status.Bringing network perspectives to critical data sets is a shared goal of the cBio project andNRNB.1. Storrs C: Combing the Cancer Genome. The Scientist 2012, Mar.Cytoscape 3.0 and the Cytoscape App Store in 2012A primary goal of NRNB is to amplify and propagate the community development model ofCytoscape. Cytoscape is a core research tool that is used and/or developed by almost everyproject and collaboration engaged by the NRNB. We are developing version 3.0 of Cytoscape,which represents a marked evolution of our architecture designed to modularize the core ofCytoscape, define a clear and consistent API, and simplify the experience of customizingCytoscape. The 4th milestone release and the first beta release of the API will be available atthe end of May 2012. The beta API release is the point at which we expect external developersto be able to comfortably port their plugins without having to make significant changes beforethe final 3.0 release. Some of new features included in 3.0 include a quick-start welcomescreen that provides simple mechanisms for loading networks and attributes, a simplified userinterface, and many small improvements such as edge bundling layout. The Cytoscape App Store will open with the release of Cytoscape 3.0 and offer a wholenew way for researchers to search, install and develop custom extensions to Cytoscape. As
  29. 29. extensions are ported from older versions or developed anew for 3.0, they will be rebranded asapps to acknowledge the shift in the underlying technology and in our focus on thesecustomizations as the primary drivers for Cytoscape’s success and its future relevance andimpact. The Cytoscape App Store will manage the submission of new apps, generating a suiteof unique content and functions around each app to support community reviews, ratings,comments, as well as “one-click install” and a variety of navigational tools. In conjunction with the Cytoscape App Store, the 3.0 of Cytoscape release will furtheraccelerate the recognition, adoption and customization of the Cytoscape platform by thenetwork biology research community.NRNB Academy Is Now Accepting ApplicationsTaking on a new approach to outreach and training, we launched NRNB Academy in January,2012. NRNB Academy offers software developers from around the world the opportunity to workwith our open source development team on network biology related tools and resources. Theprogram provides a framework for training with a list of starter projects and a host of mentors tobe paired with new developers. It is completely volunteer-based and offers participants flexibleproject terms. The main goals of the NRNB Academy are: ○ To promote development of scientific tools for network biology ○ To offer participants practical open source dev experience ○ To produce useful tools and resources for the research communityMore information about potential projects and the application process is available atnrnb.org/academy. In the first three months, we received 9 applications, started 4 new projects,and recruited 3 new mentors for our Google Summer of Code effort. We anticipate continuedgrowth of this program as word spreads. One of the principal goals of NRNB is to promote andenhance the development community around Cytoscape. The new NRNB Academy programgives us one more way to reach out to the community and realize this potential. Based on ourexperience so far, this program is not only effective in launching new developers, but also inestablishing new collaborations with long-term potential.
  30. 30. Annual Progress Report - Administrative Information 2012 National Resource for Network Biology P41 GM103504 (RR031228) 05/01/2011 - 04/30/2012Administrative StructureDuring the first year, we defined the administrative structure of the resource, including someunique new roles within the organization. The roles of Principal Investigator (PI), Co-PI, ExternalAdvisory Committee (EAC), Resource Administrator and Chief Software Architect were definedas in the original grant. We defined a new role of Executive Director (ED) to oversee some ofthe new resource functions that NRNB provides, including Training & Outreach,Communications and Infrastructure. The ED (Alex Pico, Gladstone Institutes) is responsible forcoordinating these efforts as well as conducting all of the necessary tracking and due diligencefor the annual reporting to NIH. During the second year, we defined the new role ofCollaboration Coordinator to screen and process collaboration requests to our resource. Thishas been a vital role in supporting the 60+ new collaborations in year two. Finally, we were verypleased to have all seven invited members promptly agree to join and attend our first EACmeeting last summer, including Dr. Stephen Friend as chair of the committee. Budget changes between years 1 and 2 were minimal, with a few exceptions. In Figure1A, you will notice an increase overall due mainly to annual cost-of-living raises for personnel ineach of the 3 budget categories: PIs, TRDs and Staff. The one main exception is the new staffposition for Collaboration Coordinator created in year 2 (Fig 1A, red, circled).A BFigure 1. Budget graphs. Area charts showing the distribution of funds for years 1 and 2 (x-axis) per category (A) and per group (B). Y-axis is in units of $1,000s of US dollars. Each stripecorresponds to an individual with a specific role in NRNB, totaling just over 7 FTEs. Note thatgroups are sorted by degree of change, which is critical in this style of visualization to minimizemisperception of change when slopes are actually parallel.
  31. 31. In panel B of figure 1, you will notice slight increases from raises, except where countered by adecrease in FTE (e.g., Fowler). More significant increases Conklin and Ideker budgets are dueto increased TRD support for the Conklin group (which needed correction after new ED andCommunications Coordinator staff roles were defined and not originally budgeted for) and to thenew role of Collaboration Coordinator in the Ideker group (same as in panel A). As the basis for the graphs above, here are itemized tables of FTEs and funding for bothyears 1 and 2 (Table 1). FTEs $1,000s Roles and Groups Year 1 Year 2 Year 1 Year 2 Collaboration Coord. 0.00 0.50 0 50 Resource Admin. 1.00 0.56 52 38 Chief Architect 0.40 0.40 47 51 TRD-Ideker 0.50 0.50 40 45 PI-Ideker 0.30 0.30 74 78 Communications Coord. 0.30 0.30 29 29 Executive Director 0.50 0.50 56 56 TRD-Conklin 0.20 0.48 21 39 PI-Conklin 0.02 0.02 5 5 TRD-Sander 0.65 0.65 90 97 PI-Sander 0.02 0.02 5 5 TRD-Bader 1.00 1.00 90 93 PI-Bader 0.10 0.10 0 0 TRD-Schwikowski 1.00 1.08 81 83 PI-Schwikowski 0.08 0.08 0 0 TRD-Fowler 1.00 0.72 58 54 PI-Fowler 0.10 0.10 21 26 SUBTOTAL 7.17 7.32 669 750 Supplement-Ideker 0.00 0.40 0 45 Supplement-Conklin 0.00 1.00 0 85 Supplement-Bader 0.00 0.40 0 45 SUBTOTAL 0.00 1.80 0 175 GRAND TOTAL 7.17 9.12 669 925Table 1. NRNB effort and budget. Annual budgeting of FTEs and $1,000s, itemized by rolesand groups. Subtotals are provided for the main grant and supplemental funding (bold).Allocation of Resource AccessBeyond the active distribution and support of Cytoscape, which is covered in later sections,NRNB resource allocation can be categorized in the following way: 1. On-site training events: NRNB staff have participated in 20 training events during the reporting period, up from just 7 last year. These events include tutorials, workshops and courses. 2. Requests for collaboration and mentorship: This year we ramped up our responsiveness to requests for collaboration by designation Collaboration Czars at each NRNB site and funding a Collaboration Coordinator position to oversee the processing of
  32. 32. collaboration requests. With a 277% increase in established collaborations (from 35 to 97), we are confident our new strategies are working. Many of these collaborations are coming through our participation in Google Summer of Code (GSoC) and our own NRNB Academy efforts (see #3). All told, we rejected 43 requests during this same time period; 39 of these were students through GSoC. 3. Google Summer of Code and NRNB Academy: In addition to receiving requests from potential students through these programs, we also receive requests from a number of groups to join our organization as mentors. This brings new technology and ideas to our effort. GSoC has been our most successful outreach program by far. It’s responsible for 25% of all our NRNB collaborations (24 out of 97). And by the website traffic report below (Fig. 2), you can also see that it is the most active time period for use of NRNB.org online resources, getting NRNB broad exposure in the open source community. Building on the success of this model, we launch NRNB Academy in January of this year. Our Academy follows the same approach as GSoC, organizing around available mentors, ideas and interested students. However, we are not restricted to supporting university students in our program as it is independent of GSoC and 100% volunteer based. The Research Progress and Highlights provide more details. 4. Requests for training material support: We receive requests for tutorial materials throughout the year from inside and outside the Cytoscape core development team. Our homegrown Open Tutorials system makes it easy to accommodate all such requests. Open Tutorials is an easy-to-use wiki system that provides content formatted to be used as online sessions, slide shows and printed handouts. This year we are seeing more content from more contributors, in addition to a steady rise in visitors (see details in the Training section below). 5. Providing software community support: Our goal is to develop a generic template of services based on the support we provide the Cytoscape community of users and developers. So far we have extended support to two additional software projects, internal to NRNB PI sites: WikiPathways and cBio Cancer Genomics Portal. These proven resources complement Cytoscape and help demonstrate the broader scope of the NRNB mission. We are providing distribution links, showcases, tutorial support, news and event tracking, and GSoC and NRNB Academy participation to these projects.Awards and HonorsNoneDisseminationWe averaged just over 23,000 visits per month (304,000 total visits) to the Cytoscape websiteduring this reporting period (8% increase over last period). An additional 28,000 visits weremade to Open Tutorials and another 17,000 visits were logged at the NRNB website during thereporting period (350% and 120% increases over last period, respectively). The front page ofthe NRNB website now includes a video presentation introducing NRNB. A new Showcase pagedisplays graphical highlights of common workflows involving NRNB tools. The Training page isregularly updated with information on current training events and also includes a full listing ofcourses relevant to NRNB tools. But based on the analytics report, it is clear that the dominantactivity on the site relates to our outreach and collaboration through Google Summer of Code(Fig 2).
  33. 33. Figure 2. A plot of daily visits since the launch of the NRNB website, December 2012 - April2012. Notice the dramatic spikes in activity during the GSoC application weeks at the end ofMarch and beginning of April.A key statistic in terms of dissemination is number of software downloads. Currently, the primarysoftware offered and supported by NRNB is Cytoscape and its suite of plugins. We have seenconsistent activity over the past 12 months averaging close to 5,000 downloads per month forthe Cytoscape distribution (Fig. 3). Figure 3. Chart of Cytoscape software downloads per month over the past 12 months.We are sustaining the increase in downloads that we experienced last year, and see this periodas the “calm before the storm.” With the anticipation for the Cytoscape 3.0 release and theexciting plans around the new Cytoscape App Store, these numbers are sure to take on a newgrowth curve before the next report. We also make researchers aware of our tools and services through the manyconferences our representatives attend. For example, the NRNB will have a major presence atthe Annual International Conference on Intelligent Systems for Molecular Biology (ISMB 2012),which will be held in Long Beach, California. ISMB has become the largest conference oncomputational biology worldwide. This year over 1500 attendees are expected. As part of thismeeting, we are organizing the second annual Network Biology Special Interest Group (NetBioSIG) meeting dedicated to network biology tools, resources and research applications. NRNBtools are also represented in the research literature through our development and researchpublications. Numerous Cytoscape plugin articles and research articles using Cytoscape arepublished annually: 309 during this report period alone (HighWire search). We have a reviewarticle currently under revision that covers all submitted Cytoscape plugins. We will follow that
  34. 34. up with a paper introducing Cytoscape 3.0 and another introducing the Cytoscape App Store,both scheduled for release in 2012. Finally, most visibility for our software arguably comes from our consistent dedication toan “open source” policy. Our open-source license allows us to easily disseminate our softwarecode through public repositories (Sourceforge, code.google, self-hosted servers) and participatein social networks in support of code development (Ohloh). We take very seriously our activeparticipation and cultivation of an open development community. This should not be taken forgranted. Many academic software projects suffer from relatively short cycles of commitmentfrom graduate students and postdocs progressing through their careers. The open sourcemodel offers a means to develop software inclusively and sustainably. We have worked hard tobuild, develop and maintain this community. The benefits are a sustained project that continuesto grow and to stay relevant. It also instills confidence in potential contributors as well as usersthat their work will be acknowledged and that the product will persist and remain free and open.It is through the software development community that Cytoscape maintains its most ardentevangelists, presenting new functionality at their home institutions and through conferences andpublications.Patents, Licenses, Inventions, and CopyrightsNone. We are committed to an Open-Source dissemination policy.Training and OutreachAnnual Cytoscape RetreatWe are just beginning to plan this year’s annual Cytoscape Retreat and Symposium, hosted bythe National Resource for Network Biology (NRNB) at the Gladstone Institutes on the UCSFMission Bay campus in San Francisco. In addition to developer meetings, the retreat will includeuser and new developer tutorials, a Plugin Expo, and a special symposium. This year we will beable to shift the bulk of development discussion to Cytoscape 3.0 core and apps, includingassessment of our new App Store web site and services.WorkshopsFor the reporting period, NRNB has participated a total of 20 training events in 7 countries.These events include tutorials, workshops and courses. Cytoscape is taught in many classroomand workshop settings. We try to track all of these on our website and Event Tracker. We’veidentified 32 courses offered in the 2011-2012 calendar year! And these are just the onesaffiliated with NRNB staff.Open TutorialsOur tutorial management system, Open Tutorials, is still the main source for tutorial materials forthe Cytoscape project, and is being used both internally by presenters, and by researchers anddevelopers. We have seen a steady increase in visits to Open Tutorials over the last year, withan average of 2,700 visits per month for the last three months. The increase in traffic can partlybe explained by the addition of 12 new editors in the last year, contributing to several newtutorials. Most of the development was focused on a set of 4 developer tutorials for Cytoscape3.0, which will be critical for continued momentum on Cytoscape 3.0 development. Overall,Open Tutorials has allowed NRNB to reach our goal of providing tutorial support to a broad anddiverse community.HelpdeskA major means of support for NRNB tools is through dedicated helpdesk and discussion mailinglists. We began monitoring the activity of these lists last year for the Cytoscape community as
  35. 35. an ongoing metric for the effectiveness of our support. Since the previous report, we haveimplemented several strategies for improving user communication and support. We are nowusing an automated method for analyzing mailing list activity, which has resulted in an increasein overall thread response rate from 64% (420/656) to 93% (583/628). Though the number oftopic threads remained about the same (-4%, from 656 to 628), the overall number of actualmessages on the mailing lists has increased 14%, from 1653 to 1877, during this reportingperiod, reflecting primarily the increase in response rate as well as an overall increase ininteractive discussion. It is also worth pointing out that 25% (469/1877) of messages areauthored by NRNB staff. Periodic decreases in response rate are now easily identified andremedied. Specifically, unanswered messages are now identified on a weekly basis andassigned to specific staff members. Based on the analysis of mailing list topics, we have tailoredFAQ topics for maximized support impact.Social MediaWe have initiated a social media effort for Cytoscape through a number of different tools(http://www.cytoscape.org/community.html). For example, a Twitter account is used for quickannouncements (http://twitter.com/cytoscape) and YouTube is utilized for video tutorials(http://www.youtube.com/results?search_query=cytoscape). During this reporting period westarted a Tumblr site to capture published figures using Cytoscape. Pairs of figures are postedon a weekly basis on the front page of cytoscape.org based on this Tumblr feed.Google AdWordsWe were awarded a non-profit account in the Google AdWords program. We are directing>2,000 clicks a month to NRNB tools and resources via AdWords. We are running 7 campaigngroups consisting of over 700 key words and phrases. These activities are worth over $1,600 amonth, which we are getting free-of-charge. We have a spending limit of $329 per day throughthis program, a potential value of $120,000 per year, so we will continue to identify new ads andrelevant resources.Google Summer of Code and NRNB AcademyIn addition to the outreach effort described above, we also leverage a Google-sponsoredprogram called Google Summer of Code to attract new developers. This year we coordinated 36mentors, leveraging the effort of developers from open source communities surrounding NRNB-related tools. And through the GSoC program we received over 60 student applications thisyear. From these we’ve selected 16 students to mentor on Cytoscape and NRNB-relatedprojects. Google is paying $5,000 per student, making their investment $80,000 in NRNB for 3months of work. Inspired by this very successful model for recruiting new code contributors, we designedand launched NRNB Academy in January of this year. The idea behind NRNB Academy is verysimilar to GSoC, except it’s not restricted to students, it’s not affiliated with Google, and it’s100% volunteer. We have already received 9 applications, started 4 new projects, and recruited3 new mentors. We anticipate continued growth of this program as word spreads.
  36. 36. Annual Progress Report - Advisory Committee 2012 National Resource for Network Biology P41 GM103504 (RR031228) 05/01/2011 - 04/30/2012At the conclusion of our first year, we scheduled the first External Advisory Committee (EAC),which took place May 19th, 2011. We were very pleased to have all seven invited memberspromptly agree to join our EAC and attend the first meeting. Dr. Stephen Friend serves as chairof the committee. Following the list of committee members below are the summary statementsprovided by the EAC.Committee Members:● Stephen Friend, M.D, Ph.D. is President, Co-Founder and Director of Sage Bionetworks. He was previously Senior Vice President and Franchise Head for Oncology Research at Merck & Co., Inc.● David Hill, Ph.D. is Associate Director of the Center for Cancer Systems Biology at the Dana-Farber Cancer Institute where he is also co-leader of the Pathogen Host Interactomes group.● Tamara Munzner, Ph.D. is Associate Professor in the Department of Computer Science at the University of British Columbia and is a member of the IMAGER Graphics, Visualization and HCI research group.● Nicholas Schork, Ph.D. is Director of Biostatistics and Bioinformatics at the Scripps Translational Science Institute and Professor in the department of Molecular and Experimental Medicine at the Scripps Research Institute.● Gustavo Stolovitzky, Ph.D. is Manager of the Functional Genomics and Systems Biology group at the IBM Computational Biology Center. He is a Fellow of the American Physical Society, a Fellow of the New York Academy of Sciences, and an adjunct Associate Professor at Columbia University.● Marian Walhout, Ph.D. is Associate Professor at the University of Massachusetts Medical School in the program of Program in Gene Function and Expression.● Steve Laderman, Ph.D. is the Director of the Molecular Tools Lab at Agilent Technologies, Inc.

×