Overall	
  	
  
B.2:	
  What	
  was	
  accomplished	
  
Highlights	
  from	
  the	
  past	
  year	
  include:	
  	
  
• 118 publications citing NRNB grant
• Over 8000 visits per week to Cytoscape.org
• 17,000 downloads per month for Cytoscape
• 3700 Cytoscape application launches per day
• 38,261 page views in January 2017 for the Cytoscape App Store, and an average of 875 downloads per
day among 307 apps.
• A total of 18 tools supported by NRNB
• 93 new and ongoing collaborations with external investigators on diverse topics
• 3 students trained at NRNB Academy
• 15 students trained through Google Summer of Code
• 16 NRNB coordinated training events in 10 locations in 6 countries
• Over 100 users and dozens of developers trained on Cytoscape by NRNB staff
• 66,000 unique sessions at Open Tutorials in the past year, 65% from new visitors
Technology	
  Research	
  and	
  Development	
  
Progress on the first theme of Differential Networks includes work on an improved perturbation biology method
applied to the modeling of time resolved drug response measurements in melanoma cells. The temporal
response to CDK4 inhibition in liposarcoma is a driving biology project (DBP 7 with Forest White) we are
investigating in terms of signaling networks. We also continued to develop protein-protein interaction network
alignment algorithms in support of DBP2 with Drs. Marc Vidal and David Hill. And we implemented new tools in
Cytoscape for working with mass spec data to facilitate future differential network analysis. This work was
shared with our DBP 1, the Krogan lab, from which we continue to collect valuable end-user input to design
and prioritize our tool development.
The second theme of Descriptive to Predictive Networks saw progress on two specific sub-aims. We
developed a supervised patient classification framework, called netDx, that uses patient similarity networks in a
generalizable and integrative manner. In support of DBP 5 with Sage Bionetworks, we reanalyzed all available
DREAM challenge datasets. In terms of biomarker identification for cancer progression and treatment
response, we are developing the platform for data preparation and prototyping regression analyses that will
inform future network-constrained regression analysis, such as GELnet. The application of this technology to
drug response prediction for the NCI compound library is part of our DBP 8 with Dr. Pommier. Finally, we
developed a supervised machine learning model, called DeepCell, that uses the hierarchical structure of the
cell and heat diffusion modeled signaling to simulate cell behavior. Using both the Gene Ontology and a data-
driven ontology, DeepCell can accurately predict phenotypes including both growth rate and genetic interaction
score across a range of genetic interaction scores.
Progress on the third theme of Multi-scale Networks includes the development of a general progressive
procedure, Active Interaction Mapping, which was used to assemble a comprehensive ontology of functions for
autophagy. This work continues to be motivated by the data and prediction challenges in DBP 3 and 4, Mike
Cherry (GO) and TCGA projects. We have also begun experimenting with using single cell RNA-seq data to
improve the resolution of inferred cell-cell interaction networks. These are being applied to cancer stem cell
biology and regenerative medicine. This work is being driven by Dr. Zandstra’s sustained interest in both inter-
cellular networks and cell fate regulation, DBP 9.
NRNB	
  Workgroups	
  
In addition to our TRD projects, we launched two new initiatives to foster greater interaction and collaboration
across NRNB sites and to track opportunities in developing research areas relevant to network biology. We are
calling these NRNB Workgroups.
The first workgroup is focused on Single Cell Genomics and Stem Cell Research. NRNB staff from the
Pico, Morris and Bader groups are meeting quarterly to coordinate on tools, datasets and pilot projects in this
area. Each group has collaborators in this area that may develop into CSPs or even DBPs for future TRD
projects. In addition to collecting resources and comparing notes, we have identified two specific pilot projects
to work on as an NRNB team: cell-cell interactions networks and improvements over t-SNE plots for data
analysis and visualization. The former will involve curating pathways that are transcriptionally active
downstream from receptors involved in cell-cell communication, followed by pathway analysis on single cell
datasets. The latter will involve exploring alternative clustering algorithms, such as SIMLR, and performing a
comparative assessment using Cytoscape protocols.
The second workgroup is focused on Patient Similarity Networks (PSNs). Within the scope of technology
development, both NDEx (Ideker) and cBioPortal (Sander) will be leveraged in this project. The Bader group is
currently working on depositing PSNs into NDEx with sufficient metadata and filter options. The Pico group is
also working on modeling WikiPathways content for deposition into both NDEx and Pathway Commons. These
efforts represent improved interoperability among NRNB resources. Data visualization at cBioPortal and import
into Cytoscape are also being explored in order to facilitate analysis, for example using Network Based
Stratification methods described in prior NRNB reports, and integrated data views.
We will continue to lead the coordinated efforts of these workgroups during the next reporting period and
spawn new workgroups as new data types and opportunities are presented.
Collaboration	
  and	
  Service	
  Projects	
  
NRNB staff have initiated 18 new collaboration and service projects over the reporting period, for a total of 93
collaborations maintained or completed over the last year. A summary table is provided in the CSP component
report, along with summaries of major project from each of the four sites led by the co-PIs. In broad strokes,
the projects span patient similarity network methods development applied to various cancers, cell-cell
interaction network analysis applied to Head and Neck cancer, EnrichmentMap network analysis on single cell
RNA-Seq data applied to colorectal cancer, as well as standard network and pathways analysis of stimulated T
cells, neuroinflammation, glutathione metabolism, microgravity effects, drug transporters, Multiple Sclerosis
and Huntington’s Disease. In addition to service-based collaborations, we managed the code development
collaborations between ~45 mentors and ~18 students during this reporting period through the Google
Summer of Code and NRNB Academy programs, combined.
Infrastructure	
  
In our 2016 report, we described the creation of initial technologies needed to create an ecosystem of
biologically valuable Internet-based services that exchange network data in a stable, performant, scalable,
reusable, recombinable and reliable manner. In 2016, we created and released Diffusion, which is the first
Cytoscape app to demonstrate the integration of basic CI service technologies (e.g., Cytoscap apps, CX,
request routing and service deployment). The Diffusion app calls the CI’s new Diffusion service, which uses a
heat propagation approach to identify subnetworks worthy of focused study, given a list of nodes in a large
network. The Diffusion service demonstrates how the CI can allow typical biological programmers to
dramatically increase the audience for their code and gain access to Internet-scale computational resources.
The CI framework on which Diffusion is built leverages modern Kubernetes cluster technology (to augment
Elsa, see section C.3), common CX-based message formats, server-based interface stubs, call metering and
central logging to enable biological programmers to package algorithmic code as a highly scalable, highly
available microservice with access to server- and cluster-class computing resources. We also launched
cyREST2 as a significant expansion of the highly successful cyREST Cytoscape feature
(http://apps.cytoscape.org/apps/cyrest). Critically, cyREST2 will enable access to functionality available
through Cytoscape apps, including enrichment, clustering, network acquisition, enhanced graphics and graph
analysis. We will work with both the Python/Jupyter and R communities to upgrade their cyREST interface
support (e.g., http://bioconductor.org/packages/release/bioc/html/RCy3.html). And we created the deep-cell
web app and service in support the Deep Cell phenotype prediction research described in the complete
Infrastructure report. Finally, both the Cytoscape App Store and NRNB Compute Cluster continue to thrive and
serve as major NRNB infrastructure components.
Dissemination	
  
NRNB.org is the main web site for the National Resource for Network Biology and serves as the primary
source of disseminating NRNB resources and associated information. It is constantly updated with information
for NRNB collaborators and researchers as well as the larger network biology community. The site includes our
project description and annual reports, available tools and resources, links to training materials, programs and
events, and instruction in how to collaborate. The attentive maintenance and updating of the site helps make
the #2 Google search result for "network biology tools", second only to Cytoscape.org, an NRNB supported
tool. NRNB.org is the #3 result even when searching for just "network biology". These are global, non-
personalized results. Over the past year, traffic to the site averages about 840 visits per month. Since the site
went live in late 2010, we have had over 84,000 visits.
Since our last report (March 2016), we significantly improved our discourse on Cytoscape history and
future directions: http://www.cytoscape.org/roadmap.html. It now more plainly lays out our vision for Themes
and Features for future releases, and is laid out to improve communication with users, developers and curious
parties. Since January 2016, monthly downloads have grown to an average of 17,000 per month for the year,
with Cytoscape v3 accounting for the vast majority of downloads. Extensive graphs and descriptions of
Cytoscape usage are provided in the Dissemination report.
During this period, the Cytoscape App Store, which was created as an NRNB supplement project,
continues to serve as the major source of dissemination for Cytoscape apps and related documentation. The
App Store hosts over 307 apps developed by 674 different developers around the world. Cytoscape users
download an average of 850 apps per day over the past 12 months. That has accumulated to just over
760,000 total app downloads since the launch of the App Store. The top 3 downloaded apps, ClueGO, BiNGO
and GeneMANIA, have accumulated over 136,000 downloads combined. During the month of January 2017,
the site received over 38,000 page views. Graphs of app submissions, site visits and referral sources are all
provided in the Dissemination report.
NRNB staff members are responsible for maintaining these additional sources of dissemination:
• Three Cytoscape mailing lists: helpdesk, app-dev and cytostaff
• Open Tutorials: http://opentutorials.cgl.ucsf.edu/index.php/Main_Page
• Cytoscape Publications Tumblr: http://cytoscape-publications.tumblr.com/
• Network Biology Publications Tumblr: http://netbiopub.tumblr.com/
• LinkedIn Network Biology Group: https://www.linkedin.com/groups/5123610
• F1000Research Cytoscape App Channel:
http://f1000research.com/channels/cytoscapeapps
• GSoC and NRNB Academy: http://www.nrnb.org/gsoc.html
Training	
  
In addition to the global training support provided by our Training Coordinator, Dr. Morris, we also leverage the
fact that we are a multi-site resource and are thus able to host local training events on multiple campuses. We
also provide materials, training and advertising for events presented by non-NRNB staff. The Training report
includes a table of 16 events coordinated by the NRNB, including courses, workshops, clubs and lectures in 10
locations in 6 countries.
After taking a year off from Google Summer of Code (GSoC) in 2015, and instead running our own summer
training program (NRNB Academy Summer Session), we gathered over 50 project ideas and close to 40
mentors for GSoC 2016. We were accepted as a mentoring and had one of our most successful years yet, with
all 15 enrolled students completing their projects. New for this year was also the development of a Mentor
Resource Packet, a collection of resources designed to help mentors with recruiting students. In addition to the
technical accomplishments and productivity of our students, we are also proud of the many important aspects
of diversity our students represent in the GSoC program, including geographical, gender and academic. A few
statistics of our diversity is listed in the below table, with overall GSoC numbers in parenthesis:
• 9 different countries represented, including 1 (of 2) from Croatia, 1 (of 3) from Armenia and 2 (of 12)
from Turkey
• 20% female (compared to 12% overall)
• Only 67% Computer Science (compared to 78% overall), we included PhD students in Biological
Oceanography and Medical Biochemistry & Biotechnology, an MS student in Bioinformatics, and a pre-
med undergraduate.
Our complete 2016 end-of-year report can be found here: http://nrnb.org/gsoc-reports.html. We have
received and abundance of testimonials from students and mentors, a subset of which are available on our
website: http://nrnb.org/testimonials.html#collab-tab.
B.4:	
  What	
  training	
  opportunities	
  
The collaborations during this period included many requests to prepare a custom training events and one-on-
one sessions. During this reporting period, the Pico, Bader and Ideker groups offered support to local
researchers via consulting meetings and one-on-one training sessions with the aim for biologists to learn how
to use NRNB tools in their research. For example, how to install Cytoscape on personal computers and
navigate through a network as well as training on how to go through the Bader lab enrichment analysis
standard pipeline which can be summarized under these steps: 1) run GSEA or g:Profiler or similar gene-set
enrichment tools 2) create a network of enriched pathways using Cytoscape/EnrichmentMap 3) perform post-
analysis using EnrichmentMap features or GeneMANIA. Additionally, 15 of the collaboration projects listed in
B.2 served as intensive training opportunities for students accepted into our NRNB Google Summer of Code
program. The students learned not only about NRNB tool development, but also about open source software
development with a distributed team.
Our Training effort leveraged the fact that we are a multi-site resource and are thus able to host local
training events on 4 different campuses. We also provided materials, training and advertising for events
presented by non-NRNB staff. Each year we provide 100’s of researchers an introduction to network biology
concepts and Cytoscape usage. We also train dozens of programmers how to write apps for Cytoscape to
provide domain-specific functionality to the platform. These programs have been very successful so far. This
is evident from the testimonials we collect via survey following each event:
http://nrnb.org/testimonials.html#collab-tab. Here are snippets from this year’s students and mentors in our
Google Summer of Code and NRNB Academy programs:
“The NRNB program is a fantastic opportunity to gain skills and work experience in network
biology and app development, at any stage in your academic career. I came in as a
graduate student with only a few months of coding experience and now I've released my first
application. Exhilarating!”
“It has helped improve the software developed by my group. It has also given me
experience in mentoring someone long distance.”
“Working in an NRNB training program helped to strengthen my resume and introduced me
to the idea of combining a career in medicine with computer-based research.”
“Great opportunity for developing mentoring and supervising skills as well as get my
software tools developed.”
“This was my first ever contribution to an open source project and NRNB also. This
milestone will shine on my CV forever.”
“Great experience interacting with the community and my mentor. I was excited to receive
help and encouragement for my project.”
“Learned how to work in a collaboration, formulate better questions. Gained especially
invaluable knowledge and experience. Improved coding skills. Learned new programs and
libraries.”
“It broadened my mind to issues still unsolved in the network biology community, and I
gained resources and colleagues in the community that I otherwise wouldn't have.”
“Personally, I see great value in interacting with smart, young people from all around the
world. I am optimistic that participating in NRNB training programs will benefit my own
research group by giving it wider exposure and by building a community around the
software.”
“I am continuing to work for Cytoscape.js and am happy to being staying involved.”
“The program has been great experience for my students. They not only learned about open
source community driven projects, but the work they did has contributed to their future
research.”
“The program gave me a chance to work with students in projects of mutual interest and to
develop my tools faster and more efficient.”
B.5:	
  How	
  have	
  results	
  been	
  disseminated	
  	
  
Technology	
  Research	
  and	
  Development	
  
Technology research and development results are routinely published (see C.1) and discrete software tools
and resources are highlighted and distributed through the NRNB web site at http://www.nrnb.org/tools-
wall.html. We also created and maintain almost 100 open source code repositories for NRNB related projects
at GitHub, https://github.com/nrnb/.
Infrastructure	
  
We routinely promote Cytoscape and other NRNB infrastructure advancements through publications and via
the tools page on the nrnb.org web site. Publications citing Cytoscape continue to increase year over year,
numbering 2218 in 2016, a 24% increase over 2015. NRNB staff were involved in at least 13 publications using
Cytoscape and results obtained on the NRNB cluster. These are listed in the Infrastructure report.
B.6:	
  What	
  you	
  plan	
  to	
  do	
  next	
  
Technology	
  Research	
  and	
  Development	
  
For the first theme of Differential Networks, we aim to modify the perturbation biology modeling method to
incorporate time resolved data. With respect to DBP 7, with the sample collection and profiling complete, we
will next focus on data analysis. We will also continue the work on developing an evolutionary model that
considers domain and binding site changes and their affects on network alignment. Finally, we will extend the
integrated ID mapping tool in Cytoscape to handle general annotation tasks and begin work on a gene set
manager. Both of these tools will be in support of core protocols, including the mass spec analysis workflows of
DBP 1.
Work on the second theme of Descriptive to Predictive Networks will extend netDx to other disease areas
and other features types, including epigenomics and non-coding genome regions. We will continue the
development of network-constrained regression models and apply them to NCI compound screen datasets.
And we will study the predictive process of DeepCell to glean insights into the functional logic underlying a
particular genotype-to-phenotype response.
The third theme of Multi-scale Networks will see the addition of pharmacological and clinical datasets to the
data-driven assembly of ontologies of drugs and phenotypes. In the next reporting period, we will integrate
cancer ‘omics data into HNeXO to make a cancer-specific gene ontology. And, finally, we will continue to
improve our network inference using cell-cell receptor-ligand pathways and its ability to leverage single cell
RNA-seq data.
Collaboration	
  and	
  Service	
  Projects	
  
New CSP requests are coming in all the time. We will continue to evaluate these per site as we have. This
includes the approach being tested by Gladstone and UCSD sites to have their respective Bioinformatics core
facilities explicitly offer NRNB services as part of their regularly advertised campus services. Both groups are
seeing many projects funnel in through this mechanism. We will continue to evaluate this approach and scale
it where appropriate. See the CSP report for a more detailed description of specific projects on the horizon at
each site.
Infrastructure	
  
The overall goals for the Cytoscape Desktop are published on the Cytoscape Roadmap web page
(http://cytoscape.org/roadmap.html). The Infrastructure report summarizes these and goes into detail on future
Cytoscape Cyberinfrastructure, App Store and NRNB Cluster work plans.
Training	
  
We recently submitted our application for GSoC 2017. If accepted, this should be one of our largest years yet.
We have more mentors and more project ideas than prior years and are continuing a more coordinated
outreach effort with a Mentor Resource Packet that we will distribute to all NRNB mentors. This resource was
developed in 2016, and is meant to help mentors contact and communicate with various student bodies that
are likely to have the skill and interest to participate in GSoC 2017.
C.2:	
  Website(s)	
  or	
  other	
  Internet	
  site(s)	
  
NRNB.org	
  
NRNB.org is the main web site for the National Resource for Network Biology and serves as the primary
source of disseminating NRNB resources and associated information. It has information for NRNB
collaborators and researchers as well as the larger network biology community. The site includes our project
description and annual reports, available tools and resources, links to training materials, programs and events,
and instruction in how to collaborate. Over the past year, traffic to the site averages about 840 visits per month.
Since the site went live in late 2010, we have had over 84,000 visits.
Cytoscape.org	
  
As detailed in the Dissemination report, we significantly improved our discourse on Cytoscape history and
future directions: http://www.cytoscape.org/roadmap.html. It now more plainly lays out our vision for Themes
and Features for future releases, and is laid out to improve communication with users, developers and curious
parties.
Visits to cytoscape.org now number almost 1.9M (up 26% from last year) since the site was created in
2012. While most visits to cytoscape.org are from the United States, these visits aren’t in the majority. In fact,
the second greatest source of visitors is “all the rest”, indicating that Cytoscape is popular worldwide.
Cytoscape	
  App	
  Store	
  	
  
A highlight of NRNB Dissemination efforts is the Cytoscape App Store (http://apps.cytoscape.org/), which was
developed under supplemental funding to the main NRNB award. The goals of the App Store are to highlight
the important features that apps add to Cytoscape, to enable researchers to find and install apps they need,
and for developers to promote their apps. It has stimulated a sizable community of Cytoscape App developers,
hosting over 307 apps developed by 674 different developers around the world. Cytoscape users download an
average of 850 apps per day over the past 12 months. That has accumulated to just over 760,000 total app
downloads since the launch of the App Store. The top 3 downloaded apps, ClueGO, BiNGO and GeneMANIA,
have accumulated over 136,000 downloads combined. During the month of January 2017, the site received
over 38,000 page views.
OpenTutorials	
  
Open Tutorials (http://opentutorials.cgl.ucsf.edu/index.php/Main_Page) is the main source for tutorial materials
for Cytoscape and other NRNB tools, and is being used both internally by presenters, and by researchers and
developers. Traffic to Open Tutorials is consistent, with 66,000 unique sessions at Open Tutorials in the past
year, 65% from new visitors.
Others	
  
As detailed in Administrative, Dissemination and Training reports, we also maintain a handful of other sites
related to NRNB activities, including
• Network Biology LinkedIn group
• Tumblr feeds for Network Biology- and Cytoscape-related publications
• Special pages for GSoC and NRNB Academy
• Special pages for annual NetBio SIG conference
• Guest editor roles for F1000Research Channel for Cytoscape Apps
• New Cytoscape App Developer Ladder
• New site for hosting a dynamically generated manual for Cytoscape
• Three mailing lists for Cytoscape users, app developers and core staff
C.3:	
  New	
  technologies	
  and	
  techniques	
  
TRD1.1	
  
The developed perturbation biology methodology has been publicly shared via publication. Additionally, there
is an accompanying web application (http://www.sanderlab.org/pertbio/) that is available. Users can explore
and download models produced by the analysis.
TRD1.3	
  
The new identifier mapping tool described in B.2. will be shared through the free, open source distribution of
Cytoscape 3.5.0+ as of March 2017. The stringApp is freely available as an open source app for Cytoscape at
http://apps.cytoscape.org/apps/stringapp. The app now includes STITCH support as described in B.2. The
stringApp has been downloaded over 7400 times since it’s release 13 months ago.
TRD2.1	
  
We have developed the netDX technology and will disseminate it at the netdx.org website and GitHub, under
an open access software license. The technology is implemented in R and Java, as an easy to use and well-
documented R package.
TRD2.2	
  
We anticipate that several useful resources will be generated by the proposed research. We will provide a new
deep learning training algorithm to train the hierarchy-guided deep neural network. We will provide a new
analysis pipeline to help people the behavior of their supervised machine learning model. We will also provide
a web server where users can not only predict growth related phenotypes using our trained deep learning
model but also interpret the logic of prediction.
TRD3.2	
  
• An online, interactive viewer of the Active Interaction Mapping procedure and its application to yeast
autophagy at http://atgo.ucsd.edu.
• A data-driven gene ontology in human: We will also make this ontology available through an online,
interactive viewer.
• Parallelized ontology construction: We will make code available on GitHub once completed.
Infrastructure	
  technologies	
  
Detailed in section C.3 of the Infrastructure report are the following technologies:
• CX network interchange format
• cyWidget system
• Kubernetes cluster
C.5.a:	
  Other	
  products	
  
TRD	
  3.2	
  
A significantly faster version of the popular random forests regression algorithm in the Python scikit-learn
package was created for this work and is publicly available on GitHub at https://github.com/michaelkyu/scikit-
learn-fasterRF.
	
  
TRD	
  1:	
  Differential	
  Networks	
  
B.2	
  What	
  was	
  accomplished	
  under	
  these	
  goals?	
  
TRD 1.1: Tools for Inference of Differential Networks from Protein States and Abundances Over Time; DBP 7:
Forest White; DBP 8: Pommier
Background: The aim of this task was to improve the perturbation biology method (developed by Nelander,
Molinelli, and Korkut) for a more thorough understanding of protein networks and their responses to drug
perturbations. The perturbation biology method involves inference of quantitative signaling models from high
throughput drug response data. In recent years, we solved the network inference problem through
implementation of a probabilistic statistical physics algorithm called belief propagation (BP). In network
inference, we also benefit from pathway database extracted prior information to improve model accuracy. The
network models are based on coupled nonlinear ordinary differential equations that represent the temporal
changes to perturbations.
Equation	
  1:	
  
	
  
In Equation 1, xµ
i are the perturbed and/or measured variables, µ, represent the perturbations, wij quantifies
the edge strength, αi constant is the tendency of the system to return to the initial state, and εi constant defines
the dynamic range of each variable i. The transfer function, Φ ensures that each variable has a sigmoidal
temporal behavior.
Current progress: The drug response was previously measured and analyzed at a single time point. This is a
limitation since the drug response changes over time and early changes at the protein level might be of
importance for the understanding of the drug response. We have therefore produced and analyzed a new data
set with time resolved drug response measurements in melanoma cells. The data contains protein
measurements at several time points during 3 days (10, 27 minutes, 3, 9, 24, 48, 67 hours) as well as
phenotypic measurements (cell death and growth) for 60 different drug combinations in melanoma cells.
Recent analysis shows that early protein measurements may be important to explain cell death measurements.
1. We used partial least square regression (PLSR) modeling to find relations between protein
measurements and cell death at different time points. In PLSR, the input and output variables are
projected to new dimensions (components) to find a linear regression model. We used the protein data
at 8 time points with 60 drug combinations as input, and chose the number of components in the PLSR
so that 95% of the variance in the output (cell death) was explained. The resulting model with 5
components are in agreement with data as shown in Figure 1.
2. To evaluate the contribution of the proteins to the regression model, we used VIP (variable importance
of the prediction) scores. These scores are calculated from the variance that is explained by each
variable and the total variance that is explained by all components. VIP scores are always positive, but
since it is important to know the direction of the response, we used the sign from correlation between
the protein measurements and cell death. As seen in Figure 2, early time points for some of the
proteins have a high VIP score, which means that these measurements are important to be able to
explain the outcome.
	
  
	
  
	
  
 
Figure	
  1.	
  The	
  PLSR	
  model	
  is	
  in	
  agreement	
  with	
  data.	
  The	
  PLSR	
  (partial	
  least	
  square	
  regression)	
  model	
  was	
  in	
  good	
  agreement	
  
with	
  the	
  cell	
  death	
  measurements	
  (left).	
  The	
  number	
  of	
  components	
  in	
  the	
  PLSR	
  model	
  was	
  chosen	
  to	
  be	
  5	
  since	
  a	
  PLSR	
  model	
  
with	
  5	
  components	
  explain	
  95%	
  of	
  the	
  variance	
  in	
  the	
  measurements	
  (right).	
  
	
  
	
  
	
  
	
  
Figure	
  2.	
  VIP	
  scores	
  for	
  key	
  proteins	
  show	
  the	
  importance	
  of	
  early	
  protein	
  measurements	
  to	
  explain	
  cell	
  death.	
  The	
  VIP	
  
(variables	
  importance	
  of	
  the	
  prediction)	
  scores	
  are	
  calculated	
  from	
  the	
  PLSR	
  model	
  using	
  the	
  variance	
  that	
  is	
  explained	
  by	
  
each	
  model	
  variable	
  and	
  the	
  total	
  variance	
  that	
  is	
  explained	
  by	
  all	
  components	
  of	
  the	
  model.	
  The	
  measured	
  proteins	
  AKT-­‐
pS473	
  and	
  PRAS40-­‐pT246	
  (left	
  side)	
  are	
  important	
  to	
  explain	
  cell	
  death	
  already	
  at	
  74	
  minutes	
  after	
  drug	
  addition.	
  
	
  
In service of our DBP 7, Temporal response to CDK4 inhibition in de-differentiated liposarcoma, we have
been investigating how signaling networks in two patient-derived xenograft models (DDLS8817-PDX;
MPNST3-PDX) respond to clinically-relevant inhibition of CDK4 and combinations designed to block potential
network resistance mechanisms. The PDXs are initially sensitive to CDK4 inhibition, but ultimately the tumors
begin to grow even in the presence of the drug. Proteomic profiling of peptides enriched for phoso-tyrosine
from treated and untreated animals revealed increases in key signaling proteins in response to CDK4
inhibition. In this phase of the grant, we have followed up with combination therapies targeting PDGFR and src-
kinase activation--key pathways we hypothesize play a role in the switch between cells that are sensitive and
resistant to CDK4 inhibition. A significant number of studies was required to optimize effective dosing so that
we could obtain samples for further molecular profiling. In order to relate further molecular features to the
phenotypic results we observe in vivo, we have also begun to perform deeper molecular profiling the xenograft
tumors.
This year, we followed-up with experiments to optimize dosing and endpoint data acquisition We
established reasonable doses by performing serial dilutions of palbociclib, saracatinib, and sunitinib. In the
DDLS8817-PDX, we found that the combination of palbociclib and sunitinib was was no more effective than
palbociclib alone. This was surprising as we had observed an increase in phospho-PDGFR-beta and may be
due to the “dirtiness” of the Sunitinib inihibitor, which is known to inhibit PDGFR and other receptor tyrosine
kinases. Based on our network analysis, the next experiment we attempted, combined palbociclib with the 2nd
generation Src inhibitor saracatinib. Unfortunately, we again had issues with dosing and the results were
inconclusive as the saracatinib alone appeared ineffective and palbociclib flatlined the tumors.
The sunitinib and palbociclib combination was tested in MPNST3-PDX. This PDX showed a strong
increase in phospho-PDGFR-alpha during treatment with palbociclib. For this study we utilized 150 mg/kg
PD991 with 40 mg/kg and 60 mg/kg sunitinib (singles, in combination, and a vehicle control). All groups had 5
animals. Sunitinib was very effective with or without palbociclib. At certain time points, it appeared that there
might be synergy between palbociclib and sunitinib. However, the addition of a slight amount of sunitinib (from
40 to 60 mg/kg) seemed to decrease tumor burden more effectively than 150 mg/kg palbociclib (see Figure 1).
As the tumor burden is very high with control and singly-treated MPNST3-PDXs, we were unable to harvest
tumors at the same time and simultaneously evaluate how the tumors respond after extended periods with the
drug. In order to gain time-matched material for further molecular analysis, we performed an additional
combination study with lower doses of sunitinib. Tumor material was harvested and analysis of this material is
ongoing.
In addition to performing several additional xenograft studies, we have also begun to profile the genomic
and transcriptomic baseline of this tumor material. We have now performed deep DNA sequencing using the
targeted sequencing IMPACT assay. We have also performed RNA sequencing on the tumor material.
Analysis is ongoing.
	
  
TRD	
  1.2:	
  Protein	
  network	
  alignment	
  algorithm	
  and	
  viewer;	
  DBP	
  2:	
  Vidal	
  and	
  Hill	
  	
  
TRD1, Differential networks Aim 2. We continue to develop protein-protein interaction network alignment
algorithms since publishing “GreedyPlus: An Algorithm for the Alignment of Interface Interaction Networks” in
2015 [1], the first such algorithm for protein interaction networks that includes binding site information. We
have studied protein domain and binding site evolution from a range of organisms with fully sequenced
genomes and have identified many different patterns of sequence evolution that change network architecture
at the local protein level – more so than expected. We hypothesized that we could identify a few major
sequence evolution patterns, but most examples we studied were unique. This work has led us to design a
new technology for ortholog function assessment that simultaneously considers protein and network evolution,
described in B.6. TRD1.2.
To support DBP 2 (Vidal and Hill) “Mapping the human interactome and its rewiring by disease mutations”,
we have engaged in weekly discussions with the Vidal team to consult on the analysis of their ongoing human
interactome project, in particular where their work includes differential network analysis and consideration of
binding sites.
References
1. Law B, Bader GD. GreedyPlus: An Algorithm for the Alignment of Interface Interaction Networks. Scientific
Reports. 2015;5:12074.
TRD 1.3: Facilitating the interpretation of AP-MS data as interaction networks; DBP 1: Krogan
Mass spectrometry practitioners and analysts routinely work with network models constructed from
fundamental interaction measurements. The data inform the biomedical understanding of host-pathogen
interactions, signaling networks and network rewiring in cancer, to name a few examples. This is a critical field
of research with which to provide powerful and accessible network visualization and analysis technology. This
project component is aimed at making specific improvements and implementing new features to Cytoscape to
enhance its applicability and adoption by mass spec community. The main objectives are to augment
Cytoscape to streamline the typical mass spec analysis pipeline and provide better access to public mass spec
data and annotation repositories relevant to researchers.
Following the guideline of our (lengthy) Nature Protocol for mass spec analysis using Cytoscape [1], we
made significant progress on streamlining and enhancing the protocol. First, in terms of identifier mapping, we
took a multistep process involving the installation and configuration of a separate app and replaced it with a
built-in context menu option added to the existing Node Table in Cytoscape. Identifier mapping, in brief,
addresses the matter of mapping between identifier systems (e.g., UniProt, Entrez Gene, Ensembl, etc) when
merging interaction data or integrating data types. This is a common problem faced by all bioinformaticians. In
the specific domain of network data in Cytoscape, we see the opportunity to provide semi-automated
assistance for users wanting to merge and integrate heterogeneous data. This is particularly relevant to mass
spec practitioners, e.g., those in the Krogan lab (DBP 1), who want to view their interaction data in the context
of other public interaction data and other annotations. The integration of identifier mapping into Cytoscape as a
built-in feature greatly enhances the user experience for mass spec practitioners as well as many other users.
See the before/after comparison of the steps required in the published mass spec Nature Protocol.
The simplification goes beyond app integration and user interface work. For example, rather than requiring
the user to explicitly connect to a database source, the new tool automatically connects to existing web service
provided by BridgeDb. And rather than requiring the user to explicitly choose a source identifier type, the new
tool guesses the identifier based on the values extracted from the column indicated by the user in the right click
action that initiated the dialog. We also included better, more common, default options for target identifier type
and the force single feature based on prior experience using and training others on the original BridgeDb app.
This is a great example of a coordinated NRNB project. Despite being spread across 4 campuses, this
project involved work by members of the Ideker lab, together with the features and resources leveraged by the
BridgeDb app, which was a Google Summer of Code project mentored by the Pico lab and implemented by a
student later hired by the Sander lab [2]. In the end, members of three of the 4 NRNB sites directly contributed
to this project, while also leveraging financial support and talent recruitment from Google.
The second major activity was the continued development of the stringApp for Cytoscape by Dr. Morris.
STRING (http://www.string-db.org/) is an important public interaction database, widely regarded by mass spec
practitioners. With input from both mass spec practitioners (DBP 1) and the developers/maintainers of the
STRING database, Dr. Morris implemented the app to take full advantage of all the unique aspects of STRING,
as described in the NAR special database issue for 2017 [3]. The stringApp has been downloaded over 7400
times since its original release in December of 2015 and is freely available at the Cytoscape App Store:
http://apps.cytoscape.org/apps/stringapp.
Figure 3. Screenshot of STITCH compound-protein network. This is the result of a query for Coumadin
(Warfarin®), a common blood thinner used to prevent thrombosis. Queries of proteins or compounds are
supported. The nodes in Cytoscape preserve the signature STRING style with structures and glass bobble
effects.
During this reporting period, Dr. Morris implemented critical support for STITCH as a fourth query option in
the stringApp (Figure 3). The STITCH database includes both physical interactions and functional associations
between chemical compounds and proteins (http://stitch.embl.de). Now, in addition to protein, PubMed and
disease queries, Cytoscape users can select STITCH: protein/compound query and interrogate the STITCH
database for protein-compound interactions. This new dimension of interactions allows researchers to extend
protein networks into compound space or build protein networks from a set of one or more compounds. This
feature thus nicely complements any network or protein interaction resource tools already available in
Cytoscape. It is particularly relevant to the growing demand and data deluge for drug compound screens and
metabolomics, which of course includes mass spectrometry practitioners.
Another feature added to the stringApp during this period is enrichment analysis. This was a major step in
the AP-MS protocol that once again required the installation and operation of a separate app. Now, upon
import of any network via the stringApp the user can choose to perform enrichment analysis and obtain Gene
Ontology terms and KEGG pathway results. This is a valuable addition to workflows that involve STRING or
STITCH networks.
References
1. Morris, J.H.K., G.M.; Verschueren, E.; Johnson, J.R.; Cimermancic, P.; Greninger, A.L.; Pico, A.R. Affinity
Purification-Mass Spectrometry and Network Analysis to Understand Protein- Protein Interactions. Nature
Protocol (2014) 9, 2539-54.
2. Gao J, Zhang C, van Iersel M, et al. BridgeDb app: unifying identifier mapping services for Cytoscape.
F1000Research. 2014;3:148..
3. Szklarczyk D, Morris JH, Cook H, et al. The STRING database in 2017: quality-controlled protein–protein
association networks, made broadly accessible. Nucleic Acids Research. 2017;45(Database issue):D362-
D368.
	
  
 
B.6	
  What	
  do	
  you	
  plan	
  to	
  do	
  for	
  the	
  next	
  reporting	
  period	
  to	
  accomplish	
  the	
  goals?	
  
	
  
TRD	
  1.1	
  
In the next reporting period, we aim to modify the developed perturbation biology modeling method to
incorporate time resolved data (dynamic network analysis). The current method is developed for a single time
point and assumes the model variables to be in steady state at this time point. This is a rough assumption
since data comes from living cells that continues to grow. We will therefore use the new time resolved data to
expand upon the existing method and change the implementation of the model equations. Model variables are
in the current version of our method assured a sigmoidal temporal behavior. However, also other temporal
behaviors should be possible to capture. We will also incorporate changes to evaluate not only the steady state
solution of the equations, but the full trajectories in the calculation of the model error. These changes in the
perturbation biology method will assure that models created by the method can be used to predict temporal
drug responses and therefore make the method more generally applicable to different data sets. The network
represented by the inferred interaction parameters ideally is predictive of the effects of previously unseen
perturbations, such as design combinatorial drug interventions.
With respect to DBP 7, with the exception of possibly performing another dosing study on DDLS8817-PDX,
the mouse xenograft studies are completed. Samples were collected across many of these experiments and
have undergone molecular profiling. In the next phase of the project we will first focus on data analysis. Our
first questions center around response of the xenograft tumors to combination therapy with palbociclib. This
includes determining whether the effects of addition of sunitinib or saracatinib were synergistic or additive. We
will further integrate the molecular profiling data (phospho-tyrosine mass spectrometry, RNAseq, and DNA
copy number analysis) with the xenograft growth measurements to confirm basic expectations (e.g., sunitinib
inhibiting PDGFR activity) and also correlate molecular and phenotypic changes. We also plan to perform
network analyses to better characterize the alterations we observe with other malignant peripheral nerve
sheath and de-differentiated liposarcoma tumors profiled in the Cancer Genome Atlas (TCGA). The results of
these analyses will function as a baseline for further characterization in additional MPNST and DDLS cell lines
and possibly future xenograft studies.
	
  
TRD	
  1.2	
  
Differential networks Aim 2. We continue to work on developing an evolutionary model that considers domain
and binding site changes and how these affect network alignment. Further, we are designing a new technology
for evaluating the function of proteins based on coding DNA and protein sequence evolution along with
molecular interactions involving the protein and its binding sites. We hypothesize that viewing a hierarchy of
evolutionary changes from the sequence to network levels will usefully inform protein function prediction
methods that transfer functional annotation between organisms (differential network analysis).
	
  
TRD	
  1.3	
  
Our interactions with DBP 1, the Krogan lab, continue to reveal a growing list of roadblocks and challenges
with using Cytoscape with mass spec data. With our early start on this aim, as described in this and previous
reports, we are in a good position to address the bulk of this list during the overall grant period. We have
prioritized these items per their significance, the breadth of their applicability, and the feasibility of a solution.
Over the next year, we will extend the integrated identifier mapping tool described above to handle general
annotation tasks as well. These include annotating nodes with Gene Ontology terms, for example. This will be
a prerequisite for future work providing enrichment analysis as a general tool to Cytoscape.
We are also just beginning work on a gene set manager to allow Cytoscape users to import, paste or drag
gene lists into Cytoscape to use for queries (e.g., for STRING networks), for selection and for basic set
functions (e.g., union, intersection, difference). Many core protocols in Cytoscape, include our mass
spectrometry protocol, involve user-defined gene lists.
Finally, we are planning to tackle the logging of tasks and crash events in Cytoscape. While greatly
mitigated in Cytoscape 3, there are still scenarios that require force quit actions by the user. We plan to log
these events to help diagnose and remedy them. Similarly, a log of tasks in general would provide valuable
feedback on the major operations carried out by the bulk aggregate of users. It would also identify hard to find
or otherwise neglected features.	
  
C.3.	
  Identify	
  technologies	
  or	
  techniques	
  that	
  have	
  resulted	
  from	
  the	
  research	
  activities.	
  Describe	
  
the	
  technologies	
  or	
  techniques	
  and	
  how	
  they	
  are	
  being	
  shared.	
  
The developed perturbation biology methodology has been publicly shared via publication. Additionally, there
is an accompanying web application (http://www.sanderlab.org/pertbio/) that is available. Users can explore
and download models produced by the analysis.
The new identifier mapping tool described in B.2. will be shared through the free, open source distribution
of Cytoscape 3.5.0+ as of March 2017. The stringApp is freely available as an open source app for Cytoscape
at http://apps.cytoscape.org/apps/stringapp. The app now includes STITCH support as described in B.2. The
stringApp has been downloaded over 7400 times since it’s release 13 months ago.
	
  
TRD	
  2:	
  Descriptive	
  to	
  Predictive	
  Networks	
  
B.2.	
  What	
  was	
  accomplished	
  under	
  these	
  goals?	
  
TRD2.1:	
  Predicting	
  clinical	
  outcome	
  using	
  patient	
  similarity	
  networks;	
  DBP	
  5:	
  Friend	
  
Patient classification has widespread biomedical and clinical applications, including diagnosis, prognosis,
disease subtyping and treatment response prediction. A general purpose and clinically relevant prediction
algorithm should be accurate, generalizable, be able to integrate diverse data types (e.g. clinical, genomic,
metabolomic, imaging), handle sparse data, be compatible with patient privacy protection systems and be
intuitive to interpret. We have recently developed netDx (http://netdx.org/), a supervised patient classification
framework based on patient similarity networks that meets the above criteria. netDx models input data as
patient networks and uses the GeneMANIA machine learning algorithm that we previously developed for
network integration and feature selection. We demonstrated the utility of netDx by integrating gene expression
and copy number variants to classify breast cancer tumour class, achieving an accuracy (~85%) similar or
better (depending on the class) than previously published methods. Further, we have been able to successfully
predict Autism Spectrum Disorders (ASD) phenotype from germ line DNA for a subset of ASD patients(Figure
1). netDx uses pathway features to aid biological interpretability and results can be visualized in Cytoscape as
an integrated patient similarity network to aid clinical interpretation.
Figure 1. Predictive power of netDX is better than contemporary methods: ASD case. Mean test performance
over three resamplings. Predicted status is informative beyond genetic ancestry (ANOVA chisq-test,
p=2.76x10-10
). GBT = pathway-level FET (cnvGSA). RFCF = Random forests (Engchuan, et al. 2015 BMC
Genomics). Reproduced from NetBio SIG presentation.
To support DBP 5: Sage Bionetworks: Molecular stratification of colorectal cancer and DREAM challenges,
we have revisited all major DREAM challenges where data are available and where the challenge experimental
design is compatible with netDX’s classification engine (two class classification). We have reported on netDx
[1] and presented on netDx at the NetBio SIG Meeting in 2016 [2].
References:
1. Shraddha Pai, Shirley Hui, Ruth Isserlin, Hussam Kaka, Gary Bader. netDx: Patient classification using
integrated patient similarity networks. bioRxiv 084418; https://doi.org/10.1101/084418
2. https://f1000research.com/slides/5-1710
TRD2.2:	
  Predicting	
  cellular	
  response	
  to	
  perturbation	
  with	
  network-­‐guided	
  regression;	
  DBP	
  8:	
  Pommier	
  
Part	
  I	
  
Background: The overall goal of this project is the identification of biomarkers involved in the progression of
cancer and the response to pharmaceutical treatment. This goal is accomplished through the used for
regression-based methods that subject to biological network constraints so that biomarkers can be understood
in the context of regulatory processes.
Current progress: Applied to our DBP 8 use case, “Drug response prediction for the NCI compound library,” a
major first step in this project has been data preparation of the NCI-60 and Pathway Commons data. This was
done through the development of the rcellminer/paxtoolsr R packages, respectively, to simplify the usage of
this data from an R programmatic environment. Previous to this development the NCI-60 data was provided in
spreadsheets laid out in varying formats. In addition to the direct conversion of the data from the NCI-60
CellMiner website (http://discover.nci.nih.gov/cellminer) additional elements of metadata were included in the R
package, including: structures, repeat drug screen data, mechanisms of action for compounds, and drug
approval information not found on the website.
A second ongoing focus during this reporting period has been the continued expansion of data relating to
the NCI-60 to all as wide as possible exploration of biological processes as possible. During the reporting
period, analysis of the methylation data for the NCI-60 was performed. The analysis of RNA-Seq dataset for
the NCI-60 is ongoing and should be completed within the next reporting period and made publicly available for
further usage within this project.
During the last reporting period, we have now collected the experimental data for about 39 compounds
first screened on the NCI-60 then re-screened on the Sanger Genomics of Drug Sensitivity in Cancer (GDSC)
cell lines. This provides us data for ~750 cell lines were screened, a low number (7 compounds) of the
compounds sent precipitated during screening preventing their analysis.
Additionally, we have also submitted a manuscript on the analysis of the NCI-60 SWATH mass
spectrometry (MS) data produced by the Aebersold group at ETH Zurich. Our analytic contribution was the use
Elastic Net regression analysis methodology using subsets of available feature sets (e.g. gene expression,
mutations, and protein abundances). With respect to this project, while these are not network-constrained
regression analyses there are allowing us to develop an important baseline by which we will compare future
results involving network constraints.
We have made some preliminary progress on the network-constrained regression methodology during this
reporting period. Our starting point is the recently published GELnet method, and we have a working
demonstration of the method for one drug found in the NCI-60 using expression data. The GELnet method
should highlight novel predictors of drug response.
Part	
  II	
  
Overview: Deep learning has achieved tremendous success in various biology applications such as drug
discovery, DNA/RNA protein binding and noncoding variants effects prediction. In biology, accurate prediction
is not enough, the cell is never “conquered” until human being understand why it behaves in that way. A major
challenge to tackle this problem is to develop an ‘in silico’ model which is able to simulate the biological
process of cell happened ‘in vivo’ with respect to the actual cellular structure. A number of successful
approaches have modeled the cell’s transition from genotype to phenotype by using prior knowledge in the
form of molecular networks. In these approaches, genetic variation is first mapped onto molecular networks;
affected subnetworks are then associated with phenotype. Such important information is then learned by a
supervised machine learning model using the diffused signal as features. Here, we have constructed a “white
box” model, called DeepCell, which uses the hierarchical structure of the cell to simulate cell behavior with both
high accuracy and interpretability. Prior biological knowledge is organized into a hierarchical form and the
structure of the predicting model is also constructed based on that hierarchy. DeepCell can learn complex
patterns from large datasets while still keep low computational complexity, thus prohibit overfitting. To interpret
the model, one can observe how the input signal propagates bottom-up through the hierarchical structure and
activate different subsystems at multiple scales to make final predictions.
Results: In this work, we focus on the task of simulating pairwise genetic interactions among ~3000 non-
essential genes in the budding yeast, Saccharomyces cerevisiae, in which the combined loss of both genes
might lead to unexpectedly slow or fast relative growth rate comparing with the loss of either gene alone. We
used two hierarchical structures to guide the deep neural network model including Gene Ontology (GO)
curated from Saccharomyces literature and data-driven ontology assembled from Saccharomyces datasets
using network-extracted methods. DeepCell can accurately predict phenotypes including both growth rate and
genetic interaction score across a range of genetic interaction scores (Figure 2). In comparison with all of these
approaches, DeepCell using GO and data-driven ontology both achieved substantially greater correlation
between predicted and measured genetic interaction scores.
Figure 2. a, Measured versus predicted cell viability relative to wild type (WT = 1). b, Measured versus
predicted genetic interaction scores for each double gene disruption genotype; genetic interactions between
the disrupted genes can be positive (epistasis), zero (non-interaction), or negative (synthetic sickness or
lethality). c, Predictive performance of neural networks, measured as the correlation between measured and
predicted genetic interaction scores on the Costanzo dataset (first four bars). Network structures are based on
prior knowledge of the hierarchy of cellular subsystems, as inferred from ‘omics datasets (CliXO) or from
literature curation (GO). Also shown is the average performance of neural network structures for which gene-
to-subsystem mappings have been randomly permuted. Performance is also compared to previous methods
for predicting genetic interactions (second four bars). d, Predictive performance as a function of the number of
neurons per subsystem (CliXO or GO term). The performance measure and four neural networks are identical
to (c).
B.6.	
  What	
  do	
  you	
  plan	
  to	
  do	
  for	
  the	
  next	
  reporting	
  period	
  to	
  accomplish	
  the	
  goals?	
  
TRD2.1	
  
Predictive networks Aim 1. We submit a publication about this in 2016 and are now working on revisions to be
submitted in 2017. We are now focused on the following extensions: 1) extend our results to other disease
areas; 2) extend feature engineering work to consider epigenomics data and non-coding genome regions.
TRD2.2	
  
Part	
  I: We will continue apply the developed dataset components to the development of the network-
constrained regression models in the upcoming reporting period, and make datasets more widely available as
they become published. In coordination with DBP 8, we will also look at additional upcoming NCI-60 datasets
(e.g. RNA-seq) to see their utility for this regression methodology. We will continue to look at pre-existing
methodologies to examine their properties and what improvements may be warranted.
Part	
  II: Unlike standard ANNs, DeepCell was tied directly to cell structure, raising the possibility that its
predictions could be interpreted biologically. In the next step, we are going to study whether the model can be
opened to dissect the internal cellular subsystems and functional logic responsible for governing a particular
genotype-to-phenotype response? To address this question, we will study how to interpret the predicting
process by using DeepCell. We will begin by scoring the importance of each subsystem to DeepTranslate’s
overall genotype-phenotype function. Besides global ranking, we will also explore the most important
subsystems from the GO hierarchy, examining their internal states and their functional logic.
C.3.	
  Identify	
  technologies	
  or	
  techniques	
  that	
  have	
  resulted	
  from	
  the	
  research	
  activities.	
  Describe	
  
the	
  technologies	
  or	
  techniques	
  and	
  how	
  they	
  are	
  being	
  shared.	
  
TRD2.1	
  
We have developed the netDX technology and will disseminate it at the netdx.org website and GitHub, under
an open access software license. The technology is implemented in R and Java, as an easy to use and well-
documented R package.
TRD2.2	
  
We anticipate that several useful resources will be generated by the proposed research. We will provide a new
deep learning training algorithm to train the hierarchy-guided deep neural network. We will provide a new
analysis pipeline to help people the behavior of their supervised machine learning model. We will also provide
a web server where users can not only predict growth related phenotypes using our trained deep learning
model but also interpret the logic of prediction.
TRD	
  3:	
  Multi-­‐scale	
  Networks	
  
B.2.	
  What	
  was	
  accomplished	
  under	
  these	
  goals?	
  
TRD	
  3.2:	
  Functionalized	
  gene	
  ontologies	
  as	
  a	
  hierarchy	
  of	
  functional	
  prediction;	
  DBP	
  3:	
  Cherry	
  
Development and validation of an iterative procedure for incorporating new data into a data-driven ontology. In
the last reporting period, we began developing a general progressive procedure, Active Interaction Mapping, to
guide assembly of the hierarchy of functions (ontology) encoding any biological system. Since then, we have
published this work [1] and have made the procedure available at http://atgo.ucsd.edu. In this work, we
assembled an ontology of functions comprising autophagy, a central recycling process implicated in numerous
diseases. We performed subsequent experimental validation of the ontology, including newly identified roles
for Gyp1 at the phagophore-assembly site, Atg24 in cargo engulfment, Atg26 in cytoplasm-to-vacuole
targeting, and Ssd1, Did4, and others in selective and non-selective autophagy. This work was co-authored by
our DBP 3 with Michael Cherry [1].
Construction of a data-driven gene ontology in human.
Whereas our previous work was focused in yeast, we have
recently constructed the first data-driven gene ontology in
human. As input to building this ontology, we took 908
experimental studies covering 98% of human coding genes.
These data were drawn from several databases, including
Gene Expression Omnibus (668 microarrays), GeneMANIA
(201 genetic/protein interaction networks), GTEx (35 co-
expression networks). Using our Active Interaction Mapping
pipeline, we integrated these datasets into a unified gene-
gene similarity network and then hierarchically clustered this
network to assemble a human gene ontology (called
HNeXO).
Parallelized, GPU-based algorithm for ontology
construction. We have been optimizing our algorithm for
constructing an ontology. Currently, it takes about a day to
assemble a data-driven gene ontology in yeast (~6000
genes) and several days for one in human (~20,000 genes).
We aim to reduce this runtime down to the span of hours by
parallelizing the computation. To do this, we have
reformulated the construction of an ontology as a series of
matrix computations, for which there are known algorithms
for massive parallelization and efficient memory caching.
Our approach fully exploits the capacity of parallelism on a
multi-CPU platform and is easily generalized to Graphics
Processing Units (GPU).
References
1. Kramer MH, Farré JC, Mitra K, et al. Active Interaction Mapping Reveals the Hierarchical Organization of
Autophagy. Mol Cell. 2017.
TRD3.3:	
  Bridging	
  ligand-­‐receptor	
  networks	
  to	
  cell-­‐cell	
  communication	
  networks;	
  DBP	
  9:	
  Zandstra	
  
We have undertaken new research and development work to infer cell-cell interaction networks. In particular,
we have extensively used single cell RNA-seq data to infer higher resolution cell-cell networks and have
developed applications to cancer stem cell biology and regenerative medicine (e.g. DBP 9), both areas where
cell communication is important for tumour or normal tissue development. For single cell RNA-seq, we start by
clustering the single cells to define cell types. Clusters representing cell types are identified by the expression
of known cell type markers and previously unrecognized clusters are, by default, linked to the nearest known
cell type (e.g. neuron subtype A, B, C…) Each cell type is analyzed for the expression of surface receptors and
Data-driven gene ontology in human
(HNeXO). To understand how HNeXO
compares with curated knowledge in the
Gene Ontology (GO), we searched for
matching HNeXO and GO terms. HNeXO
recapitulates thousands of GO terms and
also discovers new terms that are
supported by data but have no
ligands to infer connections between them. These represent hypotheses about cellular communication for
experimental follow up.
To support DBP 9: Engineering blood for regenerative medicine, we are automating our cell-cell interaction
network inference pipeline. This is important as the regenerative medicine community in Toronto received a
transformation grant ($114M CAD) to expand this scientific research area. As a result, many more
developmental biology research groups are requesting cell-cell interaction network analysis. As a major
milestone, a neural developmental biology group used this method to identify three new neural development
factors (Figure 1) [1].
Figure 1. Integration of the E13/14 cortex ligand data with the transcriptome-based cortical communication
model and the combined transcriptome-cell-surface proteome communication model. Red nodes denote
ligands predicted in the transcriptome-based model that were expressed in the E13/14 cortex and also have
receptors identified by cell-surface proteomics. Nodes surrounding the yellow CP (Cortical Precursor) and CN
(Cortical Neuron) nodes represent predicted autocrine ligands for CPs and CNs, respectively. Nodes located
between the yellow CP and CN nodes are predicted paracrine ligands. Edges indicate direction of
communication. Reproduced from ref 1.
References
1. Yuzwa SA, Yang G, Borrett MJ, et al. Proneurogenic Ligands Defined by Modeling Developing Cortex
Growth Factor Communication Networks. Neuron. 2016;91(5):988-1004.
B.6.	
  What	
  do	
  you	
  plan	
  to	
  do	
  for	
  the	
  next	
  reporting	
  period	
  to	
  accomplish	
  the	
  goals?
TRD3.2	
  
Data-driven ontologies of other biomedical data types, including drugs and phenotypes. Our current work
has focused on assembling and applying gene ontologies, in which we group genes based on similar functions
within the cell. Likewise, drugs can be organized into drug classes based on similar chemical structure, gene
targets, or functional effect. Moreover, clinical signs and symptoms can be organized into diseases and
disease classes. In the next reporting period, we will use available pharmacological and clinical datasets to
assemble data-driven ontologies of drugs and phenotypes.
Construction of a cancer-specific human gene ontology. One of the limitations of the Gene Ontology is that
it encompasses all species, tissues, and cell types. Context-specific knowledge is difficult to extract from GO.
In our Active Interaction Mapping work, we showed that it is possible to assemble a context-specific ontology
(yeast autophagy). In the next reporting period, we will integrate cancer ‘omics data into HNeXO to make a
cancer-specific gene ontology.
TRD3.3	
  
We are currently working to improve our network inference using additional information on cell-cell
receptor-ligand pathways. For instance, if a pathway downstream of a receptor is active, it strengthens the
inference that the receptor is active in the network. We also continue to work to adapt this method to the
analysis of single-cell RNA-Seq data. This mainly involves refinement of cell type inference from single cell
RNA-seq data. As the technology used to generate this exciting new data type is rapidly evolving, we are
spending a large amount of time keeping up with new data sets and computational methods.
C.3.	
  Identify	
  technologies	
  or	
  techniques	
  that	
  have	
  resulted	
  from	
  the	
  research	
  activities.	
  Describe	
  
the	
  technologies	
  or	
  techniques	
  and	
  how	
  they	
  are	
  being	
  shared.	
  
An online, interactive viewer of the Active Interaction Mapping procedure and its application to yeast
autophagy at http://atgo.ucsd.edu.
A data-driven gene ontology in human. We will also make this ontology available through an online,
interactive viewer.
Parallelized ontology construction. We will make code available on GitHub once completed.
CSP-­‐Compilation	
  
B.2	
  What	
  was	
  accomplished	
  under	
  these	
  goals?	
  
Sander	
  Group	
  
In this last period the Sander lab has focused on a number of collaborations as part of NRNB (see CSP table
below). The collaborations fall several categories: the expansion of existing pathway database resources, work
on cytoscape.js to improve SBGN support, and pathway and network analysis of breast, prostate and pan
cancer datasets.
In collaboration with Dr. Joan Brugge at HMS, we investigating targeted drug combination therapies for
triple negative breast cancer. Triple negative breast cancer continues to see over 150,000 new diagnoses
annually, and despite initial responses to the current established chemotherapy protocols, treatment resistance
ultimately emerges in the vast majority of cases. The significant patient-to-patient heterogeneity in the genetic
alterations of triple negative cancers (TNBC) presents a major challenge for the development of effective
targeted therapies. The goal of this proposal is to identify novel drug combinations that will improve outcomes
in TNBC patients.
We are accumulating and organizing TNBC cell line and patient tumor data from online databases
(cBioportal: TCGA and CCLE datasets), including mutation profiles and gene and protein expression levels. In
concert, we are collecting and organizing information from TNBC clinical trials, including treatment regiments,
objective response rates, and response biomarkers. These clinical data will be integrated with drug response
data on TNBC cell lines, including drug sensitivity (IC50) and target pathways (pathway data from Pathway
Commons). Using pathway and network analysis on the background of the clinical information, we will identify
rational combinations of drugs that will either target multiple proteins in a pathway, multiple pathways, or both
of these methods. These combinations will then be tested first in TNBC cell lines and later in patient-derived
xenograft models that better represent the heterogeneity of TNBC.
In collaboration with Dr. Rileen Sinha at Mount Sinai, we are also developing methods for calculating
similarity between cancer samples. Cell lines derived from human tumors are often used in pre-clinical cancer
research, but some cell lines may be too different from tumors to be good models. Genomic and molecular
profiles can be used to guide the choice of cell line suitable for particular investigations, but not all features
may be equally relevant, so any resulting methodology should take this concern into account. Understanding
the similarity between cancer samples is not limited to the comparison between cell lines and patient samples.
Particular projects may require comparison within different sets of patient or cell line samples. For these
projects, a generic approach using genomic and molecular profiles would be useful.
TumorComparer is a computational method and web service developed for comparing cell lines and
tumors with the flexibility to place a higher weight on functional alterations of interest. The first application of
TumorComparer was used to compare 260 cell lines and 1914 tumors of six cancer types from TCGA, using
weights emphasizing recurrent genomic alterations. These cell lines were ranked by their similarity to tumors
and identify apparently unsuitable outlier cell lines, including some that are widely used.
Method for developing patient similarity network: Given discretized data representing genomic alterations
(i.e. mutations and copy number alterations). Samples are represented by feature vectors and and a weight
vector for feature weights. Their weighted similarity is calculated using weighted asymmetric matching, which
measures the similarity between two samples after discarding the 0-0 matches (hence “asymmetric”). The
similarity is calculated as the ratio of the intersection to the union of the subsets of features for which the two
samples have non-zero values.
The weighted similarity method in the future may be useful to assess genomic-molecular patient profiles for
personalized choice of clinical trials or therapy in the construction of a patient similarity network. For the
construction of a patient similarity network, we plan to use pairwise weighted similarities that instantiate a
particular focus of interest, e.g., therapy relevant genetic alterations.
Bader	
  Group	
  
The Bader group engages in an impressive number of NRNB related collaborations each year (see CSP table
below). The CSP work over the past year includes a cell-cell communication project and pathway analysis of
single cell data. NRNB tools are underlined in the highlights below:
Crosstalk between cancer associated fibroblasts (CAFs) and epithelial cancer cells in Head and Neck
cancer.
Head and neck squamous cell carcinoma (HNSCC) is the sixth leading cause of cancer-related death
worldwide. Recently, the tumor microenvironment has been shown to have a significant impact on disease
progression in several tumor types, however, little is known about it in the context of HNSCC. Carcinoma-
associated fibroblasts (CAFs) make up a significant proportion of the tumor stroma, where they facilitate tumor
cell proliferation and invasion. Elucidating the molecular programs of interaction between these two cell
populations will help us to understand how they communicate with each other. Using laser-capture
microdissection (LCM) pure populations of tumor cells and CAFs were obtained from patient tissue sections,
along with their normal counterparts from matched tumor-free tissue. Laurie Ailles et al. at the Ontario Cancer
Institute, used gene expression microarrays to generate transcriptomic profiles of these cell populations. In our
collaboration together, we used these data to identify potential molecular interacting partners utilized by tumor
and stromal cells to communicate with each other.
The cell-cell interaction networks were constructed using the gene lists generated in the transcriptomic
analysis. The maps were made in Cytoscape using the tumor specific and the CAF specific genes (FDR <
0.05). The genes in both lists were classified as “ligand” or “receptor” according to Gene Ontology terms –
“cytokine activity”, “hormone activity” and “growth factor activity” for ligand and “receptor activity” for receptor
(Qiao et al.) iRefIndex (Razick et al.), a database of receptor-ligand interactions annotated based on published
studies, was used as a reference to build the interaction maps. Gene-Set Enrichment Analysis (GSEA) was run
against the ranked list of genes from top up-regulated in CAF versus tumor to down-regulated using the t-test t
values. The NRNB Cytoscape app EnrichmentMap was used to visualize the results of the enrichment analysis
(Figure 1).
Figure 1. a) Ligand-Receptor network created using Cytoscape. The outer circle represents CAF genes
and the inner circle represent tumor genes. b) Cytoscape EnrichmentMap representing enriched gene-sets
with FDR <= 0.05 in genes up-regulated in CAFs (red) and genes up-regulated in the tumor samples (blue).
Colon	
  cancer	
  stem	
  cell	
  characterization	
  and	
  analysis	
  
Colorectal cancer is the second leading cause of cancer death in the United States. Cancer stem cells are
suspected to play a major role in initiation and recurrence in this disease. Dr. Catherine O’Brien’s lab is
characterizing colon cancer stem cells to identify sensitive points that can be targeted to kill the cells. This
project analyzes POP92 cells, a patient-derived colon cancer stem cell line. By combining and collapsing the
transcriptome data of single cell RNAseq data from a colorectal cancer line into pathway activities, the Bader
lab has successfully identified 4 - 6 distinctive populations within the POP92 cells. Through correlation analysis
comparing each population with published colorectal CSC datasets, one of the populations has consistently the
highest correlation with known CSCs. Further characterization of this potential CSC population has identified
the activation of pathways in telomere and mitochondrial function while pathways involving apoptosis,
autophagy, differentiation, and development are repressed.
Single cell RNA-Seq was performed for 96 Wnt-low and 96 Wnt-high single cells from a colorectal cell line
cells (Fluidigm). Alignment of the fastq files was performed using STAR with human genome reference
GRCh37 to generate raw counts for gene expression values. Multiple clustering algorithms (W: Ward, K:K-
means, N:NMF) were employed to identify distinct populations of single cells. Pathway analysis of gene
expression data using the NRNB tool EnrichmentMap was used compare different clusters of cells and to
reveal differences in cell cycle and differentiation pathways (Figure 2).
Figure 2. a) FACS sorting of single cell populations; b) Clustering results from 3 different methods; c)
EnrichmentMap comparing 2 clusters of cells.
In this figure, we can see the difference between two groups of potential CSCs. CSC K4 (red nodes) has
elevated cell cycle, mitosis, recombination repair, non-recombinational repair, base repair, DNA replication,
DNA integrity, and TCF transactivating pathways. CSC K3 (blue nodes) has elevated embryo, endoderm, eye,
secretion, wnt, inhibition of apoptosis, extrinsic apoptosis, mesenchymal stem, and lymphocyte differentiation
pathways. From these pathways, it appears that CSC K4 is actively dividing with high level of precise DNA
replication. CSC K3 appears to be relatively more differentiated but with embryonic and mesenchymal stem
pathways activated.
Ideker	
  Group	
  
The table presents projects that are completed as collaborations with various faculty at UCSD and local
research institutions. All analyses were completed or are on-going as a part of a recharge fee-for-service.
Here is a brief description of three completed projects from the reporting period:
1) Pathway analysis of RNA Sequencing (Elisabeth Mertsching, Atyr Pharma) Results: Human T cells
were stimulated in presence of our test article (TA) or vehicle. After 24 hours, cells were collected, RNA
isolated and sent to GeneWiz for RNA sequencing. Results were analyzed with Limma and a list of genes
differentially expressed was established. Goals: Using this list, identify upstream targets of the TA. Determine
which pathway or pathways are modulated by our TA and identify proteins modulating expression of genes
affected by the TA. Status reports: 04/01/2016: Pathway analysis delivered via Google Drive. 04/09/2016:
Responded to two emailed questions from customer re Cytoscape usage; delivered pdf of additional
Cytoscape guidance. 04/11/2016: Responded to two emailed questions from customer re Cytoscape usage,
provided guidance on GeneMANIA. 04/19/2016: Delivered signed report pdf and additional guidance on
GeneMANIA. Deliverables: Detailed report pdf, colocalization methods pdf, cleaned differential expression csv
with ensembl and entrez gene ids, differential expression csv for all unique entrez gene ids, toppgene
enrichment report pdf, cytoscape network session, csv of heat propagation results. Guidance on how to use
Cytoscape and GeneMANIA.
	
  
2) NGS data integration and network analysis (Sanjay Nigam). Objective: Drug transporter networks and
data integration Updates: 11/21: Sent final deliverables: 1. Updated report of analysis, including addition of 4
control TF's/genes (Hoxb7, Sall1, Pax6, Ret) to the network proximity analysis. 2. Excel file containing
information about the Hnf4a subnetwork at each stage of development, including log2 fold change from E20,
community membership and associated GO term. 3. Integrated systems bio figure, with real miRNA/mRNA
expression data replacing simulated data from last week.
	
  
3) HD Network analysis (Vivian Hook). Objective: Mutant Huntington Protein Interactions Networks in
Animal Models of Huntington’s Disease. Integrate Htt interactor proteins in non-human animals with existing
knowledge of protein-protein interactions in the literature. Perform network analysis of these Htt-interactor
proteins overlaid on literature PPI networks. Identify groups of highly connected genes related to Htt, using
network propagation techniques and clustering methods. Pathway analysis of these highly connected genes,
particularly in pathways of interest, including mechanisms of cell death, intracellular trafficking and transport,
synaptic cell-cell communication and interactions, energy metabolism, cell viability. Build wild-type and mutant
Htt protein interaction networks, compiled from literature, and evaluate differences in connectivity and network
structure, using clustering and network propagation methods.
Pico	
  Group	
  
Since the renewal, the Pico group has initiated many new NRNB collaborations as a service model through
Gladstone’s Bioinformatics core (see CSP table below). Many of these collaborations fall into the standard
category of assisting with network and pathway analysis, but over a wide range of compelling topics. For
example, in collaboration with Drs. Gan and Akassoglou at Gladstone and UCSF, we identified interactions
networks involving drugs and proteins relating to the role of innate immune response and oxidative stress in
models of Alzheimer’s disease. Also with Dr. Akassoglou, we modeled novel pathways for neuroinflammation
implicated in Multiple Sclerosis. With members of Dr. Deepak Srivastava’s group, we applied NRNB tools to
help characterize alternative protocols for cardiac reprogramming, an essential technology for stem cell
therapies in cardiac related diseases and conditions. And in a collaboration with Dr. Sonja Schrepfer of UCSF,
we provided functional enrichment analysis and pathway visualization toward the study of the effects
microgravity on cardiac tissue expression. The microgravity environment, currently relevant to astronaut health,
was provided by the International Space Station for multiple rounds of transported mice.
In addition to service-based collaborations, we managed the code development collaborations between
~45 mentors and ~18 students during this reporting period through the Google Summer of Code and NRNB
Academy programs, combined. This work entails recruiting mentors, marshaling project ideas into the NRNB
GitHub tracker, preparing the mentoring organization application to Google, selecting student-mentor pairs,
and guiding mentors and students through a successful project, including formal evaluations and incorporating
new open source code repositories into https://github.com/nrnb/.
CSP	
  Table	
  for	
  Reporting	
  Period	
  
Total of 93 new and ongoing project during this reporting period. In yellow are the 41 projects that concluded
during this period.
Collaborating
Investigator
Investigator
Institution
Project Title Resource
Personnel
Start /
Finish
date
External
funding
status
Publications
John Kelsoe University of
California, San
Diego
RNAseq and network
analysis
Aaron Chang 2017- NIH
UL1TR001442
of CTSA
Vivian Hook University of
California, San
Diego
Mutant Huntingtin Protein
Interactions Networks in
Animal Models of
Huntington’s Disease
Aaron Chang 2017- NIH
UL1TR001442
of CTSA
Catherine
O’Brien
University
Health
Network,
Toronto,
Canada
Pathway and Network
analysis of single cell
RNAseq data from Colorectal
Cancer line using Cytoscape/
EnrichmentMap
Gary Bader/
Veronique
Voisin
2016-
Cindy Guidos Hospital for
Sick Children,
Toronto,
Canada
Pathway and Network
analysis of RNAseq data
using Cytoscape/
EnrichmentMap: comparing
different clinical subgroups of
B-ALL
Gary Bader/
Veronique
Voisin
2016-
David
Jimenez-
Morales
(Nevan
Krogan)
UCSF Network visualization support
and technology development
Alex Pico,
Adam Treister
2016-
Derek van der
Kooy
University of
Toronto,
Toronto,
Canada
Pathway and Network
analysis of single cell
RNAseq data from Retina
Photoreceptors from Stem
Cells
using Cytoscape/
EnrichmentMap
Gary Bader/
Veronique
Voisin
2016-
Faten Sayed
(Li Gan)
Gladstone
Institutes
Activity of Trem2 knockouts Alex Pico,
Kristina
Hanspers
2016-
Gelareh
Zadeh; Ken
Aldape
University
Health
Network,
Toronto,
Canada
Pathway and Network
analysis of meningioma
RNAseq data using
Cytoscape/ EnrichmentMap.
Gary Bader/
Veronique
Voisin
2016-
Gordon Keller Princess
Margaret
Cancer
Centre,
Toronto,
Canada
Pathway and Network
analysis of RNAseq data
using Cytoscape/
EnrichmentMap: comparison
endoderm differentiation from
embryonic stem cells.
Gary Bader/
Veronique
Voisin
2016-
Joan Brugge HMS Ludwig
Center
Pathway Analysis in Triple
Negative Breast Cancer
Chris Sander 2016- NCI
John Dick Ontario
Cancer
Institute,
Toronto,
Canada
Pathway and Network
analysis of proteomics data
using Cytoscape/
EnrichmentMap: comparison
between mirs co-
overexpression (VOD) and
normal cord blood samples.
Gary Bader/
Veronique
Voisin
2016-
Kristin Hope McMaster
University,
Hamilton,
Canada
Pathway and Network
analysis of RNAseq data
using Cytoscape/
EnrichmentMap: comparison
between PLAG1
overepxression and normal
cord blood samples.
Gary Bader/
Veronique
Voisin
2016-
Lihong Zhan
(Li Gan)
Gladstone
Institutes
Astrocyte response to
microglia depletion by drug
treatment
Alex Pico,
Kristina
Hanspers
2016-
Meghana
Gadgil
UCSF Metabolic pathways in
models of type 2 diabetes
Alex Pico 2016-
Nikolaus
Schultz
Memorial
Sloan-
Kettering
Cancer Center
TCGA PanCanAtlas:
Pathways Group
Augustin Luna 2016- TCGA, NCI
Sanjay Nigam University of
California, San
Diego
Drug transporter networks
and data integration
Aaron Chang 2016- NIH
1U54HD09025
9
Sonja
Schrepfer
UCSF Effects of microgravity on
cardiac tissue expression
Alex Pico 2016-
Arman Aksoy Memorial
Sloan-
Kettering
Cancer Center
Develop Pathway Database
Converters for the Expansion
of the Pathway Commons
Database
Augustin Luna 2015-
Ben Good,
PhD
Scripps
Research
Instittute
Playmatics portal for science
games
Alex Pico 2015-
Charles Perou University of
North Carolina
Pathway and Network
Analysis of Breast Cancer
Giovanni Ciriello 2015-
George
Chacko, Jim
Onken
NIH Office of
Data Analysis
Tools and
Systems
Network based grantee
portfolio analysis
Alex Pico 2015-
John Dick Ontario
Cancer
Institute,
Toronto ,
Canada
Protein expression in CD34+
cells overexpressing mir125a
Gary Bader/
Veronique
Voisin
2015-
John DIck Ontario
Cancer
Institute,
Toronto ,
Canada
Processing RNAseq data
from cord blood samples
corresponding to the whole
hematopoietic hierarchy
Gary Bader/
Veronique
Voisin
2015-
John DIck Ontario
Cancer
Institute,
Toronto ,
Canada
Energy metabolism in normal
and malignant hematopoietic
stem cells
Gary Bader/
Veronique
Voisin
2015-
John DIck Ontario
Cancer
Institute,
Toronto,
Canada
Clustering and pathway
analysis of patients with
relapsed AML
Gary Bader/
Veronique
Voisin
2015-
John DIck/
Jean Wang
Ontario
Cancer
Institute,
Toronto ,
Canada
analysis of the AML
subpopulation using CD200
Gary Bader/
Veronique
Voisin
2015-
John DIck/
Jean Wang
Ontario
Cancer
Institute,
Toronto ,
Canada
analysis of the AML
subpopulation using CD200
Gary Bader/
Veronique
Voisin
2015-
Laurie Ailles Ontario
Cancer
Institute (OCI),
Toronto,
Canada
Standard training in one-on-
one session for gene-set
enrichment using
Cytoscape/EnrichmentMap ,
ovarian cancer CAF vs NAF
Gary Bader/
Veronique
Voisin
2015-
Massimo Loda Dana-Farber
Cancer
Institute
Comprehensive Genomic
Analysis/Metabolic of
Prostate Adenocarcinoma
Ed Reznik 2015-
Metin Can
Siper (Uğur
Doğrusöz,
Onur Sümer)
Bilkent
University
Computer
Science
Department,
Ankara,
Turkey
Improving Cytoscape.js
Based Viewer for SBGN
Process Description
Diagrams with Better Layout
and Advanced Complexity
Management Operations
Augustin Luna 2015-
Patricia
Defechereux
Gladstone
Institutes
Pathway and Network
Analysis of HIV Groups
Alex Pico 2015-
Robert
Rottapel
University of
Toronto,
Canada
Pathways and processes
active in Ovarian Serous
Cancer
Gary Bader/
Veronique
Voisin
2015-
Ruedi
Aebersold
ETH Zurich Pathway and Network
Analysis of Prostate Cancer
Alex Root 2015- MSKCC
Sheila Singh McMaster
Stem Cell and
Cancer
Research
Institute,
Canada
GBM CD133+/- vs NSC
CD133 +/-
Gary Bader/
Veronique
Voisin
2015-
Sheila Singh McMaster
Stem Cell and
Cancer
Research
Institute,
Canada
Bmi1 knockdown effects Gary Bader/
Veronique
Voisin
2015-
Jean Wang,
John E. Dick
Ontario
Cancer
Institute
Large scale analysis of stem
cell enriched fraction of adult
acute myeloid leukemias
Gary Bader,
Veronique
Voisin
2013- Ontario
Institute for
Cancer
Research
Benjamin A.
Alman
SickKids Network Analysis on Stem
cells from musculoskeletal
tumors
Veronique
Voisin, Gary
Bader
2012- Ontario
Institute for
Cancer
Research
Charles
Sawyers
MSKCC Pathway and Network
Analysis of Prostate Cancer
Chris Sander,
Debbie Bemis,
Alex Root
2012- NCI, NIH
Mark Ginsberg UCSD Composition of the Integrin
Activation Complex
Trey Ideker 2012- NIH
Mathew
Meyerson
Harvard
University
Pathway and Network
Analysis of Lung Cancer
Chris Sander,
Debbie Bemis
2012- TCGA
Genome Data
Analysis
Center, NCI
22960745
Stephen
Friend
Sage
Bionetworks
Integrating Cancer Datasets
for Predictive Model
Development and Training,
Rheumatoid arthritis
treatment prediction
Trey Ideker
Gary Bader
Chris Sander
Alex Pico
2012- NCI U54
CA149237
23671412,
23177740,
22836096,
21390021
Steven Kay UCSD Cell-autonomous circadian
clock of hepatocytes drives
rhythms in transcription and
polyamine synthesis
Trey Ideker 2012- NIH
William
Stephen
Hancock
Northeastern
University
ERBB2 driven cancer Trey Ideker 2012- Multiple 23647160
Andrew Emili University of
Toronto,
Canada
Mechanistic investigation of
microRNA-mediated
regulation of dilated
cardiomyopathy
Gary Bader 2011-
Jianfeng Li
(Robert W.
Sobol)
University of
Pittsburgh
Cancer
Institute,
Hillman
Cancer
Center,
University of
Pittsburgh
DNA Repair dependent
transcriptome reprogramming
& investing synthetic lethal
interactions with DNA repair
genes
Trey Ideker 2011- NIH
Marc Vidal Dana Farber Mapping the human
interactome and its rewiring
by disease mutations
Gary Bader 2011- NHGRI P50
HG004233,
NHGRI U01
HG001715,
NIGMS R01
GM109199
23549480,
19841731
Mike Cherry,
Judith Blake
Stanford,
Jackson Labs
Gene Ontology Consortium,
Saccharomyces Genome
Database
Trey Ideker,
Gary Bader
2011- NHGRI P50
HG004233,
NHGRI U01
HG001715,
NIGMS R01
GM109199
23242164
Peter
Zandstra
University of
Toronto,
Canada
Mapping and analyzing cell-
cell interactions in the
hematopoietic system
Gary Bader 2011-
Quaid Morris University of
Toronto,
Canada
Development of GeneMANIA
gene function prediction
software
Gary Bader 2011- Genome
Canada
23794635
Sheila Singh Stem Cell and
Cancer
Research
Institute (SCC-
RI) at
McMaster
University
Characterization of the
Heterogeneity of Human
BTICs
Gary Bader 2011- Ontario
Institute for
Cancer Stem
Cell Research
Ruedi
Aebersold
ETH Analysis of differential genetic
networks
Trey Ideker 2010- Swiss Federal 21127252
(Science, 176)
Katerina
Akassoglou
Gladstone
Institutes
Pathway modeling for
neuroinflammation model of
Multiple Sclerosis
Alex Pico,
Kristina
Hanspers
2016-
2016
David Gordon
(Nevan
Krogan)
UCSF HIV host-pathogen genetic
interaction networks
Alex Pico,
Scooter Morris
2016-
2016
Nicole Stone
(Deepak
Srivastava)
Gladstone
Institutes
Network and pathway
analysis of cardiac
reprogramming
Alex Pico,
Kristina
Hanspers
2016-
2016
Gan Li Gladstone
Institutes
Network analysis of perturbed
cell model for Alzheimer’s
disease
Alex Pico 2016-
2016
Katerina
Akassoglou
Gladstone
Institutes
Treatment effects on
Alzheimer’s disease networks
and TYROBP pathway
Alex Pico,
Kristina
Hanspers
2016-
2016
Elizabeth
Mertsching
aTyr Pharma Pathway analysis of RNA
sequencing
Aaron Chang 2016-
2016
Commercial
sponsor
Devesh
Khandelwal
University of
Delhi, India
SBGN-ML and SBML to
Escher converter
Zachary King,
Alex Pico
2016-
2016
Google
Supun
Arunoda
University of
Moratuwa, Sri
Lanka
PCA and t-DSNE in
clusterMaker2
Scooter Morris 2016-
2016
Google
Istemi Bahceci Bilkent
University,
Turkey
Visualizing genomic
alterations in TCGA cancer
pathways in cBioPortal
Ugur Dogrusoz,
Chris Sander
2016-
2016
Google
Metin Can
Siper
Bilkent
University
Computer
Science
Department,
Ankara,
Turkey
Improving Cytoscape.js
Based Viewer for SBGN
Process Description
Diagrams with Better Layout
and Advanced Complexity
Management Operations
Augustin Luna,
Uğur Doğrusöz,
Onur Sümer
2016-
2016
Google
Julia
Gustavsen
University of
British
Columbia
RCy3 for network
manipulation using
Cytoscape
Augustin Luna 2016-
2016
Google
Ivan Bestvina University of
Zagreb,
Croatia
Multithread Centiscape Giovanni
Scardoni, Alex
Pico
2016-
2016
Google
Hovakim
Grabski
Russian
Armenian
University,
Armenia
Deviser for SBML libraries Frank
Bergmann, Alex
Pico
2016-
2016
Google
Kaito Ii Keio University
Graduate
School of
Science and
Technology,
Japan
Interconvertable layout
program for CellDesigner
Akira
Funahashi, Alex
Pico
2016-
2016
Google
Mridul Seth Birla Institute
of Technology
and Sciences,
India
Cytoscape file import into
GraphSpace
TM Murali, Alex
Pico
2016-
2016
Google
Roman
Schulte
Eberhard Karls
University,
Germany
JSBML validation system Andreas
Drager, Alex
Pico
2016-
2016
Google
Ashish Tiwari Arizona State
University
Cytoscape command line
scripting enhancements
Scooter Morris 2016-
2016
Google
Tramy Nguyen University of
Utah
SBML and BioPAX
coversions
Mike Hucka,
Alex Pico
2016-
2016
Google
Joseph Stahl Vanderbilt
University
Cytoscape.js interactive
tutorials
Max Franz 2016-
2016
Google
William Miles McMaster
University,
Canada
TOR-IBIN web interface
development
Mohammed
Helmy
2016-
2016
Google
Zhaoyuan Zoe
Xi
UCLA Cytoscape.js clustering
algorithms
Mike Kucera 2016-
2016
Google
Michael
Rosenberg
Agilent
Technologies
Pathway analysis of human
toxome
Alex Pico 2016-
2016
Alberto Ocaña Albacete
University
Hospital,
Albacete,
Spain
Identification and optimization
of targeted drug combinations
in breast cancer
Gary Bader/
Veronique
Voisin
2015-
2016
26314846
Chi-Hua Chen University of
California, San
Diego
Barabasi disease-disease
interactome analysis for
schizophrenia vs bipolar
GWAS genes
Aaron Chang 2015-
2016
NIH 1
R01MH100351
Danielle
Swany
(Krogan)
UCSF Cytoscape and Mass Spec
Workshops
Alex Pico 2015-
2016
Douglas
Levine
Memorial
Sloan-
Kettering
Cancer Center
Pathway and Network
Analysis of Endometrial
Cancer
Jianjiong Gao 2015-
2016
TCGA, NCI 23636398
Jill Mesirov Broad Institute GenomeSpace, Broad
Integrative Genomics Viewer
Jianjiong Gao 2015-
2016
John DIck Ontario
Cancer
Institute,
Toronto,
Canada
ITGb7+ and ITGb7–
hematopoietic stem cells.
Gary Bader/
Veronique
Voisin
2015-
2016
Kristin Hope McMaster
University,
Hamilton,
Canada
Pathway and Network
analysis of RNAseq data
using Cytoscape/
EnrichmentMap: comparison
between MSI2
overepxression and normal
cord blood samples.
Gary Bader/
Veronique
Voisin
2015-
2016
27121842
Peter Dirks Hospital for
Sick Children,
Toronto,
Canada
Standard training in one-on-
one session for gene-set
enrichment using
Cytoscape/EnrichmentMap ,
ASCL1 knockout in
glioblastoma
Gary Bader/
Veronique
Voisin
2015-
2016
Sandy
Williams, PhD
Gladstone
Institutes
Co-author networks and
metrics
Alex Pico 2015-
2016
Cynthia
Guidos
Hospital for
Sick Children,
Toronto,
Canada
IL-7 coordinates proliferation,
differentiation and Tcra
recombination during
thymocyte β-selection.
Gary Bader/
Veronique
Voisin
2015-
2016
25729925
John DIck Ontario
Cancer
Institute,
Toronto,
Canada
assessing expression levels
of erUPR and translation
initiation genes in normal and
leukemic stem cells
Gary Bader/
Veronique
Voisin
2015-
2016
John Dick Ontario
Cancer
Institute,
Toronto,
Canada
Normal and cancer
hematopoietic stem cells:
miR-126
Gary Bader/
Veronique
Voisin
2015-
2016
27300437,
27070706
John DIck Ontario
Cancer
Institute,
Toronto,
Canada
Pathways and Processes
active in leukemic stem
cells(LSC) of Acute Myeloid
Leukemia(AML) using label
free protein mass
spectrometry
Gary Bader/
Veronique
Voisin
2015-
2016
John Dick Ontario
Cancer
Institute,
Toronto ,
Canada
Protein expression in CD34+
cells overexpressing mir125a
Gary Bader/
Veronique
Voisin
2015-
2016
27424784
Laurie Ailles Ontario
Cancer
Institute (OCI),
Toronto,
Canada
Crosstalk between CAFs
(cancer associated
fibroblasts) and epithelial
cancer cells in the Head and
Neck cancer.
Gary Bader/
Veronique
Voisin
2015-
2016
Theodore J.
Brown
Lunenfeld-
Tanenbaum
Research
Institute,
Toronto,
Canada
Function of the fallopian tube
and ovulation in the
predisposition to ovarian
cancer
Gary Bader/
Veronique
Voisin
2015-
2016
26039994
Aaron D
Schimmer
The Princess
Margaret
Hospital, The
Ontario
Cancer
Institute,
University
Health
Network
Metabolic adaptation to
chronic inhibition of
mitochondrial protein
synthesis in acute myeloid
leukemia cells.
Gary Bader,
Veronique
Voisin
2013-
2016
NIH 23520503
Claudia C Dos
Santos
Keenan
Research
Centre of the
Li Ka Shing
Knowledge
Institute of St.
Michael's
Hospital
Acute lung injury Gary Bader,
Veronique
Voisin
2013-
2016
Canadian
Institutes of
Health
Research
Jaime O.
Claudio, Jean
Wang, John E.
Dick
Ontario
Cancer
Institute
Development of Highly Active
Anti-Leukemia Stem Cell
Therapy
Gary Bader,
Veronique
Voisin
2013-
2016
Ontario
Institute for
Cancer
Research
Jayne Danska SickKids microbiome and alterations in
gene expression
Gary Bader,
Veronique
Voisin
2013-
2016
Canadian
Institutes for
Health
Research,
Juvenile
Diabetes
Research
Foundation
Nadeem
Moghal
Ontario
Cancer
Institute
Stem/progenitor cell biology
in human lung.
Gary Bader,
Veronique
Voisin
2013-
2016
Canadian
Institutes of
Health
Research
Peter Dirks Hospital for
Sick Children
Isolation and characterization
of a cancer stem cell from
human brain tumours.
Gary Bader,
Veronique
Voisin
2013-
2016
Ontario
Institute for
Cancer
Research
25561528
Peter Dirks Hospital for
Sick Children
Pathway and Network
analysis of RNAseq data
using Cytoscape/
EnrichmentMap: Role of
dopamine D4 receptor in
glioblastoma stem cells
Gary Bader,
Veronique
Voisin
2013-
2016
Ontario
Institute for
Cancer
Research
27300435
Charles Perou University of
North Carolina
Pathway and Network
Analysis of Breast Cancer
Chris Sander 2012-
2016
TCGA
Genome Data
Analysis
Center, NCI
23000897,
24096568,
26451490
Mathew
Meyerson
Harvard
University
Pathway and Network
Analysis of Lung Cancer
Chris Sander,
Debbie Bemis
2012-
2016
TCGA
Genome Data
Analysis
Center, NCI
22960745
Eldad
Zacksenhaus
Toronto
General
Research
Institute,
Canada
Breast cancer Gary Bader 2011-
2016
Ontario
Institute for
Cancer Stem
Cell Research
22460789;
25330770;
27571409
Jayne
Danska,
Cynthia
Guidos
University of
Toronto,
Canada
Pathway and network
analysis of mouse models of
leukemia
Daniele Merico,
Gary Bader
2011-
2016
Ontario
Institute for
Cancer
Research
John Dick Ontario
Cancer
Institute
Normal and cancer
hematopoietic stem cells
Gary Bader 2011-
2016
Ontario
Institute for
Cancer Stem
Cell Research
23142521
Katherine
Siminovitch
University of
Toronto,
Canada
Clinical genomics of human
genetic diseases
Ruth Isserlin,
Gary Bader
2011-
2016
Margaret
Wrensch
University of
California San
Francisco
Genetic and Molecular
Epidemiology of Adult Glioma
Alexander Pico 2011-
2016
NIH 24908248,
22922872,
23733245,
23361564
Michael Taylor University of
Toronto,
Canada
Pathway and network
analysis of pediatric brain
tumours
Ruth Isserlin,
Gary Bader
2011-
2016
Genome
Canada
22832581,
21840481,
20393554,
24553142
Nevan Krogan UCSF Evolution of viral-human
protein complexes
Trey Ideker,
Scooter Morris,
Alex Pico
2011-
2016
NIAD P01
AI091575,
NIGMS R01
GM098101
23273983,
22190034,
23242164,
22681890,
22252388,
21127252
Anthony
Gramolini
University of
Toronto,
Canada
Pathway and network
analysis of mouse models of
heart disease
Ruth Isserlin,
Gary Bader
2011-
2016
Heart and
Stroke
Foundation of
Ontario
20127684
Igor Jurisica,
Lincoln Stein
University
Health
Network
Cancer Gene Encyclopaedia
(CGEP)
Gary Bader 2011-
2016
Ontario
Ministry of
Research and
Innovation
Jill Mesirov Broad Institute GenomeSpace, Broad
Integrative Genomics Viewer
Debbie Bemis,
Chris Sander,
Trey Ideker
2010-
2016
NHGRI, Starr
Consortium
25165537
B.4	
  What	
  training	
  opportunities	
  
The collaborations during this period included many requests to prepare a custom training events and one-on-
one sessions. During this reporting period, the Pico, Bader and Ideker groups offered support to local
researchers via consulting meetings and one-on-one training sessions with the aim for biologists to learn how
to use NRNB tools in their research. For example, how to install Cytoscape on personal computers and
navigate through a network as well as training on how to go through the Bader lab enrichment analysis
standard pipeline which can be summarized under these steps: 1) run GSEA or g:Profiler or similar gene-set
enrichment tools 2) create a network of enriched pathways using Cytoscape/EnrichmentMap 3) perform post-
analysis using EnrichmentMap features or GeneMANIA.
Additionally, 15 of the collaboration projects listed in B.2 served as intensive training opportunities for
students accepted into our NRNB Google Summer of Code program. The students learned not only about
NRNB tool development, but also about open source software development with a distributed team.
B.6	
  What	
  do	
  you	
  plan	
  to	
  do	
  for	
  the	
  next	
  reporting	
  period	
  to	
  accomplish	
  the	
  goals?	
  
Each group plans to continue its own highly successful collaboration process as well as our coordinated
participation as a mentoring organization in Google Summer of Code.
Infrastructure	
  	
  
B.2	
  What	
  was	
  accomplished	
  under	
  these	
  goals?	
  
Cytoscape	
  Cyberinfrastructure	
  (CI)	
  
In our 2016 report, we described the creation of initial technologies needed to create an ecosystem of
biologically valuable Internet-based services that exchange network data in a stable, performant, scalable,
reusable, recombinable and reliable manner. We described the initial development of:
• The CX lossless network transfer format (see section C.3) connecting new CI clients (e.g., Cytoscape
and Jupyter apps) and services (e.g., Diffusion)
• The cyREST system that exposes Cytoscape functionality to external workflows
• The Elsa REST service request router, which enables a service client to wait for results of long-running
calculations
• cyWidget reusable browser-based application libraries
• Future of Publishing initiative that enables journal publishers to make dynamic content available in their
articles.
In 2016, we created and released Diffusion, which is the first Cytoscape app to demonstrate the integration
of basic CI service technologies (e.g., Cytoscap apps, CX, request routing and service deployment). The
Diffusion app calls the CI’s new Diffusion service, which uses a heat propagation approach to identify
subnetworks worthy of focused study, given a list of nodes in a large network. The Diffusion service executes
on Google Cloud servers, thereby enabling authors of Python, R, Java or Javascript-based workflows (e.g.,
disease gene prioritization in GWAS, protein function prediction and discovery of significantly mutated
networks) to avoid reinventing such algorithms or provisioning the substantial computational resources needed
for their execution. In the larger picture, the Diffusion service demonstrates how the CI can allow typical
biological programmers to dramatically increase the audience for their code and gain access to Internet-scale
computational resources. The Cytoscape app’s immediate impact will be to enable a new family of filtering and
affinity algorithms that allow Cytoscape users to extract value from large networks.
The CI framework on which Diffusion is built leverages modern Kubernetes cluster technology (to augment
Elsa, see section C.3), common CX-based message formats, server-based interface stubs, call metering and
central logging to enable biological programmers to package algorithmic code as a highly scalable, highly
available microservice with access to server- and cluster-class computing resources.
We launched cyREST2 as a significant expansion of the highly successful cyREST Cytoscape feature
(http://apps.cytoscape.org/apps/cyrest). Based on user feedback and demand, cyREST2 aims to enable both
REST calls and scripting calls that mirror all Cytoscape functionality already available to users through the
Cytoscape UI. Critically, cyREST2 will enable access to functionality available through Cytoscape apps,
including enrichment, clustering, network acquisition, enhanced graphics and graph analysis. We will work with
both the Python/Jupyter and R communities to upgrade their cyREST interface support (e.g.,
http://bioconductor.org/packages/release/bioc/html/RCy3.html).
We have built up a collection of cyWidgets (see section C.3) that enable both Cytoscape- and browser-
based apps to reuse components that perform high value user-facing tasks, including:
NDExValetFinder – a UI that allows the user to explore an NDEx network repository and select
networks of interest.
NDExStore – a UI that allows a user to annotate and store a network in an NDEx repository.
NDExLogin – a UI that enables a user to provide credentials for an NDEx repository.
SimpleNetworkViewer – a UI that displays and allows interaction with a network fetched from an
NDEx repository.
cyWidgets are built on the simple and popular Facebook React framework. We demonstrated the cross-
platform reusability of cyWidgets by deploying them in both desktop-based Java and browser-based Javascript
environments. For Java, we combined NDExValetFinder, NDExStore and NDExLogin with the Electron
execution framework (http://electron.atom.io/) to enable Cytoscape to use NDEx as its primary network store.
For Javascript, we embedded NDExValetFinder into the next generation of the NDEx home page to upgrade
the user experience. Integration of cyWidgets into multiple software platforms has proven to be an inexpensive
means to improve the user experience, and has been particularly cost effective because improvements to
cyWidgets benefit multiple platforms.
We created the deep-cell web app and service in support the Deep Cell phenotype prediction research
described in section B.2. The service is deployed in the NRNB Kubernetes cluster as a peer to the Diffusion
service, and informs the use of GPU processors for future machine learning services. The web app
demonstrates the use of a hierarchy of network displays to explore semantic zooming techniques and to inform
the construction of a group of cyWidgets generalized to quickly and economically create novel high
dimensional ontological displays that represent service-based biological simulation. This continues work
performed for AtgO and NeXO reported in previous years.
Finally, we continued our Future of Publishing initiative by enabling quick and trouble free submission of
networks to the Elsevier publishing system (via the new ScienceDirect Cytoscape app). We are working with
Elsevier to improve outreach, enable submission of interactive networks on most scientific journals, and to
tightly integrate with Elsevier’s Pathway Studio product. Elsevier has committed in principle to further
streamlining network publishing by enabling submission via the NDEx network repository and displaying these
networks using an evolution of the SimpleNetworkViewer cyWidget.
Cytoscape	
  App	
  Store	
  
The maintenance of the site allows it to host over 308 apps (an 18% increase over last year) developed by 588
different developers around the world and support Cytoscape users downloading an average of 846 apps per
day (a 45% increase over the past 12 months). Since our last report, we incrementally improved the App
Developer Ladder (http://wiki.cytoscape.org/Cytoscape_3/AppDeveloper/Cytoscape_App_Ladder), which takes
a prospective developer step by step through the app development and submission process. We also moved
the App Store to a VMware virtual machine hosted in the NRNB Cluster (described below). In section B.6, we
propose major changes for the App Store over the next reporting period.
NRNB	
  Cluster	
  
In the last year, we significantly upgraded the NRNB Cluster both inside and outside of the firewall. Inside (for
HIPAA loads), we added six Supermicro RM224 2U compute servers each containing 1TB RAM, forty eight
2.1GHz cores (96 threads), 4TB local storage, 10Gb/s network adapters and the Ubuntu 14.04 operating
system. We also added a 240TB (raw) high performance GPFS storage server based on redundant
Supermicro RM110 1U head nodes, each with 128GB RAM, and redundant Supermicro RM216 2U metadata
servers with twelve 800GB SSDs each.
Outside of the firewall, we added 3 Supermicro RM216 2U virtual machine servers each containing 256GB,
twenty 3.10GHz cores and 10Gb/s network adapters. We also added an additional 10Gb/s 32 port Juniper
EX4550 switch as a border router. Finally, we added an additional 360TB (raw) NFS storage server controlled
by a Supermicro RM110 1U head node containing 128GB RAM and a 10Gb/s network adapter.
All equipment is housed at the San Diego Supercomputer Center and is connected to their high speed
backbone.
After combining this equipment with cluster described in last year’s report, the NRNB Cluster contains total
of 17TB RAM (330% increase), 1720 compute threads (67% increase), and 835TB (raw) useable storage
(230% increase). Over the last year, 90% of the cluster nodes have been saturated with NRNB-sponsored jobs
80% of the time.
Rany Salem, a new UCSD investigator, contributed an additional 1TB RM224 Supermicro server (96
threads) and 48TB storage in exchange for use of the NRNB cluster resources.
We purchased and deployed 9 Intel NUC5i5RYH micro-workstations each containing an Intel i5-5250U
1.6GHz dual core processor, 8GB RAM and 240GB SSD along with a high resolution (2560x1440) monitor.
They all run the Ubuntu 16.04 operating system. They complement the eight high performance Dell T5600
workstations and the VMware EXSi v5.5 server farm previously reported.
B.5	
  How	
  have	
  results	
  been	
  disseminated	
  to	
  communities	
  of	
  interest?	
  
Dissemination Section B.2 describes the accelerating download trends for Cytoscape Desktop. To measure
the dissemination of Cytoscape Desktop results, we count citations to Cytoscape Desktop in references
available through Google Scholar. As shown below, real Cytoscape usage is climbing, commensurate with
downloads.
Year Count Year/Year
Growth
2016 2218 24%
2015 1789 9%
2014 1645 21%
2013 1363 -5%
2012 1435 65%
2011 870 28%
2010 680 15%
2009 590 37%
2008 430 105%
2007 210 91%
2006 110 38%
2005 80 60%
2004 50
Additionally, we have published or participated in a number of papers describing our Cytoscape Desktop
advances and how best to leverage them:
• Fitts D, Zhang Z, Maher M, Demchak B. dot-app: a Graphviz-Cytoscape conversion plug-in. F1000Res.
2016 Oct 20;5:2543. doi: 10.12688/f1000research.9751.1, eCollection 2016.
• Kucera M, Isserlin R, Arkhangorodsky A and Bader GD. AutoAnnotate: A Cytoscape app for
summarizing networks with semantic annotations. F1000Research 2016, 5:1717 (doi:
10.12688/f1000research.9090.1)
• Morris JH, Vijay D, Federowicz S et al. CyAnimator: Simple Animations of Cytoscape Networks.
F1000Research 2015, 4:482 (doi: 10.12688/f1000research.6852.2)
• Kofia V, Isserlin R, Buchan AMJ and Bader GD. Social Network: a Cytoscape app for visualizing co-
authorship networks. F1000Research 2015, 4:481 (doi: 10.12688/f1000research.6804.3)
• Rinnone F, Micale G, Bonnici V et al. NetMatchStar: an enhanced Cytoscape network querying app.
F1000Research 2015, 4:479 (doi: 10.12688/f1000research.6656.2)
NRNB Cluster results manifest in papers (below) made possible because of access to the cluster. Cluster
users in the reporting period include the Trey Ideker Lab, the Hannah Carter Lab, the Nick Schork Lab and the
Rany Salem lab.
• Yu M, Kramer M, Dutkowski J, Srivas R, Licon K, Kreisberg J, Ng C, Krogan N, Sharan R, Ideker T.
Translation of Genotype to Phenotype by a Hierarchy of Cell Subsystems*. Cell Systems. 2016 Feb
24;2(2):77-88. doi: 10.1016/j.cels.2016.02.003.
• Jaeger PA, Lucin KM, Britschgi M, Vardarajan B, Huang RP, Kirby ED, Abbey R, Boeve BF, Boxer AL,
Farrer LA, Finch N, Graff-Radford NR, Head E, Hoffree M, Huang R, Johns H, Karydas A, Knopman
DS, Loboda A, Masliah E, Narasimhan R, Petersen RC, Podtelezhnikov A, Pradhan S, Rademakers R,
Sun CH, Younkin SG, Miller BL, Ideker T, Wyss-Coray T. Network-driven plasma proteomics expose
molecular changes in the Alzheimer's brain. Mol Neurodegener. 2016 Apr 26;11:31. doi:
10.1186/s13024-016-0095-2.
• Srivas R, Shen JP, Yang CC, Sun SM, Li J, Gross AM, Jensen J, Licon K, Bojorquez-Gomez A,
Klepper K, Huang J, Pekin D, Xu JL, Yeerna H, Sivaganesh V, Kollenstart L, van Attikum H, Aza-Blanc
P, Sobol RW, Ideker T. A Network of Conserved Synthetic Lethal Interactions for Exploration of
Precision Cancer Therapy. Molecular Cell. 2016 Jul 19. pii: S1097-2765(16)30280-5. doi:
10.1016/j.molcel.2016.06.022.
• Hofree M, Carter H, Kreisberg JF, Bandyopadhyay S, Mischel PS, Friend S, Ideker T. Challenges in
identifying cancer genes by analysis of exomesequencing data. Nature Communications. 2016 Jul
15;7:12096. doi: 10.1038/ncomms12096.
• Gross AM, Jaeger PA, Kreisberg JF, Licon K, Jepsen KL, Khosroheidari M, Morsey BM, Swindells S,
Shen H, Ng CT, Flagg K, Chen D, Zhang K, Fox HS,Ideker T. Methylome-wide Analysis of Chronic HIV
Infection Reveals Five-Year Increase in Biological Age and Epigenetic Targeting of HLA. Mol Cell. 2016
Apr 21;62(2):157-68. doi: 10.1016/j.molcel.2016.03.019.
• Guo T, Gaykalova DA, Considine M, Wheelan S, Pallavajjala A, Bishop JA, Westra WH, Ideker T, Koch
WM, Khan Z, Fertig EJ, Califano JA. Characterization of functionally active gene fusions in human
papillomavirus related oropharyngeal squamous cell carcinoma. Int J Cancer. 2016 Jul 15;139(2):373-
82. doi: 10.1002/ijc.30081. Epub 2016 Mar 30.
• Liss MA, DeConde R, Caovan D, Hofler J, Gabe M, Palazzi KL, Patel ND, Lee HJ, Ideker T, Van
Poppel H, Karow D, Aertsen M, Casola G, Derweesh IH. Parenchymal Volumetric Assessment as a
Predictive Tool to Determine Renal Function Benefit of Nephron-Sparing Surgery Compared with
Radical Nephrectomy. Journal of Endourology. 2016 Jan;30(1):114-21. doi: 10.1089/end.2015.0411.
Epub 2015 Sep 25.
• Qu K, Garamszegi S, Wu F, Thorvaldsdottir H, Liefeld T, Ocana M, Borges-Rivera D, Pochet N,
Robinson JT, Demchak B, Hull T, Ben-Artzi G, Blankenberg D, Barber GP, Lee BT, Kuhn RM,
Nekrutenko A, Segal E, Ideker T, Reich M, Chang HY, Mesirov JP. Integrative genomic analysis by
interoperation of bioinformatics tools in GenomeSpace. Nature Methods. 2016 Jan 18. doi:
10.1038/nmeth.3732.
• Kramer MH, Farré JC, Mitra K, Yu MK, Ono K, Demchak B, Licon K, Flagg M, Balakrishnan R, Cherry
JM, Subramani S, Ideker T. Active Interaction Mapping Reveals the Hierarchical Organization of
Autophagy. Mol Cell. 2017 Feb 16;65(4):761-774.e5. doi: 10.1016/j.molcel.2016.12.024.
B.6	
  What	
  do	
  you	
  plan	
  to	
  do	
  for	
  the	
  next	
  reporting	
  period	
  to	
  accomplish	
  the	
  goals?	
  	
  
Cytoscape	
  Cyberinfrastructure	
  (CI)	
  
With the successful NDEx-centric cyWidget in Cytoscape apps, we will extend the cyWidget family to a group
that cooperates to do most of the functions involved in core network analysis workflows, including network
visualization, layout, analysis, enrichment calculations, and attribute merging. This will initially power evolutions
of the emerging web.cytoscape.org web-based network viewer (and eventual web-based workstation). We will
create additional cyWidgets opportunistically reduce the time needed to create high quality web and desktop
applications.
We will support computational biologists as they create and deploy new computational pipelines, including
those relating to specific TRDs and DBPs (e.g., Deep Cell, which calculated phenotypes derived from
genotype perturbations).
We will develop the NRNB Kubernetes cluster and related infrastructure (e.g., Elastic Search and Central
logging) to support robust services deployed on behalf of biologists who author novel and reusable algorithms.
Finally, we will deploy an expanded VMware cluster of supporting hundreds of virtual machines, which in
turn support multiple bioinformatic services contributed in a manner similar to Cytoscape apps for the
Cytoscape desktop.
Cytoscape	
  App	
  Store	
  
We plan to create a version of the App Store that supports the Cytoscape Cyberinfrastructure (CI) described in
the Infrastructure section. Through the CI Store, application programmers will be able to discover the
existence, purpose, documentation, and API interface for services available for either immediate use or
installation on private servers.
We will also develop a Docker-based system to assist CI service developers in packaging and
disseminating their services through the CI Store. The system will include documentation standards, testing
standards, packaging of service installation files, and submission procedures.
NRNB	
  Cluster	
  	
  
We will expand the cluster to meet the increased demands of the TRDs and DBPs, including processing larger
networks, performing deeper inspection, performing differential analysis, and improved classification precision.
Specifically, we will:
• assess the need for GPU processors in new and existing nodes.
• activate three high capacity VMware virtual machine servers to enable the deployment of biological
services.
While the cluster has proven remarkably reliable, we will improve robustness by adding redundant
elements and connections to enable access to storage and networking across single point failures.
Dissemination	
  	
  
B.2	
  What	
  was	
  accomplished	
  under	
  these	
  goals?	
  	
  
NRNB.org	
  
NRNB.org is the main web site for the National Resource for Network Biology and serves as the primary
source of disseminating NRNB resources and associated information. It is constantly updated with information
for NRNB collaborators and researchers as well as the larger network biology community. The site includes our
project description and annual reports, available tools and resources, links to training materials, programs and
events, and instruction in how to collaborate. The front page features a 5-minute promotional video called
"What is NRNB?", where NRNB Principal Investigator Trey Ideker, along with co-Investigators, DBPs and
CSPs, describe network biology challenges and the impact of the NRNB. This video has been viewed over
15,000 times in the past five years. Traffic to the site has about 840 visits per month. Since the site went live in
late 2010, we have had over 84,000 visits.
The most visited page on the website is the GSoC page (http://nrnb.org/gsoc.html), closely followed by the
Training page (http://nrnb.org/training.html). The GSoC page has information about NRNBs involvement in the
Google Summer of Code program, including links to project ideas, information for students and mentors,
testimonials from previous participants and full listings of all projects completed as part of GSoC. The Training
page has up-to-date information on upcoming training events and also includes a full listing of courses relevant
to network biology. The Training page also links to popular training materials for NRNB tools, including links to
OpenHelix and Open Tutorial sites. NRNB.org includes a Media section where we provide embedded video,
pdfs and slideshows for various NRNB-related tools, our annual reports, and presentations from the Network
Biology SIG and App Expo meetings. The NRNB website also includes a section related to Collaborations
(http://nrnb.org/outreach.html), with in-depth information on the types of collaboration opportunities available,
including Google Summer of Code, NRNB Academy and direct access to our Collaboration Request Form. A
complete list of all current NRNB collaborations is also available on the website. Our Tools page
(http://nrnb.org/tools.html) presents all tools currently supported by the NRNB, and is also a highly accessed
page on the site, 4th
after the GSoC, Training and Competitions pages. A dedicated page per tool contains
relevant information on usage, user documentation, developer resources and availability. An image gallery
serves as a visual introduction to each tool’s capabilities and use.
The attentive maintenance and updating of the site helps make NRNB.org the #2 Google search results for
"network biology tools", second only to Cytoscape.org, an NRNB supported tool. NRNB.org is the #3 result
even when searching for just "network biology". These are global, non-personalized results. Over the past
year, traffic to the site averages about 840 visits per month. Since the site went live in late 2010, we have had
over 84,000 visits.
Cytoscape.org	
  Website	
  
Since our last report (March 2016), we significantly improved our discourse on Cytoscape history and future
directions: http://www.cytoscape.org/roadmap.html. It now more plainly lays out our vision for Themes and
Features for future releases, and is laid out to improve communication with users, developers and curious
parties.
Since January 2016, monthly downloads have grown to an average of 17,000 per month for the year, with
Cytoscape v3 accounting for the vast majority of downloads (see below). For v3.4 (the latest version), the
highest month saw over 20,000 downloads (up 18% from last year), and the lowest saw 13,500 (up 23% from
last year). Since inception in 2002, Cytoscape has been downloaded over 870,000 times (not shown).
Note that the download statistics we present leave out hourly or daily downloads by individual clients. We
assume such clients are bots (mainly in China) and do not reflect actual Cytoscape use. However, it’s also
possible that these bots store Cytoscape on local and departmental servers that in turn disseminate Cytoscape
to far more users.
Note that the increasing visitorship to cytoscape.org is mirrored in our realtime measurements showing
Cytoscape being started approximately 4,000 times during throughout the world weekdays (up 14% from last
year), and over 1,000 times during the weekends and holidays. As shown below.
Note that the abnormal spike around November 23, 2016 is actual usage resulting from Cytoscape’s new
role as a service callable by external workflows (e.g., Jupyter) via the cyREST interface. An external workflow
0	
  
5000	
  
10000	
  
15000	
  
20000	
  
25000	
  
Feb-­‐14	
  
Apr-­‐14	
  
Jun-­‐14	
  
Aug-­‐14	
  
Oct-­‐14	
  
Dec-­‐14	
  
Feb-­‐15	
  
Apr-­‐15	
  
Jun-­‐15	
  
Aug-­‐15	
  
Oct-­‐15	
  
Dec-­‐15	
  
Feb-­‐16	
  
Apr-­‐16	
  
Jun-­‐16	
  
Aug-­‐16	
  
Oct-­‐16	
  
Dec-­‐16	
  
Download	
  Count	
  
Month	
  
Downloads	
  by	
  Version	
  over	
  Time	
  (524,006)	
  	
  
Feb	
  2014	
  through	
  Jan	
  2017	
  
Cyto-­‐2_8_2	
  
Cyto-­‐2_8_3	
  
cytoscape-­‐3.0.2	
  
cytoscape-­‐3.1.0	
  
cytoscape-­‐3.1.1	
  
cytoscape-­‐3.2.0	
  
cytoscape-­‐3.2.1	
  
cytoscape-­‐3.3.0	
  
cytoscape-­‐3.4.0	
  
Total	
  
0	
  
2000	
  
4000	
  
6000	
  
8000	
  
10000	
  
2/9/2014	
  
3/9/2014	
  
4/9/2014	
  
5/9/2014	
  
6/9/2014	
  
7/9/2014	
  
8/9/2014	
  
9/9/2014	
  
10/9/2014	
  
11/9/2014	
  
12/9/2014	
  
1/9/2015	
  
2/9/2015	
  
3/9/2015	
  
4/9/2015	
  
5/9/2015	
  
6/9/2015	
  
7/9/2015	
  
8/9/2015	
  
9/9/2015	
  
10/9/2015	
  
11/9/2015	
  
12/9/2015	
  
1/9/2016	
  
2/9/2016	
  
3/9/2016	
  
4/9/2016	
  
5/9/2016	
  
6/9/2016	
  
7/9/2016	
  
8/9/2016	
  
9/9/2016	
  
10/9/2016	
  
11/9/2016	
  
12/9/2016	
  
1/9/2017	
  
Daily	
  Executions	
  
Date	
  
Daily	
  Cytoscape	
  Executions	
  (2,389,824	
  )	
  
(2/9/2014	
  through	
  2/7/2017)	
  
apparently started a fresh copy of Cytoscape for each network, and processed approximately 8,800 networks.
We expect similar surges as Cytoscape becomes a common biological network server.
Note that in the future, we expect a divergence between actual Cytoscape usage and Cytoscape
downloads, as we have changed Cytoscape to download only portions of itself without requiring a full
download. We will attempt to account for these partial downloads in the future, though actual executions
should continue to represent true Cytoscape usage.
Breaking down the Cytoscape versions by operating system reveals more detail (not shown). About 70% of
Cytoscape downloads are for Windows, with 32 bit Windows fading in late 2014 (coinciding with Windows 8
shipments). Downloads for Macs rate a distant second (20%), with downloads for Linux being rare (10%).
As measured by Google Analytics, visits to the cytoscape.org web site are not accelerating as quickly year
over year, as shown below. This matches our expectations for fewer full Cytoscape downloads in favor of
automatic partial downloads. Note that troughs in the 2014 record at weeks 8 and 23 are most likely attributed
to data loss by Google Analytics.
As shown below, visits to cytoscape.org now number almost 1.9M (up 26% from last year) since the site
was created in 2012. While most visits to cytoscape.org are from the United States, these visits aren’t in the
majority. In fact, the second greatest source of visitors is “all the rest”, indicating that Cytoscape is popular
worldwide.
0	
  
2,000	
  
4,000	
  
6,000	
  
8,000	
  
10,000	
  
12,000	
  
0	
   4	
   8	
   12	
   16	
   20	
   24	
   28	
   32	
   36	
   40	
   44	
   48	
  
Count	
  
Week	
  
Cytoscape.org	
  visits	
  	
  	
  
(2012	
  -­‐	
  Feb	
  2017)	
  
2012	
  Visits	
  
2013	
  Visits	
  
2014	
  Visits	
  
2015	
  Visits	
  
2016	
  Visits	
  
2017	
  Visits	
  
While most users browse to cytoscape.org via a Google search (see below), a large number find it through
some other means (e.g., via class web sites) or by entering the URL directly.
Finally, the frequency of which Cytoscape is cited in papers indexed in Google Scholar is accelerating year
over year, as shown below. The citation rate increase between 2015 and 2016 is 24%, an increase from 14%
the prior year.
United	
  States	
  
27%	
  
China	
  
8%	
  
India	
  
8%	
  
United	
  Kingdom	
  
5%	
  
Germany	
  
5%	
  
Japan	
  
4%	
  
France	
  
4%	
  
Canada	
  
3%	
  
Italy	
  
3%	
  
Spain	
  
2%	
  
Other	
  
31%	
  
Cytoscape.org	
  Sessions	
  (1,918,073)	
  
(1/1/2012	
  through	
  2/7/2017)	
  
google	
  /	
  organic	
  
41%	
  
(direct)	
  /	
  (none)	
  
17%	
  
google	
  /	
  cpc	
  
6%	
  
google.com	
  
5%	
  
cytoscape.org	
  
4%	
  
baidu	
  
2%	
  
apps.cytoscape.org	
  
1%	
  
google.co.in	
  
1%	
  
bing	
  
1%	
  
google.co.uk	
  
1%	
  
other	
  
21%	
  
Cytoscape.org	
  Referral	
  Sources	
  (1,918,073)	
  
(1/1/2012	
  through	
  2/7/2017)	
  
And the distribution of funding agencies associated with these publications represents the breadth of
network biology approaches being applied through Cytoscape usage across major diseases and health
initiatives.
NIGMS	
  
26%	
  
NCI	
  
19%	
  
NIAID	
  
9%	
  
NHLBI	
  
8%	
  
NIDDK	
  
7%	
  
NCRR	
  
6%	
  
NHGRI	
  
5%	
  
NIEHS	
  
4%	
  
NIMH	
  
4%	
  
NLM	
  
3%	
  
NINDS	
  
3%	
  
NIA	
  
3%	
  
Wellcome	
  Trust	
  
3%	
  
Top	
  13	
  Funding	
  Agencies	
  for	
  Cytoscape	
  
Citations	
  
(8,389	
  events	
  since	
  2004)	
  
Cytoscape	
  App	
  Store	
  
Since our last report (March 2016), we have not changed the Cytoscape App Store. We continue to promote
the App Developer Ladder (http://wiki.cytoscape.org/Cytoscape_3/AppDeveloper/Cytoscape_App_Ladder) as
a step by step guide through the app development and submission process. The App Store hosts over 307
apps developed by 674 different developers around the world. Cytoscape users download an average of 850
apps per day over the past 12 months. That has accumulated to just over 760,000 total app downloads since
the launch of the App Store. The top 3 downloaded apps, ClueGO, BiNGO and GeneMANIA, have
accumulated over 136,000 downloads combined. During the month of January 2017, the site received over
38,000 page views. As shown below, Cytoscape 3 app submissions continue to climb, on track to surpass ~7
years of 2.x plugin submissions in under 5 years. Separate inspection indicates that both new and experienced
app developers are submitting apps. The average submission rate remains between 2 and 3 new apps per
month.
	
  
Similarly, Cytoscape users continue to visit the App Store aggressively. As shown below, there have been
433,950 visits since the store was staged in mid 2012 – an average of 11 visitors per hour, with year-over-year
visits continuing to increase.
0	
  
2	
  
4	
  
6	
  
8	
  
10	
  
12	
  
0	
  
20	
  
40	
  
60	
  
80	
  
100	
  
120	
  
140	
  
160	
  
180	
  
7/1/2007	
  
2/1/2008	
  
9/1/2008	
  
4/1/2009	
  
11/1/2009	
  
6/1/2010	
  
1/1/2011	
  
8/1/2011	
  
3/1/2012	
  
10/1/2012	
  
5/1/2013	
  
12/1/2013	
  
7/1/2014	
  
2/1/2015	
  
9/1/2015	
  
4/1/2016	
  
11/1/2016	
  
Monthly	
  Count	
  
Cumulative	
  Count	
  
Month	
  
Apps	
  Checked	
  into	
  App	
  Store	
  
v2	
  Total	
  
v3	
  Total	
  
v2	
  Count	
  
v3	
  Count	
  
As shown below, the proportion of visitors mirrors the visitorship of the cytoscape.org web site itself, with
the second largest group being “all the rest”, thereby demonstrating that Cytoscape is popular worldwide.
0	
  
500	
  
1000	
  
1500	
  
2000	
  
2500	
  
3000	
  
3500	
  
0	
   4	
   8	
   12	
   16	
   20	
   24	
   28	
   32	
   36	
   40	
   44	
   48	
  
Count	
  
Week	
  
App	
  Store	
  Visits	
  per	
  Week	
  (433,950)	
  
(6/1/2012	
  through	
  2/7/2017)	
  
2012	
  
2013	
  
2014	
  
2015	
  
2016	
  
2017	
  
The source of visitors also mirrors cytoscape.org, with most visitors arriving via Google search, and
numerous visitors arriving from unknown (possibly class) links.
United	
  States	
  
25%	
  
China	
  
9%	
  
India	
  
6%	
  
France	
  
6%	
  United	
  Kingdom	
  
6%	
  
Germany	
  
6%	
  
Japan	
  
3%	
  
Italy	
  
3%	
  
Canada	
  
3%	
  
South	
  Korea	
  
3%	
  
Other	
  
30%	
  
App	
  Store	
  Visits	
  (476,043)	
  
(6/1/2012	
  through	
  2/7/2017)	
  
Tumblr	
  
We use our Cytoscape Publications Tumblr site, http://cytoscape-publications.tumblr.com/, to capture
published figures using Cytoscape. An average of 12 publications are featured each month on the front page of
cytoscape.org directly from this Tumblr feed. Publications highlighted on Tumblr include Cytoscape App
development, cytoscape.js development and application of these two tools in research. Posts are tagged with
categories and the name of any apps used to facilitate search and filtering. Links to the relevant App page at
the Cytoscape App Store (described in Websites) increases traffic to and usage of the App Store. By
specifically highlighting publications that cite Cytoscape, The Tumblr site is thus actively promotes the use and
citation of Cytoscape and Cytoscape Apps. In the last year, we have added an “open access publication” tag to
relevant posts, highlighting free and open publications that use Cytoscape. A nice overview of the wide range
of figures produced by Cytoscape and Cytoscape Apps is available via the archive feature at http://cytoscape-
publications.tumblr.com/archive.
To highlight a wider range of topics in network biology, our Network Biology Tumblr,
http://netbiopub.tumblr.com/, is used to highlight relevant publications. This Tumblr features a variety of
publication types, including reviews, new network biology algorithms and methods, tools and application of
network biology techniques. The posting frequency has increased in the past year to an average of 9
publications per month, with synchronized posts on the LinkedIn Group for Network Biology (described below)
as well.
F1000Research:	
  Cytoscape	
  App	
  Channel	
  
The F1000Research Cytoscape App Channel now has a total of 37 peer-reviewed articles, with 12 new
articles. The 12 new articles are:
• The PathLinker app: Connect the dots in protein interaction networks. Daniel P. Gil, Jeffrey N. Law, T.
M. Murali
google	
  
31%	
  
cytoscape.org	
  
21%	
  (direct)	
  
8%	
  
pathwaycommons.o
rg	
  
1%	
  
baidu	
  
1%	
  
genemania.org	
  
1%	
  
geneontology.org	
  
1%	
  
bing	
  
1%	
  
wiki.cytoscape.org	
  
0%	
  
opentutorials.cgl.uc
sf.edu	
  
0%	
  
other	
  
35%	
  
App	
  Store	
  Referral	
  Sources	
  (476,043)	
  	
  
(6/1/2012	
  through	
  2/7/2017)	
  
• dot-app: a Graphviz-Cytoscape conversion plug-in. Braxton Fitts, Ziran Zhang, Massoud Maher, Barry
Demchak
• Creating, generating and comparing random network models with Network Randomizer. Gabriele
Tosadori, Ivan Bestvina, Fausto Spoto, Carlo Laudanna, Giovanni Scardoni
• CoNet app: inference of biological association networks using Cytoscape. Karoline Faust, Jeroen Raes
• Contextual Hub Analysis Tool (CHAT): A Cytoscape app for identifying contextually relevant hubs in
biological networks. Tanja Muetze, Ivan H. Goenawan, Heather L. Wiencko, Manuel Bernal-Llinares,
Kenneth Bryan, David J. Lynn
• cy3sabiork: A Cytoscape app for visualizing kinetic data from SABIO-RK. Matthias König
• AutoAnnotate: A Cytoscape app for summarizing networks with semantic annotations. Mike Kucera,
Ruth Isserlin, Arkady Arkhangorodsky, Gary D. Bader
• webANIMO: Improving the accessibility of ANIMO. Willem Siers, Michiel Bakker, Bob Rubbens, Ruben
Haasjes, Jacco Brandt, Stefano Schivo
• SCODE: A Cytoscape app for supervised complex detection in protein-protein interaction graphs.
Sarah Mohamed, Nick Janus, Yanjun Qi
• Robust de novo pathway enrichment with KeyPathwayMiner 5. Nicolas Alcaraz, Markus List, Martin
Dissing-Hansen, Marc Rehmsmeier, Qihua Tan, Jan Mollenhauer, Henrik J. Ditzel, Jan Baumbach
• CyLineUp: A Cytoscape app for visualizing data in network small multiples. Maria Cecília D. Costa,
Thijs Slijkhuis, Wilco Ligterink, Henk W.M. Hilhorst, Dick de Ridder, Harm Nijveen
• Finding the shortest path with PesCa: a tool for network reconstruction. Giovanni Scardoni, Gabriele
Tosadori, Sakshi Pratap, Fausto Spoto, Carlo Laudanna
B.5	
  How	
  have	
  results	
  been	
  disseminated	
  to	
  communities	
  of	
  interest?	
  (8000	
  character,	
  no	
  pix)	
  
Section B.2 described the accelerating visitorship and download trends for the main customer-facing portals:
nrnb.org, cytoscape.org and App Store web sites, in addition to the long list of secondary community sites.
B.6	
  What	
  do	
  you	
  plan	
  to	
  do	
  for	
  the	
  next	
  reporting	
  period	
  to	
  accomplish	
  the	
  goals?	
  	
  
Cytoscape	
  App	
  Store	
  
We plan to create a version of the App Store that supports the Cytoscape Cyberinfrastructure (CI) described in
the Infrastructure section. Through the CI Store, application programmers will be able to discover the
existence, purpose, documentation, and API interface for services available for either immediate use or
installation on private servers.
Cytoscape	
  CI	
  
We will also develop a Docker-based system to assist CI service developers in packaging and disseminating
their services through the CI Store. The system will include documentation standards, testing standards,
packaging of service installation files, and submission procedures
C.5.b	
  Resource	
  Sharing	
  
All cytoscape.org and Cytoscape App Store code and artifacts are open source and free to the public, with title
vested in The Cytoscape Consortium 501(c)3 non-profit corporation. As open source projects, we welcome
audit or participation by all qualified or interested parties, subject to the terms of our published licenses. We
specify the LGPL2.1 license as modified on the Cytoscape web site (http://cytoscape.org/download.php). All
Cytoscape related project code is freely available at GitHub: https://github.com/cytoscape/.
All NRNB project code is also available at GitHub. We created and maintain almost 100 open source
repositories as the NRNB organization (https://github.com/nrnb/).
C.3	
  Technologies	
  or	
  Techniques	
  (2000	
  characters)	
  
The Cytoscape Cyberinfrastructure is based on a microservice (aka service) variant of Service Oriented
Architectures. As described in section B.2, major technological innovations include the CX network interchange
format, the cyWidget architecture, and the Kubernetes cluster architecture. Cytoscape itself will call services in
the Kubernetes cluster, will exchange networks encoded in CX, and will incorporate user interface elements
written as cyWidgets.
The	
  CX	
  Network	
  Interchange	
  Format	
  
CX is an aspect-oriented network interchange format that enables networks to be transmitted between diverse
services. It is designed for flexibility, modularity, and extensibility, and as a JSON-based message payload in
common REST protocols. It enables applications to standardize on core aspects of networks, coordinate on
more specific standards within CX, and to ignore or omit irrelevant aspects. It is not intended as an optimized
format for storage or for specific functionality in applications.
CX is distinct from other network formats in that it seeks to avoid the gridlock of a standard requiring
constant centralized coordination. Aspect-orientation means that different types of information about network
elements is separated into independent, composable modules that follow simple guidelines for dependency.
This structure makes it easy for a given application or service to make use of relevant aspects (e.g., nodes and
edges) while ignoring others (e.g., styling and layout). CX provides straightforward strategies for lossless
encoding of semantically complex formats such as OWL, BioPAX, OpenBEL, SGML, or SBGN, while at the
same time enabling the expression of simple networks without undue overhead.
We have created CX encoding and transmission libraries in Java, Python, and GoLang. The encoding time
for a 100K node network was clocked at about 50ms on a modern Mac Pro, thereby reducing concerns that CX
may be expensive to encode and decode. Cytoscape Desktop currently exchanges CX-encoded networks with
NDEx, and we expect it to call other services written to transact CX.
The	
  cyWidget	
  System	
  
A cyWidget is a software component that implements some user interface function in a browser-based web
application, is easily reusable across new and existing web applications, and may communicate with services
to perform backend or expensive computation. A cyWidget is built on the Facebook’s React framework, which
is a simple event-driven, message oriented, model-view-controller implemention in Javascript. By construction,
a cyWidget exposes a function-specific API connected to code that implements the cyWidget. Several
cyWidgets can coexist in the same web application, and are intended to. A cyWidget can keep its own state
information or update state common to all cyWidgets, which is then broadcast to all cyWidgets.
A simple example of a cyWidgets is a network displayer that calls NDEx to fetch a network and then uses
the cytoscape.js drawing library to render the network in a browser frame. The NDEx Valet cyWidget will
orchestrate multiple embedded cyWidgets to fetch a network list from NDEx, display network metadata, and
allow the end user to choose a network. A more complex cyWidget ecosystem contains a network as its basic
state, and then allows the app programmer to add in network, table, style, and other cyWidgets to create a
complete web app.
A web app programmer incorporating a cyWidget need enter only a few lines of header directives at the
beginning of Javascript-enabled web page code. As cyWidgets are self-contained, no particular web app
framework is required – apps written using raw Javascript, Angular, and other frameworks can use cyWidgets.
A cyWidget author must become familiar with React in order to create a new cyWidget. Given that React is
a simple framework with simple constructs, we believe it to be plausible that numerous cyWidget authors may
arise besides our own.
The qualities of encapsulation, simple widget use, simple widget creation, and non-proprietary ownership
drove our decision to use React. We evaluated Angular, Angular 2, Web Components, React, and Aurelia.
In an upcoming release, Cytoscape Desktop will include NDEx Valet as its primary user interface to NDEx,
and will benefit directly as NDEx Valet is improved to assist users more intelligently.
The	
  Kubernetes	
  Cluster	
  
Kubernetes (http://kubernetes.io) is an open-source framework for deploying scalable service-oriented
infrastructures. As such, it is a middleware layer that intervenes between a REST client (e.g., a Cytoscape or
web app) and a service. Given a service (likely written by a computational biologist to expose a valuable
calculation), Kubernetes enables multiple clients to call it simultaneously by creating multiple service instances
on one or more physical servers, and then matching actual calls to available instances. The multiple instance
strategy ensures that
To participate in the NRNB Kubernetes cluster, the calculation author must execute a two step process. In
step one, the author creates a service by pairing the calculation function with the CI service wrapper, which 1)
listens for an HTTP connection, 2) processes/unbundles the HTTP stream, 3) calls the author’s function, and
then 4) returns an HTTP stream containing the result. The author can deploy the service on a private
workstation or server as-is for debugging.
In step two, the author packages the service as a Docker container, registers it with the Kubernetes
framework and defines the number of servers on which it should be deployed. Kubernetes deploys the multiple
service instances and monitors them so that should a service instance fail, it is replaced with a new one.
While Kubernetes addresses instance deployment and load balancing, it does not provide the logging
necessary for service debugging and management. We provide the Elastic Stack (formerly ELK,
http://www.elastic.co/) to provide log management and visualization.
C.5.b	
  Resource	
  Sharing	
  (2000	
  characters)	
  
All Cytoscape Desktop and Cytoscape Cyberinfrastructure (CI) code and artifacts are open source and free
to the public, with title vested in The Cytoscape Consortium 501(c)3 non-profit corporation. As open source
projects, we welcome audit or participation by all qualified or interested parties, subject to the terms of our
published licenses. For Cytoscape Desktop, we specify the LGPL2.1 license as modified on the Cytoscape
web site (http://cytoscape.org/download.php). The license terms for the Cytoscape CI infrastructure, we
haven’t designated yet, but will likely be LGPL2.1 or MIT.
All such code is available in GitHub repositories (i.e., http://github.com/cytoscape,
http://github.com/cytoscape-ci, http://github.com/cycomponent and http://github.com/idekerlab. Artifacts such
as user documents, tutorials, issue databases, developer guides, and so on are available for reading on
Cytoscape wikis (e.g., http://wiki.cytoscape.org) or web sites (e.g., http://cytoscape.org).
Networks derived or produced by NRNB activities will be placed in the NDEx network database and will be
accessible via the CX, cyWidget, and Kubernetes technologies embedded in Cytoscape Desktop and
elsewhere, as described in sections B.2 and C.3.
The NRNB Cluster is available to the general research community, and in 2016 the Nick Schork Lab has
used approximately 10,000 hours of compute time from the approximately 250,000 hours available in the
cluster. The Hannah Carter Lab has used approximately 100,000 hours.
Training	
  
B.2	
  What	
  was	
  accomplished	
  under	
  these	
  goals?	
  
Workshops	
  and	
  lectures	
  and	
  courses
In addition to the global training support provided by our Training Coordinator, Dr. Morris, we also leverage the
fact that we are a multi-site resource and are thus able to host local training events on multiple campuses. We
also provide materials, training and advertising for events presented by non-NRNB staff (not listed). The table
below lists the events since our last annual report. Additional one-on-one training requests are tracked as
services in our CSP report.
Event Title NRNB Staff/Site City Year Event Type
Cytoscape Workshop 2017 Barry Demchak San Diego, CA 2017 Workshop
Systems Pharmacology course Scooter Morris San Francisco, CA 2017 Course
RECOMB/ISCB Barry Demchak Phoenix, AZ 2016 Workshop
Cytoscape Clinic Barry Demchak San Diego, CA 2016 Workshop
NetBio SIG Meeting Alex Pico Orlando, FL 2016 Lecture
Medical Biophysics students tech
talk: Network and Pathway
Analysis with Cytoscape
Gary Bader/Veronique
Voisin
Toronto, Canada 2016 Workshop
Introduction to Network Analysis Scooter Morris Lausanne, Switzerland 2016 Workshop
Two-day workshop: Visualizing
Complex Networks Using
Cytoscape (NCI BTEP training
series)
Scooter Morris Bethesda, Maryland 2016 Workshop
Two day workshop on protein-
protein interactions
Scooter Morris Boulder, Colorado 2016 Workshop
Two half-day Cytoscape
workshops at the eScience
Symposium: Big Data in
Precision Medicine
Scooter Morris Osense, Denmark 2016 Workshop
Two-day graduate course at
Eötvös Loránd University on
protein-protein interactions
Scooter Morris Budapest, Hungary 2016 Course
EMBO Practical Course on
Computational Analysis of
protein-protein interactions
Scooter Morris Budapest, Hungary 2016 Course
GLBio conference satellite
workshop: Network Visualization
and Analysis with Cytoscape.
Gary Bader/Veronique
Voisin
Toronto, Canada 2016 Workshop
CBW Pathway and Network
Analysis of -omics Data
Gary Bader/Veronique
Voisin
Toronto, Canada 2016 Workshop
Google	
  Summer	
  of	
  Code	
  
After taking a year off from Google Summer of Code (GSoC) in 2015, and instead running our own summer
training program (NRNB Academy Summer Session), we gathered over 50 project ideas and close to 40
mentors for GSoC 2016. We were accepted as a mentoring and had one of our most successful years yet, with
all 15 enrolled students completing their projects. New for this year was also the development of a Mentor
Resource Packet, a collection of resources designed to help mentors with recruiting students. The packet
includes tips on how and where to recruit, as well as ready-to-use slides, flyers and other materials. In addition
to the technical accomplishments and productivity of our students, we are also proud of the many important
aspects of diversity our students represent in the GSoC program, including geographical, gender and
academic. A few statistics of our diversity is listed in the below table, with overall GSoC numbers in
parenthesis:
• 9 different countries represented, including 1 (of 2) from Croatia, 1 (of 3) from Armenia and 2 (of 12) from
Turkey
• 20% female (compared to 12% overall)
• Only 67% Computer Science (compared to 78% overall), including PhD students in Biological
Oceanography and Medical Biochemistry & Biotechnology, an MS student in Bioinformatics, and a pre-
med undergraduate.
Our complete 2016 end-of-year report can be found here: http://nrnb.org/gsoc-reports.html. We have
received and abundance of testimonials from students and mentors, a subset of which are available on our
website: http://nrnb.org/testimonials.html#collab-tab.
NRNB	
  Academy	
  
Our year-round NRNB Academy program continues to attract interested students and mentors and in 2016 we
had 3 students enrolled, with 2 of the projects completed (one still active).
We have received and abundance of testimonials from students and mentors, a subset of which are
available on our website: http://nrnb.org/testimonials.html#collab-tab.
Cytoscape	
  manual	
   	
  
The Cytoscape User Manual is available on the ReadTheDocs.org platform,
http://manual.cytoscape.org/en/stable/, and represents a comprehensive source of instructions for users for
every aspect and feature of Cytoscape, including more technical aspects such as the API. The manual
includes several tutorials and many hands-on examples of use. The manual is updated for every major release
of Cytoscape. The most recent updates to the manual were in December of 2016. In an effort to streamline the
process of maintaining the manual as well as improving the usability of the manual, we migrated our existing
wiki manual to the ReadTheDocs.org platform in April 2016. This system is integrated with GitHub, supports
markdown and can be integrated with Google Analytics.
OpenTutorials	
  
Open Tutorials (http://opentutorials.cgl.ucsf.edu/index.php/Main_Page) is the main source for tutorial materials
for Cytoscape and other NRNB tools, and is being used both internally by presenters, and by researchers and
developers. The site now hosts 6 detailed user tutorials and 3 developer tutorials. Traffic to Open Tutorials is
consistent, with over 66,000 unique sessions in the last year. Visits are split roughly 60-40 between new and
returning visitors, with Cytoscape 3 user tutorials being the most popular pages.
Open Tutorials has allowed NRNB to reach our goal of providing tutorial support to a broad and diverse
community. In the upcoming year, we plan to move all Cytoscape tutorials to the ReadTheDocs.org platform,
where the Cytoscape manual is hosted. This move will facilitate updating and maintenance of tutorial content,
as well as making the content more accessible and searchable for users. It also facilitates creation of handout
materials for presenters.
http://tutorials.cytoscape.org/
B.4	
  What	
  training	
  opportunities	
  
All of the activities reported in this component are providing training opportunities. These are opportunities that
in most cases would not exist without NRNB staff and support. Each year we provide 100’s of researchers an
introduction to network biology concepts and Cytoscape usage. We also train dozens of programmers how to
write apps for Cytoscape to provide domain-specific functionality to the platform. These programs have been
very successful so far. This is evident from the testimonials we collect via survey following each event:
http://nrnb.org/testimonials.html#collab-tab. Here are snippets from this year’s students and mentors in our
Google Summer of Code and NRNB Academy programs:
“The NRNB program is a fantastic opportunity to gain skills and work experience in network biology
and app development, at any stage in your academic career. I came in as a graduate student with
only a few months of coding experience and now I've released my first application. Exhilarating!”
“It has helped improve the software developed by my group. It has also given me experience in
mentoring someone long distance.”
“Working in an NRNB training program helped to strengthen my resume and introduced me to the
idea of combining a career in medicine with computer-based research.”
“Great opportunity for developing mentoring and supervising skills as well as get my software tools
developed.”
“This was my first ever contribution to an open source project and NRNB also. This milestone will
shine on my CV forever.”
“Great experience interacting with the community and my mentor. I was excited to receive help and
encouragement for my project.”
“Learned how to work in a collaboration, formulate better questions. Gained especially invaluable
knowledge and experience. Improved coding skills. Learned new programs and libraries.”
“It broadened my mind to issues still unsolved in the network biology community, and I gained
resources and colleagues in the community that I otherwise wouldn't have.”
“Personally, I see great value in interacting with smart, young people from all around the world. I am
optimistic that participating in NRNB training programs will benefit my own research group by giving
it wider exposure and by building a community around the software.”
“I am continuing to work for Cytoscape.js and am happy to being staying involved.”
“The program has been great experience for my students. They not only learned about open source
community driven projects, but the work they did has contributed to their future research.”
“The program gave me a chance to work with students in projects of mutual interest and to develop
my tools faster and more efficient.”
B.6	
  What	
  do	
  you	
  plan	
  to	
  do	
  for	
  the	
  next	
  reporting	
  period	
  to	
  accomplish	
  the	
  goals?	
  
We recently submitted our application for GSoC 2017. If accepted, this should be one of our largest years yet.
We have more mentors and more project ideas than prior years and are continuing a more coordinated
outreach effort with a Mentor Resource Packet that we will distribute to all NRNB mentors. This resource was
developed in 2016, and is meant to help mentors contact and communicate with various student bodies that
are likely to have the skill and interest to participate in GSoC 2017.
Admin	
  
B.2:	
  What	
  was	
  accomplished	
  
Measuring	
  success:	
  
• 118 publications citing NRNB grant
• Over 8000 visits per week to Cytoscape.org
• 17,000 downloads per month for Cytoscape
• 3700 Cytoscape application launches per day
• 38,261 page views in January 2017 for the Cytoscape App Store, and an average of 875 downloads per
day among 307 apps.
• A total of 18 tools supported by NRNB
• 93 new and ongoing collaborations with external investigators on diverse topics
• 3 students trained at NRNB Academy last year, 2 completed projects
• 15 students trained through Google Summer of Code
• 16 NRNB coordinated training events in 10 locations in 6 countries
• Over 100 users and dozens of developers trained on Cytoscape by NRNB staff
• 66,000 unique sessions at Open Tutorials in the past year, 65% from new visitors
• 12 open access Cytoscape app articles edited for F1000Research channel
• 500 members in our Network Biology LinkedIn group
• 2800 members and over 7000 messages on our Google groups for Cytoscape
LinkedIn	
  
We manage a LinkedIn Group for Network Biology to organize events, publications and discussions in the
broader scientific community. Nucleated with attendees of the annual NetBio community meetings
(ISCB/ECCB), the group now has 500 members. Posts from our Network Biology Tumblr are also promoted
here, as are updates on Cytoscape and NetBio SIG meeting news and presentations. In the past year, we
have had an average of 3-4 posts per month.
It is worth noting that this community interface is independent of Cytoscape context and thus represents an
opportunity to engage a more diverse set of researchers: http://www.linkedin.com/groups/Network-Biology-
Group-5123610.
OpenTutorials	
  
Our tutorial management system, Open Tutorials, was developed in the first year of funding. It is the main
source for tutorial materials for NRNB tools, including Cytoscape. Tutorials are actively updated and new
content added with each major release of Cytoscape. Open Tutorials has allowed NRNB to reach our goal of
providing tutorial support to a broad and diverse community. Currently, the site includes tutorials for users as
well as for developers. http://tutorials.cytoscape.org/
Google	
  Summer	
  of	
  Code	
  
One of our most successful training initiatives has been participating in Google’s Summer of Code (GSoC)
program (https://developers.google.com/open-source/soc). Each summer, Google sponsors students to work
at open source organizations to develop code for open source software projects. The NRNB executive director,
Dr. Pico, administers the NRNB effort in this program and has experience as a GSoC org admin going back 4
years prior to NRNB, focusing mainly on Cytoscape- and WikiPathways-related projects.
During summer of 2016, the NRNB trained 15 students as part of GSoC. It was one of our most successful
years, with all enrolled students completing their projects. After taking a year off from GSoC in 2015, we pulled
together over 50 project ideas and dozens of mentors. The projects covered a wide range of topics, including
algorithm, UI, importer and converter development for both web and desktop
for Cytoscape, cytoscape.js, SBML, SBGN, cBioPortal, Cell Designer, GraphSpace and more. New for this
year was also the development of a Mentor Resource Packet, a collection of resources designed to help
mentors with recruiting students. The packet includes tips on how and where to recruit, as well as ready-to-use
slides, flyers and other materials.
Our complete 2016 end-of-year report can be found here: http://nrnb.org/gsoc-reports.html. We have received
and abundance of testimonials from students and mentors, a subset of which are available on our website:
http://nrnb.org/testimonials.html#collab-tab.
NRNB	
  Academy	
  
Our year-round NRNB Academy program was started to offer students training opportunities year-round and to
build on the momentum from our Google Summer of Code participation. The program continues to attract
interested students and mentors with an application process that utilizes the same web form and tracking
infrastructure we built for NRNB Collaborations. The Outreach Coordinator, Ms. Hanspers, acts as the dean of
NRNB Academy and reviews each applicant. She identifies the appropriate NRNB staff to serve as a possible
mentor and manages the initialization of their mentored project, borrowing from her years of experience as co-
admin and co-mentor with GSoC. The NRNB Academy projects include milestones and deadlines, and are
expected to wrap up with a completed feature or application. In addition to advancing network biology tools,
this program serves as an important training opportunity and drives interest for our GSoC effort and NRNB tool
development in general.
We have received and abundance of testimonials from students and mentors, a subset of which are available
on our website: http://nrnb.org/testimonials.html#collab-tab
In the past year, we have mentored 3 students through this program, two of which completed projects in 2016
(one ongoing). One of the completed projects was published on the F1000Research Cytoscape App Channel:
https://f1000research.com/articles/5-2524/v1
F1000Research:	
  Cytoscape	
  App	
  Channel	
  
The F1000Research Cytoscape App Channel was started in 2014 with the purpose of highlighting apps with
clear use cases for network researchers as well as relevant implementation tips for other app developers.
NRNB staff act as guest editors periodically to help shepherd and improve app article submissions from the
community. In 2016, 12 Cytoscape app articles were published through this channel, on topics ranging from
random network generation, kinetic data visualization, semantic network annotation and network inference.
Cytoscape	
  manual	
   	
  
The Cytoscape User Manual was migrated to the ReadtheDocs platform in March 2016, and is now available
at http://manual.cytoscape.org/en/stable/. This documentation system is integrated with GitHub, greatly
facilitating updates. It also supports markdown and can be integrated with Google Analytics. The interface
automatically generates a PDF upon every update, and also has several benefits in terms of usability. A table
of contents and a search feature are included automatically as a side panel, making navigation of the materials
easier.
The Cytoscape manual represents a comprehensive source of instructions for users for every aspect and
feature of Cytoscape, including more technical aspects such as the API. The manual includes several tutorials
and many hands-on examples of use. The manual is updated continually and for every major release of
Cytoscape, most recently in December 2016, with the next one pending March 2017.
Cytoscape	
  mailing	
  lists	
  
Members of NRNB staff help monitor and answer all associated Google groups on a weekly basis. This is a
major source of interaction with the network biology community and supports a broad range of research and
development across many labs. During this reporting period we launched a new group dedicated to app
developers. This has become a critical channel for communication. We now have ~2800 members for our user
and app developer mailing lists.
B.5:	
  How	
  have	
  results	
  been	
  disseminated	
  	
  
Dissemination of NRNB tools and resources to the biomedical research community happens through a variety
of resources:
• NRNB resources and information are disseminated through NRNB.org.
• The Cytoscape website, http://www.cytoscape.org, is the main source for information on the Cytoscape
project and for downloading the tool.
• Cytoscape Apps are highlighted, organized and disseminated via the Cytoscape App Store:
http://apps.cytoscape.org/
• Two Cytoscape mailing lists, https://groups.google.com/forum/#!forum/cytoscape-app-dev and
https://groups.google.com/forum/#!forum/cytoscape-helpdesk, are the main point of contact with users and
the app developer community.
• Cytoscape user and developer tutorials are continually updated and expanded at Open Tutorials:
http://opentutorials.cgl.ucsf.edu/index.php/Main_Page
• The Cytoscape user manual is available via ReadTheDocs: http://manual.cytoscape.org/en/stable/
• Publications utilizing Cytoscape or describing new Cytoscape Apps are highlighted on our Cytoscape
Publications Tumblr: http://cytoscape-publications.tumblr.com/
• Publications describing methods, resources, tools and research related to network biology are posted on
our Network Biology Publications Tumblr: http://netbiopub.tumblr.com/
• Relevant news, articles and events are posted on our LinkedIn Network Biology Group:
https://www.linkedin.com/groups/5123610
• Facilitating publication of articles describing network biology related Cytoscape apps at the F1000Research
Cytoscape App Channel: http://f1000research.com/channels/cytoscapeapps
• Training student programmers in open source development of network biology related tools through GSoC
and the NRNB Academy: http://www.nrnb.org/gsoc.html
• Organizing and tracking community training events, such as workshops, tutorials, seminars and courses.
B.6:	
  What	
  you	
  plan	
  to	
  do	
  next	
  
F1000Research:	
  Cytoscape	
  App	
  Channel	
  
Acting as guest editors, NRNB staff planning to collect a new batch of Cytoscape app articles for our
F1000Research channel to be published in July 2016.
Open	
  Tutorials	
  
With a renewed focus on core protocols among common Cytoscape use cases, we plan to produce more
complex task-based style of tutorials that describe a specific workflow involving interplay among several tools
to answer specific questions in network biology. We plan to migrate our tutorials to a new platform that will
better support usability, discoverability and maintainability. For example, we are testing the platform that hosts
our manual, ReadTheDocs, to see if the same benefits apply to our tutorials.
Cytoscape	
  Manual	
  
We will continue hosting the Cytoscape manual via ReadTheDocs. This system has proven easy to use not
only for users and developers, but also for team members to create and maintain the content.
Tumblr	
  and	
  LinkedIn	
  	
  
We will continue to post publications to our two Tumblr archives. Relevant posts will also be promoted on the
Network Biology LinkedIn group. We will also increase our efforts of reaching out to app developers and
network biology researchers through LinkedIn to highlight their articles as well.
Metrics	
  
We will continue to track publications, collaborations, training events, NRNB tool usage, and community
engagement, in addition to our own progress on NRNB technology research and development aims.

NRNB Annual Report 2017

  • 1.
    Overall     B.2:  What  was  accomplished   Highlights  from  the  past  year  include:     • 118 publications citing NRNB grant • Over 8000 visits per week to Cytoscape.org • 17,000 downloads per month for Cytoscape • 3700 Cytoscape application launches per day • 38,261 page views in January 2017 for the Cytoscape App Store, and an average of 875 downloads per day among 307 apps. • A total of 18 tools supported by NRNB • 93 new and ongoing collaborations with external investigators on diverse topics • 3 students trained at NRNB Academy • 15 students trained through Google Summer of Code • 16 NRNB coordinated training events in 10 locations in 6 countries • Over 100 users and dozens of developers trained on Cytoscape by NRNB staff • 66,000 unique sessions at Open Tutorials in the past year, 65% from new visitors Technology  Research  and  Development   Progress on the first theme of Differential Networks includes work on an improved perturbation biology method applied to the modeling of time resolved drug response measurements in melanoma cells. The temporal response to CDK4 inhibition in liposarcoma is a driving biology project (DBP 7 with Forest White) we are investigating in terms of signaling networks. We also continued to develop protein-protein interaction network alignment algorithms in support of DBP2 with Drs. Marc Vidal and David Hill. And we implemented new tools in Cytoscape for working with mass spec data to facilitate future differential network analysis. This work was shared with our DBP 1, the Krogan lab, from which we continue to collect valuable end-user input to design and prioritize our tool development. The second theme of Descriptive to Predictive Networks saw progress on two specific sub-aims. We developed a supervised patient classification framework, called netDx, that uses patient similarity networks in a generalizable and integrative manner. In support of DBP 5 with Sage Bionetworks, we reanalyzed all available DREAM challenge datasets. In terms of biomarker identification for cancer progression and treatment response, we are developing the platform for data preparation and prototyping regression analyses that will inform future network-constrained regression analysis, such as GELnet. The application of this technology to drug response prediction for the NCI compound library is part of our DBP 8 with Dr. Pommier. Finally, we developed a supervised machine learning model, called DeepCell, that uses the hierarchical structure of the cell and heat diffusion modeled signaling to simulate cell behavior. Using both the Gene Ontology and a data- driven ontology, DeepCell can accurately predict phenotypes including both growth rate and genetic interaction score across a range of genetic interaction scores. Progress on the third theme of Multi-scale Networks includes the development of a general progressive procedure, Active Interaction Mapping, which was used to assemble a comprehensive ontology of functions for autophagy. This work continues to be motivated by the data and prediction challenges in DBP 3 and 4, Mike Cherry (GO) and TCGA projects. We have also begun experimenting with using single cell RNA-seq data to improve the resolution of inferred cell-cell interaction networks. These are being applied to cancer stem cell biology and regenerative medicine. This work is being driven by Dr. Zandstra’s sustained interest in both inter- cellular networks and cell fate regulation, DBP 9.
  • 2.
    NRNB  Workgroups   Inaddition to our TRD projects, we launched two new initiatives to foster greater interaction and collaboration across NRNB sites and to track opportunities in developing research areas relevant to network biology. We are calling these NRNB Workgroups. The first workgroup is focused on Single Cell Genomics and Stem Cell Research. NRNB staff from the Pico, Morris and Bader groups are meeting quarterly to coordinate on tools, datasets and pilot projects in this area. Each group has collaborators in this area that may develop into CSPs or even DBPs for future TRD projects. In addition to collecting resources and comparing notes, we have identified two specific pilot projects to work on as an NRNB team: cell-cell interactions networks and improvements over t-SNE plots for data analysis and visualization. The former will involve curating pathways that are transcriptionally active downstream from receptors involved in cell-cell communication, followed by pathway analysis on single cell datasets. The latter will involve exploring alternative clustering algorithms, such as SIMLR, and performing a comparative assessment using Cytoscape protocols. The second workgroup is focused on Patient Similarity Networks (PSNs). Within the scope of technology development, both NDEx (Ideker) and cBioPortal (Sander) will be leveraged in this project. The Bader group is currently working on depositing PSNs into NDEx with sufficient metadata and filter options. The Pico group is also working on modeling WikiPathways content for deposition into both NDEx and Pathway Commons. These efforts represent improved interoperability among NRNB resources. Data visualization at cBioPortal and import into Cytoscape are also being explored in order to facilitate analysis, for example using Network Based Stratification methods described in prior NRNB reports, and integrated data views. We will continue to lead the coordinated efforts of these workgroups during the next reporting period and spawn new workgroups as new data types and opportunities are presented. Collaboration  and  Service  Projects   NRNB staff have initiated 18 new collaboration and service projects over the reporting period, for a total of 93 collaborations maintained or completed over the last year. A summary table is provided in the CSP component report, along with summaries of major project from each of the four sites led by the co-PIs. In broad strokes, the projects span patient similarity network methods development applied to various cancers, cell-cell interaction network analysis applied to Head and Neck cancer, EnrichmentMap network analysis on single cell RNA-Seq data applied to colorectal cancer, as well as standard network and pathways analysis of stimulated T cells, neuroinflammation, glutathione metabolism, microgravity effects, drug transporters, Multiple Sclerosis and Huntington’s Disease. In addition to service-based collaborations, we managed the code development collaborations between ~45 mentors and ~18 students during this reporting period through the Google Summer of Code and NRNB Academy programs, combined. Infrastructure   In our 2016 report, we described the creation of initial technologies needed to create an ecosystem of biologically valuable Internet-based services that exchange network data in a stable, performant, scalable, reusable, recombinable and reliable manner. In 2016, we created and released Diffusion, which is the first Cytoscape app to demonstrate the integration of basic CI service technologies (e.g., Cytoscap apps, CX, request routing and service deployment). The Diffusion app calls the CI’s new Diffusion service, which uses a heat propagation approach to identify subnetworks worthy of focused study, given a list of nodes in a large network. The Diffusion service demonstrates how the CI can allow typical biological programmers to dramatically increase the audience for their code and gain access to Internet-scale computational resources. The CI framework on which Diffusion is built leverages modern Kubernetes cluster technology (to augment Elsa, see section C.3), common CX-based message formats, server-based interface stubs, call metering and central logging to enable biological programmers to package algorithmic code as a highly scalable, highly available microservice with access to server- and cluster-class computing resources. We also launched cyREST2 as a significant expansion of the highly successful cyREST Cytoscape feature (http://apps.cytoscape.org/apps/cyrest). Critically, cyREST2 will enable access to functionality available through Cytoscape apps, including enrichment, clustering, network acquisition, enhanced graphics and graph analysis. We will work with both the Python/Jupyter and R communities to upgrade their cyREST interface support (e.g., http://bioconductor.org/packages/release/bioc/html/RCy3.html). And we created the deep-cell
  • 3.
    web app andservice in support the Deep Cell phenotype prediction research described in the complete Infrastructure report. Finally, both the Cytoscape App Store and NRNB Compute Cluster continue to thrive and serve as major NRNB infrastructure components. Dissemination   NRNB.org is the main web site for the National Resource for Network Biology and serves as the primary source of disseminating NRNB resources and associated information. It is constantly updated with information for NRNB collaborators and researchers as well as the larger network biology community. The site includes our project description and annual reports, available tools and resources, links to training materials, programs and events, and instruction in how to collaborate. The attentive maintenance and updating of the site helps make the #2 Google search result for "network biology tools", second only to Cytoscape.org, an NRNB supported tool. NRNB.org is the #3 result even when searching for just "network biology". These are global, non- personalized results. Over the past year, traffic to the site averages about 840 visits per month. Since the site went live in late 2010, we have had over 84,000 visits. Since our last report (March 2016), we significantly improved our discourse on Cytoscape history and future directions: http://www.cytoscape.org/roadmap.html. It now more plainly lays out our vision for Themes and Features for future releases, and is laid out to improve communication with users, developers and curious parties. Since January 2016, monthly downloads have grown to an average of 17,000 per month for the year, with Cytoscape v3 accounting for the vast majority of downloads. Extensive graphs and descriptions of Cytoscape usage are provided in the Dissemination report. During this period, the Cytoscape App Store, which was created as an NRNB supplement project, continues to serve as the major source of dissemination for Cytoscape apps and related documentation. The App Store hosts over 307 apps developed by 674 different developers around the world. Cytoscape users download an average of 850 apps per day over the past 12 months. That has accumulated to just over 760,000 total app downloads since the launch of the App Store. The top 3 downloaded apps, ClueGO, BiNGO and GeneMANIA, have accumulated over 136,000 downloads combined. During the month of January 2017, the site received over 38,000 page views. Graphs of app submissions, site visits and referral sources are all provided in the Dissemination report. NRNB staff members are responsible for maintaining these additional sources of dissemination: • Three Cytoscape mailing lists: helpdesk, app-dev and cytostaff • Open Tutorials: http://opentutorials.cgl.ucsf.edu/index.php/Main_Page • Cytoscape Publications Tumblr: http://cytoscape-publications.tumblr.com/ • Network Biology Publications Tumblr: http://netbiopub.tumblr.com/ • LinkedIn Network Biology Group: https://www.linkedin.com/groups/5123610 • F1000Research Cytoscape App Channel: http://f1000research.com/channels/cytoscapeapps • GSoC and NRNB Academy: http://www.nrnb.org/gsoc.html Training   In addition to the global training support provided by our Training Coordinator, Dr. Morris, we also leverage the fact that we are a multi-site resource and are thus able to host local training events on multiple campuses. We also provide materials, training and advertising for events presented by non-NRNB staff. The Training report includes a table of 16 events coordinated by the NRNB, including courses, workshops, clubs and lectures in 10 locations in 6 countries. After taking a year off from Google Summer of Code (GSoC) in 2015, and instead running our own summer training program (NRNB Academy Summer Session), we gathered over 50 project ideas and close to 40 mentors for GSoC 2016. We were accepted as a mentoring and had one of our most successful years yet, with all 15 enrolled students completing their projects. New for this year was also the development of a Mentor Resource Packet, a collection of resources designed to help mentors with recruiting students. In addition to the
  • 4.
    technical accomplishments andproductivity of our students, we are also proud of the many important aspects of diversity our students represent in the GSoC program, including geographical, gender and academic. A few statistics of our diversity is listed in the below table, with overall GSoC numbers in parenthesis: • 9 different countries represented, including 1 (of 2) from Croatia, 1 (of 3) from Armenia and 2 (of 12) from Turkey • 20% female (compared to 12% overall) • Only 67% Computer Science (compared to 78% overall), we included PhD students in Biological Oceanography and Medical Biochemistry & Biotechnology, an MS student in Bioinformatics, and a pre- med undergraduate. Our complete 2016 end-of-year report can be found here: http://nrnb.org/gsoc-reports.html. We have received and abundance of testimonials from students and mentors, a subset of which are available on our website: http://nrnb.org/testimonials.html#collab-tab. B.4:  What  training  opportunities   The collaborations during this period included many requests to prepare a custom training events and one-on- one sessions. During this reporting period, the Pico, Bader and Ideker groups offered support to local researchers via consulting meetings and one-on-one training sessions with the aim for biologists to learn how to use NRNB tools in their research. For example, how to install Cytoscape on personal computers and navigate through a network as well as training on how to go through the Bader lab enrichment analysis standard pipeline which can be summarized under these steps: 1) run GSEA or g:Profiler or similar gene-set enrichment tools 2) create a network of enriched pathways using Cytoscape/EnrichmentMap 3) perform post- analysis using EnrichmentMap features or GeneMANIA. Additionally, 15 of the collaboration projects listed in B.2 served as intensive training opportunities for students accepted into our NRNB Google Summer of Code program. The students learned not only about NRNB tool development, but also about open source software development with a distributed team. Our Training effort leveraged the fact that we are a multi-site resource and are thus able to host local training events on 4 different campuses. We also provided materials, training and advertising for events presented by non-NRNB staff. Each year we provide 100’s of researchers an introduction to network biology concepts and Cytoscape usage. We also train dozens of programmers how to write apps for Cytoscape to provide domain-specific functionality to the platform. These programs have been very successful so far. This is evident from the testimonials we collect via survey following each event: http://nrnb.org/testimonials.html#collab-tab. Here are snippets from this year’s students and mentors in our Google Summer of Code and NRNB Academy programs: “The NRNB program is a fantastic opportunity to gain skills and work experience in network biology and app development, at any stage in your academic career. I came in as a graduate student with only a few months of coding experience and now I've released my first application. Exhilarating!” “It has helped improve the software developed by my group. It has also given me experience in mentoring someone long distance.” “Working in an NRNB training program helped to strengthen my resume and introduced me to the idea of combining a career in medicine with computer-based research.” “Great opportunity for developing mentoring and supervising skills as well as get my software tools developed.” “This was my first ever contribution to an open source project and NRNB also. This milestone will shine on my CV forever.”
  • 5.
    “Great experience interactingwith the community and my mentor. I was excited to receive help and encouragement for my project.” “Learned how to work in a collaboration, formulate better questions. Gained especially invaluable knowledge and experience. Improved coding skills. Learned new programs and libraries.” “It broadened my mind to issues still unsolved in the network biology community, and I gained resources and colleagues in the community that I otherwise wouldn't have.” “Personally, I see great value in interacting with smart, young people from all around the world. I am optimistic that participating in NRNB training programs will benefit my own research group by giving it wider exposure and by building a community around the software.” “I am continuing to work for Cytoscape.js and am happy to being staying involved.” “The program has been great experience for my students. They not only learned about open source community driven projects, but the work they did has contributed to their future research.” “The program gave me a chance to work with students in projects of mutual interest and to develop my tools faster and more efficient.” B.5:  How  have  results  been  disseminated     Technology  Research  and  Development   Technology research and development results are routinely published (see C.1) and discrete software tools and resources are highlighted and distributed through the NRNB web site at http://www.nrnb.org/tools- wall.html. We also created and maintain almost 100 open source code repositories for NRNB related projects at GitHub, https://github.com/nrnb/. Infrastructure   We routinely promote Cytoscape and other NRNB infrastructure advancements through publications and via the tools page on the nrnb.org web site. Publications citing Cytoscape continue to increase year over year, numbering 2218 in 2016, a 24% increase over 2015. NRNB staff were involved in at least 13 publications using Cytoscape and results obtained on the NRNB cluster. These are listed in the Infrastructure report. B.6:  What  you  plan  to  do  next   Technology  Research  and  Development   For the first theme of Differential Networks, we aim to modify the perturbation biology modeling method to incorporate time resolved data. With respect to DBP 7, with the sample collection and profiling complete, we will next focus on data analysis. We will also continue the work on developing an evolutionary model that considers domain and binding site changes and their affects on network alignment. Finally, we will extend the integrated ID mapping tool in Cytoscape to handle general annotation tasks and begin work on a gene set manager. Both of these tools will be in support of core protocols, including the mass spec analysis workflows of DBP 1. Work on the second theme of Descriptive to Predictive Networks will extend netDx to other disease areas and other features types, including epigenomics and non-coding genome regions. We will continue the development of network-constrained regression models and apply them to NCI compound screen datasets.
  • 6.
    And we willstudy the predictive process of DeepCell to glean insights into the functional logic underlying a particular genotype-to-phenotype response. The third theme of Multi-scale Networks will see the addition of pharmacological and clinical datasets to the data-driven assembly of ontologies of drugs and phenotypes. In the next reporting period, we will integrate cancer ‘omics data into HNeXO to make a cancer-specific gene ontology. And, finally, we will continue to improve our network inference using cell-cell receptor-ligand pathways and its ability to leverage single cell RNA-seq data. Collaboration  and  Service  Projects   New CSP requests are coming in all the time. We will continue to evaluate these per site as we have. This includes the approach being tested by Gladstone and UCSD sites to have their respective Bioinformatics core facilities explicitly offer NRNB services as part of their regularly advertised campus services. Both groups are seeing many projects funnel in through this mechanism. We will continue to evaluate this approach and scale it where appropriate. See the CSP report for a more detailed description of specific projects on the horizon at each site. Infrastructure   The overall goals for the Cytoscape Desktop are published on the Cytoscape Roadmap web page (http://cytoscape.org/roadmap.html). The Infrastructure report summarizes these and goes into detail on future Cytoscape Cyberinfrastructure, App Store and NRNB Cluster work plans. Training   We recently submitted our application for GSoC 2017. If accepted, this should be one of our largest years yet. We have more mentors and more project ideas than prior years and are continuing a more coordinated outreach effort with a Mentor Resource Packet that we will distribute to all NRNB mentors. This resource was developed in 2016, and is meant to help mentors contact and communicate with various student bodies that are likely to have the skill and interest to participate in GSoC 2017. C.2:  Website(s)  or  other  Internet  site(s)   NRNB.org   NRNB.org is the main web site for the National Resource for Network Biology and serves as the primary source of disseminating NRNB resources and associated information. It has information for NRNB collaborators and researchers as well as the larger network biology community. The site includes our project description and annual reports, available tools and resources, links to training materials, programs and events, and instruction in how to collaborate. Over the past year, traffic to the site averages about 840 visits per month. Since the site went live in late 2010, we have had over 84,000 visits. Cytoscape.org   As detailed in the Dissemination report, we significantly improved our discourse on Cytoscape history and future directions: http://www.cytoscape.org/roadmap.html. It now more plainly lays out our vision for Themes and Features for future releases, and is laid out to improve communication with users, developers and curious parties. Visits to cytoscape.org now number almost 1.9M (up 26% from last year) since the site was created in 2012. While most visits to cytoscape.org are from the United States, these visits aren’t in the majority. In fact, the second greatest source of visitors is “all the rest”, indicating that Cytoscape is popular worldwide. Cytoscape  App  Store     A highlight of NRNB Dissemination efforts is the Cytoscape App Store (http://apps.cytoscape.org/), which was developed under supplemental funding to the main NRNB award. The goals of the App Store are to highlight the important features that apps add to Cytoscape, to enable researchers to find and install apps they need, and for developers to promote their apps. It has stimulated a sizable community of Cytoscape App developers, hosting over 307 apps developed by 674 different developers around the world. Cytoscape users download an
  • 7.
    average of 850apps per day over the past 12 months. That has accumulated to just over 760,000 total app downloads since the launch of the App Store. The top 3 downloaded apps, ClueGO, BiNGO and GeneMANIA, have accumulated over 136,000 downloads combined. During the month of January 2017, the site received over 38,000 page views. OpenTutorials   Open Tutorials (http://opentutorials.cgl.ucsf.edu/index.php/Main_Page) is the main source for tutorial materials for Cytoscape and other NRNB tools, and is being used both internally by presenters, and by researchers and developers. Traffic to Open Tutorials is consistent, with 66,000 unique sessions at Open Tutorials in the past year, 65% from new visitors. Others   As detailed in Administrative, Dissemination and Training reports, we also maintain a handful of other sites related to NRNB activities, including • Network Biology LinkedIn group • Tumblr feeds for Network Biology- and Cytoscape-related publications • Special pages for GSoC and NRNB Academy • Special pages for annual NetBio SIG conference • Guest editor roles for F1000Research Channel for Cytoscape Apps • New Cytoscape App Developer Ladder • New site for hosting a dynamically generated manual for Cytoscape • Three mailing lists for Cytoscape users, app developers and core staff C.3:  New  technologies  and  techniques   TRD1.1   The developed perturbation biology methodology has been publicly shared via publication. Additionally, there is an accompanying web application (http://www.sanderlab.org/pertbio/) that is available. Users can explore and download models produced by the analysis. TRD1.3   The new identifier mapping tool described in B.2. will be shared through the free, open source distribution of Cytoscape 3.5.0+ as of March 2017. The stringApp is freely available as an open source app for Cytoscape at http://apps.cytoscape.org/apps/stringapp. The app now includes STITCH support as described in B.2. The stringApp has been downloaded over 7400 times since it’s release 13 months ago. TRD2.1   We have developed the netDX technology and will disseminate it at the netdx.org website and GitHub, under an open access software license. The technology is implemented in R and Java, as an easy to use and well- documented R package. TRD2.2   We anticipate that several useful resources will be generated by the proposed research. We will provide a new deep learning training algorithm to train the hierarchy-guided deep neural network. We will provide a new analysis pipeline to help people the behavior of their supervised machine learning model. We will also provide a web server where users can not only predict growth related phenotypes using our trained deep learning model but also interpret the logic of prediction. TRD3.2   • An online, interactive viewer of the Active Interaction Mapping procedure and its application to yeast autophagy at http://atgo.ucsd.edu.
  • 8.
    • A data-drivengene ontology in human: We will also make this ontology available through an online, interactive viewer. • Parallelized ontology construction: We will make code available on GitHub once completed. Infrastructure  technologies   Detailed in section C.3 of the Infrastructure report are the following technologies: • CX network interchange format • cyWidget system • Kubernetes cluster C.5.a:  Other  products   TRD  3.2   A significantly faster version of the popular random forests regression algorithm in the Python scikit-learn package was created for this work and is publicly available on GitHub at https://github.com/michaelkyu/scikit- learn-fasterRF.  
  • 9.
    TRD  1:  Differential  Networks   B.2  What  was  accomplished  under  these  goals?   TRD 1.1: Tools for Inference of Differential Networks from Protein States and Abundances Over Time; DBP 7: Forest White; DBP 8: Pommier Background: The aim of this task was to improve the perturbation biology method (developed by Nelander, Molinelli, and Korkut) for a more thorough understanding of protein networks and their responses to drug perturbations. The perturbation biology method involves inference of quantitative signaling models from high throughput drug response data. In recent years, we solved the network inference problem through implementation of a probabilistic statistical physics algorithm called belief propagation (BP). In network inference, we also benefit from pathway database extracted prior information to improve model accuracy. The network models are based on coupled nonlinear ordinary differential equations that represent the temporal changes to perturbations. Equation  1:     In Equation 1, xµ i are the perturbed and/or measured variables, µ, represent the perturbations, wij quantifies the edge strength, αi constant is the tendency of the system to return to the initial state, and εi constant defines the dynamic range of each variable i. The transfer function, Φ ensures that each variable has a sigmoidal temporal behavior. Current progress: The drug response was previously measured and analyzed at a single time point. This is a limitation since the drug response changes over time and early changes at the protein level might be of importance for the understanding of the drug response. We have therefore produced and analyzed a new data set with time resolved drug response measurements in melanoma cells. The data contains protein measurements at several time points during 3 days (10, 27 minutes, 3, 9, 24, 48, 67 hours) as well as phenotypic measurements (cell death and growth) for 60 different drug combinations in melanoma cells. Recent analysis shows that early protein measurements may be important to explain cell death measurements. 1. We used partial least square regression (PLSR) modeling to find relations between protein measurements and cell death at different time points. In PLSR, the input and output variables are projected to new dimensions (components) to find a linear regression model. We used the protein data at 8 time points with 60 drug combinations as input, and chose the number of components in the PLSR so that 95% of the variance in the output (cell death) was explained. The resulting model with 5 components are in agreement with data as shown in Figure 1. 2. To evaluate the contribution of the proteins to the regression model, we used VIP (variable importance of the prediction) scores. These scores are calculated from the variance that is explained by each variable and the total variance that is explained by all components. VIP scores are always positive, but since it is important to know the direction of the response, we used the sign from correlation between the protein measurements and cell death. As seen in Figure 2, early time points for some of the proteins have a high VIP score, which means that these measurements are important to be able to explain the outcome.      
  • 10.
      Figure  1.  The  PLSR  model  is  in  agreement  with  data.  The  PLSR  (partial  least  square  regression)  model  was  in  good  agreement   with  the  cell  death  measurements  (left).  The  number  of  components  in  the  PLSR  model  was  chosen  to  be  5  since  a  PLSR  model   with  5  components  explain  95%  of  the  variance  in  the  measurements  (right).           Figure  2.  VIP  scores  for  key  proteins  show  the  importance  of  early  protein  measurements  to  explain  cell  death.  The  VIP   (variables  importance  of  the  prediction)  scores  are  calculated  from  the  PLSR  model  using  the  variance  that  is  explained  by   each  model  variable  and  the  total  variance  that  is  explained  by  all  components  of  the  model.  The  measured  proteins  AKT-­‐ pS473  and  PRAS40-­‐pT246  (left  side)  are  important  to  explain  cell  death  already  at  74  minutes  after  drug  addition.     In service of our DBP 7, Temporal response to CDK4 inhibition in de-differentiated liposarcoma, we have been investigating how signaling networks in two patient-derived xenograft models (DDLS8817-PDX; MPNST3-PDX) respond to clinically-relevant inhibition of CDK4 and combinations designed to block potential network resistance mechanisms. The PDXs are initially sensitive to CDK4 inhibition, but ultimately the tumors begin to grow even in the presence of the drug. Proteomic profiling of peptides enriched for phoso-tyrosine from treated and untreated animals revealed increases in key signaling proteins in response to CDK4 inhibition. In this phase of the grant, we have followed up with combination therapies targeting PDGFR and src- kinase activation--key pathways we hypothesize play a role in the switch between cells that are sensitive and resistant to CDK4 inhibition. A significant number of studies was required to optimize effective dosing so that we could obtain samples for further molecular profiling. In order to relate further molecular features to the
  • 11.
    phenotypic results weobserve in vivo, we have also begun to perform deeper molecular profiling the xenograft tumors. This year, we followed-up with experiments to optimize dosing and endpoint data acquisition We established reasonable doses by performing serial dilutions of palbociclib, saracatinib, and sunitinib. In the DDLS8817-PDX, we found that the combination of palbociclib and sunitinib was was no more effective than palbociclib alone. This was surprising as we had observed an increase in phospho-PDGFR-beta and may be due to the “dirtiness” of the Sunitinib inihibitor, which is known to inhibit PDGFR and other receptor tyrosine kinases. Based on our network analysis, the next experiment we attempted, combined palbociclib with the 2nd generation Src inhibitor saracatinib. Unfortunately, we again had issues with dosing and the results were inconclusive as the saracatinib alone appeared ineffective and palbociclib flatlined the tumors. The sunitinib and palbociclib combination was tested in MPNST3-PDX. This PDX showed a strong increase in phospho-PDGFR-alpha during treatment with palbociclib. For this study we utilized 150 mg/kg PD991 with 40 mg/kg and 60 mg/kg sunitinib (singles, in combination, and a vehicle control). All groups had 5 animals. Sunitinib was very effective with or without palbociclib. At certain time points, it appeared that there might be synergy between palbociclib and sunitinib. However, the addition of a slight amount of sunitinib (from 40 to 60 mg/kg) seemed to decrease tumor burden more effectively than 150 mg/kg palbociclib (see Figure 1). As the tumor burden is very high with control and singly-treated MPNST3-PDXs, we were unable to harvest tumors at the same time and simultaneously evaluate how the tumors respond after extended periods with the drug. In order to gain time-matched material for further molecular analysis, we performed an additional combination study with lower doses of sunitinib. Tumor material was harvested and analysis of this material is ongoing. In addition to performing several additional xenograft studies, we have also begun to profile the genomic and transcriptomic baseline of this tumor material. We have now performed deep DNA sequencing using the targeted sequencing IMPACT assay. We have also performed RNA sequencing on the tumor material. Analysis is ongoing.   TRD  1.2:  Protein  network  alignment  algorithm  and  viewer;  DBP  2:  Vidal  and  Hill     TRD1, Differential networks Aim 2. We continue to develop protein-protein interaction network alignment algorithms since publishing “GreedyPlus: An Algorithm for the Alignment of Interface Interaction Networks” in 2015 [1], the first such algorithm for protein interaction networks that includes binding site information. We have studied protein domain and binding site evolution from a range of organisms with fully sequenced genomes and have identified many different patterns of sequence evolution that change network architecture at the local protein level – more so than expected. We hypothesized that we could identify a few major sequence evolution patterns, but most examples we studied were unique. This work has led us to design a new technology for ortholog function assessment that simultaneously considers protein and network evolution, described in B.6. TRD1.2. To support DBP 2 (Vidal and Hill) “Mapping the human interactome and its rewiring by disease mutations”, we have engaged in weekly discussions with the Vidal team to consult on the analysis of their ongoing human interactome project, in particular where their work includes differential network analysis and consideration of binding sites. References 1. Law B, Bader GD. GreedyPlus: An Algorithm for the Alignment of Interface Interaction Networks. Scientific Reports. 2015;5:12074. TRD 1.3: Facilitating the interpretation of AP-MS data as interaction networks; DBP 1: Krogan Mass spectrometry practitioners and analysts routinely work with network models constructed from fundamental interaction measurements. The data inform the biomedical understanding of host-pathogen interactions, signaling networks and network rewiring in cancer, to name a few examples. This is a critical field of research with which to provide powerful and accessible network visualization and analysis technology. This project component is aimed at making specific improvements and implementing new features to Cytoscape to enhance its applicability and adoption by mass spec community. The main objectives are to augment
  • 12.
    Cytoscape to streamlinethe typical mass spec analysis pipeline and provide better access to public mass spec data and annotation repositories relevant to researchers. Following the guideline of our (lengthy) Nature Protocol for mass spec analysis using Cytoscape [1], we made significant progress on streamlining and enhancing the protocol. First, in terms of identifier mapping, we took a multistep process involving the installation and configuration of a separate app and replaced it with a built-in context menu option added to the existing Node Table in Cytoscape. Identifier mapping, in brief, addresses the matter of mapping between identifier systems (e.g., UniProt, Entrez Gene, Ensembl, etc) when merging interaction data or integrating data types. This is a common problem faced by all bioinformaticians. In the specific domain of network data in Cytoscape, we see the opportunity to provide semi-automated assistance for users wanting to merge and integrate heterogeneous data. This is particularly relevant to mass spec practitioners, e.g., those in the Krogan lab (DBP 1), who want to view their interaction data in the context of other public interaction data and other annotations. The integration of identifier mapping into Cytoscape as a built-in feature greatly enhances the user experience for mass spec practitioners as well as many other users. See the before/after comparison of the steps required in the published mass spec Nature Protocol. The simplification goes beyond app integration and user interface work. For example, rather than requiring the user to explicitly connect to a database source, the new tool automatically connects to existing web service provided by BridgeDb. And rather than requiring the user to explicitly choose a source identifier type, the new tool guesses the identifier based on the values extracted from the column indicated by the user in the right click action that initiated the dialog. We also included better, more common, default options for target identifier type and the force single feature based on prior experience using and training others on the original BridgeDb app. This is a great example of a coordinated NRNB project. Despite being spread across 4 campuses, this project involved work by members of the Ideker lab, together with the features and resources leveraged by the BridgeDb app, which was a Google Summer of Code project mentored by the Pico lab and implemented by a student later hired by the Sander lab [2]. In the end, members of three of the 4 NRNB sites directly contributed to this project, while also leveraging financial support and talent recruitment from Google. The second major activity was the continued development of the stringApp for Cytoscape by Dr. Morris. STRING (http://www.string-db.org/) is an important public interaction database, widely regarded by mass spec practitioners. With input from both mass spec practitioners (DBP 1) and the developers/maintainers of the STRING database, Dr. Morris implemented the app to take full advantage of all the unique aspects of STRING, as described in the NAR special database issue for 2017 [3]. The stringApp has been downloaded over 7400 times since its original release in December of 2015 and is freely available at the Cytoscape App Store: http://apps.cytoscape.org/apps/stringapp.
  • 13.
    Figure 3. Screenshotof STITCH compound-protein network. This is the result of a query for Coumadin (Warfarin®), a common blood thinner used to prevent thrombosis. Queries of proteins or compounds are supported. The nodes in Cytoscape preserve the signature STRING style with structures and glass bobble effects. During this reporting period, Dr. Morris implemented critical support for STITCH as a fourth query option in the stringApp (Figure 3). The STITCH database includes both physical interactions and functional associations between chemical compounds and proteins (http://stitch.embl.de). Now, in addition to protein, PubMed and disease queries, Cytoscape users can select STITCH: protein/compound query and interrogate the STITCH database for protein-compound interactions. This new dimension of interactions allows researchers to extend protein networks into compound space or build protein networks from a set of one or more compounds. This feature thus nicely complements any network or protein interaction resource tools already available in Cytoscape. It is particularly relevant to the growing demand and data deluge for drug compound screens and metabolomics, which of course includes mass spectrometry practitioners. Another feature added to the stringApp during this period is enrichment analysis. This was a major step in the AP-MS protocol that once again required the installation and operation of a separate app. Now, upon import of any network via the stringApp the user can choose to perform enrichment analysis and obtain Gene Ontology terms and KEGG pathway results. This is a valuable addition to workflows that involve STRING or STITCH networks. References 1. Morris, J.H.K., G.M.; Verschueren, E.; Johnson, J.R.; Cimermancic, P.; Greninger, A.L.; Pico, A.R. Affinity Purification-Mass Spectrometry and Network Analysis to Understand Protein- Protein Interactions. Nature Protocol (2014) 9, 2539-54. 2. Gao J, Zhang C, van Iersel M, et al. BridgeDb app: unifying identifier mapping services for Cytoscape. F1000Research. 2014;3:148.. 3. Szklarczyk D, Morris JH, Cook H, et al. The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible. Nucleic Acids Research. 2017;45(Database issue):D362- D368.  
  • 14.
      B.6  What  do  you  plan  to  do  for  the  next  reporting  period  to  accomplish  the  goals?     TRD  1.1   In the next reporting period, we aim to modify the developed perturbation biology modeling method to incorporate time resolved data (dynamic network analysis). The current method is developed for a single time point and assumes the model variables to be in steady state at this time point. This is a rough assumption since data comes from living cells that continues to grow. We will therefore use the new time resolved data to expand upon the existing method and change the implementation of the model equations. Model variables are in the current version of our method assured a sigmoidal temporal behavior. However, also other temporal behaviors should be possible to capture. We will also incorporate changes to evaluate not only the steady state solution of the equations, but the full trajectories in the calculation of the model error. These changes in the perturbation biology method will assure that models created by the method can be used to predict temporal drug responses and therefore make the method more generally applicable to different data sets. The network represented by the inferred interaction parameters ideally is predictive of the effects of previously unseen perturbations, such as design combinatorial drug interventions. With respect to DBP 7, with the exception of possibly performing another dosing study on DDLS8817-PDX, the mouse xenograft studies are completed. Samples were collected across many of these experiments and have undergone molecular profiling. In the next phase of the project we will first focus on data analysis. Our first questions center around response of the xenograft tumors to combination therapy with palbociclib. This includes determining whether the effects of addition of sunitinib or saracatinib were synergistic or additive. We will further integrate the molecular profiling data (phospho-tyrosine mass spectrometry, RNAseq, and DNA copy number analysis) with the xenograft growth measurements to confirm basic expectations (e.g., sunitinib inhibiting PDGFR activity) and also correlate molecular and phenotypic changes. We also plan to perform network analyses to better characterize the alterations we observe with other malignant peripheral nerve sheath and de-differentiated liposarcoma tumors profiled in the Cancer Genome Atlas (TCGA). The results of these analyses will function as a baseline for further characterization in additional MPNST and DDLS cell lines and possibly future xenograft studies.   TRD  1.2   Differential networks Aim 2. We continue to work on developing an evolutionary model that considers domain and binding site changes and how these affect network alignment. Further, we are designing a new technology for evaluating the function of proteins based on coding DNA and protein sequence evolution along with molecular interactions involving the protein and its binding sites. We hypothesize that viewing a hierarchy of evolutionary changes from the sequence to network levels will usefully inform protein function prediction methods that transfer functional annotation between organisms (differential network analysis).   TRD  1.3   Our interactions with DBP 1, the Krogan lab, continue to reveal a growing list of roadblocks and challenges with using Cytoscape with mass spec data. With our early start on this aim, as described in this and previous reports, we are in a good position to address the bulk of this list during the overall grant period. We have prioritized these items per their significance, the breadth of their applicability, and the feasibility of a solution. Over the next year, we will extend the integrated identifier mapping tool described above to handle general annotation tasks as well. These include annotating nodes with Gene Ontology terms, for example. This will be a prerequisite for future work providing enrichment analysis as a general tool to Cytoscape. We are also just beginning work on a gene set manager to allow Cytoscape users to import, paste or drag gene lists into Cytoscape to use for queries (e.g., for STRING networks), for selection and for basic set
  • 15.
    functions (e.g., union,intersection, difference). Many core protocols in Cytoscape, include our mass spectrometry protocol, involve user-defined gene lists. Finally, we are planning to tackle the logging of tasks and crash events in Cytoscape. While greatly mitigated in Cytoscape 3, there are still scenarios that require force quit actions by the user. We plan to log these events to help diagnose and remedy them. Similarly, a log of tasks in general would provide valuable feedback on the major operations carried out by the bulk aggregate of users. It would also identify hard to find or otherwise neglected features.   C.3.  Identify  technologies  or  techniques  that  have  resulted  from  the  research  activities.  Describe   the  technologies  or  techniques  and  how  they  are  being  shared.   The developed perturbation biology methodology has been publicly shared via publication. Additionally, there is an accompanying web application (http://www.sanderlab.org/pertbio/) that is available. Users can explore and download models produced by the analysis. The new identifier mapping tool described in B.2. will be shared through the free, open source distribution of Cytoscape 3.5.0+ as of March 2017. The stringApp is freely available as an open source app for Cytoscape at http://apps.cytoscape.org/apps/stringapp. The app now includes STITCH support as described in B.2. The stringApp has been downloaded over 7400 times since it’s release 13 months ago.  
  • 16.
    TRD  2:  Descriptive  to  Predictive  Networks   B.2.  What  was  accomplished  under  these  goals?   TRD2.1:  Predicting  clinical  outcome  using  patient  similarity  networks;  DBP  5:  Friend   Patient classification has widespread biomedical and clinical applications, including diagnosis, prognosis, disease subtyping and treatment response prediction. A general purpose and clinically relevant prediction algorithm should be accurate, generalizable, be able to integrate diverse data types (e.g. clinical, genomic, metabolomic, imaging), handle sparse data, be compatible with patient privacy protection systems and be intuitive to interpret. We have recently developed netDx (http://netdx.org/), a supervised patient classification framework based on patient similarity networks that meets the above criteria. netDx models input data as patient networks and uses the GeneMANIA machine learning algorithm that we previously developed for network integration and feature selection. We demonstrated the utility of netDx by integrating gene expression and copy number variants to classify breast cancer tumour class, achieving an accuracy (~85%) similar or better (depending on the class) than previously published methods. Further, we have been able to successfully predict Autism Spectrum Disorders (ASD) phenotype from germ line DNA for a subset of ASD patients(Figure 1). netDx uses pathway features to aid biological interpretability and results can be visualized in Cytoscape as an integrated patient similarity network to aid clinical interpretation. Figure 1. Predictive power of netDX is better than contemporary methods: ASD case. Mean test performance over three resamplings. Predicted status is informative beyond genetic ancestry (ANOVA chisq-test, p=2.76x10-10 ). GBT = pathway-level FET (cnvGSA). RFCF = Random forests (Engchuan, et al. 2015 BMC Genomics). Reproduced from NetBio SIG presentation. To support DBP 5: Sage Bionetworks: Molecular stratification of colorectal cancer and DREAM challenges, we have revisited all major DREAM challenges where data are available and where the challenge experimental design is compatible with netDX’s classification engine (two class classification). We have reported on netDx [1] and presented on netDx at the NetBio SIG Meeting in 2016 [2]. References: 1. Shraddha Pai, Shirley Hui, Ruth Isserlin, Hussam Kaka, Gary Bader. netDx: Patient classification using integrated patient similarity networks. bioRxiv 084418; https://doi.org/10.1101/084418 2. https://f1000research.com/slides/5-1710
  • 17.
    TRD2.2:  Predicting  cellular  response  to  perturbation  with  network-­‐guided  regression;  DBP  8:  Pommier   Part  I   Background: The overall goal of this project is the identification of biomarkers involved in the progression of cancer and the response to pharmaceutical treatment. This goal is accomplished through the used for regression-based methods that subject to biological network constraints so that biomarkers can be understood in the context of regulatory processes. Current progress: Applied to our DBP 8 use case, “Drug response prediction for the NCI compound library,” a major first step in this project has been data preparation of the NCI-60 and Pathway Commons data. This was done through the development of the rcellminer/paxtoolsr R packages, respectively, to simplify the usage of this data from an R programmatic environment. Previous to this development the NCI-60 data was provided in spreadsheets laid out in varying formats. In addition to the direct conversion of the data from the NCI-60 CellMiner website (http://discover.nci.nih.gov/cellminer) additional elements of metadata were included in the R package, including: structures, repeat drug screen data, mechanisms of action for compounds, and drug approval information not found on the website. A second ongoing focus during this reporting period has been the continued expansion of data relating to the NCI-60 to all as wide as possible exploration of biological processes as possible. During the reporting period, analysis of the methylation data for the NCI-60 was performed. The analysis of RNA-Seq dataset for the NCI-60 is ongoing and should be completed within the next reporting period and made publicly available for further usage within this project. During the last reporting period, we have now collected the experimental data for about 39 compounds first screened on the NCI-60 then re-screened on the Sanger Genomics of Drug Sensitivity in Cancer (GDSC) cell lines. This provides us data for ~750 cell lines were screened, a low number (7 compounds) of the compounds sent precipitated during screening preventing their analysis. Additionally, we have also submitted a manuscript on the analysis of the NCI-60 SWATH mass spectrometry (MS) data produced by the Aebersold group at ETH Zurich. Our analytic contribution was the use Elastic Net regression analysis methodology using subsets of available feature sets (e.g. gene expression, mutations, and protein abundances). With respect to this project, while these are not network-constrained regression analyses there are allowing us to develop an important baseline by which we will compare future results involving network constraints. We have made some preliminary progress on the network-constrained regression methodology during this reporting period. Our starting point is the recently published GELnet method, and we have a working demonstration of the method for one drug found in the NCI-60 using expression data. The GELnet method should highlight novel predictors of drug response. Part  II   Overview: Deep learning has achieved tremendous success in various biology applications such as drug discovery, DNA/RNA protein binding and noncoding variants effects prediction. In biology, accurate prediction is not enough, the cell is never “conquered” until human being understand why it behaves in that way. A major challenge to tackle this problem is to develop an ‘in silico’ model which is able to simulate the biological process of cell happened ‘in vivo’ with respect to the actual cellular structure. A number of successful approaches have modeled the cell’s transition from genotype to phenotype by using prior knowledge in the form of molecular networks. In these approaches, genetic variation is first mapped onto molecular networks; affected subnetworks are then associated with phenotype. Such important information is then learned by a supervised machine learning model using the diffused signal as features. Here, we have constructed a “white box” model, called DeepCell, which uses the hierarchical structure of the cell to simulate cell behavior with both high accuracy and interpretability. Prior biological knowledge is organized into a hierarchical form and the structure of the predicting model is also constructed based on that hierarchy. DeepCell can learn complex patterns from large datasets while still keep low computational complexity, thus prohibit overfitting. To interpret the model, one can observe how the input signal propagates bottom-up through the hierarchical structure and activate different subsystems at multiple scales to make final predictions.
  • 18.
    Results: In thiswork, we focus on the task of simulating pairwise genetic interactions among ~3000 non- essential genes in the budding yeast, Saccharomyces cerevisiae, in which the combined loss of both genes might lead to unexpectedly slow or fast relative growth rate comparing with the loss of either gene alone. We used two hierarchical structures to guide the deep neural network model including Gene Ontology (GO) curated from Saccharomyces literature and data-driven ontology assembled from Saccharomyces datasets using network-extracted methods. DeepCell can accurately predict phenotypes including both growth rate and genetic interaction score across a range of genetic interaction scores (Figure 2). In comparison with all of these approaches, DeepCell using GO and data-driven ontology both achieved substantially greater correlation between predicted and measured genetic interaction scores. Figure 2. a, Measured versus predicted cell viability relative to wild type (WT = 1). b, Measured versus predicted genetic interaction scores for each double gene disruption genotype; genetic interactions between the disrupted genes can be positive (epistasis), zero (non-interaction), or negative (synthetic sickness or lethality). c, Predictive performance of neural networks, measured as the correlation between measured and predicted genetic interaction scores on the Costanzo dataset (first four bars). Network structures are based on prior knowledge of the hierarchy of cellular subsystems, as inferred from ‘omics datasets (CliXO) or from literature curation (GO). Also shown is the average performance of neural network structures for which gene- to-subsystem mappings have been randomly permuted. Performance is also compared to previous methods for predicting genetic interactions (second four bars). d, Predictive performance as a function of the number of neurons per subsystem (CliXO or GO term). The performance measure and four neural networks are identical to (c).
  • 19.
    B.6.  What  do  you  plan  to  do  for  the  next  reporting  period  to  accomplish  the  goals?   TRD2.1   Predictive networks Aim 1. We submit a publication about this in 2016 and are now working on revisions to be submitted in 2017. We are now focused on the following extensions: 1) extend our results to other disease areas; 2) extend feature engineering work to consider epigenomics data and non-coding genome regions. TRD2.2   Part  I: We will continue apply the developed dataset components to the development of the network- constrained regression models in the upcoming reporting period, and make datasets more widely available as they become published. In coordination with DBP 8, we will also look at additional upcoming NCI-60 datasets (e.g. RNA-seq) to see their utility for this regression methodology. We will continue to look at pre-existing methodologies to examine their properties and what improvements may be warranted. Part  II: Unlike standard ANNs, DeepCell was tied directly to cell structure, raising the possibility that its predictions could be interpreted biologically. In the next step, we are going to study whether the model can be opened to dissect the internal cellular subsystems and functional logic responsible for governing a particular genotype-to-phenotype response? To address this question, we will study how to interpret the predicting process by using DeepCell. We will begin by scoring the importance of each subsystem to DeepTranslate’s overall genotype-phenotype function. Besides global ranking, we will also explore the most important subsystems from the GO hierarchy, examining their internal states and their functional logic. C.3.  Identify  technologies  or  techniques  that  have  resulted  from  the  research  activities.  Describe   the  technologies  or  techniques  and  how  they  are  being  shared.   TRD2.1   We have developed the netDX technology and will disseminate it at the netdx.org website and GitHub, under an open access software license. The technology is implemented in R and Java, as an easy to use and well- documented R package. TRD2.2   We anticipate that several useful resources will be generated by the proposed research. We will provide a new deep learning training algorithm to train the hierarchy-guided deep neural network. We will provide a new analysis pipeline to help people the behavior of their supervised machine learning model. We will also provide a web server where users can not only predict growth related phenotypes using our trained deep learning model but also interpret the logic of prediction.
  • 20.
    TRD  3:  Multi-­‐scale  Networks   B.2.  What  was  accomplished  under  these  goals?   TRD  3.2:  Functionalized  gene  ontologies  as  a  hierarchy  of  functional  prediction;  DBP  3:  Cherry   Development and validation of an iterative procedure for incorporating new data into a data-driven ontology. In the last reporting period, we began developing a general progressive procedure, Active Interaction Mapping, to guide assembly of the hierarchy of functions (ontology) encoding any biological system. Since then, we have published this work [1] and have made the procedure available at http://atgo.ucsd.edu. In this work, we assembled an ontology of functions comprising autophagy, a central recycling process implicated in numerous diseases. We performed subsequent experimental validation of the ontology, including newly identified roles for Gyp1 at the phagophore-assembly site, Atg24 in cargo engulfment, Atg26 in cytoplasm-to-vacuole targeting, and Ssd1, Did4, and others in selective and non-selective autophagy. This work was co-authored by our DBP 3 with Michael Cherry [1]. Construction of a data-driven gene ontology in human. Whereas our previous work was focused in yeast, we have recently constructed the first data-driven gene ontology in human. As input to building this ontology, we took 908 experimental studies covering 98% of human coding genes. These data were drawn from several databases, including Gene Expression Omnibus (668 microarrays), GeneMANIA (201 genetic/protein interaction networks), GTEx (35 co- expression networks). Using our Active Interaction Mapping pipeline, we integrated these datasets into a unified gene- gene similarity network and then hierarchically clustered this network to assemble a human gene ontology (called HNeXO). Parallelized, GPU-based algorithm for ontology construction. We have been optimizing our algorithm for constructing an ontology. Currently, it takes about a day to assemble a data-driven gene ontology in yeast (~6000 genes) and several days for one in human (~20,000 genes). We aim to reduce this runtime down to the span of hours by parallelizing the computation. To do this, we have reformulated the construction of an ontology as a series of matrix computations, for which there are known algorithms for massive parallelization and efficient memory caching. Our approach fully exploits the capacity of parallelism on a multi-CPU platform and is easily generalized to Graphics Processing Units (GPU). References 1. Kramer MH, Farré JC, Mitra K, et al. Active Interaction Mapping Reveals the Hierarchical Organization of Autophagy. Mol Cell. 2017. TRD3.3:  Bridging  ligand-­‐receptor  networks  to  cell-­‐cell  communication  networks;  DBP  9:  Zandstra   We have undertaken new research and development work to infer cell-cell interaction networks. In particular, we have extensively used single cell RNA-seq data to infer higher resolution cell-cell networks and have developed applications to cancer stem cell biology and regenerative medicine (e.g. DBP 9), both areas where cell communication is important for tumour or normal tissue development. For single cell RNA-seq, we start by clustering the single cells to define cell types. Clusters representing cell types are identified by the expression of known cell type markers and previously unrecognized clusters are, by default, linked to the nearest known cell type (e.g. neuron subtype A, B, C…) Each cell type is analyzed for the expression of surface receptors and Data-driven gene ontology in human (HNeXO). To understand how HNeXO compares with curated knowledge in the Gene Ontology (GO), we searched for matching HNeXO and GO terms. HNeXO recapitulates thousands of GO terms and also discovers new terms that are supported by data but have no
  • 21.
    ligands to inferconnections between them. These represent hypotheses about cellular communication for experimental follow up. To support DBP 9: Engineering blood for regenerative medicine, we are automating our cell-cell interaction network inference pipeline. This is important as the regenerative medicine community in Toronto received a transformation grant ($114M CAD) to expand this scientific research area. As a result, many more developmental biology research groups are requesting cell-cell interaction network analysis. As a major milestone, a neural developmental biology group used this method to identify three new neural development factors (Figure 1) [1]. Figure 1. Integration of the E13/14 cortex ligand data with the transcriptome-based cortical communication model and the combined transcriptome-cell-surface proteome communication model. Red nodes denote ligands predicted in the transcriptome-based model that were expressed in the E13/14 cortex and also have receptors identified by cell-surface proteomics. Nodes surrounding the yellow CP (Cortical Precursor) and CN (Cortical Neuron) nodes represent predicted autocrine ligands for CPs and CNs, respectively. Nodes located between the yellow CP and CN nodes are predicted paracrine ligands. Edges indicate direction of communication. Reproduced from ref 1. References 1. Yuzwa SA, Yang G, Borrett MJ, et al. Proneurogenic Ligands Defined by Modeling Developing Cortex Growth Factor Communication Networks. Neuron. 2016;91(5):988-1004. B.6.  What  do  you  plan  to  do  for  the  next  reporting  period  to  accomplish  the  goals? TRD3.2   Data-driven ontologies of other biomedical data types, including drugs and phenotypes. Our current work has focused on assembling and applying gene ontologies, in which we group genes based on similar functions within the cell. Likewise, drugs can be organized into drug classes based on similar chemical structure, gene targets, or functional effect. Moreover, clinical signs and symptoms can be organized into diseases and disease classes. In the next reporting period, we will use available pharmacological and clinical datasets to assemble data-driven ontologies of drugs and phenotypes. Construction of a cancer-specific human gene ontology. One of the limitations of the Gene Ontology is that it encompasses all species, tissues, and cell types. Context-specific knowledge is difficult to extract from GO. In our Active Interaction Mapping work, we showed that it is possible to assemble a context-specific ontology (yeast autophagy). In the next reporting period, we will integrate cancer ‘omics data into HNeXO to make a cancer-specific gene ontology. TRD3.3   We are currently working to improve our network inference using additional information on cell-cell receptor-ligand pathways. For instance, if a pathway downstream of a receptor is active, it strengthens the
  • 22.
    inference that thereceptor is active in the network. We also continue to work to adapt this method to the analysis of single-cell RNA-Seq data. This mainly involves refinement of cell type inference from single cell RNA-seq data. As the technology used to generate this exciting new data type is rapidly evolving, we are spending a large amount of time keeping up with new data sets and computational methods. C.3.  Identify  technologies  or  techniques  that  have  resulted  from  the  research  activities.  Describe   the  technologies  or  techniques  and  how  they  are  being  shared.   An online, interactive viewer of the Active Interaction Mapping procedure and its application to yeast autophagy at http://atgo.ucsd.edu. A data-driven gene ontology in human. We will also make this ontology available through an online, interactive viewer. Parallelized ontology construction. We will make code available on GitHub once completed.
  • 23.
    CSP-­‐Compilation   B.2  What  was  accomplished  under  these  goals?   Sander  Group   In this last period the Sander lab has focused on a number of collaborations as part of NRNB (see CSP table below). The collaborations fall several categories: the expansion of existing pathway database resources, work on cytoscape.js to improve SBGN support, and pathway and network analysis of breast, prostate and pan cancer datasets. In collaboration with Dr. Joan Brugge at HMS, we investigating targeted drug combination therapies for triple negative breast cancer. Triple negative breast cancer continues to see over 150,000 new diagnoses annually, and despite initial responses to the current established chemotherapy protocols, treatment resistance ultimately emerges in the vast majority of cases. The significant patient-to-patient heterogeneity in the genetic alterations of triple negative cancers (TNBC) presents a major challenge for the development of effective targeted therapies. The goal of this proposal is to identify novel drug combinations that will improve outcomes in TNBC patients. We are accumulating and organizing TNBC cell line and patient tumor data from online databases (cBioportal: TCGA and CCLE datasets), including mutation profiles and gene and protein expression levels. In concert, we are collecting and organizing information from TNBC clinical trials, including treatment regiments, objective response rates, and response biomarkers. These clinical data will be integrated with drug response data on TNBC cell lines, including drug sensitivity (IC50) and target pathways (pathway data from Pathway Commons). Using pathway and network analysis on the background of the clinical information, we will identify rational combinations of drugs that will either target multiple proteins in a pathway, multiple pathways, or both of these methods. These combinations will then be tested first in TNBC cell lines and later in patient-derived xenograft models that better represent the heterogeneity of TNBC. In collaboration with Dr. Rileen Sinha at Mount Sinai, we are also developing methods for calculating similarity between cancer samples. Cell lines derived from human tumors are often used in pre-clinical cancer research, but some cell lines may be too different from tumors to be good models. Genomic and molecular profiles can be used to guide the choice of cell line suitable for particular investigations, but not all features may be equally relevant, so any resulting methodology should take this concern into account. Understanding the similarity between cancer samples is not limited to the comparison between cell lines and patient samples. Particular projects may require comparison within different sets of patient or cell line samples. For these projects, a generic approach using genomic and molecular profiles would be useful. TumorComparer is a computational method and web service developed for comparing cell lines and tumors with the flexibility to place a higher weight on functional alterations of interest. The first application of TumorComparer was used to compare 260 cell lines and 1914 tumors of six cancer types from TCGA, using weights emphasizing recurrent genomic alterations. These cell lines were ranked by their similarity to tumors and identify apparently unsuitable outlier cell lines, including some that are widely used. Method for developing patient similarity network: Given discretized data representing genomic alterations (i.e. mutations and copy number alterations). Samples are represented by feature vectors and and a weight vector for feature weights. Their weighted similarity is calculated using weighted asymmetric matching, which measures the similarity between two samples after discarding the 0-0 matches (hence “asymmetric”). The similarity is calculated as the ratio of the intersection to the union of the subsets of features for which the two samples have non-zero values. The weighted similarity method in the future may be useful to assess genomic-molecular patient profiles for personalized choice of clinical trials or therapy in the construction of a patient similarity network. For the construction of a patient similarity network, we plan to use pairwise weighted similarities that instantiate a particular focus of interest, e.g., therapy relevant genetic alterations. Bader  Group   The Bader group engages in an impressive number of NRNB related collaborations each year (see CSP table below). The CSP work over the past year includes a cell-cell communication project and pathway analysis of single cell data. NRNB tools are underlined in the highlights below:
  • 24.
    Crosstalk between cancerassociated fibroblasts (CAFs) and epithelial cancer cells in Head and Neck cancer. Head and neck squamous cell carcinoma (HNSCC) is the sixth leading cause of cancer-related death worldwide. Recently, the tumor microenvironment has been shown to have a significant impact on disease progression in several tumor types, however, little is known about it in the context of HNSCC. Carcinoma- associated fibroblasts (CAFs) make up a significant proportion of the tumor stroma, where they facilitate tumor cell proliferation and invasion. Elucidating the molecular programs of interaction between these two cell populations will help us to understand how they communicate with each other. Using laser-capture microdissection (LCM) pure populations of tumor cells and CAFs were obtained from patient tissue sections, along with their normal counterparts from matched tumor-free tissue. Laurie Ailles et al. at the Ontario Cancer Institute, used gene expression microarrays to generate transcriptomic profiles of these cell populations. In our collaboration together, we used these data to identify potential molecular interacting partners utilized by tumor and stromal cells to communicate with each other. The cell-cell interaction networks were constructed using the gene lists generated in the transcriptomic analysis. The maps were made in Cytoscape using the tumor specific and the CAF specific genes (FDR < 0.05). The genes in both lists were classified as “ligand” or “receptor” according to Gene Ontology terms – “cytokine activity”, “hormone activity” and “growth factor activity” for ligand and “receptor activity” for receptor (Qiao et al.) iRefIndex (Razick et al.), a database of receptor-ligand interactions annotated based on published studies, was used as a reference to build the interaction maps. Gene-Set Enrichment Analysis (GSEA) was run against the ranked list of genes from top up-regulated in CAF versus tumor to down-regulated using the t-test t values. The NRNB Cytoscape app EnrichmentMap was used to visualize the results of the enrichment analysis (Figure 1). Figure 1. a) Ligand-Receptor network created using Cytoscape. The outer circle represents CAF genes and the inner circle represent tumor genes. b) Cytoscape EnrichmentMap representing enriched gene-sets with FDR <= 0.05 in genes up-regulated in CAFs (red) and genes up-regulated in the tumor samples (blue). Colon  cancer  stem  cell  characterization  and  analysis   Colorectal cancer is the second leading cause of cancer death in the United States. Cancer stem cells are suspected to play a major role in initiation and recurrence in this disease. Dr. Catherine O’Brien’s lab is characterizing colon cancer stem cells to identify sensitive points that can be targeted to kill the cells. This
  • 25.
    project analyzes POP92cells, a patient-derived colon cancer stem cell line. By combining and collapsing the transcriptome data of single cell RNAseq data from a colorectal cancer line into pathway activities, the Bader lab has successfully identified 4 - 6 distinctive populations within the POP92 cells. Through correlation analysis comparing each population with published colorectal CSC datasets, one of the populations has consistently the highest correlation with known CSCs. Further characterization of this potential CSC population has identified the activation of pathways in telomere and mitochondrial function while pathways involving apoptosis, autophagy, differentiation, and development are repressed. Single cell RNA-Seq was performed for 96 Wnt-low and 96 Wnt-high single cells from a colorectal cell line cells (Fluidigm). Alignment of the fastq files was performed using STAR with human genome reference GRCh37 to generate raw counts for gene expression values. Multiple clustering algorithms (W: Ward, K:K- means, N:NMF) were employed to identify distinct populations of single cells. Pathway analysis of gene expression data using the NRNB tool EnrichmentMap was used compare different clusters of cells and to reveal differences in cell cycle and differentiation pathways (Figure 2). Figure 2. a) FACS sorting of single cell populations; b) Clustering results from 3 different methods; c) EnrichmentMap comparing 2 clusters of cells. In this figure, we can see the difference between two groups of potential CSCs. CSC K4 (red nodes) has elevated cell cycle, mitosis, recombination repair, non-recombinational repair, base repair, DNA replication, DNA integrity, and TCF transactivating pathways. CSC K3 (blue nodes) has elevated embryo, endoderm, eye,
  • 26.
    secretion, wnt, inhibitionof apoptosis, extrinsic apoptosis, mesenchymal stem, and lymphocyte differentiation pathways. From these pathways, it appears that CSC K4 is actively dividing with high level of precise DNA replication. CSC K3 appears to be relatively more differentiated but with embryonic and mesenchymal stem pathways activated. Ideker  Group   The table presents projects that are completed as collaborations with various faculty at UCSD and local research institutions. All analyses were completed or are on-going as a part of a recharge fee-for-service. Here is a brief description of three completed projects from the reporting period: 1) Pathway analysis of RNA Sequencing (Elisabeth Mertsching, Atyr Pharma) Results: Human T cells were stimulated in presence of our test article (TA) or vehicle. After 24 hours, cells were collected, RNA isolated and sent to GeneWiz for RNA sequencing. Results were analyzed with Limma and a list of genes differentially expressed was established. Goals: Using this list, identify upstream targets of the TA. Determine which pathway or pathways are modulated by our TA and identify proteins modulating expression of genes affected by the TA. Status reports: 04/01/2016: Pathway analysis delivered via Google Drive. 04/09/2016: Responded to two emailed questions from customer re Cytoscape usage; delivered pdf of additional Cytoscape guidance. 04/11/2016: Responded to two emailed questions from customer re Cytoscape usage, provided guidance on GeneMANIA. 04/19/2016: Delivered signed report pdf and additional guidance on GeneMANIA. Deliverables: Detailed report pdf, colocalization methods pdf, cleaned differential expression csv with ensembl and entrez gene ids, differential expression csv for all unique entrez gene ids, toppgene enrichment report pdf, cytoscape network session, csv of heat propagation results. Guidance on how to use Cytoscape and GeneMANIA.   2) NGS data integration and network analysis (Sanjay Nigam). Objective: Drug transporter networks and data integration Updates: 11/21: Sent final deliverables: 1. Updated report of analysis, including addition of 4 control TF's/genes (Hoxb7, Sall1, Pax6, Ret) to the network proximity analysis. 2. Excel file containing information about the Hnf4a subnetwork at each stage of development, including log2 fold change from E20, community membership and associated GO term. 3. Integrated systems bio figure, with real miRNA/mRNA expression data replacing simulated data from last week.   3) HD Network analysis (Vivian Hook). Objective: Mutant Huntington Protein Interactions Networks in Animal Models of Huntington’s Disease. Integrate Htt interactor proteins in non-human animals with existing knowledge of protein-protein interactions in the literature. Perform network analysis of these Htt-interactor proteins overlaid on literature PPI networks. Identify groups of highly connected genes related to Htt, using network propagation techniques and clustering methods. Pathway analysis of these highly connected genes, particularly in pathways of interest, including mechanisms of cell death, intracellular trafficking and transport, synaptic cell-cell communication and interactions, energy metabolism, cell viability. Build wild-type and mutant Htt protein interaction networks, compiled from literature, and evaluate differences in connectivity and network structure, using clustering and network propagation methods. Pico  Group   Since the renewal, the Pico group has initiated many new NRNB collaborations as a service model through Gladstone’s Bioinformatics core (see CSP table below). Many of these collaborations fall into the standard category of assisting with network and pathway analysis, but over a wide range of compelling topics. For example, in collaboration with Drs. Gan and Akassoglou at Gladstone and UCSF, we identified interactions networks involving drugs and proteins relating to the role of innate immune response and oxidative stress in models of Alzheimer’s disease. Also with Dr. Akassoglou, we modeled novel pathways for neuroinflammation implicated in Multiple Sclerosis. With members of Dr. Deepak Srivastava’s group, we applied NRNB tools to help characterize alternative protocols for cardiac reprogramming, an essential technology for stem cell therapies in cardiac related diseases and conditions. And in a collaboration with Dr. Sonja Schrepfer of UCSF, we provided functional enrichment analysis and pathway visualization toward the study of the effects
  • 27.
    microgravity on cardiactissue expression. The microgravity environment, currently relevant to astronaut health, was provided by the International Space Station for multiple rounds of transported mice. In addition to service-based collaborations, we managed the code development collaborations between ~45 mentors and ~18 students during this reporting period through the Google Summer of Code and NRNB Academy programs, combined. This work entails recruiting mentors, marshaling project ideas into the NRNB GitHub tracker, preparing the mentoring organization application to Google, selecting student-mentor pairs, and guiding mentors and students through a successful project, including formal evaluations and incorporating new open source code repositories into https://github.com/nrnb/. CSP  Table  for  Reporting  Period   Total of 93 new and ongoing project during this reporting period. In yellow are the 41 projects that concluded during this period. Collaborating Investigator Investigator Institution Project Title Resource Personnel Start / Finish date External funding status Publications John Kelsoe University of California, San Diego RNAseq and network analysis Aaron Chang 2017- NIH UL1TR001442 of CTSA Vivian Hook University of California, San Diego Mutant Huntingtin Protein Interactions Networks in Animal Models of Huntington’s Disease Aaron Chang 2017- NIH UL1TR001442 of CTSA Catherine O’Brien University Health Network, Toronto, Canada Pathway and Network analysis of single cell RNAseq data from Colorectal Cancer line using Cytoscape/ EnrichmentMap Gary Bader/ Veronique Voisin 2016- Cindy Guidos Hospital for Sick Children, Toronto, Canada Pathway and Network analysis of RNAseq data using Cytoscape/ EnrichmentMap: comparing different clinical subgroups of B-ALL Gary Bader/ Veronique Voisin 2016- David Jimenez- Morales (Nevan Krogan) UCSF Network visualization support and technology development Alex Pico, Adam Treister 2016- Derek van der Kooy University of Toronto, Toronto, Canada Pathway and Network analysis of single cell RNAseq data from Retina Photoreceptors from Stem Cells using Cytoscape/ EnrichmentMap Gary Bader/ Veronique Voisin 2016- Faten Sayed (Li Gan) Gladstone Institutes Activity of Trem2 knockouts Alex Pico, Kristina Hanspers 2016-
  • 28.
    Gelareh Zadeh; Ken Aldape University Health Network, Toronto, Canada Pathway andNetwork analysis of meningioma RNAseq data using Cytoscape/ EnrichmentMap. Gary Bader/ Veronique Voisin 2016- Gordon Keller Princess Margaret Cancer Centre, Toronto, Canada Pathway and Network analysis of RNAseq data using Cytoscape/ EnrichmentMap: comparison endoderm differentiation from embryonic stem cells. Gary Bader/ Veronique Voisin 2016- Joan Brugge HMS Ludwig Center Pathway Analysis in Triple Negative Breast Cancer Chris Sander 2016- NCI John Dick Ontario Cancer Institute, Toronto, Canada Pathway and Network analysis of proteomics data using Cytoscape/ EnrichmentMap: comparison between mirs co- overexpression (VOD) and normal cord blood samples. Gary Bader/ Veronique Voisin 2016- Kristin Hope McMaster University, Hamilton, Canada Pathway and Network analysis of RNAseq data using Cytoscape/ EnrichmentMap: comparison between PLAG1 overepxression and normal cord blood samples. Gary Bader/ Veronique Voisin 2016- Lihong Zhan (Li Gan) Gladstone Institutes Astrocyte response to microglia depletion by drug treatment Alex Pico, Kristina Hanspers 2016- Meghana Gadgil UCSF Metabolic pathways in models of type 2 diabetes Alex Pico 2016- Nikolaus Schultz Memorial Sloan- Kettering Cancer Center TCGA PanCanAtlas: Pathways Group Augustin Luna 2016- TCGA, NCI Sanjay Nigam University of California, San Diego Drug transporter networks and data integration Aaron Chang 2016- NIH 1U54HD09025 9 Sonja Schrepfer UCSF Effects of microgravity on cardiac tissue expression Alex Pico 2016- Arman Aksoy Memorial Sloan- Kettering Cancer Center Develop Pathway Database Converters for the Expansion of the Pathway Commons Database Augustin Luna 2015-
  • 29.
    Ben Good, PhD Scripps Research Instittute Playmatics portalfor science games Alex Pico 2015- Charles Perou University of North Carolina Pathway and Network Analysis of Breast Cancer Giovanni Ciriello 2015- George Chacko, Jim Onken NIH Office of Data Analysis Tools and Systems Network based grantee portfolio analysis Alex Pico 2015- John Dick Ontario Cancer Institute, Toronto , Canada Protein expression in CD34+ cells overexpressing mir125a Gary Bader/ Veronique Voisin 2015- John DIck Ontario Cancer Institute, Toronto , Canada Processing RNAseq data from cord blood samples corresponding to the whole hematopoietic hierarchy Gary Bader/ Veronique Voisin 2015- John DIck Ontario Cancer Institute, Toronto , Canada Energy metabolism in normal and malignant hematopoietic stem cells Gary Bader/ Veronique Voisin 2015- John DIck Ontario Cancer Institute, Toronto, Canada Clustering and pathway analysis of patients with relapsed AML Gary Bader/ Veronique Voisin 2015- John DIck/ Jean Wang Ontario Cancer Institute, Toronto , Canada analysis of the AML subpopulation using CD200 Gary Bader/ Veronique Voisin 2015- John DIck/ Jean Wang Ontario Cancer Institute, Toronto , Canada analysis of the AML subpopulation using CD200 Gary Bader/ Veronique Voisin 2015- Laurie Ailles Ontario Cancer Institute (OCI), Toronto, Canada Standard training in one-on- one session for gene-set enrichment using Cytoscape/EnrichmentMap , ovarian cancer CAF vs NAF Gary Bader/ Veronique Voisin 2015- Massimo Loda Dana-Farber Cancer Institute Comprehensive Genomic Analysis/Metabolic of Prostate Adenocarcinoma Ed Reznik 2015-
  • 30.
    Metin Can Siper (Uğur Doğrusöz, OnurSümer) Bilkent University Computer Science Department, Ankara, Turkey Improving Cytoscape.js Based Viewer for SBGN Process Description Diagrams with Better Layout and Advanced Complexity Management Operations Augustin Luna 2015- Patricia Defechereux Gladstone Institutes Pathway and Network Analysis of HIV Groups Alex Pico 2015- Robert Rottapel University of Toronto, Canada Pathways and processes active in Ovarian Serous Cancer Gary Bader/ Veronique Voisin 2015- Ruedi Aebersold ETH Zurich Pathway and Network Analysis of Prostate Cancer Alex Root 2015- MSKCC Sheila Singh McMaster Stem Cell and Cancer Research Institute, Canada GBM CD133+/- vs NSC CD133 +/- Gary Bader/ Veronique Voisin 2015- Sheila Singh McMaster Stem Cell and Cancer Research Institute, Canada Bmi1 knockdown effects Gary Bader/ Veronique Voisin 2015- Jean Wang, John E. Dick Ontario Cancer Institute Large scale analysis of stem cell enriched fraction of adult acute myeloid leukemias Gary Bader, Veronique Voisin 2013- Ontario Institute for Cancer Research Benjamin A. Alman SickKids Network Analysis on Stem cells from musculoskeletal tumors Veronique Voisin, Gary Bader 2012- Ontario Institute for Cancer Research Charles Sawyers MSKCC Pathway and Network Analysis of Prostate Cancer Chris Sander, Debbie Bemis, Alex Root 2012- NCI, NIH Mark Ginsberg UCSD Composition of the Integrin Activation Complex Trey Ideker 2012- NIH Mathew Meyerson Harvard University Pathway and Network Analysis of Lung Cancer Chris Sander, Debbie Bemis 2012- TCGA Genome Data Analysis Center, NCI 22960745 Stephen Friend Sage Bionetworks Integrating Cancer Datasets for Predictive Model Development and Training, Rheumatoid arthritis treatment prediction Trey Ideker Gary Bader Chris Sander Alex Pico 2012- NCI U54 CA149237 23671412, 23177740, 22836096, 21390021
  • 31.
    Steven Kay UCSDCell-autonomous circadian clock of hepatocytes drives rhythms in transcription and polyamine synthesis Trey Ideker 2012- NIH William Stephen Hancock Northeastern University ERBB2 driven cancer Trey Ideker 2012- Multiple 23647160 Andrew Emili University of Toronto, Canada Mechanistic investigation of microRNA-mediated regulation of dilated cardiomyopathy Gary Bader 2011- Jianfeng Li (Robert W. Sobol) University of Pittsburgh Cancer Institute, Hillman Cancer Center, University of Pittsburgh DNA Repair dependent transcriptome reprogramming & investing synthetic lethal interactions with DNA repair genes Trey Ideker 2011- NIH Marc Vidal Dana Farber Mapping the human interactome and its rewiring by disease mutations Gary Bader 2011- NHGRI P50 HG004233, NHGRI U01 HG001715, NIGMS R01 GM109199 23549480, 19841731 Mike Cherry, Judith Blake Stanford, Jackson Labs Gene Ontology Consortium, Saccharomyces Genome Database Trey Ideker, Gary Bader 2011- NHGRI P50 HG004233, NHGRI U01 HG001715, NIGMS R01 GM109199 23242164 Peter Zandstra University of Toronto, Canada Mapping and analyzing cell- cell interactions in the hematopoietic system Gary Bader 2011- Quaid Morris University of Toronto, Canada Development of GeneMANIA gene function prediction software Gary Bader 2011- Genome Canada 23794635 Sheila Singh Stem Cell and Cancer Research Institute (SCC- RI) at McMaster University Characterization of the Heterogeneity of Human BTICs Gary Bader 2011- Ontario Institute for Cancer Stem Cell Research Ruedi Aebersold ETH Analysis of differential genetic networks Trey Ideker 2010- Swiss Federal 21127252 (Science, 176) Katerina Akassoglou Gladstone Institutes Pathway modeling for neuroinflammation model of Multiple Sclerosis Alex Pico, Kristina Hanspers 2016- 2016
  • 32.
    David Gordon (Nevan Krogan) UCSF HIVhost-pathogen genetic interaction networks Alex Pico, Scooter Morris 2016- 2016 Nicole Stone (Deepak Srivastava) Gladstone Institutes Network and pathway analysis of cardiac reprogramming Alex Pico, Kristina Hanspers 2016- 2016 Gan Li Gladstone Institutes Network analysis of perturbed cell model for Alzheimer’s disease Alex Pico 2016- 2016 Katerina Akassoglou Gladstone Institutes Treatment effects on Alzheimer’s disease networks and TYROBP pathway Alex Pico, Kristina Hanspers 2016- 2016 Elizabeth Mertsching aTyr Pharma Pathway analysis of RNA sequencing Aaron Chang 2016- 2016 Commercial sponsor Devesh Khandelwal University of Delhi, India SBGN-ML and SBML to Escher converter Zachary King, Alex Pico 2016- 2016 Google Supun Arunoda University of Moratuwa, Sri Lanka PCA and t-DSNE in clusterMaker2 Scooter Morris 2016- 2016 Google Istemi Bahceci Bilkent University, Turkey Visualizing genomic alterations in TCGA cancer pathways in cBioPortal Ugur Dogrusoz, Chris Sander 2016- 2016 Google Metin Can Siper Bilkent University Computer Science Department, Ankara, Turkey Improving Cytoscape.js Based Viewer for SBGN Process Description Diagrams with Better Layout and Advanced Complexity Management Operations Augustin Luna, Uğur Doğrusöz, Onur Sümer 2016- 2016 Google Julia Gustavsen University of British Columbia RCy3 for network manipulation using Cytoscape Augustin Luna 2016- 2016 Google Ivan Bestvina University of Zagreb, Croatia Multithread Centiscape Giovanni Scardoni, Alex Pico 2016- 2016 Google Hovakim Grabski Russian Armenian University, Armenia Deviser for SBML libraries Frank Bergmann, Alex Pico 2016- 2016 Google Kaito Ii Keio University Graduate School of Science and Technology, Japan Interconvertable layout program for CellDesigner Akira Funahashi, Alex Pico 2016- 2016 Google Mridul Seth Birla Institute of Technology and Sciences, India Cytoscape file import into GraphSpace TM Murali, Alex Pico 2016- 2016 Google
  • 33.
    Roman Schulte Eberhard Karls University, Germany JSBML validationsystem Andreas Drager, Alex Pico 2016- 2016 Google Ashish Tiwari Arizona State University Cytoscape command line scripting enhancements Scooter Morris 2016- 2016 Google Tramy Nguyen University of Utah SBML and BioPAX coversions Mike Hucka, Alex Pico 2016- 2016 Google Joseph Stahl Vanderbilt University Cytoscape.js interactive tutorials Max Franz 2016- 2016 Google William Miles McMaster University, Canada TOR-IBIN web interface development Mohammed Helmy 2016- 2016 Google Zhaoyuan Zoe Xi UCLA Cytoscape.js clustering algorithms Mike Kucera 2016- 2016 Google Michael Rosenberg Agilent Technologies Pathway analysis of human toxome Alex Pico 2016- 2016 Alberto Ocaña Albacete University Hospital, Albacete, Spain Identification and optimization of targeted drug combinations in breast cancer Gary Bader/ Veronique Voisin 2015- 2016 26314846 Chi-Hua Chen University of California, San Diego Barabasi disease-disease interactome analysis for schizophrenia vs bipolar GWAS genes Aaron Chang 2015- 2016 NIH 1 R01MH100351 Danielle Swany (Krogan) UCSF Cytoscape and Mass Spec Workshops Alex Pico 2015- 2016
  • 34.
    Douglas Levine Memorial Sloan- Kettering Cancer Center Pathway andNetwork Analysis of Endometrial Cancer Jianjiong Gao 2015- 2016 TCGA, NCI 23636398 Jill Mesirov Broad Institute GenomeSpace, Broad Integrative Genomics Viewer Jianjiong Gao 2015- 2016 John DIck Ontario Cancer Institute, Toronto, Canada ITGb7+ and ITGb7– hematopoietic stem cells. Gary Bader/ Veronique Voisin 2015- 2016 Kristin Hope McMaster University, Hamilton, Canada Pathway and Network analysis of RNAseq data using Cytoscape/ EnrichmentMap: comparison between MSI2 overepxression and normal cord blood samples. Gary Bader/ Veronique Voisin 2015- 2016 27121842 Peter Dirks Hospital for Sick Children, Toronto, Canada Standard training in one-on- one session for gene-set enrichment using Cytoscape/EnrichmentMap , ASCL1 knockout in glioblastoma Gary Bader/ Veronique Voisin 2015- 2016 Sandy Williams, PhD Gladstone Institutes Co-author networks and metrics Alex Pico 2015- 2016 Cynthia Guidos Hospital for Sick Children, Toronto, Canada IL-7 coordinates proliferation, differentiation and Tcra recombination during thymocyte β-selection. Gary Bader/ Veronique Voisin 2015- 2016 25729925 John DIck Ontario Cancer Institute, Toronto, Canada assessing expression levels of erUPR and translation initiation genes in normal and leukemic stem cells Gary Bader/ Veronique Voisin 2015- 2016 John Dick Ontario Cancer Institute, Toronto, Canada Normal and cancer hematopoietic stem cells: miR-126 Gary Bader/ Veronique Voisin 2015- 2016 27300437, 27070706 John DIck Ontario Cancer Institute, Toronto, Canada Pathways and Processes active in leukemic stem cells(LSC) of Acute Myeloid Leukemia(AML) using label free protein mass spectrometry Gary Bader/ Veronique Voisin 2015- 2016
  • 35.
    John Dick Ontario Cancer Institute, Toronto, Canada Protein expression in CD34+ cells overexpressing mir125a Gary Bader/ Veronique Voisin 2015- 2016 27424784 Laurie Ailles Ontario Cancer Institute (OCI), Toronto, Canada Crosstalk between CAFs (cancer associated fibroblasts) and epithelial cancer cells in the Head and Neck cancer. Gary Bader/ Veronique Voisin 2015- 2016 Theodore J. Brown Lunenfeld- Tanenbaum Research Institute, Toronto, Canada Function of the fallopian tube and ovulation in the predisposition to ovarian cancer Gary Bader/ Veronique Voisin 2015- 2016 26039994 Aaron D Schimmer The Princess Margaret Hospital, The Ontario Cancer Institute, University Health Network Metabolic adaptation to chronic inhibition of mitochondrial protein synthesis in acute myeloid leukemia cells. Gary Bader, Veronique Voisin 2013- 2016 NIH 23520503 Claudia C Dos Santos Keenan Research Centre of the Li Ka Shing Knowledge Institute of St. Michael's Hospital Acute lung injury Gary Bader, Veronique Voisin 2013- 2016 Canadian Institutes of Health Research Jaime O. Claudio, Jean Wang, John E. Dick Ontario Cancer Institute Development of Highly Active Anti-Leukemia Stem Cell Therapy Gary Bader, Veronique Voisin 2013- 2016 Ontario Institute for Cancer Research Jayne Danska SickKids microbiome and alterations in gene expression Gary Bader, Veronique Voisin 2013- 2016 Canadian Institutes for Health Research, Juvenile Diabetes Research Foundation Nadeem Moghal Ontario Cancer Institute Stem/progenitor cell biology in human lung. Gary Bader, Veronique Voisin 2013- 2016 Canadian Institutes of Health Research
  • 36.
    Peter Dirks Hospitalfor Sick Children Isolation and characterization of a cancer stem cell from human brain tumours. Gary Bader, Veronique Voisin 2013- 2016 Ontario Institute for Cancer Research 25561528 Peter Dirks Hospital for Sick Children Pathway and Network analysis of RNAseq data using Cytoscape/ EnrichmentMap: Role of dopamine D4 receptor in glioblastoma stem cells Gary Bader, Veronique Voisin 2013- 2016 Ontario Institute for Cancer Research 27300435 Charles Perou University of North Carolina Pathway and Network Analysis of Breast Cancer Chris Sander 2012- 2016 TCGA Genome Data Analysis Center, NCI 23000897, 24096568, 26451490 Mathew Meyerson Harvard University Pathway and Network Analysis of Lung Cancer Chris Sander, Debbie Bemis 2012- 2016 TCGA Genome Data Analysis Center, NCI 22960745 Eldad Zacksenhaus Toronto General Research Institute, Canada Breast cancer Gary Bader 2011- 2016 Ontario Institute for Cancer Stem Cell Research 22460789; 25330770; 27571409 Jayne Danska, Cynthia Guidos University of Toronto, Canada Pathway and network analysis of mouse models of leukemia Daniele Merico, Gary Bader 2011- 2016 Ontario Institute for Cancer Research John Dick Ontario Cancer Institute Normal and cancer hematopoietic stem cells Gary Bader 2011- 2016 Ontario Institute for Cancer Stem Cell Research 23142521 Katherine Siminovitch University of Toronto, Canada Clinical genomics of human genetic diseases Ruth Isserlin, Gary Bader 2011- 2016 Margaret Wrensch University of California San Francisco Genetic and Molecular Epidemiology of Adult Glioma Alexander Pico 2011- 2016 NIH 24908248, 22922872, 23733245, 23361564 Michael Taylor University of Toronto, Canada Pathway and network analysis of pediatric brain tumours Ruth Isserlin, Gary Bader 2011- 2016 Genome Canada 22832581, 21840481, 20393554, 24553142 Nevan Krogan UCSF Evolution of viral-human protein complexes Trey Ideker, Scooter Morris, Alex Pico 2011- 2016 NIAD P01 AI091575, NIGMS R01 GM098101 23273983, 22190034, 23242164, 22681890, 22252388, 21127252
  • 37.
    Anthony Gramolini University of Toronto, Canada Pathway andnetwork analysis of mouse models of heart disease Ruth Isserlin, Gary Bader 2011- 2016 Heart and Stroke Foundation of Ontario 20127684 Igor Jurisica, Lincoln Stein University Health Network Cancer Gene Encyclopaedia (CGEP) Gary Bader 2011- 2016 Ontario Ministry of Research and Innovation Jill Mesirov Broad Institute GenomeSpace, Broad Integrative Genomics Viewer Debbie Bemis, Chris Sander, Trey Ideker 2010- 2016 NHGRI, Starr Consortium 25165537 B.4  What  training  opportunities   The collaborations during this period included many requests to prepare a custom training events and one-on- one sessions. During this reporting period, the Pico, Bader and Ideker groups offered support to local researchers via consulting meetings and one-on-one training sessions with the aim for biologists to learn how to use NRNB tools in their research. For example, how to install Cytoscape on personal computers and navigate through a network as well as training on how to go through the Bader lab enrichment analysis standard pipeline which can be summarized under these steps: 1) run GSEA or g:Profiler or similar gene-set enrichment tools 2) create a network of enriched pathways using Cytoscape/EnrichmentMap 3) perform post- analysis using EnrichmentMap features or GeneMANIA. Additionally, 15 of the collaboration projects listed in B.2 served as intensive training opportunities for students accepted into our NRNB Google Summer of Code program. The students learned not only about NRNB tool development, but also about open source software development with a distributed team. B.6  What  do  you  plan  to  do  for  the  next  reporting  period  to  accomplish  the  goals?   Each group plans to continue its own highly successful collaboration process as well as our coordinated participation as a mentoring organization in Google Summer of Code.
  • 38.
    Infrastructure     B.2  What  was  accomplished  under  these  goals?   Cytoscape  Cyberinfrastructure  (CI)   In our 2016 report, we described the creation of initial technologies needed to create an ecosystem of biologically valuable Internet-based services that exchange network data in a stable, performant, scalable, reusable, recombinable and reliable manner. We described the initial development of: • The CX lossless network transfer format (see section C.3) connecting new CI clients (e.g., Cytoscape and Jupyter apps) and services (e.g., Diffusion) • The cyREST system that exposes Cytoscape functionality to external workflows • The Elsa REST service request router, which enables a service client to wait for results of long-running calculations • cyWidget reusable browser-based application libraries • Future of Publishing initiative that enables journal publishers to make dynamic content available in their articles. In 2016, we created and released Diffusion, which is the first Cytoscape app to demonstrate the integration of basic CI service technologies (e.g., Cytoscap apps, CX, request routing and service deployment). The Diffusion app calls the CI’s new Diffusion service, which uses a heat propagation approach to identify subnetworks worthy of focused study, given a list of nodes in a large network. The Diffusion service executes on Google Cloud servers, thereby enabling authors of Python, R, Java or Javascript-based workflows (e.g., disease gene prioritization in GWAS, protein function prediction and discovery of significantly mutated networks) to avoid reinventing such algorithms or provisioning the substantial computational resources needed for their execution. In the larger picture, the Diffusion service demonstrates how the CI can allow typical biological programmers to dramatically increase the audience for their code and gain access to Internet-scale computational resources. The Cytoscape app’s immediate impact will be to enable a new family of filtering and affinity algorithms that allow Cytoscape users to extract value from large networks. The CI framework on which Diffusion is built leverages modern Kubernetes cluster technology (to augment Elsa, see section C.3), common CX-based message formats, server-based interface stubs, call metering and central logging to enable biological programmers to package algorithmic code as a highly scalable, highly available microservice with access to server- and cluster-class computing resources. We launched cyREST2 as a significant expansion of the highly successful cyREST Cytoscape feature (http://apps.cytoscape.org/apps/cyrest). Based on user feedback and demand, cyREST2 aims to enable both REST calls and scripting calls that mirror all Cytoscape functionality already available to users through the Cytoscape UI. Critically, cyREST2 will enable access to functionality available through Cytoscape apps, including enrichment, clustering, network acquisition, enhanced graphics and graph analysis. We will work with both the Python/Jupyter and R communities to upgrade their cyREST interface support (e.g., http://bioconductor.org/packages/release/bioc/html/RCy3.html). We have built up a collection of cyWidgets (see section C.3) that enable both Cytoscape- and browser- based apps to reuse components that perform high value user-facing tasks, including: NDExValetFinder – a UI that allows the user to explore an NDEx network repository and select networks of interest. NDExStore – a UI that allows a user to annotate and store a network in an NDEx repository. NDExLogin – a UI that enables a user to provide credentials for an NDEx repository. SimpleNetworkViewer – a UI that displays and allows interaction with a network fetched from an NDEx repository.
  • 39.
    cyWidgets are builton the simple and popular Facebook React framework. We demonstrated the cross- platform reusability of cyWidgets by deploying them in both desktop-based Java and browser-based Javascript environments. For Java, we combined NDExValetFinder, NDExStore and NDExLogin with the Electron execution framework (http://electron.atom.io/) to enable Cytoscape to use NDEx as its primary network store. For Javascript, we embedded NDExValetFinder into the next generation of the NDEx home page to upgrade the user experience. Integration of cyWidgets into multiple software platforms has proven to be an inexpensive means to improve the user experience, and has been particularly cost effective because improvements to cyWidgets benefit multiple platforms. We created the deep-cell web app and service in support the Deep Cell phenotype prediction research described in section B.2. The service is deployed in the NRNB Kubernetes cluster as a peer to the Diffusion service, and informs the use of GPU processors for future machine learning services. The web app demonstrates the use of a hierarchy of network displays to explore semantic zooming techniques and to inform the construction of a group of cyWidgets generalized to quickly and economically create novel high dimensional ontological displays that represent service-based biological simulation. This continues work performed for AtgO and NeXO reported in previous years. Finally, we continued our Future of Publishing initiative by enabling quick and trouble free submission of networks to the Elsevier publishing system (via the new ScienceDirect Cytoscape app). We are working with Elsevier to improve outreach, enable submission of interactive networks on most scientific journals, and to tightly integrate with Elsevier’s Pathway Studio product. Elsevier has committed in principle to further streamlining network publishing by enabling submission via the NDEx network repository and displaying these networks using an evolution of the SimpleNetworkViewer cyWidget. Cytoscape  App  Store   The maintenance of the site allows it to host over 308 apps (an 18% increase over last year) developed by 588 different developers around the world and support Cytoscape users downloading an average of 846 apps per day (a 45% increase over the past 12 months). Since our last report, we incrementally improved the App Developer Ladder (http://wiki.cytoscape.org/Cytoscape_3/AppDeveloper/Cytoscape_App_Ladder), which takes a prospective developer step by step through the app development and submission process. We also moved the App Store to a VMware virtual machine hosted in the NRNB Cluster (described below). In section B.6, we propose major changes for the App Store over the next reporting period. NRNB  Cluster   In the last year, we significantly upgraded the NRNB Cluster both inside and outside of the firewall. Inside (for HIPAA loads), we added six Supermicro RM224 2U compute servers each containing 1TB RAM, forty eight 2.1GHz cores (96 threads), 4TB local storage, 10Gb/s network adapters and the Ubuntu 14.04 operating system. We also added a 240TB (raw) high performance GPFS storage server based on redundant Supermicro RM110 1U head nodes, each with 128GB RAM, and redundant Supermicro RM216 2U metadata servers with twelve 800GB SSDs each. Outside of the firewall, we added 3 Supermicro RM216 2U virtual machine servers each containing 256GB, twenty 3.10GHz cores and 10Gb/s network adapters. We also added an additional 10Gb/s 32 port Juniper EX4550 switch as a border router. Finally, we added an additional 360TB (raw) NFS storage server controlled by a Supermicro RM110 1U head node containing 128GB RAM and a 10Gb/s network adapter. All equipment is housed at the San Diego Supercomputer Center and is connected to their high speed backbone. After combining this equipment with cluster described in last year’s report, the NRNB Cluster contains total of 17TB RAM (330% increase), 1720 compute threads (67% increase), and 835TB (raw) useable storage (230% increase). Over the last year, 90% of the cluster nodes have been saturated with NRNB-sponsored jobs 80% of the time. Rany Salem, a new UCSD investigator, contributed an additional 1TB RM224 Supermicro server (96 threads) and 48TB storage in exchange for use of the NRNB cluster resources. We purchased and deployed 9 Intel NUC5i5RYH micro-workstations each containing an Intel i5-5250U 1.6GHz dual core processor, 8GB RAM and 240GB SSD along with a high resolution (2560x1440) monitor.
  • 40.
    They all runthe Ubuntu 16.04 operating system. They complement the eight high performance Dell T5600 workstations and the VMware EXSi v5.5 server farm previously reported. B.5  How  have  results  been  disseminated  to  communities  of  interest?   Dissemination Section B.2 describes the accelerating download trends for Cytoscape Desktop. To measure the dissemination of Cytoscape Desktop results, we count citations to Cytoscape Desktop in references available through Google Scholar. As shown below, real Cytoscape usage is climbing, commensurate with downloads. Year Count Year/Year Growth 2016 2218 24% 2015 1789 9% 2014 1645 21% 2013 1363 -5% 2012 1435 65% 2011 870 28% 2010 680 15% 2009 590 37% 2008 430 105% 2007 210 91% 2006 110 38% 2005 80 60% 2004 50 Additionally, we have published or participated in a number of papers describing our Cytoscape Desktop advances and how best to leverage them: • Fitts D, Zhang Z, Maher M, Demchak B. dot-app: a Graphviz-Cytoscape conversion plug-in. F1000Res. 2016 Oct 20;5:2543. doi: 10.12688/f1000research.9751.1, eCollection 2016. • Kucera M, Isserlin R, Arkhangorodsky A and Bader GD. AutoAnnotate: A Cytoscape app for summarizing networks with semantic annotations. F1000Research 2016, 5:1717 (doi: 10.12688/f1000research.9090.1) • Morris JH, Vijay D, Federowicz S et al. CyAnimator: Simple Animations of Cytoscape Networks. F1000Research 2015, 4:482 (doi: 10.12688/f1000research.6852.2) • Kofia V, Isserlin R, Buchan AMJ and Bader GD. Social Network: a Cytoscape app for visualizing co- authorship networks. F1000Research 2015, 4:481 (doi: 10.12688/f1000research.6804.3) • Rinnone F, Micale G, Bonnici V et al. NetMatchStar: an enhanced Cytoscape network querying app. F1000Research 2015, 4:479 (doi: 10.12688/f1000research.6656.2) NRNB Cluster results manifest in papers (below) made possible because of access to the cluster. Cluster users in the reporting period include the Trey Ideker Lab, the Hannah Carter Lab, the Nick Schork Lab and the Rany Salem lab. • Yu M, Kramer M, Dutkowski J, Srivas R, Licon K, Kreisberg J, Ng C, Krogan N, Sharan R, Ideker T. Translation of Genotype to Phenotype by a Hierarchy of Cell Subsystems*. Cell Systems. 2016 Feb 24;2(2):77-88. doi: 10.1016/j.cels.2016.02.003. • Jaeger PA, Lucin KM, Britschgi M, Vardarajan B, Huang RP, Kirby ED, Abbey R, Boeve BF, Boxer AL, Farrer LA, Finch N, Graff-Radford NR, Head E, Hoffree M, Huang R, Johns H, Karydas A, Knopman DS, Loboda A, Masliah E, Narasimhan R, Petersen RC, Podtelezhnikov A, Pradhan S, Rademakers R, Sun CH, Younkin SG, Miller BL, Ideker T, Wyss-Coray T. Network-driven plasma proteomics expose
  • 41.
    molecular changes inthe Alzheimer's brain. Mol Neurodegener. 2016 Apr 26;11:31. doi: 10.1186/s13024-016-0095-2. • Srivas R, Shen JP, Yang CC, Sun SM, Li J, Gross AM, Jensen J, Licon K, Bojorquez-Gomez A, Klepper K, Huang J, Pekin D, Xu JL, Yeerna H, Sivaganesh V, Kollenstart L, van Attikum H, Aza-Blanc P, Sobol RW, Ideker T. A Network of Conserved Synthetic Lethal Interactions for Exploration of Precision Cancer Therapy. Molecular Cell. 2016 Jul 19. pii: S1097-2765(16)30280-5. doi: 10.1016/j.molcel.2016.06.022. • Hofree M, Carter H, Kreisberg JF, Bandyopadhyay S, Mischel PS, Friend S, Ideker T. Challenges in identifying cancer genes by analysis of exomesequencing data. Nature Communications. 2016 Jul 15;7:12096. doi: 10.1038/ncomms12096. • Gross AM, Jaeger PA, Kreisberg JF, Licon K, Jepsen KL, Khosroheidari M, Morsey BM, Swindells S, Shen H, Ng CT, Flagg K, Chen D, Zhang K, Fox HS,Ideker T. Methylome-wide Analysis of Chronic HIV Infection Reveals Five-Year Increase in Biological Age and Epigenetic Targeting of HLA. Mol Cell. 2016 Apr 21;62(2):157-68. doi: 10.1016/j.molcel.2016.03.019. • Guo T, Gaykalova DA, Considine M, Wheelan S, Pallavajjala A, Bishop JA, Westra WH, Ideker T, Koch WM, Khan Z, Fertig EJ, Califano JA. Characterization of functionally active gene fusions in human papillomavirus related oropharyngeal squamous cell carcinoma. Int J Cancer. 2016 Jul 15;139(2):373- 82. doi: 10.1002/ijc.30081. Epub 2016 Mar 30. • Liss MA, DeConde R, Caovan D, Hofler J, Gabe M, Palazzi KL, Patel ND, Lee HJ, Ideker T, Van Poppel H, Karow D, Aertsen M, Casola G, Derweesh IH. Parenchymal Volumetric Assessment as a Predictive Tool to Determine Renal Function Benefit of Nephron-Sparing Surgery Compared with Radical Nephrectomy. Journal of Endourology. 2016 Jan;30(1):114-21. doi: 10.1089/end.2015.0411. Epub 2015 Sep 25. • Qu K, Garamszegi S, Wu F, Thorvaldsdottir H, Liefeld T, Ocana M, Borges-Rivera D, Pochet N, Robinson JT, Demchak B, Hull T, Ben-Artzi G, Blankenberg D, Barber GP, Lee BT, Kuhn RM, Nekrutenko A, Segal E, Ideker T, Reich M, Chang HY, Mesirov JP. Integrative genomic analysis by interoperation of bioinformatics tools in GenomeSpace. Nature Methods. 2016 Jan 18. doi: 10.1038/nmeth.3732. • Kramer MH, Farré JC, Mitra K, Yu MK, Ono K, Demchak B, Licon K, Flagg M, Balakrishnan R, Cherry JM, Subramani S, Ideker T. Active Interaction Mapping Reveals the Hierarchical Organization of Autophagy. Mol Cell. 2017 Feb 16;65(4):761-774.e5. doi: 10.1016/j.molcel.2016.12.024. B.6  What  do  you  plan  to  do  for  the  next  reporting  period  to  accomplish  the  goals?     Cytoscape  Cyberinfrastructure  (CI)   With the successful NDEx-centric cyWidget in Cytoscape apps, we will extend the cyWidget family to a group that cooperates to do most of the functions involved in core network analysis workflows, including network visualization, layout, analysis, enrichment calculations, and attribute merging. This will initially power evolutions of the emerging web.cytoscape.org web-based network viewer (and eventual web-based workstation). We will create additional cyWidgets opportunistically reduce the time needed to create high quality web and desktop applications. We will support computational biologists as they create and deploy new computational pipelines, including those relating to specific TRDs and DBPs (e.g., Deep Cell, which calculated phenotypes derived from genotype perturbations). We will develop the NRNB Kubernetes cluster and related infrastructure (e.g., Elastic Search and Central logging) to support robust services deployed on behalf of biologists who author novel and reusable algorithms.
  • 42.
    Finally, we willdeploy an expanded VMware cluster of supporting hundreds of virtual machines, which in turn support multiple bioinformatic services contributed in a manner similar to Cytoscape apps for the Cytoscape desktop. Cytoscape  App  Store   We plan to create a version of the App Store that supports the Cytoscape Cyberinfrastructure (CI) described in the Infrastructure section. Through the CI Store, application programmers will be able to discover the existence, purpose, documentation, and API interface for services available for either immediate use or installation on private servers. We will also develop a Docker-based system to assist CI service developers in packaging and disseminating their services through the CI Store. The system will include documentation standards, testing standards, packaging of service installation files, and submission procedures. NRNB  Cluster     We will expand the cluster to meet the increased demands of the TRDs and DBPs, including processing larger networks, performing deeper inspection, performing differential analysis, and improved classification precision. Specifically, we will: • assess the need for GPU processors in new and existing nodes. • activate three high capacity VMware virtual machine servers to enable the deployment of biological services. While the cluster has proven remarkably reliable, we will improve robustness by adding redundant elements and connections to enable access to storage and networking across single point failures.
  • 44.
    Dissemination     B.2  What  was  accomplished  under  these  goals?     NRNB.org   NRNB.org is the main web site for the National Resource for Network Biology and serves as the primary source of disseminating NRNB resources and associated information. It is constantly updated with information for NRNB collaborators and researchers as well as the larger network biology community. The site includes our project description and annual reports, available tools and resources, links to training materials, programs and events, and instruction in how to collaborate. The front page features a 5-minute promotional video called "What is NRNB?", where NRNB Principal Investigator Trey Ideker, along with co-Investigators, DBPs and CSPs, describe network biology challenges and the impact of the NRNB. This video has been viewed over 15,000 times in the past five years. Traffic to the site has about 840 visits per month. Since the site went live in late 2010, we have had over 84,000 visits. The most visited page on the website is the GSoC page (http://nrnb.org/gsoc.html), closely followed by the Training page (http://nrnb.org/training.html). The GSoC page has information about NRNBs involvement in the Google Summer of Code program, including links to project ideas, information for students and mentors, testimonials from previous participants and full listings of all projects completed as part of GSoC. The Training page has up-to-date information on upcoming training events and also includes a full listing of courses relevant to network biology. The Training page also links to popular training materials for NRNB tools, including links to OpenHelix and Open Tutorial sites. NRNB.org includes a Media section where we provide embedded video, pdfs and slideshows for various NRNB-related tools, our annual reports, and presentations from the Network Biology SIG and App Expo meetings. The NRNB website also includes a section related to Collaborations (http://nrnb.org/outreach.html), with in-depth information on the types of collaboration opportunities available, including Google Summer of Code, NRNB Academy and direct access to our Collaboration Request Form. A complete list of all current NRNB collaborations is also available on the website. Our Tools page (http://nrnb.org/tools.html) presents all tools currently supported by the NRNB, and is also a highly accessed page on the site, 4th after the GSoC, Training and Competitions pages. A dedicated page per tool contains relevant information on usage, user documentation, developer resources and availability. An image gallery serves as a visual introduction to each tool’s capabilities and use. The attentive maintenance and updating of the site helps make NRNB.org the #2 Google search results for "network biology tools", second only to Cytoscape.org, an NRNB supported tool. NRNB.org is the #3 result even when searching for just "network biology". These are global, non-personalized results. Over the past year, traffic to the site averages about 840 visits per month. Since the site went live in late 2010, we have had over 84,000 visits. Cytoscape.org  Website   Since our last report (March 2016), we significantly improved our discourse on Cytoscape history and future directions: http://www.cytoscape.org/roadmap.html. It now more plainly lays out our vision for Themes and Features for future releases, and is laid out to improve communication with users, developers and curious parties. Since January 2016, monthly downloads have grown to an average of 17,000 per month for the year, with Cytoscape v3 accounting for the vast majority of downloads (see below). For v3.4 (the latest version), the highest month saw over 20,000 downloads (up 18% from last year), and the lowest saw 13,500 (up 23% from last year). Since inception in 2002, Cytoscape has been downloaded over 870,000 times (not shown).
  • 45.
    Note that thedownload statistics we present leave out hourly or daily downloads by individual clients. We assume such clients are bots (mainly in China) and do not reflect actual Cytoscape use. However, it’s also possible that these bots store Cytoscape on local and departmental servers that in turn disseminate Cytoscape to far more users. Note that the increasing visitorship to cytoscape.org is mirrored in our realtime measurements showing Cytoscape being started approximately 4,000 times during throughout the world weekdays (up 14% from last year), and over 1,000 times during the weekends and holidays. As shown below. Note that the abnormal spike around November 23, 2016 is actual usage resulting from Cytoscape’s new role as a service callable by external workflows (e.g., Jupyter) via the cyREST interface. An external workflow 0   5000   10000   15000   20000   25000   Feb-­‐14   Apr-­‐14   Jun-­‐14   Aug-­‐14   Oct-­‐14   Dec-­‐14   Feb-­‐15   Apr-­‐15   Jun-­‐15   Aug-­‐15   Oct-­‐15   Dec-­‐15   Feb-­‐16   Apr-­‐16   Jun-­‐16   Aug-­‐16   Oct-­‐16   Dec-­‐16   Download  Count   Month   Downloads  by  Version  over  Time  (524,006)     Feb  2014  through  Jan  2017   Cyto-­‐2_8_2   Cyto-­‐2_8_3   cytoscape-­‐3.0.2   cytoscape-­‐3.1.0   cytoscape-­‐3.1.1   cytoscape-­‐3.2.0   cytoscape-­‐3.2.1   cytoscape-­‐3.3.0   cytoscape-­‐3.4.0   Total   0   2000   4000   6000   8000   10000   2/9/2014   3/9/2014   4/9/2014   5/9/2014   6/9/2014   7/9/2014   8/9/2014   9/9/2014   10/9/2014   11/9/2014   12/9/2014   1/9/2015   2/9/2015   3/9/2015   4/9/2015   5/9/2015   6/9/2015   7/9/2015   8/9/2015   9/9/2015   10/9/2015   11/9/2015   12/9/2015   1/9/2016   2/9/2016   3/9/2016   4/9/2016   5/9/2016   6/9/2016   7/9/2016   8/9/2016   9/9/2016   10/9/2016   11/9/2016   12/9/2016   1/9/2017   Daily  Executions   Date   Daily  Cytoscape  Executions  (2,389,824  )   (2/9/2014  through  2/7/2017)  
  • 46.
    apparently started afresh copy of Cytoscape for each network, and processed approximately 8,800 networks. We expect similar surges as Cytoscape becomes a common biological network server. Note that in the future, we expect a divergence between actual Cytoscape usage and Cytoscape downloads, as we have changed Cytoscape to download only portions of itself without requiring a full download. We will attempt to account for these partial downloads in the future, though actual executions should continue to represent true Cytoscape usage. Breaking down the Cytoscape versions by operating system reveals more detail (not shown). About 70% of Cytoscape downloads are for Windows, with 32 bit Windows fading in late 2014 (coinciding with Windows 8 shipments). Downloads for Macs rate a distant second (20%), with downloads for Linux being rare (10%). As measured by Google Analytics, visits to the cytoscape.org web site are not accelerating as quickly year over year, as shown below. This matches our expectations for fewer full Cytoscape downloads in favor of automatic partial downloads. Note that troughs in the 2014 record at weeks 8 and 23 are most likely attributed to data loss by Google Analytics. As shown below, visits to cytoscape.org now number almost 1.9M (up 26% from last year) since the site was created in 2012. While most visits to cytoscape.org are from the United States, these visits aren’t in the majority. In fact, the second greatest source of visitors is “all the rest”, indicating that Cytoscape is popular worldwide. 0   2,000   4,000   6,000   8,000   10,000   12,000   0   4   8   12   16   20   24   28   32   36   40   44   48   Count   Week   Cytoscape.org  visits       (2012  -­‐  Feb  2017)   2012  Visits   2013  Visits   2014  Visits   2015  Visits   2016  Visits   2017  Visits  
  • 47.
    While most usersbrowse to cytoscape.org via a Google search (see below), a large number find it through some other means (e.g., via class web sites) or by entering the URL directly. Finally, the frequency of which Cytoscape is cited in papers indexed in Google Scholar is accelerating year over year, as shown below. The citation rate increase between 2015 and 2016 is 24%, an increase from 14% the prior year. United  States   27%   China   8%   India   8%   United  Kingdom   5%   Germany   5%   Japan   4%   France   4%   Canada   3%   Italy   3%   Spain   2%   Other   31%   Cytoscape.org  Sessions  (1,918,073)   (1/1/2012  through  2/7/2017)   google  /  organic   41%   (direct)  /  (none)   17%   google  /  cpc   6%   google.com   5%   cytoscape.org   4%   baidu   2%   apps.cytoscape.org   1%   google.co.in   1%   bing   1%   google.co.uk   1%   other   21%   Cytoscape.org  Referral  Sources  (1,918,073)   (1/1/2012  through  2/7/2017)  
  • 48.
    And the distributionof funding agencies associated with these publications represents the breadth of network biology approaches being applied through Cytoscape usage across major diseases and health initiatives. NIGMS   26%   NCI   19%   NIAID   9%   NHLBI   8%   NIDDK   7%   NCRR   6%   NHGRI   5%   NIEHS   4%   NIMH   4%   NLM   3%   NINDS   3%   NIA   3%   Wellcome  Trust   3%   Top  13  Funding  Agencies  for  Cytoscape   Citations   (8,389  events  since  2004)  
  • 49.
    Cytoscape  App  Store   Since our last report (March 2016), we have not changed the Cytoscape App Store. We continue to promote the App Developer Ladder (http://wiki.cytoscape.org/Cytoscape_3/AppDeveloper/Cytoscape_App_Ladder) as a step by step guide through the app development and submission process. The App Store hosts over 307 apps developed by 674 different developers around the world. Cytoscape users download an average of 850 apps per day over the past 12 months. That has accumulated to just over 760,000 total app downloads since the launch of the App Store. The top 3 downloaded apps, ClueGO, BiNGO and GeneMANIA, have accumulated over 136,000 downloads combined. During the month of January 2017, the site received over 38,000 page views. As shown below, Cytoscape 3 app submissions continue to climb, on track to surpass ~7 years of 2.x plugin submissions in under 5 years. Separate inspection indicates that both new and experienced app developers are submitting apps. The average submission rate remains between 2 and 3 new apps per month.   Similarly, Cytoscape users continue to visit the App Store aggressively. As shown below, there have been 433,950 visits since the store was staged in mid 2012 – an average of 11 visitors per hour, with year-over-year visits continuing to increase. 0   2   4   6   8   10   12   0   20   40   60   80   100   120   140   160   180   7/1/2007   2/1/2008   9/1/2008   4/1/2009   11/1/2009   6/1/2010   1/1/2011   8/1/2011   3/1/2012   10/1/2012   5/1/2013   12/1/2013   7/1/2014   2/1/2015   9/1/2015   4/1/2016   11/1/2016   Monthly  Count   Cumulative  Count   Month   Apps  Checked  into  App  Store   v2  Total   v3  Total   v2  Count   v3  Count  
  • 50.
    As shown below,the proportion of visitors mirrors the visitorship of the cytoscape.org web site itself, with the second largest group being “all the rest”, thereby demonstrating that Cytoscape is popular worldwide. 0   500   1000   1500   2000   2500   3000   3500   0   4   8   12   16   20   24   28   32   36   40   44   48   Count   Week   App  Store  Visits  per  Week  (433,950)   (6/1/2012  through  2/7/2017)   2012   2013   2014   2015   2016   2017  
  • 51.
    The source ofvisitors also mirrors cytoscape.org, with most visitors arriving via Google search, and numerous visitors arriving from unknown (possibly class) links. United  States   25%   China   9%   India   6%   France   6%  United  Kingdom   6%   Germany   6%   Japan   3%   Italy   3%   Canada   3%   South  Korea   3%   Other   30%   App  Store  Visits  (476,043)   (6/1/2012  through  2/7/2017)  
  • 52.
    Tumblr   We useour Cytoscape Publications Tumblr site, http://cytoscape-publications.tumblr.com/, to capture published figures using Cytoscape. An average of 12 publications are featured each month on the front page of cytoscape.org directly from this Tumblr feed. Publications highlighted on Tumblr include Cytoscape App development, cytoscape.js development and application of these two tools in research. Posts are tagged with categories and the name of any apps used to facilitate search and filtering. Links to the relevant App page at the Cytoscape App Store (described in Websites) increases traffic to and usage of the App Store. By specifically highlighting publications that cite Cytoscape, The Tumblr site is thus actively promotes the use and citation of Cytoscape and Cytoscape Apps. In the last year, we have added an “open access publication” tag to relevant posts, highlighting free and open publications that use Cytoscape. A nice overview of the wide range of figures produced by Cytoscape and Cytoscape Apps is available via the archive feature at http://cytoscape- publications.tumblr.com/archive. To highlight a wider range of topics in network biology, our Network Biology Tumblr, http://netbiopub.tumblr.com/, is used to highlight relevant publications. This Tumblr features a variety of publication types, including reviews, new network biology algorithms and methods, tools and application of network biology techniques. The posting frequency has increased in the past year to an average of 9 publications per month, with synchronized posts on the LinkedIn Group for Network Biology (described below) as well. F1000Research:  Cytoscape  App  Channel   The F1000Research Cytoscape App Channel now has a total of 37 peer-reviewed articles, with 12 new articles. The 12 new articles are: • The PathLinker app: Connect the dots in protein interaction networks. Daniel P. Gil, Jeffrey N. Law, T. M. Murali google   31%   cytoscape.org   21%  (direct)   8%   pathwaycommons.o rg   1%   baidu   1%   genemania.org   1%   geneontology.org   1%   bing   1%   wiki.cytoscape.org   0%   opentutorials.cgl.uc sf.edu   0%   other   35%   App  Store  Referral  Sources  (476,043)     (6/1/2012  through  2/7/2017)  
  • 53.
    • dot-app: aGraphviz-Cytoscape conversion plug-in. Braxton Fitts, Ziran Zhang, Massoud Maher, Barry Demchak • Creating, generating and comparing random network models with Network Randomizer. Gabriele Tosadori, Ivan Bestvina, Fausto Spoto, Carlo Laudanna, Giovanni Scardoni • CoNet app: inference of biological association networks using Cytoscape. Karoline Faust, Jeroen Raes • Contextual Hub Analysis Tool (CHAT): A Cytoscape app for identifying contextually relevant hubs in biological networks. Tanja Muetze, Ivan H. Goenawan, Heather L. Wiencko, Manuel Bernal-Llinares, Kenneth Bryan, David J. Lynn • cy3sabiork: A Cytoscape app for visualizing kinetic data from SABIO-RK. Matthias König • AutoAnnotate: A Cytoscape app for summarizing networks with semantic annotations. Mike Kucera, Ruth Isserlin, Arkady Arkhangorodsky, Gary D. Bader • webANIMO: Improving the accessibility of ANIMO. Willem Siers, Michiel Bakker, Bob Rubbens, Ruben Haasjes, Jacco Brandt, Stefano Schivo • SCODE: A Cytoscape app for supervised complex detection in protein-protein interaction graphs. Sarah Mohamed, Nick Janus, Yanjun Qi • Robust de novo pathway enrichment with KeyPathwayMiner 5. Nicolas Alcaraz, Markus List, Martin Dissing-Hansen, Marc Rehmsmeier, Qihua Tan, Jan Mollenhauer, Henrik J. Ditzel, Jan Baumbach • CyLineUp: A Cytoscape app for visualizing data in network small multiples. Maria Cecília D. Costa, Thijs Slijkhuis, Wilco Ligterink, Henk W.M. Hilhorst, Dick de Ridder, Harm Nijveen • Finding the shortest path with PesCa: a tool for network reconstruction. Giovanni Scardoni, Gabriele Tosadori, Sakshi Pratap, Fausto Spoto, Carlo Laudanna
  • 54.
    B.5  How  have  results  been  disseminated  to  communities  of  interest?  (8000  character,  no  pix)   Section B.2 described the accelerating visitorship and download trends for the main customer-facing portals: nrnb.org, cytoscape.org and App Store web sites, in addition to the long list of secondary community sites. B.6  What  do  you  plan  to  do  for  the  next  reporting  period  to  accomplish  the  goals?     Cytoscape  App  Store   We plan to create a version of the App Store that supports the Cytoscape Cyberinfrastructure (CI) described in the Infrastructure section. Through the CI Store, application programmers will be able to discover the existence, purpose, documentation, and API interface for services available for either immediate use or installation on private servers. Cytoscape  CI   We will also develop a Docker-based system to assist CI service developers in packaging and disseminating their services through the CI Store. The system will include documentation standards, testing standards, packaging of service installation files, and submission procedures C.5.b  Resource  Sharing   All cytoscape.org and Cytoscape App Store code and artifacts are open source and free to the public, with title vested in The Cytoscape Consortium 501(c)3 non-profit corporation. As open source projects, we welcome audit or participation by all qualified or interested parties, subject to the terms of our published licenses. We specify the LGPL2.1 license as modified on the Cytoscape web site (http://cytoscape.org/download.php). All Cytoscape related project code is freely available at GitHub: https://github.com/cytoscape/. All NRNB project code is also available at GitHub. We created and maintain almost 100 open source repositories as the NRNB organization (https://github.com/nrnb/). C.3  Technologies  or  Techniques  (2000  characters)   The Cytoscape Cyberinfrastructure is based on a microservice (aka service) variant of Service Oriented Architectures. As described in section B.2, major technological innovations include the CX network interchange format, the cyWidget architecture, and the Kubernetes cluster architecture. Cytoscape itself will call services in the Kubernetes cluster, will exchange networks encoded in CX, and will incorporate user interface elements written as cyWidgets. The  CX  Network  Interchange  Format   CX is an aspect-oriented network interchange format that enables networks to be transmitted between diverse services. It is designed for flexibility, modularity, and extensibility, and as a JSON-based message payload in common REST protocols. It enables applications to standardize on core aspects of networks, coordinate on more specific standards within CX, and to ignore or omit irrelevant aspects. It is not intended as an optimized format for storage or for specific functionality in applications. CX is distinct from other network formats in that it seeks to avoid the gridlock of a standard requiring constant centralized coordination. Aspect-orientation means that different types of information about network elements is separated into independent, composable modules that follow simple guidelines for dependency. This structure makes it easy for a given application or service to make use of relevant aspects (e.g., nodes and edges) while ignoring others (e.g., styling and layout). CX provides straightforward strategies for lossless encoding of semantically complex formats such as OWL, BioPAX, OpenBEL, SGML, or SBGN, while at the same time enabling the expression of simple networks without undue overhead. We have created CX encoding and transmission libraries in Java, Python, and GoLang. The encoding time for a 100K node network was clocked at about 50ms on a modern Mac Pro, thereby reducing concerns that CX
  • 55.
    may be expensiveto encode and decode. Cytoscape Desktop currently exchanges CX-encoded networks with NDEx, and we expect it to call other services written to transact CX. The  cyWidget  System   A cyWidget is a software component that implements some user interface function in a browser-based web application, is easily reusable across new and existing web applications, and may communicate with services to perform backend or expensive computation. A cyWidget is built on the Facebook’s React framework, which is a simple event-driven, message oriented, model-view-controller implemention in Javascript. By construction, a cyWidget exposes a function-specific API connected to code that implements the cyWidget. Several cyWidgets can coexist in the same web application, and are intended to. A cyWidget can keep its own state information or update state common to all cyWidgets, which is then broadcast to all cyWidgets. A simple example of a cyWidgets is a network displayer that calls NDEx to fetch a network and then uses the cytoscape.js drawing library to render the network in a browser frame. The NDEx Valet cyWidget will orchestrate multiple embedded cyWidgets to fetch a network list from NDEx, display network metadata, and allow the end user to choose a network. A more complex cyWidget ecosystem contains a network as its basic state, and then allows the app programmer to add in network, table, style, and other cyWidgets to create a complete web app. A web app programmer incorporating a cyWidget need enter only a few lines of header directives at the beginning of Javascript-enabled web page code. As cyWidgets are self-contained, no particular web app framework is required – apps written using raw Javascript, Angular, and other frameworks can use cyWidgets. A cyWidget author must become familiar with React in order to create a new cyWidget. Given that React is a simple framework with simple constructs, we believe it to be plausible that numerous cyWidget authors may arise besides our own. The qualities of encapsulation, simple widget use, simple widget creation, and non-proprietary ownership drove our decision to use React. We evaluated Angular, Angular 2, Web Components, React, and Aurelia. In an upcoming release, Cytoscape Desktop will include NDEx Valet as its primary user interface to NDEx, and will benefit directly as NDEx Valet is improved to assist users more intelligently. The  Kubernetes  Cluster   Kubernetes (http://kubernetes.io) is an open-source framework for deploying scalable service-oriented infrastructures. As such, it is a middleware layer that intervenes between a REST client (e.g., a Cytoscape or web app) and a service. Given a service (likely written by a computational biologist to expose a valuable calculation), Kubernetes enables multiple clients to call it simultaneously by creating multiple service instances on one or more physical servers, and then matching actual calls to available instances. The multiple instance strategy ensures that To participate in the NRNB Kubernetes cluster, the calculation author must execute a two step process. In step one, the author creates a service by pairing the calculation function with the CI service wrapper, which 1) listens for an HTTP connection, 2) processes/unbundles the HTTP stream, 3) calls the author’s function, and then 4) returns an HTTP stream containing the result. The author can deploy the service on a private workstation or server as-is for debugging. In step two, the author packages the service as a Docker container, registers it with the Kubernetes framework and defines the number of servers on which it should be deployed. Kubernetes deploys the multiple service instances and monitors them so that should a service instance fail, it is replaced with a new one. While Kubernetes addresses instance deployment and load balancing, it does not provide the logging necessary for service debugging and management. We provide the Elastic Stack (formerly ELK, http://www.elastic.co/) to provide log management and visualization. C.5.b  Resource  Sharing  (2000  characters)   All Cytoscape Desktop and Cytoscape Cyberinfrastructure (CI) code and artifacts are open source and free to the public, with title vested in The Cytoscape Consortium 501(c)3 non-profit corporation. As open source projects, we welcome audit or participation by all qualified or interested parties, subject to the terms of our published licenses. For Cytoscape Desktop, we specify the LGPL2.1 license as modified on the Cytoscape
  • 56.
    web site (http://cytoscape.org/download.php).The license terms for the Cytoscape CI infrastructure, we haven’t designated yet, but will likely be LGPL2.1 or MIT. All such code is available in GitHub repositories (i.e., http://github.com/cytoscape, http://github.com/cytoscape-ci, http://github.com/cycomponent and http://github.com/idekerlab. Artifacts such as user documents, tutorials, issue databases, developer guides, and so on are available for reading on Cytoscape wikis (e.g., http://wiki.cytoscape.org) or web sites (e.g., http://cytoscape.org). Networks derived or produced by NRNB activities will be placed in the NDEx network database and will be accessible via the CX, cyWidget, and Kubernetes technologies embedded in Cytoscape Desktop and elsewhere, as described in sections B.2 and C.3. The NRNB Cluster is available to the general research community, and in 2016 the Nick Schork Lab has used approximately 10,000 hours of compute time from the approximately 250,000 hours available in the cluster. The Hannah Carter Lab has used approximately 100,000 hours.
  • 57.
    Training   B.2  What  was  accomplished  under  these  goals?   Workshops  and  lectures  and  courses In addition to the global training support provided by our Training Coordinator, Dr. Morris, we also leverage the fact that we are a multi-site resource and are thus able to host local training events on multiple campuses. We also provide materials, training and advertising for events presented by non-NRNB staff (not listed). The table below lists the events since our last annual report. Additional one-on-one training requests are tracked as services in our CSP report. Event Title NRNB Staff/Site City Year Event Type Cytoscape Workshop 2017 Barry Demchak San Diego, CA 2017 Workshop Systems Pharmacology course Scooter Morris San Francisco, CA 2017 Course RECOMB/ISCB Barry Demchak Phoenix, AZ 2016 Workshop Cytoscape Clinic Barry Demchak San Diego, CA 2016 Workshop NetBio SIG Meeting Alex Pico Orlando, FL 2016 Lecture Medical Biophysics students tech talk: Network and Pathway Analysis with Cytoscape Gary Bader/Veronique Voisin Toronto, Canada 2016 Workshop Introduction to Network Analysis Scooter Morris Lausanne, Switzerland 2016 Workshop Two-day workshop: Visualizing Complex Networks Using Cytoscape (NCI BTEP training series) Scooter Morris Bethesda, Maryland 2016 Workshop Two day workshop on protein- protein interactions Scooter Morris Boulder, Colorado 2016 Workshop Two half-day Cytoscape workshops at the eScience Symposium: Big Data in Precision Medicine Scooter Morris Osense, Denmark 2016 Workshop Two-day graduate course at Eötvös Loránd University on protein-protein interactions Scooter Morris Budapest, Hungary 2016 Course EMBO Practical Course on Computational Analysis of protein-protein interactions Scooter Morris Budapest, Hungary 2016 Course GLBio conference satellite workshop: Network Visualization and Analysis with Cytoscape. Gary Bader/Veronique Voisin Toronto, Canada 2016 Workshop CBW Pathway and Network Analysis of -omics Data Gary Bader/Veronique Voisin Toronto, Canada 2016 Workshop Google  Summer  of  Code   After taking a year off from Google Summer of Code (GSoC) in 2015, and instead running our own summer training program (NRNB Academy Summer Session), we gathered over 50 project ideas and close to 40 mentors for GSoC 2016. We were accepted as a mentoring and had one of our most successful years yet, with all 15 enrolled students completing their projects. New for this year was also the development of a Mentor Resource Packet, a collection of resources designed to help mentors with recruiting students. The packet includes tips on how and where to recruit, as well as ready-to-use slides, flyers and other materials. In addition
  • 58.
    to the technicalaccomplishments and productivity of our students, we are also proud of the many important aspects of diversity our students represent in the GSoC program, including geographical, gender and academic. A few statistics of our diversity is listed in the below table, with overall GSoC numbers in parenthesis: • 9 different countries represented, including 1 (of 2) from Croatia, 1 (of 3) from Armenia and 2 (of 12) from Turkey • 20% female (compared to 12% overall) • Only 67% Computer Science (compared to 78% overall), including PhD students in Biological Oceanography and Medical Biochemistry & Biotechnology, an MS student in Bioinformatics, and a pre- med undergraduate. Our complete 2016 end-of-year report can be found here: http://nrnb.org/gsoc-reports.html. We have received and abundance of testimonials from students and mentors, a subset of which are available on our website: http://nrnb.org/testimonials.html#collab-tab. NRNB  Academy   Our year-round NRNB Academy program continues to attract interested students and mentors and in 2016 we had 3 students enrolled, with 2 of the projects completed (one still active). We have received and abundance of testimonials from students and mentors, a subset of which are available on our website: http://nrnb.org/testimonials.html#collab-tab. Cytoscape  manual     The Cytoscape User Manual is available on the ReadTheDocs.org platform, http://manual.cytoscape.org/en/stable/, and represents a comprehensive source of instructions for users for every aspect and feature of Cytoscape, including more technical aspects such as the API. The manual includes several tutorials and many hands-on examples of use. The manual is updated for every major release of Cytoscape. The most recent updates to the manual were in December of 2016. In an effort to streamline the process of maintaining the manual as well as improving the usability of the manual, we migrated our existing wiki manual to the ReadTheDocs.org platform in April 2016. This system is integrated with GitHub, supports markdown and can be integrated with Google Analytics. OpenTutorials   Open Tutorials (http://opentutorials.cgl.ucsf.edu/index.php/Main_Page) is the main source for tutorial materials for Cytoscape and other NRNB tools, and is being used both internally by presenters, and by researchers and developers. The site now hosts 6 detailed user tutorials and 3 developer tutorials. Traffic to Open Tutorials is consistent, with over 66,000 unique sessions in the last year. Visits are split roughly 60-40 between new and returning visitors, with Cytoscape 3 user tutorials being the most popular pages. Open Tutorials has allowed NRNB to reach our goal of providing tutorial support to a broad and diverse community. In the upcoming year, we plan to move all Cytoscape tutorials to the ReadTheDocs.org platform, where the Cytoscape manual is hosted. This move will facilitate updating and maintenance of tutorial content, as well as making the content more accessible and searchable for users. It also facilitates creation of handout materials for presenters. http://tutorials.cytoscape.org/ B.4  What  training  opportunities   All of the activities reported in this component are providing training opportunities. These are opportunities that in most cases would not exist without NRNB staff and support. Each year we provide 100’s of researchers an
  • 59.
    introduction to networkbiology concepts and Cytoscape usage. We also train dozens of programmers how to write apps for Cytoscape to provide domain-specific functionality to the platform. These programs have been very successful so far. This is evident from the testimonials we collect via survey following each event: http://nrnb.org/testimonials.html#collab-tab. Here are snippets from this year’s students and mentors in our Google Summer of Code and NRNB Academy programs: “The NRNB program is a fantastic opportunity to gain skills and work experience in network biology and app development, at any stage in your academic career. I came in as a graduate student with only a few months of coding experience and now I've released my first application. Exhilarating!” “It has helped improve the software developed by my group. It has also given me experience in mentoring someone long distance.” “Working in an NRNB training program helped to strengthen my resume and introduced me to the idea of combining a career in medicine with computer-based research.” “Great opportunity for developing mentoring and supervising skills as well as get my software tools developed.” “This was my first ever contribution to an open source project and NRNB also. This milestone will shine on my CV forever.” “Great experience interacting with the community and my mentor. I was excited to receive help and encouragement for my project.” “Learned how to work in a collaboration, formulate better questions. Gained especially invaluable knowledge and experience. Improved coding skills. Learned new programs and libraries.” “It broadened my mind to issues still unsolved in the network biology community, and I gained resources and colleagues in the community that I otherwise wouldn't have.” “Personally, I see great value in interacting with smart, young people from all around the world. I am optimistic that participating in NRNB training programs will benefit my own research group by giving it wider exposure and by building a community around the software.” “I am continuing to work for Cytoscape.js and am happy to being staying involved.” “The program has been great experience for my students. They not only learned about open source community driven projects, but the work they did has contributed to their future research.” “The program gave me a chance to work with students in projects of mutual interest and to develop my tools faster and more efficient.” B.6  What  do  you  plan  to  do  for  the  next  reporting  period  to  accomplish  the  goals?   We recently submitted our application for GSoC 2017. If accepted, this should be one of our largest years yet. We have more mentors and more project ideas than prior years and are continuing a more coordinated outreach effort with a Mentor Resource Packet that we will distribute to all NRNB mentors. This resource was developed in 2016, and is meant to help mentors contact and communicate with various student bodies that are likely to have the skill and interest to participate in GSoC 2017.
  • 60.
    Admin   B.2:  What  was  accomplished   Measuring  success:   • 118 publications citing NRNB grant • Over 8000 visits per week to Cytoscape.org • 17,000 downloads per month for Cytoscape • 3700 Cytoscape application launches per day • 38,261 page views in January 2017 for the Cytoscape App Store, and an average of 875 downloads per day among 307 apps. • A total of 18 tools supported by NRNB • 93 new and ongoing collaborations with external investigators on diverse topics • 3 students trained at NRNB Academy last year, 2 completed projects • 15 students trained through Google Summer of Code • 16 NRNB coordinated training events in 10 locations in 6 countries • Over 100 users and dozens of developers trained on Cytoscape by NRNB staff • 66,000 unique sessions at Open Tutorials in the past year, 65% from new visitors • 12 open access Cytoscape app articles edited for F1000Research channel • 500 members in our Network Biology LinkedIn group • 2800 members and over 7000 messages on our Google groups for Cytoscape LinkedIn   We manage a LinkedIn Group for Network Biology to organize events, publications and discussions in the broader scientific community. Nucleated with attendees of the annual NetBio community meetings (ISCB/ECCB), the group now has 500 members. Posts from our Network Biology Tumblr are also promoted here, as are updates on Cytoscape and NetBio SIG meeting news and presentations. In the past year, we have had an average of 3-4 posts per month. It is worth noting that this community interface is independent of Cytoscape context and thus represents an opportunity to engage a more diverse set of researchers: http://www.linkedin.com/groups/Network-Biology- Group-5123610. OpenTutorials   Our tutorial management system, Open Tutorials, was developed in the first year of funding. It is the main source for tutorial materials for NRNB tools, including Cytoscape. Tutorials are actively updated and new content added with each major release of Cytoscape. Open Tutorials has allowed NRNB to reach our goal of providing tutorial support to a broad and diverse community. Currently, the site includes tutorials for users as well as for developers. http://tutorials.cytoscape.org/ Google  Summer  of  Code   One of our most successful training initiatives has been participating in Google’s Summer of Code (GSoC) program (https://developers.google.com/open-source/soc). Each summer, Google sponsors students to work at open source organizations to develop code for open source software projects. The NRNB executive director, Dr. Pico, administers the NRNB effort in this program and has experience as a GSoC org admin going back 4 years prior to NRNB, focusing mainly on Cytoscape- and WikiPathways-related projects. During summer of 2016, the NRNB trained 15 students as part of GSoC. It was one of our most successful years, with all enrolled students completing their projects. After taking a year off from GSoC in 2015, we pulled together over 50 project ideas and dozens of mentors. The projects covered a wide range of topics, including algorithm, UI, importer and converter development for both web and desktop
  • 61.
    for Cytoscape, cytoscape.js,SBML, SBGN, cBioPortal, Cell Designer, GraphSpace and more. New for this year was also the development of a Mentor Resource Packet, a collection of resources designed to help mentors with recruiting students. The packet includes tips on how and where to recruit, as well as ready-to-use slides, flyers and other materials. Our complete 2016 end-of-year report can be found here: http://nrnb.org/gsoc-reports.html. We have received and abundance of testimonials from students and mentors, a subset of which are available on our website: http://nrnb.org/testimonials.html#collab-tab. NRNB  Academy   Our year-round NRNB Academy program was started to offer students training opportunities year-round and to build on the momentum from our Google Summer of Code participation. The program continues to attract interested students and mentors with an application process that utilizes the same web form and tracking infrastructure we built for NRNB Collaborations. The Outreach Coordinator, Ms. Hanspers, acts as the dean of NRNB Academy and reviews each applicant. She identifies the appropriate NRNB staff to serve as a possible mentor and manages the initialization of their mentored project, borrowing from her years of experience as co- admin and co-mentor with GSoC. The NRNB Academy projects include milestones and deadlines, and are expected to wrap up with a completed feature or application. In addition to advancing network biology tools, this program serves as an important training opportunity and drives interest for our GSoC effort and NRNB tool development in general. We have received and abundance of testimonials from students and mentors, a subset of which are available on our website: http://nrnb.org/testimonials.html#collab-tab In the past year, we have mentored 3 students through this program, two of which completed projects in 2016 (one ongoing). One of the completed projects was published on the F1000Research Cytoscape App Channel: https://f1000research.com/articles/5-2524/v1 F1000Research:  Cytoscape  App  Channel   The F1000Research Cytoscape App Channel was started in 2014 with the purpose of highlighting apps with clear use cases for network researchers as well as relevant implementation tips for other app developers. NRNB staff act as guest editors periodically to help shepherd and improve app article submissions from the community. In 2016, 12 Cytoscape app articles were published through this channel, on topics ranging from random network generation, kinetic data visualization, semantic network annotation and network inference. Cytoscape  manual     The Cytoscape User Manual was migrated to the ReadtheDocs platform in March 2016, and is now available at http://manual.cytoscape.org/en/stable/. This documentation system is integrated with GitHub, greatly facilitating updates. It also supports markdown and can be integrated with Google Analytics. The interface automatically generates a PDF upon every update, and also has several benefits in terms of usability. A table of contents and a search feature are included automatically as a side panel, making navigation of the materials easier. The Cytoscape manual represents a comprehensive source of instructions for users for every aspect and feature of Cytoscape, including more technical aspects such as the API. The manual includes several tutorials and many hands-on examples of use. The manual is updated continually and for every major release of Cytoscape, most recently in December 2016, with the next one pending March 2017. Cytoscape  mailing  lists   Members of NRNB staff help monitor and answer all associated Google groups on a weekly basis. This is a major source of interaction with the network biology community and supports a broad range of research and development across many labs. During this reporting period we launched a new group dedicated to app
  • 62.
    developers. This hasbecome a critical channel for communication. We now have ~2800 members for our user and app developer mailing lists. B.5:  How  have  results  been  disseminated     Dissemination of NRNB tools and resources to the biomedical research community happens through a variety of resources: • NRNB resources and information are disseminated through NRNB.org. • The Cytoscape website, http://www.cytoscape.org, is the main source for information on the Cytoscape project and for downloading the tool. • Cytoscape Apps are highlighted, organized and disseminated via the Cytoscape App Store: http://apps.cytoscape.org/ • Two Cytoscape mailing lists, https://groups.google.com/forum/#!forum/cytoscape-app-dev and https://groups.google.com/forum/#!forum/cytoscape-helpdesk, are the main point of contact with users and the app developer community. • Cytoscape user and developer tutorials are continually updated and expanded at Open Tutorials: http://opentutorials.cgl.ucsf.edu/index.php/Main_Page • The Cytoscape user manual is available via ReadTheDocs: http://manual.cytoscape.org/en/stable/ • Publications utilizing Cytoscape or describing new Cytoscape Apps are highlighted on our Cytoscape Publications Tumblr: http://cytoscape-publications.tumblr.com/ • Publications describing methods, resources, tools and research related to network biology are posted on our Network Biology Publications Tumblr: http://netbiopub.tumblr.com/ • Relevant news, articles and events are posted on our LinkedIn Network Biology Group: https://www.linkedin.com/groups/5123610 • Facilitating publication of articles describing network biology related Cytoscape apps at the F1000Research Cytoscape App Channel: http://f1000research.com/channels/cytoscapeapps • Training student programmers in open source development of network biology related tools through GSoC and the NRNB Academy: http://www.nrnb.org/gsoc.html • Organizing and tracking community training events, such as workshops, tutorials, seminars and courses. B.6:  What  you  plan  to  do  next   F1000Research:  Cytoscape  App  Channel   Acting as guest editors, NRNB staff planning to collect a new batch of Cytoscape app articles for our F1000Research channel to be published in July 2016. Open  Tutorials   With a renewed focus on core protocols among common Cytoscape use cases, we plan to produce more complex task-based style of tutorials that describe a specific workflow involving interplay among several tools to answer specific questions in network biology. We plan to migrate our tutorials to a new platform that will better support usability, discoverability and maintainability. For example, we are testing the platform that hosts our manual, ReadTheDocs, to see if the same benefits apply to our tutorials. Cytoscape  Manual   We will continue hosting the Cytoscape manual via ReadTheDocs. This system has proven easy to use not only for users and developers, but also for team members to create and maintain the content. Tumblr  and  LinkedIn     We will continue to post publications to our two Tumblr archives. Relevant posts will also be promoted on the Network Biology LinkedIn group. We will also increase our efforts of reaching out to app developers and network biology researchers through LinkedIn to highlight their articles as well.
  • 63.
    Metrics   We willcontinue to track publications, collaborations, training events, NRNB tool usage, and community engagement, in addition to our own progress on NRNB technology research and development aims.