NetBioSIG2013-Talk Gang Su
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


NetBioSIG2013-Talk Gang Su

Uploaded on

Presentation for Network Biology SIG 2013 by Gang Su, University of Michigan, USA. “CoolMap Cytoscape App: Flexible Multi-scale Heatmap-Driven Molecular Network Exploration”

Presentation for Network Biology SIG 2013 by Gang Su, University of Michigan, USA. “CoolMap Cytoscape App: Flexible Multi-scale Heatmap-Driven Molecular Network Exploration”

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 638 611 9 8 5 4 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. A ‘Cool’ Heatmap: and its Applications in Flexible Multi-scale Molecular Network Exploration Molecular
  • 2.   Behavioral
  • 3.   Neuroscience
  • 4.   Institute
  • 5.    Department
  • 6.   of
  • 7.   Computational
  • 8.   Medicine
  • 9.   and
  • 10.   Bioinformatics
  • 11.    University
  • 12.   of
  • 13.   Michigan,
  • 14.   Ann
  • 15.   Arbor
  • 16.   48109
  • 17.
  • 18.    Gang
  • 19.   Su,
  • 20.   PhD
  • 21.    Network
  • 22.   Biology
  • 23.   Sig
  • 24.   2013
  • 25.    Friday
  • 26.   July
  • 27.   19th,
  • 28.   Berlin,
  • 29.   Germany
  • 30.   
  • 31. Heatmap… What is it? ‘CoolMap.. I, am your father’ ¤  One of the most popular way of visualizing tabular data ¤  X  column, Y  row, value  color ¤  Trees for hierarchical clustering, or groups are often drawn along the sides ¤  Great format for visual exploration and pattern discovery ¤  Used along with node-edge network views such as Cytoscape-clusterExplorer ¤  The paradigm remains largely unchanged The American Statistician, 2009;! PNAS Dec. 8, 1998 Vol. 95 No. 25 14863-14868! Czekanowski (1909)! Brinton (1914)!Loua (1873)! Eisen (1998)! 12k citations!
  • 32. The Good, the Bad, and the Ugly… of the conventional heatmaps ¤  The Good ¤  Mapping number to color makes it intuitive ¤  Clustering patterns become conspicuous and interpretable ¤  The Bad ¤  Increasingly difficult to visualize and explore big datasets ¤  Difficult for data other than numeric ¤  The Ugly ¤  Difficult to incorporate existing annotations such as pathways and ontologies ¤  Difficult to visualize high-level relationships such as overall pathway to pathway correlations The “Figure 1” Phenomena
  • 33. There are known knowns, and there are known unknowns. PLoS Genet. 2008 Mar 14;4(3):e1000034! BMC Bioinformatics. 2011; 12(Suppl 1); 2011! How do we relate the unknown to the known: From observed patterns to existing knowledge interactively and intuitively?
  • 34. The $$$ Solution There are only that many screens you can buy
  • 35. The CoolMap Solution: Nuts and Bolts ¤  Core concept: ‘Collapsible Heatmap’ ¤  The tree nodes can be expanded/collapsed at any level: ¤  Think about a two-way multi tree ¤  Collapsed data are represented using aggregation functions (mean, median, etc.) ¤  The aggregation enables the user to explore data at multiple levels: ¤  Identify potential signals from high level aggregated views ¤  Expand nodes or interest, while keeping the context around ! Using mean to collapse four numeric cells The two way tree can be expanded and collapsed at multiple levels
  • 36. CoolMap: Core Design Concepts ¤  Extensible Interfaces: ¤  A Loader that imports custom data objects into a ‘base’ matrix ¤  An aggregator that transforms a group of ‘base’ data objects into a ‘view’ data object ¤  A render that renders the ‘view’ data object to the designated region in the interactive view Example: ¤  Gene expression values of all genes in pathway A, sample group B, aggregated using median, and rendered in color [0.5, 1, 2.1, 3.2, 4.3]  [2.1]   ¤  Nucleotide sequences belong to the same transcription factor binding sites, aggregated using IUPAC consensus code to a single letter, and rendered in text: [A,A,A,A,T]  [A]  A ¤  The ‘base’ matrix can use a variety of data structures, such as arrays, lists, sparse matrices or even remote services ¤  Flexible Row/Column Ontological Trees: ¤  Multiple-inheritance tree ¤  Genes or metabolites may be shared by multiple pathways or ontological terms, and may occur more than once. ¤  Trees from different sources ¤  Side by side comparison of different ontologies (GO, KEGG, Hierarchical Clustering) ¤  Trees may be used at any level ¤  Tree nodes at any level can be inserted into any place in the tree.
  • 37. Near-ready Releases ¤  CoolMap Core ¤  Core interfaces, data structures and utility functions for base matrix, view matrix, ontology trees, renderers, interactive view panels, etc. ¤  CoolMap Application ¤  An application with auxiliary modules such as dynamic multiple dataset synchronization, searcher, filters, sorters, data persistence etc. ¤  Followed many best practices from Cytoscape ¤  CoolMap Cytoscape Prototype Plugin ¤  A Cytoscape plugin that enables two way communication between Cytoscape and CoolMap Our user classroom user study of a group of undergraduate students with preliminary computer and bioinformatics background shows: 65% found it easy or not difficult to learn 74% highly enjoyed or enjoyed the software
  • 38. Screenshot
  • 39. Case Study 1: Eisen Yeast Data Eisen (1998)! Gene expression fold change of selected gene groups and experiment conditions CoolMap makes it easier to interpret data from the higher concept levels CoolMap!
  • 40. Case Study 1: Eisen Yeast Data (con’t) CoolMap reveals more than meets the eye from conventional heatmaps The peculiar outlier sample of spo5 2 Fold change reversed across many pathways Easier to identify in the aggregated view í
  • 41. Case Study 1: Eisen Yeast Data (con’t) Using CoolMap’s multi-view link functions to compare different ontology definitions Left: Go 6096: Glycolysis Right: Eisen’s annotated Glycolysis cluster Integrate existing knowledge with observed data for hypothesis generation
  • 42. Case Study 2: Diet Induced Differential Gene Expression ¤  Individuals fed on SFA (Saturated Fatty Acid) and Monounsaturated Fatty Acid (MUFA) diets demonstrate differential gene expression over 8 week span ¤  Authors picked a list of immune related genes showed up-regulation of these genes The American journal of clinical nutrition 90, 1656-64 (2009)! CoolMap!
  • 43. Probe level expression profiles can be maintained Case Study 2: Diet Induced Differential Gene Expression (cont’d)
  • 44. Using ontology groups (genders) leads to new discoveries: up-regulated gene groups and gender-specific responses: weaker patterns. Total of 25k probes Case Study 2: Diet Induced Differential Gene Expression (cont’d) Up-regulated clusters Female-specific Male-specific
  • 45. Case Study 3: Mother-Child Nutrition Data (Unpublished) v The aggregated group view makes it much easier to interpret at concept level v We can immediately identify that: §  BCAA AcylCarnitines(0.45), Long Chain AcylCarnitines(0.34), PPARa methylation (0.52), ESR Methylation (0.32) are highly correlated between mother and child Burant C. Unpublished data!
  • 46. Case Study 3: Mother-Child Nutrition Data (Unpublished) PPARa: One Level Down ê ¤  Validation ¤  Boxplot overlay (left) and expanded view (right) shows the high correlation is unlikely to be a result from error, outliers or noise (mean 0.52) ¤  Strong association of PPARa methylation levels in mother and child. ¤  Hypothesis ¤  As PPARa regulates genes involved in cell proliferation, cell differentiation and inflammation responses, the expression profile of these genes may also be correlated in mother and child.! Burant C. Unpublished data!
  • 47. Case Study 3: Mother-Child Nutrition Data (Unpublished) BCAA AcylCarnitines ¤  The Mother-child correlation is lower (mean 0.45) ¤  The BCAA AcylCarnitines intra-child group have a larger variance comparing with Mother ¤  While C3 is highly correlated, C4 has low correlation
  • 48. Case Study 4: DNA Methylation Missing values and ragged data (unpublished) ¤  Sparse or Ragged matrix ¤  Normalized methylation data: every gene has a different number of methylation sites. ¤  Collapsing by cell line (Caski.1 and Caski.2 cell lines) reveals the aggregated (mean, etc.) normalized methylation value. Expansion by cell line reveals details for each methylation site. Sartor M. Unpublished data!
  • 49. Case Study 5: Continuous Glucose Monitoring (CGM) Display glucose level at: •  a variety of time resolutions: From 5 min to 1 month •  and sample groups: age groups, gender Link hypoglycemia events to blood sugar changes.
  • 50. Case Study 6: Sequence Analysis Example ¤  Interactive Consensus sequence exploration: CRP (Catabolite Activator Protein) binding site, 49 sequences in dozens of promoters | Chip-seq ¤  Extend CoolMap: Loader, Aggregator, Renderer [Annotator] Full Sequence View! Sequence Logo! Consensus View! Consensus View with base percentage overlay! Consensus View with GC content overlay! Genome Res. 2004 June; 14(6): 1188-1190!
  • 51. Case Study 7: Network Analysis ¤  Link Cytoscape with CoolMap: ¤  Network node link with CoolMap views, by ID, attribute names, etc. ¤  Explore identified patterns in an experiment to curated networks – an alternative for JTreeView; create correlation matrices from Cytoscape numeric attributes; ¤  Use pathways and ontologies to view sub-network to sub-network connectivity ¤  Cluster network based on attributes, and compare unsupervised clustering v.s. annotated pathways and ontologies. Need two monitors!
  • 52. Case Study 7: Network Analysis (con’t) Top Left: MAPK pathway in ‘galFiltered.cys’ network from Cytoscape Bottom Left: Part of the same network arranged with pathways and the adjacency matrix, and sum as aggregator. Each cell shows the number of edges within each pathway, as well as the number of inter-pathway edges. A good ‘community’ clustering will have most of the green dots along the diagonal Right: The same view with MAPK pathway expanded, showing dense intra-cluster connectivity
  • 53. Case Study 7: Network Analysis (con’t) Left: a correlation matrix can be created from gal expression profiles, and then use pathways to arrange them into a condensed concept correlation view. Hierarchical clustering can be run from the concept level. Right: The selected region contains nodes are annotated with KEGG pathway: Cell cycle and are close to each other in the network
  • 54. Acknowledgement Thank you! Primary Advisor Dr Fan Meng Committee Mentors Dr Brian D. Athey (Co-chair) Dr Charles F. Burant and his lab Dr Barbara Mirel Dr Maureen Sartor Testers Usability testers and software testers, fellow Bioinformatics brethren. Development Please contact me if you are interested in development or testing: