Your SlideShare is downloading. ×
Visualization hang zhong
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Visualization hang zhong

74

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
74
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. 1      Visualization  of  Ciona  Intestinalis  Co-­‐expression  Network  by  Hang  Zhong    A  dissertation  submitted  in  partial  fulfillment  of  the  requirements  for  the  degree  of  Master  of  Science  Department  of  Biology  New  York  University  May,  2012              
  • 2. 2    ACKNOWLEDGEMENTS     I   would   like   to   thank   my   advisor,   Richard   Bonneau,   for  providing  me  the  opportunity  to  participate  in  this  project,   ongoing  guidance   and   support.   I   am   also   indebted   to   professor   Lionel  Christiaen  for  inspiring  the  project.  This  thesis  could  not  have  come  to  fruition  without  the  help  of  Florian  Razy,  who  offered  insightful  and  thought-­‐provoking  input.    I  am  also  everlastingly  grateful  to  Duncan  Penfold-­‐Brown  for  teaching  me  the  programming.  I  would  also  like  to  thank  Kieran  Mace,  Aviv  Madar,  Kevin  Drew,  Maximilian  Haeussler  and  Claudia  Racioppi  who  so  patiently  offer  their  time  and  support.  Many  thanks  to  Todd  Heiniger  and  Joel  Rodriguez  for  revising  the  thesis.    Finally,   I   would   like   to   thank   my   family   for   the   invaluable  support  they  have  given  me  in  the  course  of  my  life  and  studies.                
  • 3. 3    ABSTRACT  The   abnormalities   of   the   heart   development   causes   most  frequent  congenital  diseases  in  humans.  The  conservation  of  the  Gene  Regulatory   Network   (GRN)   involved   in   heart   development,   cellular  simplicity,  low  genetic  redundancy  and  relevant  evolutionary  position  lead   researchers   to   study   the   ascidian   Ciona   intestinalis.   To   extract  useful  information  from  the  Microarray  data  for  researchers  to  infer  the  heart  network  in  Ciona,  this  thesis  not  only  applies  the  standard-­‐based   approaches   to   find   the   differential   expression   genes,   but   also  explores  the  network-­‐based  approaches  to  find  functional  group.  By  visualizing  the  co-­‐expression  network    in  Gaggle,  the  list  of  ASM  and  heart   candidate   genes   are   fine-­‐tuned.   In   addition,   the   modules  containing   candiate   and   known   marker   genes   may   deserve   further  study.  
  • 4. 4      TABLE  OF  CONTENTS  ABSTRACT  ..................................................................................................................................  3  1.   INTRODUCTION  ...............................................................................................................  7  1.1   GENE  REGULATORY  NETWORK  OF  CARDIOGENIC  PRECURSORS  IN  CIONA  ...............................  7  1.2   MICROARRAY  DATA  ANALYSIS  ...............................................................................................  8  1.3   NETWORK  VISUALIZATION  THROUGH  GAGGLE  .......................................................................  9  2.   METHODS  ........................................................................................................................  10  2.1   MICROARRAY  EXPERIMENTAL  DESIGN  ................................................................................  10  2.2   GENE  EXPRESSION  DATA  ....................................................................................................  10  2.2.1   QUALITY  CONTROL  ........................................................................................................................  10  2.2.2   PREPROCESSING  ............................................................................................................................  11  2.3   STATISTICAL  TEST  ..............................................................................................................  11  2.4   CLUSTER  ANALYSIS  ............................................................................................................  11  2.5   FUNCTIONAL  ENRICHMENT  ANALYSIS  ................................................................................  12  2.6   GENERATION  OF  NETWORKS  ..............................................................................................  12  2.6.1   STRING  PROTEIN  NETWORK  ........................................................................................................  12  2.6.2   UNWEIGHTED  CO-­‐EXPRESSION  NETWORK  ................................................................................  13  2.6.3   WEIGHTED  CO-­‐EXPRESSION  NETWORK  .....................................................................................  13  2.7   NETWORK  VISUALIZATION  .................................................................................................  14  2.7.1   FILE  FORMAT  .................................................................................................................................  14  2.7.2   ANALYZING  NETWORK  BY  PLUGIN  IN  CYTOSCAPE  ....................................................................  14  3.   RESULTS  ..........................................................................................................................  15  3.1   DIFFERENTIAL  EXPRESSION  ...............................................................................................  15  3.1.1   EXPECTATION  OF  THE  MICROARRAY  DATA  ................................................................................  15  3.1.2   ASM  AND  HEART  CANDIDATE  GENES  ..........................................................................................  15  3.2   NETWORK  VISUALIZATION  IN  GAGGLE  ...............................................................................  17  
  • 5. 5    3.2.1   NETWORKS  .....................................................................................................................................  17  3.2.2   FINDINGS  FROM  THE  NETWORK  VISUALIZATION  IN  GAGGLE  ..................................................  20  3.2.2.1   GAGGLE  AS  INFORMATION  INTEGRATION  CENTER  ...............................................................  20  3.2.2.2   MODULE  FROM  ALLEGROMCODE  .............................................................................................  21  3.2.2.3   MODULE  FROM  WEIGHTED  NETWORK  ....................................................................................  22  3.2.2.4   FINE-­‐TUNED  LIST  ......................................................................................................................  23  4.   DISCUSSION  ....................................................................................................................  25  4.1        ASM  CANDIDATE  GENES  ......................................................................................................  25  4.2   ANNOTATION  IN  CIONA  INTESTINALIS  ................................................................................  25  4.3   FUNCTIONAL  RIBOSOME  GROUP  AND  COE  ...........................................................................  26  4.4   TIME-­‐SERIES  ......................................................................................................................  27  4.5   LIMITATIONS  OF  THE  CO-­‐EXPRESSION  NETWORK  ...............................................................  28  FIGURES  AND  TABLES  .........................................................................................................  29  FIGURE  1   PIPELINE.  ...................................................................................................................  29  FIGURE  2   NORMALIZED  UNSCALED  STANDARD  ERROR  (NUSE).  .................................................  30  FIGURE  3   HEAT-­‐MAP  OF  ASM  AND  HEART  CANDIDATE  GENES.  ...................................................  30  FIGURE  4   OUTPUT  OF  THE  SHORT  TIME-­‐SERIES  EXPRESSION  MINER.  ........................................  31  FIGURE  5   SELECTING  SOFT  POWER.  ...........................................................................................  31  FIGURE  6   CIONA  INTESTINALIS  WEIGHTED  CO-­‐EXPRESSION  NETWORK.  ....................................  32  FIGURE  7   MODULE  SIGNIFICANCE.  .............................................................................................  33  FIGURE  8   INTRAMODULAR  CONNECTIVITY  AND  MODULE  SIGNIFICANCE.  ...................................  34  FIGURE  9   STRING    PROTEIN  NETWORK.  .....................................................................................  35  FIGURE  10   LABELING  IN  WEIGHTED  NETWORK.  ........................................................................  35  FIGURE  11   THE  1ST  MODULE  INFERRED  BY  ALLEGROMCODE  FOR  UNWEIGHTED  CO-­‐EXPRESSION  NETWORK.   36  FIGURE  12   THE  1ST  MODULE  OF  UNWEIGHTED  CO-­‐EXPRESSION  NETWORK  ENRICHMENT.  .........  37  FIGURE  13   THE  1ST  MODULE  INFERRED  BY  ALLEGROMCODE  FOR  WEIGHTED  CO-­‐EXPRESSION  NETWORK.   37  
  • 6. 6    FIGURE  14   THE  1ST  MODULE  OF  WEIGHTED  NETWORK  ENRICHMENT.  .......................................  37  FIGURE  15   RIBOSOME  GROUP  IN  THE  STRING.  ...........................................................................  38  FIGURE  16   RIBOSOME  GROUP  IN  STRING  NETWORK  ENRICHMENT.  ............................................  38  FIGURE  17   RIBOSOME  GROUP  AND  COE.  ....................................................................................  39  FIGURE  18   GREY  COLOR  GENES.  ................................................................................................  39  FIGURE  19   TAN  MODULE  ...........................................................................................................  40  FIGURE  20   BROWN  MODULE  .....................................................................................................  40  FIGURE  21   TURQUOISE  MODULE  ENRICHMENT.  .........................................................................  41  FIGURE  22   GENES  IN  TURQUOISE  PLUS    STEM  CONDITION.  ........................................................  41  FIGURE  23   GENES  OF  TURQUOISE  PLUS  STEM  CONDITION  ENRICHMENT.  ...................................  42  FIGURE  24   SUB-­‐GROUP  OF  CANDIDATE  GENES  IN  UNWEIGHTED  NETWORK.  ..............................  42  FIGURE  25   SUB-­‐GROUP  OF  CANDIDATE  GENES  IN  UNWEIGHTED  NETWORK  ENRICHMENT.  ........  43  FIGURE  26   ASM  CANDIDATE  GENES  IN  WEIGHTED  NETWORK  ENRICHMENT.  .............................  43  FIGURE  27   ASM  AND  HEART  CANDIDATE  GENES  ........................................................................  44  REFERENCES  ...........................................................................................................................  45          
  • 7. 7      1. INTRODUCTION  1.1  Gene  regulatory  network  of  cardiogenic  precursors  in  Ciona         The   abnormalities   of   the   heart   development   causes   most  frequent  congenital  diseases  in  humans.  The  conservation  of  the  Gene  Regulatory   Network   (GRN)   involved   in   heart   development,   cellular  simplicity,  low  genetic  redundancy  and  relevant  evolutionary  position  lead   researchers   to   study   the   ascidian   Ciona   intestinalis(Davidson  2007).  In  Ciona,  a  single  pair  of  blastomeres  called  B7.5  gives  birth  to  the   anterior   tail   muscle   (ATM)   and   to   the   trunk   ventral   cells   (TVC)  (Figure   27).   Following   migration   from   the   tail,   the   TVC   undergo  asymmetric   cell   divisions   at   the   ventral   midline   of   the   trunk.   The  medial   TVC   give   rise   to   the   heart   while   the   lateral   TVCs   migrate  toward   the   atrial   placode   where   they   will   form   the   atrial   siphon  muscles  (ASM).  Thus,  the  TVC  are  similar  to  the  multipotent  cardio-­‐pharyngeal   progenitors   found   in   vertebrates,   while   ASM   are   likely  equivalent  to  the  jaw  muscle  in  vertebrates.         A   few   years   ago,   the   first   cardiogenic   the   Gene   Regulatory  Network   (GRN)   in   Ciona   was   proposed   (Christiaen,   Davidson   et   al.  2008),  decoupling  genes  necessary  for  heart  specification  from  genes  necessary   for   cell   migration.   Later   study   has   been   shown   that   ASM  precursors   express   the   transcription   factor   COE   (Stolfi,   Gainous   et   al.  
  • 8. 8    2010),   which   is   necessary   and   sufficient   to   specify   ASM   fate.    Misexpression   of   COE   in   the   whole   TVC   lineage   blocks   heart  development   and   imposes   an   ASM   fate   to   all   cells.   Conversely,  misexpression  of  a  constitutive  repressor  form  of  COE  provokes  the  opposite  phenotype,  blocking  ASM  formation  and  causing  all  cells  to  form   heart   tissue.   Using   the   genome-­‐wide   Microarray   analysis   to  study  this  crucial  COE  gene  and  find  the  downstream  effectors  of  COE,  it  is  expected  to  gain  insights  to  the  gene  regulatory  network  of  the  heart.    1.2 Microarray  data  analysis     Most   of   the   existing   studies   have   focused   on   the   differential  expression  to  identify  genes  that  distinguish  different  sets  of  samples.  It’s  quite  common  to  apply  different  testing  method,  such  as  t-­‐test,  F-­‐test,   or   nonparametric   versions   of   the   Wilcoxon   test   to   rank  thousands   of   genes,   and   the   most   significant   genes   are   select  (Gentleman   2005).   Other   specific   statistical   methods   are   also  commonly  used  in  the  Microarray  data  analysis,  such  as  Significance  Analysis   of   Microarray   (SAM)     (Tusher,   Tibshirani   et   al.   2001)   and  LIMMA   (Wettenhall,   Smyth   2004)   using   a   Bayesian   mixture   model.     Another   way   of   using   microarray   data   is   to   understand   an  individual   gene   or   protein’s   network   properties   by   studying   the   co-­‐expression,  where  genes  that  have  similar  expression  patterns  across  a   set   of   samples   are   hypothesized   to   have   a   functional   relationship.  
  • 9. 9    This   co-­‐expression   network-­‐based   approach   is   consistent   with   the  important  concept  that  has  emerged  over  the  past  decade—genes  and  their  protein  products  carry  out  cellular  processes  in  the   context  of  functional   modules   and   are   related   (Barabasi,   Bonabeau   2003,  Barabasi,  Oltvai  2004).  1.3 Network  visualization  through  Gaggle       It  has  been  well  recognized  that  visualization  plays  a  key  role  in  helping   to   understand   biological   systems,   particularly   in   the   era   of  high-­‐throughput   studies   with   a   wealth   of   ‘omics’-­‐scale   data  (Gehlenborg,  ODonoghue  et  al.  2010).  This  thesis  applies  the  simple,  open-­‐source   Java   software   system   Gaggle   (Shannon,   Reiss   et   al.   2006)  for   co-­‐expression   network   visualization.   Gaggle   is   a   cross-­‐platform  system  integrated  with  diverse  databases  (KEGG,  BioCyc,  and  String)  and   software   (Cytoscape,   DataMatrixViewer,   R   statistical  environment,   and   TIGR   Microarray   Expression   Viewer).   With   four  simple  data  types  (names,  matrices,  networks,  and  associative  arrays),  researchers   can   explore   many   different   sources   and   variety   of  software  tools  by  entering  these  information  into  the  Gaggle  Boss  and  transferred  to  other  tools.          
  • 10. 10      2. METHODS      The  pipeline  of  this  thesis  is  in  Figure  1.    2.1  Microarray  experimental  design     The  microarray  data  used  in  this  study  are  kindly  provided  by  Dr.  Lionel  Christiaen.  It  consists  of  30,969  probe  sets  from  Affymetrix  GeneChips.   The   perturbation   group   includes   LacZ   control,   the   over-­‐expression  and  loss  of  function  of  transcription  factor  Collier/EBF/OIf  (COE)   in   the   sorted   TVC   cells   at   21   hours   post   fertilization   (hpf)—after  the  asymmetric  divisions  of  the  TVCs  but  before  completion  of  the  ASM  migration.  Time-­‐series  group  is  comprised  of  11  time  points,  every  2  hours  varying  from  8  to  28  hours  in  TVC  cells.    2.2  Gene  expression  data  2.2.1 Quality  control         This   thesis   applies   the   arrayQualityMetrics   (Kauffmann,  Gentleman  et  al.  2009),   a   Bioconductor   package   for   quality   control.   It  provides   an   HTML   report   with   several   diagnostics   plots.   In   general,  the   array   will   be   discarded   if   it   is   identified   as   an   outlier   in   both  before  and  after  normalization  in  the  report.         The   Microarray   data   firstly   is   imported   in   statistical  programming  language  R,  and  then  carried  on  the  quality  control  by  arrayQualityMetrics.   The   sample   LacZ.3   is   removed   since   it   was  
  • 11. 11    reported  an  outlier  in  both  before  and  after  normalization  (Figure  2).  2.2.2 Preprocessing     The   cell   files   of   the   Microarray   are   normalized   by   the   RMA  method   (Gentleman   2005).   The   expression   matrix   contains   30,969  probes   and   48   arrays.   After   the   non-­‐specific   filtering   by   variance  (IQR=0.5),  the  matrix  contains  15,484  probes,  48  arrays.       Using   the   collapseRows   function   in   WGCNA,   the   probes   with  maximum  variance  are  selected  to  represent  genes.  After  merging  the  probes,  the  merged  matrix  contains  10,079  probes  and  48  arrays.    2.3  Statistical  test     The  merged  matrix  is  ranked  by  moderated  F  test  and  genes  are   selected   with   significant   p-­‐value   (<0.05,   using   Limma   package)  (Smyth   2004)   after   adjusted   by   Benjamini-­‐Hochnerg   method.     After  ranking,  the  top-­‐rank  matrix  contains  4,307  probes  and  48  arrays.       The   top-­‐rank   matrix   is   imported   to   one   of   the   Gaggle   Geese  MultiExperiment   Viewer   (MeV)   and   under   Significant   Analysis   for  Microarrays   (SAM)   test   (COE   versus   COEW   group,   p-­‐value   <   0.05,  1000  permutation,  FDR  =  0.9).    2.4  Cluster  analysis  
  • 12. 12       Hierarchical   clustering   is   performed   for   ASM   and   Heart  candidate   genes   using   MeV,   using   Pearson   correlation   metric   and  average  linkage  clustering.       The  time-­‐series  group  data,  totaling  36  arrays,  are  averaged  for  each  time  point  and  imported  to  Short  Time-­‐series  Expression  Miner  (STEM),  using  STEM  Clustering  Method.  2.5  Functional  enrichment  analysis     Blast2GO   (B2G)     (Conesa,   Gtz   et   al.   2005)   is   a   comprehensive  bioinformatics   tool   for   annotation,   visualization   and   analysis   in  functional   genomics   research.   It   offers   a   suitable   platform   for  functional  research  in  non-­‐model  species,  such  as  Ciona  intestinalis.         DNA   sequences   in   fasta   format   were   loaded   to   Blast2GO.  15,629   genes   remained   in   the   Blast2GO,   followed   by   blasting,   go-­‐mapping  and  yielded  Go-­‐terms  for  3,964  genes.  The  test  group  from  different   lists   is   tested   against   the   reference   group   (3,964   genes)  using  the  Fisher’s  Exact  Test  (p-­‐value  <  0.05,  FDR  correction).    2.6 Generation  of  networks  2.6.1 String  protein  network     Using  the  Ensembl  gene  name  in  this  filt.gene  matrix  as  input,  the  genes  of  interest  in  the  Search  Tool  for  the  Retrieval  of  Interacting  Genes   (STRING)   database   (Szklarczyk,   Franceschini   et   al.   2011)   are  extracted   from   the   STRING   website   in   Text   Summary   format   and  
  • 13. 13    parsed   to   Cystoscape   simple   interaction   format   (SIF)     (Shannon,  Markiel  et  al.  2003)  by  python  programming  language.    2.6.2 Unweighted  co-­‐expression  network     The   Pearson   Correlation   Coefficient   for   all   pair-­‐wise  comparisons   of   genes   is   calculated   from   filt.gene   matrix   in   R.   High  correlated   genes   are   selected   with   cutoff   0.9   and   parsed   to   simple  interaction  format  (SIF)    (Shannon,  Markiel  et  al.  2003)  by  python.    2.6.3 Weighted  co-­‐expression  network  2.6.3.1 Network  construction     The  procedure  can  be  found  in  the  WGCNA  website  (Horvath  2011).    2.6.3.2 Module  detection     Pearson  correlation  coefficients  are  calculated  for  all  pair-­‐wise  comparisons   of   genes   across   all   samples.     The   resulting   Pearson  correlation  matrix  is  transformed  into  the  weighted  adjacency  matrix  with   the   above   power   beta   6.   The   average   linkage   hierarchical  clustering   is   used   to   group   genes   on   the   basis   of   the   topological  overlap  dissimilarity  measure  of  their  network  connection  strengths  (Zhang,   Horvath   2005).   Using   a   dynamic   tree-­‐cutting   algorithm  (Langfelder,  Zhang  et  al.  2008),  13  modules  are  found  with  the  minimum  cluster  size  of  70  (Figure  6).  Genes  that  are  not  assigned  to  modules  are  assigned  the  color  grey.    
  • 14. 14    2.6.3.3 Module  significance     The  p  value  of  moderated  t  test  is  the  output  from  topTable  of  AffylmGUI  package  in  R  (Smyth  2004).      2.7 Network  visualization  2.7.1 File  format       The  output  files  from  WGCNA  are  parsed  to  simple  interaction  format  (SIF)    (Shannon,  Markiel  et  al.  2003)  by  python.    2.7.2 Analyzing  network  by  plugin  in  Cytoscape     AllegroMCODE  and  Network  Analysis  plugin  in  Cytoscape  are  used   to   analyze   the   network.   Finding   the   cluster   automatically   is  achieved   by   AllegroMCODE.  
  • 15. 15      3. RESULTS  3.1 Differential  expression    3.1.1 Expectation  of  the  Microarray  data  Genes   that   are   up-­‐regulated   in   the   overexpression   of   COE   or  down-­‐regulated   in   loss   of   function   of   COE   are   considered   ASM  candidate   genes   downstream   of   COE,   while   genes   that   are   down-­‐regulated  in  overexpression  of  COE  or  up-­‐regulated  in  loss  of  function  of  COE  are  considered  Heart  candidate  genes  repressed  by  COE  (Stolfi,  Gainous  et  al.  2010).    Using   the   COE   and   COEW   group   as   two   classes   in   the  Significant  Analysis  for  Microarrays  (SAM),  the  contrast  would  yield  ASM  and  Heart  candidate  genes.    3.1.2 ASM  and  Heart  candidate  genes  3.1.2.1   Lists  from  SAM       336  significant  genes  are  derived  from  SAM  and  separated  into  206  ASM  candidate  genes  (negative  in  SAM,  expression  of  COE  group  lower   than   that   of   COEW   group)   and   130   Heart   candidate   genes  (positive  in  SAM,  expression  of  COE  group  higher  than  that  of  COEW  group).     These   two   groups   can   be   distinguished   by   the   first   three  columns  in  the  heat-­‐map  (Figure  3,  Figure  27).    
  • 16. 16       Based  on  the  Hierarchical  Clustering  and  observation,  the  ASM  candidate  genes  can  be  roughly  divided  into  three  large  groups:     A1.  The  first  group  (up-­‐down-­‐up-­‐ASM,  61  genes),  shows  a  “U”  shape   curve   through   the   time-­‐series   experiments,   with   the   earliest  up-­‐regulation   right   at   the   experimental   time   point   of   8   hours.   This  group  contains  Snail  (‘SNAIL’  in  the  thesis),  SET  and  MYND  Domain  1  (SMYD1)  and  Myodblast  determination  protein  (Myod,  ‘MYOD’  in  the  thesis).       A2.   The   second   group   (early-­‐ASM,   45   genes),   including   COE  and   Myocyte   Regulatory   Light   Chain   (MRLC5,   ‘MYL5’   in   the   thesis)  gene,  shows  early  up-­‐regulation  around  14  hours.       A3.  The  third  group  (late-­‐ASM,  100  genes)  has  relatively  late  up-­‐regulation  after  18  hours,  with  myosin  heavy  chain  genes  (MHC3),  tropomyosin   1(TPM1,   ‘CTM1’   in   the   thesis)   and   muscle   like   actin   2  (MA2)  in  the  group.       The   Heart   candidate   genes   can   be   divided   into   two   large  groups:     H1.   The   first   group   (early-­‐Heart,   99   genes)   shows   early   up-­‐regulation  (before  20  hours),  containing  heart  markers  BMP2/4,  NK4,  NOTRLC/HAND-­‐LIKE,  and  ETS/POINTED2.    
  • 17. 17       H2.  The  second  group  (late-­‐Heart,  31  genes)  displays  relative  late  up-­‐regulation  (after  20  hours),  with  mesenchyme  specific  gene  3  (MECH3)  in  the  group.       As  expected,  two  lists  of  genes  have  some  important  markers  in  them  and  noticeable  temporal  expression.  But  these  ASM  and  Heart  candidate  genes  didn’t  show  Go-­‐term  enrichment  from  the  Blast2GO,  which  might  indicate  the  need  to  fine-­‐tune  the  list,  even  though  the  Blast2GO  with  few  go  terms  is  another  concern.  Further  improvement  of  the  ASM  and  Heart  candidate  gene  list  would  be  necessary  to  know  the  effect  of  the  non-­‐specific  filtering,  selecting  the  probe  for  a  gene  by  maximum  variance  and  SAM  ranking.    3.1.2.2   Clusters  from  STEM  Total  7  significant  model  profiles  showed  in  the  STEM  output.  23  out  of  the  206  ASM  candidate  genes  are  in  the  significant  profiles.  Most  of  them  are  in  the  profile  20,  similar  to  the  late-­‐ASM,  including  the  MHC3,  MA2  and  MYL5  genes.  For  the  Heart  candidate  genes,  13  out  of  130  are  in  the  significant  profiles.    3.2 Network  Visualization  in  Gaggle  3.2.1 Networks  3.2.1.1 STRING  protein  network       The   STRING   (Szklarczyk,   Franceschini   et   al.   2011)   protein  network  is  created  to  make  good  use  of  the  existing  data  resources.    It  
  • 18. 18    provides   both   experimental   and   predicted   interaction   information  from   computational   techniques,   presented   as   different   colors   in   the  edge  (Figure  9).    3.2.1.2 Co-­‐expression  network     The   network-­‐based   approaches,   also   termed   graph-­‐based  approaches,   aim   to   extract   recurrent   expression   patterns   or  conserved   module   from   the   rapid   accumulation   of   Microarray  datasets.  The  Microarray  dataset  is  modeled  as  a  relation  graph  where  each  node  represents  one  gene  and  two  genes  are  connected  through  the   edge   based   on   certain   expression   correlation   parameter   (Zhang,  Horvath  2005)  to  measure  the  similarity  between  expression  profiles  (Pearson   Correlation   Coefficient   is   used   in   this   thesis).   The   graph,  namely   network,   can   be   represented   by   an   adjacency   matrix   that  encodes   whether   a   pair   of   nodes   is   connected.   For   unweighted  networks,   entries   are   1   or   0.   For   weighted   networks,   the   adjacency  matrix  reports  the  connection  strength  for  the  gene  pairs,  between  1  and   0   (Zhang,   Horvath   2005).   The   concept   of   connectivity   in   graph  theory,   also   termed   degree,   can   be   depicted   as   the   row   sum   of   the  adjacency  matrix,  measuring  the  direct  neighbors  of  the  node  in  the  unweighted   networks   and   connection   strengths   in   the   weighted  network.        Two  co-­‐expression  networks  are  generated  in  this  thesis.    
  • 19. 19       The  unweighted  co-­‐expression  network  is  formed  by  the  genes  with  the  Pearson  Correlation  Coefficient  higher  than  0.9.  A  total  766  nodes   are   in   this   unweighted   network   with   clustering   coefficient  0.311  (output  result  from  the  Network  Analysis  plugin  in  Cytoscape,  measuring  the  cohesiveness  of  the  neighborhood  of  a  node).       The   genes   with   the   top   5000   strong   weight   are   outputted   to  build   the   weighted   co-­‐expression   network   (cutoff   for   the   weight   is  0.23),  a  total  of  814  nodes,  with  clustering  coefficient  0.728.       The  unweighted  network  has  more  isolated  clusters  with  only  2  nodes  linked  by  1  edge.  The  weighted  network  has  greater  density  with   some   hubs   (high   connectivity),   and   also   contains   colors   in   the  node  for  the  different  modules  detected  in  the  WGCNA.                Though   these   two   networks   are   different   in   the   adjacency  matrix,   they   are   both   based   on   Pearson   Correlation   Coefficient   to  present   the   genes   of   high   similarity   in   the   graph   in   terms   of   their  closeness.  In  other  words,  genes  of  same  expression  profiles  across  all  of  the  experiments  would  be  close  to  each  other  in  the  network.  These  network-­‐based  approaches  allow  for  the  exploration  of  the  position  of  a  biological  entity  in  the  context  of  its  local  neighborhood  in  the  graph  and   network   as   a   whole,   and   less   troubled   by   inherent   noise   that  confound  conventional  pairwise  approaches  (Freeman,  Goldovsky  et  al.  2007).    
  • 20. 20    3.2.2 Findings  from  the  network  visualization  in  Gaggle    3.2.2.1 Gaggle  as  information  integration  center                            In  this  post-­‐genomic  era,  biologists  often  face  the  challenge  to  freely   explore   the   experimental   and   computational   data   from   many  different  sources  and  diverse  software  tools,  such  as  storing  different  data  for  genes,  retrieving  data  from  a  list  of  genes,  and  mapping  one  list  of  genes  with  another.  Once  the  network  has  been  loaded  in  the  Cytoscape,   Gaggle,   as   an   information   integration   center,   can   help   to  solve  these  problems  with  respect  to  Microarray  data.     Storing  different  data  for  genes  can  be  achieved  by  labeling.  As  shown   in   the   Figure   9   and   10,   two   networks   present   data   from   6  different  sources,  such  node  color  for  module,  node  label  for  ASM  or  Heart   candidate   genes,   node   shape   for   significance   in   moderated   F  test,   node   size   for   connectivity,   edge   color   for   different   interaction,  and  distance  between  nodes  for  closeness.  Therefore  the  network  in  Cytoscape  functions  as  a  visual  database.       Retrieving  data  from  a  list  of  genes,  such  as  expression  matrix,  is  also  feasible  through  the  basic  function  “broadcast”  in  Gaggle.  For  example,  a  list  of  genes  of  interest  in  the  Cytoscape  can  be  sent  to  the  Gaggle  Boss,  and  then  broadcast  to  Data  Matrix  Viewer  (DMV),  which  can  output  the  expression  matrix.    
  • 21. 21       Mapping   one   list   of   genes   with   another   can   be   done  conveniently   in   Gaggle   thourhg   the   many   functions   that   it   offers.   In  the   MultiExperiment   Viewer   (MeV),   a   sub-­‐list   of   genes   can   be  launched   in   a   new   viewer.   In   Cytoscape,   the   function   “Create   new  network   from   selected   nodes”   can   be   used   in   this   task.   Between  different   tools,   the   function   “broadcast”   would   serve   as   a   bridge   to  transfer  the  list  and  map  it  in  the  existing  tools.  3.2.2.2 Module  from  AllegroMCODE     The  main  goal  of  the  co-­‐expression  network  visualization  is  to  find  the  highly  correlated  genes  (module)  related  to  the  ASM  or  Heart  network,  specifically  aiming  to  infer  targets  of  the  transcription  factor  COE.    In   the   unweighted   network   without   predefined   modules,   the  modules  can  be  automatically  detected  by  AllegroMCODE,  a  plugin  in  Cytoscape   to   find   highly   interconnected   groups   of   nodes   in   a   huge  complex  network.  The  1st  module  detected  by  AllegroMCODE  for  the  unweighted   network   is   shown   in   the   Figure   11.   This   module   is  significantly   enriched   in   biological   process   (Figure   12),   such   as  biosynthetic  process  and  cellular  biosynthetic  process.       For  the  weighted  network,  the  1st  module  (Figure  13)  detected  by   AllegroMCODE   contains   largely   turquoise   module   genes   (only   1  
  • 22. 22    grey  color  gene.  This  module  is  significantly  enriched  in  intracellular  process  (Figure  14).         Comparing   these   1st   modules   of   unweighed   and   weighted  network,  they  both  contain  ribosome  related  genes  (gene  name  starts  with  “RP”).    Because  these  two  networks  are  both  generated  from  the  same   Microarray   data,   an   external   reference   would   be   necessary   to  determine   whether   this   ribosome   group   is   found   by   chance.   The  common   list   of   23   genes   is   from   the   comparison   between   the   1st  module   in   weighted   network   and   all   turquoise   module   genes   in  STRING  network,  which  has  16  ribosome  related  genes.  3.2.2.3 Module  from  weighted  network     Weighted   correlation   analysis   (WGCNA)   has   advantages   in  identifying   candidate   targets   with   its   unique   mathematical   features  (Langfelder,  Horvath  2008).  While  the  highly  correlated  genes  can  be  grouped   into   different   modules,   those   genes   that   are   far   from   the  modules  are  depicted  in  grey.  Figure  18  shows  that  these  grey  color  genes   in   the   weighted   network   are   often   with   fewer   edges   and  targeted   at   miRNA,   which   are   reasonably   different   from   other  functional  modules.       In   Figure   7   and   Figure   8,   the   tan   and   brown   modules   have  strong  module  significance  (the  significance  is  defined  as  –log10  (p-­‐value   in   moderated   t   test)).   By   visualizing   these   two   modules   from  
  • 23. 23    their   top   50   intramodular   connectivity   genes   respectively,   these  modules  can  be  found  enriched  in  the  ASM  and  Heart  candidate  genes.  Interestingly,  NK4  gene  is  in  the  tan  module  with  other  genes  (Figure  19).  Islet  (ISL)  gene,  which  is  not  in  the  candidate  list  yet  reported  to  be  ASM  gene,  is  in  the  brown  module  with  some  known  markers,  such  as   MA2,   MHC3,   NOTRLC/HAND-­‐LIKE,   and   ETS/POINTED2   (Figure  20).    These  results  would  be  helpful  to  be  a  starting  point  for  making  hypothesis  of  the  Heart  network  in  Ciona.       As   the   largest   module   in   the   weighted   network,   enriched   in  cellular   process   and   others   (Figure   21),   it   is   natural   to   consider  limiting  the  list  of  the  turquoise  module  genes  with  other  conditions.  The  list  of  genes  resulted  from  turquoise  module  and  STEM  condition  shows   a   clear   temporal   expression   and   enrichment   in   muscle   and  heart  related  go-­‐terms  (Figure  22,  Figure  23),  while  containing  only  four  genes  found  in  the  list.    3.2.2.4 Fine-­‐tuned  list     The   network   in   Gaggle   can   serve   as   a   visualization   center   as  well  as  a  fine-­‐tuning  filter  for  a  list  of  genes,  because  the  network  is  built  upon  the  high  correlated  pair  of  genes  with  reduced  noise.  It  is  by   no   means   the   genes   that   are   not   in   the   network   that   should   be  discarded,   but   it   is   good   to   have   expected   go-­‐term   enrichment   to  confirm   the   list.   Because   the   go-­‐term   enrichment   is   related   to   the  
  • 24. 24    proportion   of   genes   with   the   same   go-­‐terms,   the   number   of   noisy  genes  in  the  whole  list  would  have  a  great  impact  on  the  enrichment.  Importing   the   candidate   list   to   the   co-­‐expression   network   would  reduce  the  noise  and  yield  better  enrichment  result.       By   “broadcasting”   function   in   the   MeV,   the   Cytoscape   can  receive  and  label  the  336  significant  genes  in  the  unweighted  network  with   yellow   color,   and   then   create   a   sub-­‐network   for   the   candidate  genes.  A  subgroup  of  the  candidate  genes  (Figure  24)  is  significantly  enriched   in   muscle   and   heart   related   go-­‐terms   (Figure   25),   which  previously   could   not   be   reported   from   the   Blast2GO.   The   ASM  candidate  genes  in  the  network  are  also  enriched  in  muscle  and  heart  go-­‐terms  (Figure  26),  while  the  Heart  candidate  genes  in  the  network  are  still  not  reported  enrichment  from  the  Blast2GO.      
  • 25. 25      4. DISCUSSION  4.1 ASM  candidate  genes     COE   is   necessary   and   sufficient   to   specify   ASM   fate   (Stolfi,  Gainous  et  al.  2010).     It   is  understandable   that   COE   expresses   earlier  than  the  late-­‐ASM  genes  (A3  group),  such  as  MHC3,  TPM1,  MA2.  While  for  the  up-­‐down-­‐up-­‐ASM  (A1  group),  it  has  the  earliest  up-­‐regulation,  with  MYOD  in  the  group.  In  Xenopus,  the  cross-­‐regulatory  interactions  of  COE  orthologs  with  genes  of  the  Myogenic  Regulatory  Factor  (MRF)  family,  such  as  MYOD  and  MYF5,  are  crucial  for  muscle  commitment  and   differentiation   (Green,   Vetter   2011).   However,   how   COE   may  repress   the   cardiac   fate   and   promote   cell   migration   in   Xenopus   has  never  been  studied.  A  possible  hypothesis  is  that  in  Ciona,  the  early  functions   controlled   by   COE   in   ASM   precursors   are   independent   on  MRF   activation   since   the   MRF   in   the   A1   group   has   earlier   up-­‐regulation  than  COE  in  the  A2  group.    And  the  A1  group  genes  are  more  likely  to  be  TVC  genes,  which  also  can  explain  the  fact  that  there  are  heart  related  go-­‐terms  in  the  enrichment  of  the  ASM  genes  in  the  weighted  network  (Figure  26).    4.2 Annotation  in  Ciona  intestinalis       The  draft  of  genome  sequence  of  the  ascidian  Ciona  intestinalis  (Dehal,   Satou   et   al.   2002)   has   been   a   valuable   research   resource.  
  • 26. 26    However,  there  are  numerous  inconsistencies  with  the  gene  models  because  of  the  intrinsic  limitations  in  gene  prediction  programs  and  the   fragmented   nature   of   the   assembly   (Satou,   Mineta   et   al.   2008).  Therefore   the   annotation   job   for   the   probe   in   this   study   focuses   on  combining   available   resources   from   various   databases,   such   as  Aniseed   (Tassy,   Dauga   et   al.),   Ensembl   Genome   Browser   (Kersey,  Lawson  et  al.  2010),  CIPRO  (Endo,  Ueno  et  al.),  STRING  (Szklarczyk,  Franceschini  et  al.  2011),  UCSC  Genome  Browser  (Karolchik,  Hinrichs  et   al.   2011),   and   also   internal   files   from   Dr.   Lionel   Christiaen’s   lab.  There  are  16,250  non-­‐redundant  genes  in  the  30,969  probes,  which  will  be  the  criteria  to  map  a  probe  to  a  gene.  It  is  unavoidable  that  there  are  differences  between  the  gene  annotation  in  this  thesis  and  other  sources.      4.3 Functional  ribosome  group  and  COE    The   highly   linked   ribosome   genes   in   the   STRING   network  (Figure  19),  enriched  in  ribosome  process  (Figure  20),  naturally  lead  to   a   question—what   is   the   relationship   between   this   functional  ribosome  group  and  COE.  By  broadcasting  this  list  of  ribosomes  and  COE   genes   to   MeV,   the   heat-­‐map   and   expression   plot   show   the  similarity  in  the  time-­‐series  experiments  of  ribosome  group  and  COE.  And   this   group   of   ribosome   genes   has   quite   a   stable   expression  profile.   It   is   likely   to   find   more   housekeeping   genes   in   the   same  module  as  the  ribosome  group,  which  is  not  the  focus  of  this  thesis.  
  • 27. 27    4.4 Time-­‐series  Though   the   clustering   algorithms,   such   as   Hierarchical  clustering   (Eisen,   Spellman   et   al.   1998),   K-­‐means,   and   Self-­‐organizing  Maps   (SOM)   (Tamayo,  Slonim  et  al.  1999),   can   be   used   to   analyze   the  Microarray   data   and   yield   many   biological   insights,   they   are   not  designed  for  time-­‐series  data  since  they  assume  that  data  at  each  time  point  is  collected  independent  of  each  other,  and  ignore  the  sequential  nature  of  time-­‐series  data  (Ernst,  Nau  et  al.  2005).  This  thesis  applies  the   Short   Time-­‐series   Expression   Miner   (STEM)   method   to   learn  about   the   time-­‐series   experiments   with   the   hope   of   finding   clues  about  the  true  biological  pattern,  which  is  designed  for  the  analysis  of  short   time   series   Microarray   gene   expression   data   (Ernst,  Bar  Joseph  2006).  The  algorithm  (Ernst,  Nau  et  al.  2005)  of  STEM  starts  by  selecting  a  set  of  potential  expression  profiles,  covering  the  entire  space  of  all  possible  expression  profiles  that  can  be  generated  by  the  genes  in  the  experiment,   and   each   represents   a   unique   temporal   expression  pattern.   Next,   each   gene   will   be   assigned   to   one   of   the   profiles   and  after   the   permutation   resulting   in   different   large   clusters   with  significant  model  profiles  by  greedy  algorithm  (Ernst,  Nau  et  al.  2005),  which  are  colored  in  the  top  list  in  the  user  interface.    It  is  worth  to  mention  that  the  STEM  is  designed  for  short  time-­‐series   (defined   3   –   8   time   points   in   their   website);   while   the   time  points  in  this  Microarray  dataset  is  11.    
  • 28. 28    4.5 Limitations  of  the  co-­‐expression  network           The  co-­‐expression  network  approaches  have  several  limitations  including  the  following.  First,  the  network  similarity  is  based  on  the  Pearson   Correlation   Coefficient,   which   is   sensitive   to   outliers.  Therefore  the  quality  of  the  input  matrix  would  be  important  to  the  final  result.  It  would  be  helpful  to  try  the  data  transformation  or  use  Spearman’s  rank  correlation  coefficient.       A  second  limitation  is  that  the  Pearson  Correlation  Coefficient  based   co-­‐expression   network   is   more   suitable   for   finding   global   co-­‐expression   genes(Qian,   Dolled   Filhart   et   al.   2001),   and   it   cannot  accurately  detect  the  time-­‐delayed  or  transient  response  of  the  down-­‐stream  effectors  for  the  time-­‐series  experiments.  It  would  be  better  to  use   local   clustering   (Qian,   Dolled   Filhart   et   al.   2001)   to   find   the   time-­‐delay  or  local  co-­‐expression  genes,  or  other  tools  specialized  in  long  time-­‐series   experiments   like   The   Graphical   Query   Language   (GQL)  (Costa,  Schnhuth  et  al.  2005).       A  third  limitation  is  that  it  is  difficult  to  pick  thresholds  for  a  biological   network.   The   hard-­‐threshold   for   the   unweighted   network  would  arbitrarily  cut  off  some  biological  meaningful  edges.  The  weak  weight  modules  would  also  be  cut  off  in  the  weighted  network  while  it  is   possible   that   this   kind   of   weak   linkage   would   be   biologically  meaningful.    
  • 29. 29    Figures  and  tables    Figure  1   Pipeline.    
  • 30. 30      Figure  2   Normalized  unscaled  standard  error  (NUSE).    One  of  the  tests  in  the  arrayQualityMetrics,  NUSE,  detected  sample  LacZ3  as  an  outlier.      Figure  3   Heat-­‐map  of  ASM  and  Heart  candidate  genes.    ASM  candidate  genes  are  red  in  the  first  and  third  column.  A1:  up-­‐down-­‐up-­‐ASM.  A2:  early-­‐ASM.  A3:  late-­‐ASM.  Heart  candidate  genes  are  red  in  the  second  column.  H1:  early-­‐Heart.  H2:  late-­‐Heart.    
  • 31. 31      Figure  4   Output  of  the  Short  Time-­‐series  Expression  Miner.    Significant  clusters  are  colored  at  the  top  row.    5 10 15 200.30.40.50.60.70.80.9Scale independenceSoft Threshold (power)ScaleFreeTopologyModelFit,signedR^2123 45 67 8 9 10 11 12 13 14 15 16 171819 205 10 15 20050010001500 Mean connectivitySoft Threshold (power)MeanConnectivity12345 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20  Figure  5   Selecting  soft  power.    The   soft   threshold   power   beta   of   6   is   chosen   for   calculating   the  adjacency  matrix  since  it  reached  a  high  topology  model  fit  (R^2)  and  high  mean  connectivity.      
  • 32. 32      Figure  6   Ciona  intestinalis  weighted  co-­‐expression  network.    The  dendrogram  results  from  average  linkage  hierarchical  clustering.  The   color-­‐band   below   the   dendrogram   denotes   the   modules,   which  are   defined   as   branches   in   the   dendrogram.   Of   the   10,   079   genes,  6162   were   clustered   into   13   modules,   and   the   remaining   genes   are  colored  in  grey.    
  • 33. 33    black blue brown green greenyellow grey magenta pink purple red tan turquoise yellowDynamic−cutree Module Significance(COE−COEW modt) p= 3.1e−86Dynamic Modulecoesig0.00.20.40.60.8black blue brown green greenyellow grey magenta pink purple red tan turquoise yellowCounts01000200030004000  Figure  7   Module  significance.  Module   significance   is   determined   as   the   average   absolute   gene  significance  (defined  by  minus  log  of  a  p-­‐value)  measure  for  all  genes  in  a  given  module.  
  • 34. 34    ●●●●●●●●●●●● ● ●●● ●●●●●●● ●●●●●●●● ●● ●● ●● ●●●●●● ●●●●● ●●●● ●● ●●●●● ● ●● ●●● ●●●●●●●●● ● ●●●●●●●●●●● ● ●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●● ●●●●●● ●●●●● ●●●● ●● ●●●●●●●●●●●●●● ●●● ●●●●●● ● ●●● ●●● ●●● ●●●●●●●● ●●●●●● ●●●● ●●●● ●●●●● ●●●●●●●●● ●●●●●●● ●●●●●●●●● ●●● ●● ●●●●●●●● ●● ● ●●●●● ●●●● ● ●● ●● ●● ●●●● ●●●●●●●●●●●● ●●● ● ●●● ● ●● ● ●●●●●●●●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●●●● ●●●●● ●● ●●●●●● ●●●●●●● ● ●●●●●●●●● ●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●● ●●● ●●● ●●●●●●●●● ●●● ● ●●●●●● ●●●●● ●●●●●●●●●●●●● ●●● ●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ● ●● ●●● ●●●●●●●● ● ●●●●●●● ●●●●●●●●●●●●●●●●● ●●● ●●● ●●●●●●●●●● ●●●●●●●● ●●●●● ●●● ●●●● ●●●●● ●●●●●●●●●●●●●●●●● ●●● ●●●● ●●●●●● ●●●●●●● ●●● ●●●●●● ●●●●● ●●●●●●●●●●●●●●●● ●●● ●● ●●●●●●●●●●● ●● ●●●●●●●● ●●●●●●●●●● ● ●●●●●● ● ●●●●●●●●●●●●● ●●● ●● ●●●●●●●● ● ●●●●● ●● ●●●●● ●●● ●●●●●●● ●●●●●● ●● ●●●●● ●● ● ●● ●● ●●●●●●●●●● ●●●●● ●●●●●●●●● ●●●●●●●●●●●●●● ● ●●●●●● ●● ●●●● ●● ●●●● ●●●●●●●●●●●● ●● ●●● ●●●●●●●●● ●●● ●●●●●●●● ●●● ●●●● ●●●● ● ●●●● ●●●● ●●●●●●●● ● ●● ●●● ●●●●●●●●●●● ●●●●●● ●●●●●●●●●● ●●●● ●●●●● ●●●●●●●●●●●●●●●●● ●●● ●●● ●●●●●●● ●●●●●●●●● ●●●●● ●●●●●●●●●●●● ●●●● ●●●●●●●● ●●●● ●●● ●●● ●● ●●●●●●● ●●●●●● ● ●●● ●●●●●●● ●●●● ●● ●●● ●● ●●●● ●●●●●●● ●●● ●● ●●●●●●●● ●●●●●●●●●●●● ●●●● ●●●●●●●●●●●●● ●● ●●●●●●●●●● ●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●● ●●●●●●●●● ●● ●● ●●● ●● ●●●●●● ●●●●●●● ●●●●●●●●●●●● ●● ●●● ●●●●●●●●●●●●● ●●● ●●● ●●●● ●●●●● ●●●● ●● ●●●● ●● ●●●●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●● ●●●●●●● ●●●●●●●● ●●●●●●●●●●●● ●●●● ● ●●●●●●● ●●●● ●●●● ●●● ●● ●●● ●●●● ●●●●●● ●●●●●●● ● ●●●●●●●●●●●●● ●●● ●●●●●●●● ●●●●●● ●●●●● ● ●● ●●●●●● ●●●●●●●●● ● ●●●●●●●●●●●●●●● ● ●●● ●●●●● ●●● ● ● ●●●● ●●●●●●●● ●●●●● ●● ●●●●● ●● ●●●●● ●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●● ●●● ●● ●●● ●●●●●● ●●● ●●●● ●●●●● ●● ●●●●●●●●●●●●●● ● ●●●●●●●● ● ●● ● ●●●● ●●●●●●●● ●●●●●●●● ●●● ● ●●●●●●●●●●●●●●●●● ●●●●● ●●●●● ●●●●●●●●●● ● ●●●●●●●● ●●●●●●●●●● ●● ●●●●●●●●●●● ●●●●●●● ●●● ●● ●●● ●●●●● ●●● ●●● ●●●●●● ●●● ●●●●●●●●● ●● ●●● ●●●● ●●●●●●●●●● ●●●● ●●●● ●● ●●● ●●●● ●●●●●●●●● ● ● ●● ●●●●●●●●● ●●● ● ● ● ●●●●●●● ●●●●●●●●●●●●● ●● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●●● ●● ●●●●●●●●●●●● ● ● ●●●●●●●●●●●● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●● ● ●● ● ●●●●● ●● ●●●● ●●●● ●● ●●● ●● ●●●● ●● ●●● ●●●●●●● ●●● ●● ●● ●●●●●●●●●●●●●●●●● ●●●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●● ●●●● ●●● ●● ● ●●●●●●●● ●●●●● ●●●●●●●●●● ●●● ●● ●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●● ●● ●●●● ●●●●● ●●●●●● ●●● ●●●● ● ●● ●●●● ●●●●●●●●●● ●● ●●● ●●● ●● ●●●●●●●● ●● ●● ●●●● ● ●●● ●● ● ●●● ●●●●● ●● ●●●●●●●●●●● ● ●●● ●●●●●●●●●●●●●●●●●●●● ● ●●● ●●●●●●●●●●●●●●●● ●●●●●●● ●●● ● ●●●●● ● ●●● ●● ●●●●● ●●●●●●●●●●●●● ●● ●●●●●● ● ●● ●●●●●●●● ●●●● ●●●● ●●●●●●●●●●● ●● ●●●● ●●● ●●●●●●● ●●● ●●●● ●●●●● ●●●●● ●●●●●●●●●●●●● ●●●●●●● ● ●● ●●●●●●● ●●●●●●●●●● ●●● ●●●●●●● ●● ●● ●●● ●●●●●●●● ●●●●●●●● ●●●●●●●●●●●● ●●●●●●●● ●●●● ● ●●●● ●●● ●● ●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●● ●● ●●●● ● ●●●●●●●●● ●●●● ●●●●● ●●●●● ●●●●●●●●●●● ●●●●● ●●●●●●●●● ●●●●●●●●● ●●●● ● ●●● ●●● ●● ●● ●●●●● ●● ●● ●●●●●●●●●●●●● ●●●●● ●●●●●●● ●● ●●●● ●●● ●●●●● ● ●●●●●●● ●●●●● ●●●●●● ●●●● ●●●●●● ●●●●●●● ●● ● ● ●●●●●●● ●●●●●●● ● ●● ●●●●●●●● ●●● ●●●● ●●●● ● ●●● ●●●● ●●● ● ●●● ●●● ●● ●● ● ●●●● ●●● ● ● ●● ●●●●●●●●●●●●●●● ●●●●● ●●● ●● ●●●●●● ● ●●●●● ●● ●●●●●●●●●●●● ●●●● ●● ●●●●●●● ●●●●●● ●●●● ●●●●●●●●●● ●●●●● ●●● ●●●●●● ●●●●●● ●●●●● ●●●●●●●● ●●●● ●● ●●● ●●●●●●●●●●● ●●● ●●●● ●●●●●●●●●●●●●●●●● ●●● ●●●●●●●● ●●● ●●●●●●●●●●●●● ●●● ●●● ●●● ●●● ●●●●●●●●●●●● ●●●●●●● ●●● ●●●●●● ●●●● ●●●● ●●●●●●●●● ●● ●● ●●●●●●●●● ● ●● ●●●●●● ●● ●●●● ●●●●●● ●●●●●●● ●●● ●●●●● ●●●● ●●●●●●●●● ●●● ●●●●● ●●●●● ●●● ●●●●●● ●● ●●● ●●●● ●● ●●●● ●●●● ●●●●●●●●● ●●●●●●●●● ● ●●●●●● ●● ●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●● ●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●● ●● ●●●●● ●●●●●● ●●●●●●●● ●●● ●●●●●● ●● ● ●● ●●●●●●●●●● ●●●● ●●●● ●●●●●●● ● ●●● ●●●●●●● ●● ●●●●●●● ●●●●● ●●● ●●● ●● ●●● ●● ●●●●●●● ●●● ● ●●●●●● ●●●● ●●●●●●●●●● ●●●● ●●●●●●●●● ●●● ●●●●●●●● ●● ●●● ●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●● ●● ●●●●●●●●●●●●●●●●●● ●●●●●● ●●●● ●●●●●●●●● ●●●●●●●●●●●● ●●●● ●●●●●●●●●●● ●●●●●● ● ●●●●●●●●●●●●●●● ●●●●● ●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●● ●●●●●● ●●●●●● ●●● ●●●●● ●●● ●●●●●●●● ●●●●●●●●●● ●●●●●●● ●●●● ●●●● ● ●● ●●●●● ●●●●●● ●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●● ●●● ●●●● ●●●●●●●● ●●●●● ●●●●●●● ●●●● ●●●●● ● ●●●●●●●●●● ● ●●● ●●●● ●●●●●●● ●●●●●●● ●●●●●●● ●●● ●● ●●●●● ●●●●●●●●●●●● ●● ●●●●●●● ●●●●●●●●●●●● ●●●● ●●●●●●●●●● ●●● ●●●●●●●● ●●●●●● ●●●● ●●●●●● ●●●● ●●●●●● ●●●●●●●●●● ●●●●●●●● ●● ●●●●●●●●●●●●●●●● ●●● ●●● ●●●●●●● ● ●●●●●● ●●●●●●●●●●●●●●● ●●●●●●● ●● ●●●●●●●●●● ●● ●● ●●●● ●● ●●● ●● ●●● ● ●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●● ●● ●●●●●●●●● ●●●● ●●●●●●● ● ● ●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●● ●● ●●●●●●●● ● ●●●●●● ●●● ●●●●●●● ●● ●●●●●● ●●●●● ●● ●●●●●●● ●●●●● ● ●●● ●●●●● ●●●●● ● ●●●●●● ●●● ●● ●● ●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●● ●● ● ● ● ●●● ●●●●●●● ● ●●● ●●●● ●● ●●●●●● ●●●●●● ●●●●●●●●0 2 4 6 8 10 120123456grey cor=−0.023, p=0.14ConnectivityGeneCOE−COEWSignificance●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●0 5 10 15 20 25 300.00.51.01.5pink cor=−0.066, p=0.36ConnectivityGeneCOE−COEWSignificance●● ●●●●● ● ●●●●●●●●●●●●●●●●● ●●●●●●● ●●●●●● ●●● ●●●●●●●●●●●● ●●● ●●●●●●● ● ●●●●●●●●●● ●●●● ● ●●●●●●●●●●●● ●●●● ●●●● ●●●●●●●●●● ●●●●●●●●●●● ●● ●●● ●●●● ●●●●●●●●● ●● ●● ●● ●●●●●●●● ● ●● ●●●●●● ●●●● ●●●●●● ● ●●●●●●● ●● ●●● ●●●●●● ●●●●●●●●●●●●●●●● ●●●●●● ●●●●●●● ● ●●●●●●●●●●● ● ●●●● ●●●●●●●●● ● ●●●●●● ●●●●●●● ●● ●● ●●●●●● ●●●● ●●●● ●●●●●●● ●●● ●●●●●●●● ●●●●●●●●●●● ●● ● ●● ● ● ●●●●●●● ●● ●●●● ●●● ●●●● ●●● ●●●●●●●●●● ●●●● ●●●●●●●●●● ●●●●● ●●●●●●● ●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●● ●●●●●●● ●●●●●●●● ●●●●●●●●● ●●●●● ●● ● ●●●●●●●●●●●●●●●● ● ●●●●●● ●●●●● ● ●●●●●●●● ●●● ● ●●●●●●●● ●●● ● ● ● ●● ●●● ●●●●●●●●●●●●●●● ●●● ●●●●●●● ●●●●●●●●●●●●●●● ● ●●●●● ●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●● ●●●●●●● ●●●● ●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●● ● ●● ●●●●● ●●●●●●●●●●● ●●●●● ●●●●●●● ●●●●● ●●● ●●●●●●● ● ●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●● ●●●● ●●●●●●●●●●● ●●●● ● ●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●● ●●● ●● ●●●● ● ●●●●●●●● ● ●●● ●●●●●●●●●●● ●●●●●●●●●● ●●● ●●● ● ●●●●●●●●●●●● ●●●● ●●●●●●●●●● ●●●●●● ●●●●●● ●●●● ●● ●●●●●● ●●●●●●●●● ●● ●●●● ●●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●● ●● ●●● ●● ●● ●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●● ●●● ●● ●●●● ●● ●●●●●●● ●●●●●●●●● ●●●● ●●●●●●● ●● ●●●●● ● ●●● ● ●●●●● ●● ●●●●●● ●●●●●●●● ● ●●● ● ● ●● ● ●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●●● ●●● ●● ● ●●●●● ●●●●●●● ●●●●●●●●●●●●●●●●● ● ●●●●●● ●● ● ●●●●●●● ●●●●●●● ●●●●●●●●●●●●●●●●●●●50 100 150 200 250012345turquoise cor=−0.0093, p=0.75ConnectivityGeneCOE−COEWSignificance●●●●●●●● ●●●●● ●●●●●●●●●●●●● ●●●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●5 10 15 20 25 300.00.51.01.5magenta cor=0.11, p=0.19ConnectivityGeneCOE−COEWSignificance●●●● ●●●●●●●●●●●●●●● ●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●● ●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●● ●●●●●● ●●●●●●●●●●● ●●●● ●●●●●●●●● ●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●●●● ●●●●● ●●●● ● ●●●● ● ●● ●●●●●●●●●●●●●●●● ●●●● ●●●● ●●●●●●●●● ●●●●●●●●● ●●●● ●●●●●●●●●●●●●●● ●● ●● ● ●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●● ●●●●● ● ●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●0 10 20 30 400.00.51.01.52.02.5red cor=−0.09, p=0.036ConnectivityGeneCOE−COEWSignificance●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●● ● ●●●●●●● ●●●●●● ●●● ●●●●●●●● ● ●●●●●●●●● ●●●●● ●●●● ● ●●●●●● ●●●●●●●● ● ●● ● ●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●● ●●● ●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●● ●●●●●●●●●● ●●●●●●● ●●●●●●●●●●●● ●●●●●●●●● ●●●●●●●●●● ●●●●●● ●● ●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●●●● ●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ●●●● ●●●●● ●●●●● ●●●●● ●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●●● ●●●●● ●●●●●●●●●●●●●●● ●●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ● ●●●● ●● ●●● ●● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●●●●● ● ●●●●●●● ●●●●●● ●●● ●●●●●●●●●●●●● ●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●● ●● ●●●●● ●●●● ●●● ●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●● ●● ●● ●●●● ●●●●●●●● ●●● ●●● ● ●●● ● ●●●●●●●●●●● ●●●●●●●●● ●●●●●●●●● ●●●●●●●● ●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●● ●●●● ●●●●●● ●●●●●●●●●● ● ●●●● ●●●●●●● ●●● ●●●●●●●●0 5 10 15 20 2501234blue cor=0.28, p=2.3e−22ConnectivityGeneCOE−COEWSignificance●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●0 2 4 6 801234tan cor=0.5, p=1e−07ConnectivityGeneCOE−COEWSignificance●● ●●●●●●●● ●●●●●●●●● ●●●●●●●●●●● ●●●●● ●●● ●●●●●●●●● ●●●●●●●●● ●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●● ●●●●●●●● ●●●●●● ●●●●●●● ●●●● ●●●●●●●●●● ● ●●●●●●●●●● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●●● ●●● ●●●●●●●●● ●● ●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●●●●●●● ●●●●● ●●●●●●●● ●●●●●●● ●●●● ●●●●●●●●●●●●●● ●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ● ●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●●●●●●●● ● ●●●●● ●●●●●●●● ● ●●●●●●● ●●●●●●●●●● ●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●● ●●● ●●●●●●●●● ●●●●●● ●●●●●●●●●●●●● ●●● ●● ●●●●●● ●●●●●0 20 40 60 800123456brown cor=0.61, p=5.9e−79ConnectivityGeneCOE−COEWSignificance●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●0 5 10 15 20 25 300.00.51.01.52.02.5black cor=0.24, p=2.6e−06ConnectivityGeneCOE−COEWSignificance●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●2 4 6 8 100.00.10.20.30.40.5greenyellow cor=−0.13, p=0.14ConnectivityGeneCOE−COEWSignificance●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●● ● ●●●●●● ●●●●●●●●●● ●●●●●●●●●●● ● ● ●●●●●● ●●●●●● ●●●●●●●●●●●●● ●●●●●●●●● ● ●●●●●●●●●● ●●●●●●●●●● ●●●●●● ●● ●●●●● ● ●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ● ●● ●●●●● ●●●●●● ●●●● ●●●●●●●●●● ●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ●●● ●●●●●●●●●●● ●●●●● ●●●●●●●●●●●●● ●●●●●●●●● ●●●●●● ●●●●● ●● ●●●● ●● ●●●●●●● ●●● ●●●●●●●● ●●●●● ●●●●●●●●● ●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●● ●●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●0 10 20 30 40 50 600.00.51.01.5yellow cor=−0.044, p=0.27ConnectivityGeneCOE−COEWSignificance●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●● ●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●● ●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●● ● ●●●● ● ●●●●●●●●●● ●●●●●●● ●●● ●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ●● ● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●● ●●●●●●●●●● ●●●●●● ●●●●● ●●●●●●●●● ● ●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●● ● ●●●●●●●●●●●●● ●●●●●● ●● ●●●●●●●●●●●●●●●●● ●● ●●●● ●● ●●●●●●●●● ●●●● ●●●●●●●●●●●● ●●● ●● ●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●● ●● ●●●● ● ●●●●●●●●●●●●●● ● ●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●● ● ●● ●●●●●●●●●●●●● ● ●● ●●●0 5 10 20 300.00.51.01.52.0green cor=−0.079, p=0.054ConnectivityGeneCOE−COEWSignificance●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●● ●●●●●●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●● ●2 4 6 8 10 120.00.10.20.30.40.50.6purple cor=−0.094, p=0.27ConnectivityGeneCOE−COEWSignificance  Figure  8   Intramodular  connectivity  and  module  significance.  Intramodular  connectivity  measures  how  connected,  or  co-­‐expressed,  a  given  node  is  with  respect  to  the  nodes  of  a  particular  module.  It  is  the  connectivity  in  the  subnetwork  defined  by  the  module.          
  • 35. 35        Figure  9   STRING  protein  network.      The  edge  colors  represent  different  evidences.  Neighborhood:  green;  Gene   Fusion:   red;   Coocurrence:   blue;   Coexpression:   black;  Experimental:   magenta;   Databases:   cyan;   Textmining:   greenyellow;  Homology:  light-­‐blue.      Figure  10   Labeling  in  weighted  network.    Different   labelings   in   the   network   represent   different   data.   Node  color:  module  color;  node  border  color:  significant  clusters  in  STEM;  node  shape:  significant  genes  in  moderated  F  test  are  diamond  shape,  
  • 36. 36    while   not   significant   genes   are   round   shape;   node   label   color:   ASM  candidate  genes  are  blue,  Heart  candidate  genes  are  red.    Figure  11   The   1st   module   inferred   by   AllegroMCODE   for  unweighted  co-­‐expression  network.        
  • 37. 37    Figure  12   The   1st   module   of   unweighted   co-­‐expression   network  enrichment.    Figure  13   The  1st  module  inferred  by  AllegroMCODE  for  weighted  co-­‐expression  network.      Figure  14   The  1st  module  of  weighted  network  enrichment.    
  • 38. 38      Figure  15   Ribosome  group  in  the  String.      Differential GO-term DistributionTest Set Reference Set0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95%Sequencesribosomestructural constituent of ribosometranslationribonucleoprotein complexstructural molecule activitycytosolic ribosomecytosolic partsmall ribosomal subunittranslational elongationcellular protein metabolic processgene expressioncellular macromolecule biosynthetic processmacromolecule biosynthetic processcytosolic small ribosomal subunitendocrine pancreas developmenttranslational terminationprotein metabolic processnon-membrane-bounded organelleintracellular non-membrane-bounded organellemacromolecular complexcellular protein complex disassemblycellular macromolecular complex disassemblyprotein complex disassemblyendocrine system developmentmacromolecular complex disassemblycellular biosynthetic processpancreas developmentviral genome expressionviral transcriptionviral infectious cyclecellular component disassemblyviral reproductive processbiosynthetic processcytosolreproductive cellular processcellular macromolecule metabolic processviral reproductioncytoplasmic partmacromolecule metabolic processlarge ribosomal subunitreproductionribosome biogenesismacromolecular complex subunit organizationcellular macromolecular complex subunit organizationreproductive processrRNA metabolic processrRNA processingcytosolic large ribosomal subunitcytoplasmcellular metabolic processrRNA bindingribonucleoprotein complex biogenesisprimary metabolic processRNA bindingribosomal small subunit biogenesisncRNA processingdevelopmental processintracellular organelleorganellemulticellular organismal developmentncRNA metabolic processsystem developmentorgan developmenterythrocyte homeostasisintracellularmetabolic processcellular component biogenesisGOTerms  Figure  16   Ribosome  group  in  STRING  network  enrichment.    
  • 39. 39      Figure  17   Ribosome  group  and  COE.        Figure  18   Grey  color  genes.    
  • 40. 40      Figure  19   Tan  module        Figure  20   Brown  module    
  • 41. 41      Figure  21   Turquoise  module  enrichment.      Figure  22   Genes  in  turquoise  plus  STEM  condition.    
  • 42. 42      Figure  23   Genes  of  Turquoise  plus  STEM  condition  enrichment.      Figure  24   Sub-­‐group  of  candidate  genes  in  unweighted  network.    
  • 43. 43    Differential GO-term DistributionTest Set Reference Set0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 22.5 25.0 27.5 30.0 32.5 35.0 37.5 40.0 42.5% Sequencescardiac muscle tissue developmentheart processheart contractionpositive regulation of heart contractionstriated muscle tissue developmentmyofibril assemblyactomyosin structure organizationmuscle contractionmuscle tissue developmentcardiac cell differentiationmuscle system processactin filament-based movementcirculatory system processblood circulationregulation of heart contractionheart developmentmuscle structure developmentcellular component assembly involved ...striated muscle cell developmentsarcomerecontractile fiber partmuscle cell developmentheart morphogenesismyofibrilcontractile fiberstriated muscle cell differentiationsystem processanatomical structure formation involved ...striated muscle thin filamentsarcomere organizationcellular component morphogenesiscardiac myofibril assemblystress fiberactin cytoskeleton organizationmuscle cell differentiationmuscle organ developmentpositive regulation of multicellular organismal processactin cytoskeletonpositive regulation of cell adhesioncardiac cell developmentcardiac muscle cell developmentactin filament-based processGOTerms  Figure  25   Sub-­‐group   of   candidate   genes   in   unweighted   network  enrichment.        Figure  26   ASM  candidate  genes  in  weighted  network  enrichment.      
  • 44. 44      Figure  27   ASM  and  Heart  candidate  genes  Part  A  illustrates  the  generation  of  ASM  and  Heart  cells  from  TVC.  Part  B  summerizes  different  temporal  expression  groups  of  ASM  and  Heart  candidate  genes,  with  the  count  numbers  and  known  markes.  Arrows  represent  the  trend  of  their  temporal  expression.                  
  • 45. 45    References  1.  BARABASI,  A.  and  BONABEAU,  E.,  2003.  Scale-­‐free  networks.  Scientific  American,  288(5),  pp.  60-­‐69.  2.  BARABASI,  A.  and  OLTVAI,  Z.,  2004.  Network  biology:  Understanding  the  cells  functional  organization.  Nature  Reviews  Genetics,  5(2),  pp.  101-­‐U15.  3.  CHRISTIAEN,  L.,  DAVIDSON,  B.,  KAWASHIMA,  T.,  POWELL,  W.,  NOLLA,  H.,  VRANIZAN,  K.  and  LEVINE,  M.,  2008.  The  transcription/migration  interface  in  heart  precursors  of  Ciona  intestinalis.  Science,  320(5881),  pp.  1349-­‐1352.  4.  CONESA,  A.,  GTZ,  S.,  GARCA-­‐GMEZ,  J.,  TEROL,  J.,  TALN,  M.  and  ROBLES,  M.,  2005.  Blast2GO:  a  universal  tool  for  annotation,  visualization  and  analysis  in  functional  genomics  research.  Oxford:  Oxford  University  Press.  5.  COSTA,  I.,  SCHNHUTH,  A.  and  SCHLIEP,  A.,  2005.  The  Graphical  Query  Language:  a  tool  for  analysis  of  gene  expression  time-­‐courses.  Oxford:  Oxford  University  Press.  6.  DAVIDSON,  B.,  2007.  Ciona  intestinalis  as  a  model  for  cardiac  development.  London,  UK:  Academic  Press.  7.  EISEN,  M.B.,  SPELLMAN,  P.T.,  BROWN,  P.O.  and  BOTSTEIN,  D.,  1998.  Cluster  analysis  and  display  of  genome-­‐wide  expression  patterns.  Washington,  D.C.:  National  Academy  of  Sciences.  8.  ERNST,  J.  and  BAR  JOSEPH,  Z.,  2006.  STEM:  a  tool  for  the  analysis  of  short  time  series  gene  expression  data.  London:  BioMed  Central.  9.  ERNST,  J.,  NAU,  G.  and  BAR  JOSEPH,  Z.,  2005.  Clustering  short  time  series  gene  expression  data.  Oxford:  Oxford  University  Press.  10.  FREEMAN,  T.,  GOLDOVSKY,  L.,  BROSCH,  M.,  VAN  DONGEN,  S.,  MAZIRE,  P.,  GROCOCK,  R.,  FREILICH,  S.,  THORNTON,  J.  and  ENRIGHT,  A.,  2007.  Construction,  visualisation,  and  clustering  of  transcription  networks  from  microarray  expression  data.  San  Francisco,  CA:  Public  Library  of  Science.  11.  GENTLEMAN,  R.,  2005.  Bioinformatics  and  Computational  Biology  Solutions  Using  R  and  Bioconductor.  New  York:  Springer-­‐Verlag.  
  • 46. 46    12.  GREEN,  Y.  and  VETTER,  M.,  2011.  EBF  proteins  participate  in  transcriptional  regulation  of  Xenopus  muscle  development.  San  Diego  [etc.]:  Academic  Press.  13.  HORVATH,  S.,  2011.  Weighted  Network  Analysis  :  Applications  in  Genomics  and  Systems  Biology.  New  York:  Springer.  14.  KAUFFMANN,  A.,  GENTLEMAN,  R.  and  HUBER,  W.,  2009.  arrayQualityMetrics-­‐-­‐a  bioconductor  package  for  quality  assessment  of  microarray  data.  Oxford:  Oxford  University  Press.  15.  LANGFELDER,  P.  and  HORVATH,  S.,  2008.  WGCNA:  an  R  package  for  weighted  correlation  network  analysis.  Bmc  Bioinformatics,  9,  pp.  559.  16.  LANGFELDER,  P.,  ZHANG,  B.  and  HORVATH,  S.,  2008.  Defining  clusters  from  a  hierarchical  cluster  tree:  the  Dynamic  Tree  Cut  package  for  R.  Oxford:  Oxford  University  Press.  17.  QIAN,  J.,  DOLLED  FILHART,  M.,  LIN,  J.,  YU,  H.  and  GERSTEIN,  M.,  2001.  Beyond  synexpression  relationships:  local  clustering  of  time-­‐shifted  and  inverted  gene  expression  profiles  identifies  new,  biologically  relevant  interactions.  London,:  Academic  Press.  18.  SHANNON,  P.,  MARKIEL,  A.,  OZIER,  O.,  BALIGA,  N.,  WANG,  J.,  RAMAGE,  D.,  AMIN,  N.,  SCHWIKOWSKI,  B.  and  IDEKER,  T.,  2003.  Cytoscape:  a  software  environment  for  integrated  models  of  biomolecular  interaction  networks.  Cold  Spring  Harbor,  N.Y.:  Cold  Spring  Harbor  Laboratory  Press.  19.  SHANNON,  P.,  REISS,  D.,  BONNEAU,  R.  and  BALIGA,  N.,  2006.  The  Gaggle:  An  open-­‐source  software  system  for  integrating  bioinformatics  software  and  data  sources.  Bmc  Bioinformatics,  7,  pp.  176.  20.  SMYTH,  G.,  2004.  Linear  models  and  empirical  bayes  methods  for  assessing  differential  expression  in  microarray  experiments.  [Berkeley,  CA]:  Berkeley  Electronic  Press.  21.  STOLFI,  A.,  GAINOUS,  T.B.,  YOUNG,  J.J.,  MORI,  A.,  LEVINE,  M.  and  CHRISTIAEN,  L.,  2010.  Early  Chordate  Origins  of  the  Vertebrate  Second  Heart  Field.  Science,  329(5991),  pp.  565-­‐568.  22.  SZKLARCZYK,  D.,  FRANCESCHINI,  A.,  KUHN,  M.,  SIMONOVIC,  M.,  ROTH,  A.,  MINGUEZ,  P.,  DOERKS,  T.,  STARK,  M.,  MULLER,  J.,  BORK,  P.,  JENSEN,  L.  and  VON  MERING,  C.,  2011.  The  STRING  database  in  2011:  
  • 47. 47    functional  interaction  networks  of  proteins,  globally  integrated  and  scored.  [London]:  Information  Retrieval  Ltd.  23.  TAMAYO,  P.,  SLONIM,  D.,  MESIROV,  J.,  ZHU,  Q.,  KITAREEWAN,  S.,  DMITROVSKY,  E.,  LANDER,  E.S.  and  GOLUB,  T.R.,  1999.  Interpreting  patterns  of  gene  expression  with  self-­‐organizing  maps:  methods  and  application  to  hematopoietic  differentiation.  Washington,  D.C.:  National  Academy  of  Sciences.  24.  TUSHER,  V.,  TIBSHIRANI,  R.  and  CHU,  G.,  2001.  Significance  analysis  of  microarrays  applied  to  the  ionizing  radiation  response.  Proceedings  of  the  National  Academy  of  Sciences  of  the  United  States  of  America,  98(9),  pp.  5116-­‐5121.  25.  WETTENHALL,  J.  and  SMYTH,  G.,  2004.  limmaGUI:  A  graphical  user  interface  for  linear  modeling  of  microarray  data  RID  B-­‐5276-­‐2008.  Bioinformatics,  20(18),  pp.  3705-­‐3706.  26.  ZHANG,  B.  and  HORVATH,  S.,  2005.  A  general  framework  for  weighted  gene  co-­‐expression  network  analysis.  Statistical  Applications  in  Genetics  and  Molecular  Biology,  4,  pp.  17.    

×