Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Bioinformática aplicada al estudio del control de la expresión de genes en el cerebro humano

82 views

Published on

UKBEC (United Kingdom Brain Expression Consortium) has the aim of studying the mechanisms of gene expression regulation in human brain. For that, it works on the creation of regulation models based on (1) expression quantitative trait loci, (2) allele specific expression and (3) co-expression networks. During the first data release from the consortium, braineac.org was created to facilitate sharing results with the research community. Braineac is a database of gene expression and its regulation for 10 brain regions based on samples collected by the Medical Research Council (MRC) Sudden Death Brain and Tissue Bank, Edinburgh, UK from 134 neuro-pathologically normal individuals. Gene expression profiling were based on Affymetrix Human Exon 1.0 ST Arrays. Genotyping were performed with Illumina Infinium Omni1-Quad BeadChip and on Immunochip. We will introduce this resource, currently hosted at Universidad de Murcia, and explain how it can help researchers working on brain diseases. On a second part of the talk, we will focus on the creation of the second release of the Braineac resource, based on RNA-seq technology that allows a genome-wide study of regulation mechanisms.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Bioinformática aplicada al estudio del control de la expresión de genes en el cerebro humano

  1. 1. Juan  A.  Bo*a   Ins-tute  of  Neurology,  University  College  London,  UK   Facultad  de  Informá-ca,  Universidad  de  Murcia,  Spain   Algorithmic  Approaches  for  the  construc3on  of  gene  co-­‐expression   networks  from  control  brain  3ssue  samples  mRNA         RNA-­‐seq  Substan-a  nigra  and  Putamen  brain  co-­‐expression  networks  on  the  UKBEC  project  to   study  Parkinson’s  Disease   03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   2  
  2. 2. The  central  dogma  of   biology   source  Wikipedia   We  use  pre-­‐mRNA   03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   3  
  3. 3. Chapter  I.  The  dataset   03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   4  
  4. 4. Braineacv2,  RNA-­‐seq  based,  focused   on  Parkinson’s  Disease   l  Affects 1% to 2% of the population older than 65 years l  Symptons: resting tremor, bradykinesia, rigidity and impairment in ability to initiate and sustain movements l  The hallmark of this disease is the progressive loss of dopaminergic neurons, mainly in the substantia nigra 03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   5  
  5. 5. Chapter  II.  The  computa-onal  model   03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   6  
  6. 6. Network  analysis:  aprioris-c  versus   free  approaches   03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   7  
  7. 7. Are  networks  something  more  than  a   fancy  graph  and  nice  plots?   Yes  they  are!!   • Can  be  used  to  iden-fy  the  ac-ve   pathways  in  specific  samples  (cases  vs.   controls)   • Describe  subsystems  (i.e.  cell  types)     • Iden-fy  candidate  genes  (GBA)   03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   8  
  8. 8. To  create  networks  we  need  to   es-mate  links  between  genes   03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   9  
  9. 9. From  gene  expression  to     gene  co-­‐expression  networks   TREM2  forms  a  receptor  signaling  complex  with   TYROBP,  which  triggers  the  ac-va-on  of  immune   responses   in   macrophages   and   dendri-c   cells,   and   the   func-onal   polymorphism   of   TREM2   is   r e p o r t e d   t o   b e   a s s o c i a t e d   w i t h   neurodegenera-ve  disorders  such  as  Alzheimer’s   disease  (AD).     03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   10  
  10. 10. From  gene  expression  to     gene  co-­‐expression  networks   TREM2  forms  a  receptor  signaling  complex  with   TYROBP,  which  triggers  the  ac-va-on  of  immune   responses   in   macrophages   and   dendri-c   cells,   and   the   func-onal   polymorphism   of   TREM2   is   r e p o r t e d   t o   b e   a s s o c i a t e d   w i t h   neurodegenera-ve  disorders  such  as  Alzheimer’s   disease  (AD).     03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   11  
  11. 11. From  gene  expression  to     gene  co-­‐expression  networks   TREM2  forms  a  receptor  signaling  complex  with   TYROBP,  which  triggers  the  ac-va-on  of  immune   responses   in   macrophages   and   dendri-c   cells,   and   the   func-onal   polymorphism   of   TREM2   is   r e p o r t e d   t o   b e   a s s o c i a t e d   w i t h   neurodegenera-ve  disorders  such  as  Alzheimer’s   disease  (AD).     TYROBP   TREM2   0.76   03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   12  
  12. 12. From  gene  expression  to     gene  co-­‐expression  networks   TREM2  forms  a  receptor  signaling  complex  with   TYROBP,  which  triggers  the  ac-va-on  of  immune   responses   in   macrophages   and   dendri-c   cells,   and   the   func-onal   polymorphism   of   TREM2   is   r e p o r t e d   t o   b e   a s s o c i a t e d   w i t h   neurodegenera-ve  disorders  such  as  Alzheimer’s   disease  (AD).     TYROBP   TREM2   0.76   03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   13  
  13. 13. But  before  reaching  that   •  Scale  free  topology  assump-on   – The  degree  distribu-on  p(k)  of  a  network  follows   a  power  law  so  p(k)  ~  k-­‐ϒ   – Evidence  supports  this  for  many  organisms  (ϒ  is   approx.  2.2)   03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   14  
  14. 14. But  before  reaching  that  (&  2)   •  Modularity  assump-on   – Varia-on  coefficient  of  organisms,  Ci  =︎  2n/ki(ki  –  1)   with  n  number  of  direct  links  connec-ng  the  ki   nearest  neighbours  of  i-­‐th  node,  suggests  strong   modular  organiza-on   – Evidence  suggests  the  coefficient  of  varia-on  is   higher  than  expected  in  SFT  networks   03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   15  
  15. 15. But  before  reaching  that  (&  3)   •  Hierarchies  solve  this  apparent  dilemma     03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   16  
  16. 16. Chapter  III.  The  problem   03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   17  
  17. 17. Our  main  focus:  Parkison's  Disease   l  Affects 1% to 2% of the population older than 65 years l  Symptons: resting tremor, bradykinesia, rigidity and impairment in ability to initiate and sustain movements l  The hallmark of this disease is the progressive loss of dopaminergic neurons, mainly in the substantia nigra excitatory inhibitory Substantia Nigra Pars Compacta Brain regions most typically affected by adult-onset disease 03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   18  
  18. 18. Step 1: RPKM exonic gene quantification and CQN normalization Step 2: RPKM-CQN > 0.2 & missingness < 70% Step 3: Data correcting for Sex, Age and 7/8 Peer axes Step 4: WGCNA “signed” network construction Step 5: k-Means optimization of module partitions Step 6: Network assessment Step 7: Within tissue and between tissues subsystem characterization 33670 Ensembl genes Approx. 19K genes, two datasets Two corrected datasets SNIG and PUTM networks And gene modules assignment Modified gene modules assignment for SNIG and PUTM Quality metrics for networks and Gene partitions Functional characterization, correlation with traits, gene function prediction Steps on the pipeline Outcomes 03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   19  
  19. 19. Co-expression analysis methodology 03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   20  
  20. 20. A  measure  of  similarity  between  genes,  values  in  [0,1]   From  similarity  to  adjacency,  hard  thresholding   From  similarity  to  adjacency,  sou  thresholding   From  adjacency  to  TOM  values   03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   21  
  21. 21. From  TOM  values  to  clusters  by  1-­‐TOM  as  a  distance   complete  linkage  hierarchical  approach  for  clustering     summarisa-on  based  on  eigenvalue   03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   22  
  22. 22. l  Hierarchical clustering's results are highly variable depending on linkage (max/complete, min/single, average linkages) l  Module membership (MM) of g is the correlation of g and the 1st PC of gene expression (module eigengene) l  This doesn't necessarily mean all genes are in the best module according to MM l  Previous approaches based on reassigning some/all genes l  k-means algorithm helps finding a better partition in which genes are (hopefully) assigned to a module in a more natural way Why do we need an optimization process for WGCNA 03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   23  
  23. 23. A  k-­‐means  heuris-c   How  does  it  work?       03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   24  
  24. 24. 03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   25  
  25. 25. 03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   26  
  26. 26. 03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   27  
  27. 27. 03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   28  
  28. 28. 03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   29  
  29. 29. 03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   30  
  30. 30. Outline  of  the  op-miza-on   Accepted  in  BCM  Systems  Biology   03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   31  
  31. 31. Chapter  IV.  The  results   03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   32  
  32. 32. What we get from the optimization •  More  accurate  par--on  construc-on   •  Bever  func-on  annota-on  for  modules   •  Bever  cell  markers  enrichment   •  More  preserved  modules  across  similar  -ssues   03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   33  
  33. 33. How to assess the accuracy of a co-expression network cluster driven validation data driven validation by replication Are the gene groups good according to a given index same tissue similar tissue same network model diff. network model Biology: Does my module make sense? functional characterization 03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   34  
  34. 34. How to assess the accuracy of a co-expression network cluster driven validation data driven validation by replication Are the gene groups good according to a given index same tissue similar tissue same network model diff. network model Biology: Does my module make sense? functional characterization 03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   35  
  35. 35. Replication in GTEx GNAT networks for Substantia Nigra lightgreen midnightblue cyan tan turquoise grey60 lightyellow green pink blue magenta purple yellow red black lightcyan brown salmon greenyellow Mantel fold SNIG GTEx coexpression within 0.0 0.5 1.0 1.5 2.0 2.5 3.0 *** 340 *** 412 *** 449 *** 385 *** 574 ** 295 *** 250 *** 427 *** 457 *** 783 *** 505 *** 417 *** 579 *** 521 244 * 410 260 88 372 red purple magenta turquoise blue yellow lightyellow cyan lightcyan tan green grey60 midnightblue lightgreen pink brown greenyellow black salmon Mantel fold SNIG microarray binary between 0.0 0.5 1.0 1.5 2.0 *** 701 *** 624 *** 760 *** 837 *** 1070 *** 837 *** 365 *** 624 *** 653 *** 460 *** 658 477 475 417 743 406 579 402 149 03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   36  
  36. 36. Replication in GTEx GNAT networks for Putamen lightcyan grey60 yellow salmon greenyellow pink green black tan brown purple magenta turquoise lightgreen midnightblue blue cyan Mantel fold PUTM GTEx coexpression within 0.0 0.5 1.0 1.5 2.0 2.5 3.0 *** 429 *** 275 * 72 *** 372 *** 268 *** 444 *** 486 *** 484 *** 541 *** 611 *** 574 *** 617 *** 546 *** 440 *** 461 ** 386 *** 759 greenyellow salmon lightcyan brown green pink grey60 cyan tan magenta black lightgreen purple turquoise midnightblue blue yellow Mantel fold PUTM GTEx binary between 0.0 0.5 1.0 1.5 2.0 2.5 *** 268 *** 372 *** 429 *** 611 *** 486 *** 444 *** 275 *** 759 *** 541 *** 617 *** 484 *** 440 *** 574 *** 546 *** 461 *** 386 72 03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   37  
  37. 37. How to assess the accuracy of a co-expression network cluster driven validation data driven validation by replication Are the gene groups good according to a given index same tissue similar tissue same network model diff. network model Biology: Does my module make sense? functional characterization 03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   38  
  38. 38. Asignment  of  biological  func3on  to  modules     with  gProfiler   •  Based  on  GO  (BP,  MF,  CC)  and  gProfileR   •  Fisher's  exact  test  and  Bonferroni  corrected  p-­‐values   •  What  should  we  expect?   •  Normal  cell  processes  like  respira-on,  cell  development,  immune   func-on   •  But  also  brain  related  terms  (hopefully  movement  disorders,   signalling)  in  some  of  the  modules   •  What  should  we  consider  when  looking  for  enrichment?   •  GO  is  not  a  closed  world  ontology   •  Something  not  found  doesn't  imply  it  doesn't  exist   •  Genes  can  play  new  roles   •  Groups  of  genes  can  have  new  func-ons   •  It  is  possible  to  find  modules  with  no  GO  and  s-ll  be  valid       03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   39  
  39. 39. Significant similarities in practically all modules This is a tabular View of significant agreements (Fisher's Exact test) on genes between modules from the two tissues 03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   40  
  40. 40. Subsystems 03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   41  
  41. 41. Subsystems cell type & function Neuron cells, Synapse/NADH Microglia cells, Immune system Nucleus, transcription Neuron, astrocytes & microglia cell types Response to stimulus Endothelial cell type, Cell division Oligodendrocytes cell type, synapse & ion transport Mitochondrion Cytosolic rybosome Ubiqutin 03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   42  
  42. 42. Lessons learned l  The default WGCNA can be improved to get more coherent gene groups l  Network analysis reveals l  cell specific subsystems in putamen and substantia nigra l  Interesting differences between the two tissues at the subsystem level Ongoing work l  Models to explain the differences between subsystems l  Function prediction for non coding species and intergenic regions 03/04/17   Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   43  
  43. 43. Acknowledgements University College London Jana Vandrovcova Sebastian Guelfi Karishma D'sha John Hardy Mar Matarin Daniah Trabzuni King's College London Mike Weale Mina Ryten Paola Forabosco Adai Ramasamy Conferencias  de  Inves-gación  para  Posgrado,  Fac.  Informátca,  UCM   44  

×