Successfully reported this slideshow.

Festival Of Genomics 2016 - Brain talk

2

Share

1 of 57
1 of 57

Festival Of Genomics 2016 - Brain talk

2

Share

Download to read offline

Applying single cell transcriptomics: unraveling the complexity of the brain

http://www.festivalofgenomicsboston.com/speaker/jean-fan/

Applying single cell transcriptomics: unraveling the complexity of the brain

http://www.festivalofgenomicsboston.com/speaker/jean-fan/

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

Festival Of Genomics 2016 - Brain talk

  1. 1. Jean Fan / Festival of Genomics / June 2016 1 Jean Fan NSF GRFP | Bioinformatics and Integrative Genomics PhD Candidate Kharchenko Lab | Department of Biomedical Informatics | Harvard University Applying single cell transcriptomics: unraveling the complexity of the brain
  2. 2. Jean Fan / Festival of Genomics / June 2016 2
  3. 3. Motivation: Characterize heterogeneity and identify cell subpopulations with scRNA-seq Jean Fan / Festival of Genomics / June 2016 3 Valent P, Bonnet D, De maria R, et al. Cancer stem cell definitions and terminology: the devil is in the details. Nat Rev Cancer. 2012;12(11):767-75. Cancer Kaech SM, Cui W. Transcriptional control of effector and memory CD8+ T cell differentiation. Nat Rev Immunol. 2012;12(11):749-61. T Cells
  4. 4. Motivation: Characterize heterogeneity and identify cell subpopulations with scRNA-seq Jean Fan / Festival of Genomics / June 2016 4 Greig LC, Woodworth MB, Galazo MJ, Padmanabhan H, Macklis JD. Molecular logic of neocortical projection neuron specification, development and diversity. Nat Rev Neurosci. 2013;14(11):755-69. NPCs
  5. 5. Motivation: Characterize heterogeneity and identify cell subpopulations with scRNA-seq Jean Fan / Festival of Genomics / June 2016 5 Greig LC, Woodworth MB, Galazo MJ, Padmanabhan H, Macklis JD. Molecular logic of neocortical projection neuron specification, development and diversity. Nat Rev Neurosci. 2013;14(11):755-69. NPCs Single cell RNA-seq
  6. 6. Food For Thought ◦ How can we identify transcriptional subpopulations in a way that is robust and takes into consideration technical artefacts from single cell RNA-seq? ◦ What are the different ways to group and classify cells in the brain? ◦ In additional to expression heterogeneity, how can we make the most out of single-cell RNA-seq data? Jean Fan / Festival of Genomics / June 2016 6
  7. 7. Food For Thought ◦ How can we identify transcriptional subpopulations in a way that is robust and takes into consideration technical artefacts from single cell RNA-seq? ◦ What are the different ways to group and classify cells in the brain? ◦ In additional to expression heterogeneity, how can we make the most out of single-cell RNA-seq data? Jean Fan / Festival of Genomics / June 2016 7
  8. 8. Challenges: scRNA-seq data is highly variable and noisy ◦ Expect high correlation between replicates Jean Fan / Festival of Genomics / June 2016 8 expression in bulk replicate 1 expressioninbulkreplicate2 Bulk
  9. 9. Challenges: scRNA-seq data is highly variable and noisy ◦ Expect high correlation between replicates ◦ Many differences between individual cells (even of the same type) ◦ Biological vs. technical differences ◦ Focus on the biological variability ◦ Control for the technical variability ◦ ex. measurement failures (drop-outs) Jean Fan / Festival of Genomics / June 2016 9 Single Cell
  10. 10. Previous work: SCDE - use error models to get a better handle on technical noise Jean Fan / Festival of Genomics / June 2016 10
  11. 11. Previous work: SCDE - use error models to get a better handle on technical noise ◦ Estimate true biological variability of a gene ◦ Account for possible drop- out events Jean Fan / Festival of Genomics / June 2016 11 Cross-fits Cell 1 Cell2
  12. 12. Previous work: SCDE - use error models to get a better handle on technical noise ◦ Estimate true biological variability of a gene ◦ Account for possible drop- out events Jean Fan / Festival of Genomics / June 2016 12 Cross-fits Error Models Cell 1 Cell2
  13. 13. Previous work: SCDE - use error models to get a better handle on technical noise ◦ Estimate true biological variability of a gene ◦ Account for possible drop-out events ◦ Assess variability of expressing taking into consideration expression magnitude dependencies Jean Fan / Festival of Genomics / June 2016 13 Variance Normalization
  14. 14. Jean Fan / Festival of Genomics / June 2016 14 Error models and normalization helps us understand the data on a probabilistic level: What is the chance this 0 expression in this cell is due to drop-out or true non-expression? What is the chance that this gene is really this variable given the expected variability for genes at this average expression magnitude?
  15. 15. PAGODA (Pathway And Geneset OverDispersion Analysis) applies error models and variance normalization to characterize heterogeneity and identify subpopulations pklab.med.harvard.edu/scde
  16. 16. PAGODA intuition: Improve statistical sensitivity by taking advantage of pathways and gene sets ◦ Rather than relying on a few genes, look for broader patterns of variability ◦ Coordinated patterns of variability of genes linked to function/phenotype == stronger signal -> increases statistical power
  17. 17. PAGODA intuition: Improve statistical sensitivity by taking advantage of pathways and gene sets ◦ Rather than relying on a few genes, look for broader patterns of variability ◦ Coordinated patterns of variability of genes linked to function/phenotype == stronger signal -> increases statistical power
  18. 18. PAGODA intuition: Improve statistical sensitivity by taking advantage of pathways and gene sets ◦ Rather than relying on a few genes, look for broader patterns of variability ◦ Coordinated patterns of variability of genes linked to function/phenotype == stronger signal -> increases statistical power
  19. 19. PAGODA overview: assess expression within annotated pathways and de novo gene sets
  20. 20. PAGODA overview: assess expression within annotated pathways and de novo gene sets
  21. 21. PAGODA overview: Identify pathways and gene sets exhibiting coordinated over dispersion
  22. 22. PAGODA overview: Remove redundancy pathways and gene sets, and visualize
  23. 23. Jean Fan / Festival of Genomics / June 2016 23 Pathway based approach integrates prior knowledge to increase statistical power and provide interpretability of identified subpopulations (example next)
  24. 24. Food For Thought ◦ How can we identify transcriptional subpopulations in a way that is robust and takes into consideration technical artefacts from single cell RNA-seq? ◦ What are the different ways to group and classify cells in the brain? ◦ In additional to expression heterogeneity, how can we make the most out of single-cell RNA-seq data? Jean Fan / Festival of Genomics / June 2016 24
  25. 25. PAGODA applied to mouse neural progenitors identifies and characterizes subpopulations cells pathway clusters Kun Zhang Jerold Chun
  26. 26. PAGODA applied to mouse neural progenitors identifies and characterizes subpopulations
  27. 27. PAGODA applied to mouse neural progenitors identifies and characterizes subpopulations
  28. 28. PAGODA applied to mouse neural progenitors identifies and characterizes subpopulations
  29. 29. PAGODA applied to mouse neural progenitors identifies and characterizes subpopulations
  30. 30. PAGODA applied to mouse neural progenitors identifies and characterizes subpopulations
  31. 31. PAGODA applied to mouse neural progenitors identifies and characterizes subpopulations
  32. 32. PAGODA integrated with FISH data spatially placed subpopulations 32 github.com/hms-dbmi/brainmapr
  33. 33. PAGODA integrated with FISH data spatially placed subpopulations Allen Brain Atlas; https://github.com/hms-dbmi/brainmapr
  34. 34. PAGODA identifies multiple, potentially overlapping aspects of transcriptional heterogeneity
  35. 35. PAGODA identifies multiple, potentially overlapping aspects of transcriptional heterogeneity
  36. 36. PAGODA identifies multiple, potentially overlapping aspects of transcriptional heterogeneity Allen Brain Atlas; https://github.com/hms-dbmi/brainmapr
  37. 37. Food For Thought ◦ How can we identify transcriptional subpopulations in a way that is robust and takes into consideration technical artefacts from single cell RNA-seq? ◦ What are the different ways to group and classify cells in the brain? ◦ In additional to expression heterogeneity, how can we make the most out of single-cell RNA-seq data? Jean Fan / Festival of Genomics / June 2016 37
  38. 38. Food For Thought ◦ How can we identify transcriptional subpopulations in a way that is robust and takes into consideration technical artefacts from single cell RNA-seq? ◦ What are the different ways to group and classify cells in the brain? ◦ In additional to expression heterogeneity, how can we make the most out of single-cell RNA-seq data? ◦ Alternative splicing Jean Fan / Festival of Genomics / June 2016 38
  39. 39. PAGODA applied to human cortical cells identifies and characterizes subpopulations Jean Fan / Festival of Genomics / June 2016 39 Xiaochang Zhang Chris Walsh
  40. 40. Jean Fan / Festival of Genomics / June 2016 40 Marker genes confirm subpopulation identified by PAGODA
  41. 41. PAGODA integrated with MISO identifies alternative splicing in pure pooled single cells Jean Fan / Festival of Genomics / June 2016 41
  42. 42. PAGODA integrated with MISO identifies alternative splicing in pure pooled single cells Jean Fan / Festival of Genomics / June 2016 42 Needs bulk
  43. 43. PAGODA integrated with MISO identifies alternative splicing in pure pooled single cells Jean Fan / Festival of Genomics / June 2016 43 Needs bulk -> pool single cells
  44. 44. Pure pooled RGs vs neurons lend credence to potential purity concerns with bulk CP vs. VZ Jean Fan / Festival of Genomics / June 2016 44
  45. 45. Food For Thought ◦ How can we identify transcriptional subpopulations in a way that is robust and takes into consideration technical artefacts from single cell RNA-seq? ◦ What are the different ways to group and classify cells in the brain? ◦ In additional to expression heterogeneity, how can we make the most out of single-cell RNA-seq data? ◦ Alternative splicing ◦ Copy number alteration detection / integrative analysis Jean Fan / Festival of Genomics / June 2016 45
  46. 46. BADGER quantitatively assess posterior probabilities of copy number alterations Jean Fan / Festival of Genomics / June 2016 46 Bayesian Approach to CNV Detection from single cell RNA-seq (BADGER)
  47. 47. BADGER quantitatively assess posterior probabilities of copy number alterations Jean Fan / Festival of Genomics / June 2016 47 Bayesian Approach to CNV Detection from single cell RNA-seq (BADGER)
  48. 48. BADGER quantitatively assess posterior probabilities of copy number alterations Jean Fan / Festival of Genomics / June 2016 48 Bayesian Approach to CNV Detection from single cell RNA-seq (BADGER)
  49. 49. BADGER applied to scRNA-seq identified subclonal expansion in progressive MM Jean Fan / Festival of Genomics / June 2016 49 Soo Lee Peter Park Woong-Yang Park Hae-Ock Lee Initial Bone Marrow Ascite MM34 MM34A
  50. 50. BADGER applied to scRNA-seq identified subclonal expansion in progressive MM Jean Fan / Festival of Genomics / June 2016 50
  51. 51. BADGER applied to scRNA-seq identified subclonal expansion in progressive MM Jean Fan / Festival of Genomics / June 2016 51
  52. 52. BADGER applied to scRNA-seq identified subclonal expansion in progressive MM Jean Fan / Festival of Genomics / June 2016 52
  53. 53. BADGER applied to scRNA-seq identified subclonal expansion in progressive MM Jean Fan / Festival of Genomics / June 2016 53
  54. 54. PAGODA integrated with BADGER connects genetic with transcriptional heterogeneity Jean Fan / Festival of Genomics / June 2016 54
  55. 55. PAGODA integrated with BADGER connects genetic with transcriptional heterogeneity Jean Fan / Festival of Genomics / June 2016 55
  56. 56. Jean Fan / Festival of Genomics / June 2016 56 ScRNA-seq contains (noisy) expression as well as (noisy) splicing and some (noisy) genetic information. Novel statistical and computational methods and techniques are still needed to harness the potential of scRNA-seq data!
  57. 57. Thanks! Kharchenko Lab Peter Kharchenko Joseph Herman Jean Fan / Festival of Genomics / June 2016 57 Park Lab Soo Lee Semin Lee SGI Hae-Ock Lee Walsh Lab Xiaochang Zhang Funding

Editor's Notes

  • Actually identify subpopulations
  • DCX = neuronal maturation marker
    Previous FACs rely on just one marker

    PAGODA builds on these error models

    Rather than variability of genes,
    coordinated variability of genes within a pathway or gene set

    The general intuition…

    you can image if I have
    many cells one gene
    red is high blue is low
  • PAGODA builds on these error models

    Rather than variability of genes,
    coordinated variability of genes within a pathway or gene set

    The general intuition…

    you can image if I have
    many cells one gene
    red is high blue is low
  • PAGODA builds on these error models

    Rather than variability of genes,
    coordinated variability of genes within a pathway or gene set

    The general intuition…

    you can image if I have
    many cells one gene
    red is high blue is low
  • After error modeling…

    Explain green and orange
    Red and green
    split de novo and top section

    Given annotations from MsigDB, GO, or other ontologies
    we integrate the error models previously mentioned and use weighted PCA to capture the variability of a gene set in principle components

    where weights are derived from our error modeling

    because annotations are limited, we also derive ‘de novo’ gene sets based on correlated expression patterns we observe directly from the data

    Capturing the patterns of variability
  • because annotations are limited, we also derive ‘de novo’ gene sets based on correlated expression patterns we observe directly from the data
  • We focus on the pathways and gene sets that exhibit significantly coordinated variability

    Statistical significance of the λ1 eigenvalues obtained for each gene set was evaluated based on the Tracy-Widom F1 distribution F1(m,ne ), where m is the number of
    genes in a given set s, and ne is the effective number of cells, determined to fit the distribution of the randomly sampled gene sets (containing the same number of genes as the actual gene sets).
  • But many pathways and gene sets share genes or show similar patterns of variability across cells

    we further collapse these redundancies into pathway clusters

    Ultimately finally providing a cell clustering
    along with an interactive browser to explore these results

    Label middle heatmap
  • Demonstrate the utility of PAGODA
    NPCs from embryonic 13.5 mouse

    Row is pathway cluster
    Column is a cell

    PAGODA pulls out aspects of transcriptional heterogeneity

    Focus on the most prominent aspect
  • Demonstrate the utility of PAGODA
    NPCs from embryonic 13.5 mouse

    Row is pathway cluster
    Column is a cell

    PAGODA pulls out aspects of transcriptional heterogeneity

    Focus on the most prominent aspect
  • Explore genes and pathways driving this pathway cluster
  • interpret this aspects of transcriptional heterogeneity as relating to
  • Alternatively, a can focus on a different aspect of transcriptional heterogeneity
  • Similarly for rest of the pathway clusters
  • Based on the genes and pathways driving this primary aspects of transcriptional heterogeneity, we can interpret the corresponding cells as early and intermediate or mature NPCs

    Indeed, when we look at how the genes associated with these subpopulations localize in the brain, we see
    signatures associated with early NPCs localize to the sub ventricular zone
    while sigs assoc…further outwards
    which is consistent with what’s known about NPC migration through the ventricular zone during development

    Credit images to allen brain atlas
    Move mouse block down, label out
  • Indeed, when we look at how the genes associated with these subpopulations localize in the brain, we see
    signatures associated with early NPCs localize to the sub ventricular zone
    while sigs assoc…further outwards
    which is consistent with what’s known about NPC migration through the ventricular zone during development

    Credit images to allen brain atlas
    Move mouse block down, label out
  • There are potentially many ways to group cells
    such as mature vs. early NPCs

    Often we find the primary component of variation driving cell differences is cell cycle
    Some times, we’re not interested in cell cycle
    Important to be able to look at additional aspects of transcriptional heterogeneity that may not agree with the primary split

    It is known that these are tangentially migrating neurons
    some are early, some are mature
    Add prediction figure
  • There are potentially many ways to group cells
    such as mature vs. early NPCs

    Often we find the primary component of variation driving cell differences is cell cycle
    Some times, we’re not interested in cell cycle
    Important to be able to look at additional aspects of transcriptional heterogeneity that may not agree with the primary split

    It is known that these are tangentially migrating neurons
    some are early, some are mature
    Add prediction figure
  • There are potentially many ways to group cells
    such as mature vs. early NPCs

    Often we find the primary component of variation driving cell differences is cell cycle
    Some times, we’re not interested in cell cycle
    Important to be able to look at additional aspects of transcriptional heterogeneity that may not agree with the primary split

    It is known that these are tangentially migrating neurons
    some are early, some are mature
    Add prediction figure
  • Now that we have pure single cell subpopulations, let’s create pure in silico bulks
  • Create in silico mini-bulks
  • Reviewers were initially concerned about purity of bulk
  • Method I’m currently working on for copy number detection from single cell RNA-seq
  • ×