2. Outline:
• A comparison of common scRNA-seq approaches
• Challenges specific to working with clinical samples of tumor
tissue
• Uses of scRNA-seq – what types of questions can scRNA-seq
answer?
7. • ScRNA-seq works best with
fresh tissue
• Processing fresh tissue can be
logistically challenging and
accrual rates may be low
• Nuclei can be harvested from
frozen tissue and subjected to
scRNA-seq• Only nuclear transcripts profiled – fewer genes tagged per cell
• Sensitive to RNA degradation
• However, single-cell
suspensions can also be fixed
with methanol and later re-
hydrated for processing
9. scRNA-seq
pipeline:
• scRNA-seq and exome-seq
are performed in parallel
• Clonal single-nucleotide
variants (SNVs) and copy-
number variants (CNVs) can
be identified in single cells
Bartlett et al. Sci Rep 2017
Müller et al. Bioinfo 2018
10. Genotyping via SNVs
Clustering via expression
GATK
WES/Annovar
Nonsynonymous Mutation
Vartrix
SCG: Single Cell Genotyper
Non-neoplastic cells
Probability
Neoplastic
Non-
neoplastic
Genotyping via CNVs
11. scRNA-seq
pipeline:
• scRNA-seq and exome-seq
are performed in parallel
• Clonal single-nucleotide
variants (SNVs) and copy-
number variants (CNVs) can
be identified in single cells
• Neoplastic cells can be
separated from immune cells
by analyzing expressed
mutations and transcriptional
profiles
12. A gene signature that distinguishes macrophages by
ontogeny in glioma
• Bowman 2016 studied
macrophage induction in
murine gliomas using
two macrophage-
lineage tracing models
• We reasoned a core
signature of ontogeny
may be conserved in
human macrophages
13. A gene signature that distinguishes macrophages by
ontogeny in glioma
• Bowman 2016 studied
macrophage induction in
murine gliomas using
two macrophage-
lineage tracing models
• We reasoned a core
signature of ontogeny
may be conserved in
human macrophages
14. A gene signature that distinguishes macrophages by
ontogeny in glioma
• Principal components analysis
in the space of genes
differentially expressed in
Bowman et al. separates TAMs
into two discrete populations.
Müller et al. Genome Biol 2017
15. Müller et al. Genome Biol 2017
TCGA gliomas (all grades)
A gene signature that distinguishes
macrophages by ontogeny in glioma
• Principal components analysis
in the space of genes
differentially expressed in
Bowman et al. separates TAMs
into two discrete populations.
• Ontogeny-specific gene sets
are co-expressed in TCGA
glioma data
16. Müller et al. Genome Biol 2017
IVY GAP glioblastomas
A gene signature that distinguishes
macrophages by ontogeny in glioma
• Principal components analysis
in the space of genes
differentially expressed in
Bowman et al. separates TAMs
into two discrete populations.
• Ontogeny-specific gene sets
are co-expressed in TCGA
glioma data
• Blood-derived TAMs are
enriched in perivascular and
necrotic regions, microglial
TAMs in the leading edge and
infiltrating tumor
17. A gene signature that distinguishes
macrophages by ontogeny in glioma
18. “RNA velocity” – estimating the rate of
transcription from the splicing rate
19.
20. The transcriptional phenotypes of GBM neoplastic
cells can be explained by a single axis that varies
from proneural to mesenchymal
A draft single-cell atlas of human glioblastoma reveals a single axis of phenotype in tumor-
propagating cells. S. Muller et al., M. Aghi, A. Diaz
bioRxiv 377606; doi: https://doi.org/10.1101/377606
Editor's Notes
A comparison of common scRNA-seq approaches: I will describe the differences between the two most common scRNA-seq workflows: droplet based vs. plate based
Challenges specific to working with clinical samples of tumor tissue: Here I will discuss the technical challenges of working with limited tissue, as well as the computational challenge of separating neoplastic cells from tumor-infiltrating stromal and immune cells in silico
Uses of scRNA-seq: I will discuss the uses of scRNA-seq, including unbiased cell-type discovery, mapping cell-type signatures to bulk-sequencing datasets, mapping cell-types to tumor-anatomical atlases, and RNA velocity
The two most common methods for generating single‐cell transcriptomics data are droplet based and plate‐based. In droplet-based method, individual cells are captured in emulsion bubbles together will gel beads, on which oligo-dt primers have been immobilized (along with unique molecular identifier and cell-barcode sequences). In the plate-based method, single cells are flow sorted into multiwell plates. The plate-based approach can also be implemented using microfluidics chips (i.e. Fluidigm C1). The small reaction volumes enable very high quality libraries.
The fundamental difference between the two approaches is how sequencing coverage is distributed. The plate-based approach has a smaller cellular throughput, typically 200-400 cells per sample vs. 4000-10000 cells per sample in the droplet approach. However, the plate-based method yields roughly an order of magnitude higher reads per cell (meaning more genes sequenced) as well as full transcript coverage.
Figure 3. A sketch of the DropSeq protocol [26]—a 3ʹ tag counting technology. Left panel: Workflow. After tissue dissociation, the cell suspension is injected into a microfluidic device, where it is joined by another flow containing DNA-barcoded beads that are suspended in lysis buffer. Each bead is linked to primers containing a cell barcode (a DNA sequence of 12 bp, which are the same for all primers linked to the same bead), a subsequent UMI (a random sequence of 8 bp) and an oligo-dT segment. Joining a third flow of oil creates an emulsion in which thousands of droplets, many of which contain a single bead and a single cell, are dispersed within the oil. Thus, each such droplet is a distinct compartment in which a single cell is lysed and its mRNA is hybridized to the beads. The droplets are broken by adding a demulsifier to disrupt the water–oil interface, and the beads (along with the hybridized mRNA) are separated from the oil by centrifugation and are resuspended in reverse transcription buffer. The mRNA is then reverse transcribed and pre-amplified. To prepare sequencing-ready cDNA fragments that also contain the cell barcode and UMI, Tn5 transposase-mediated fragmentation and adapter insertion (‘tagmentation’) is done, followed by PCR, which selects for the ‘3’-end’ fragments and appends the sequencing adapters. Paired-end sequencing is done to extract both the sequence of the gene from which the mRNA was transcribed (Read 2) as well as the cell barcode and UMI (Read 1). Right panel: Molecular biology of the DropSeq protocol. Note that instead of using a standard library preparation step (that is intended to create adapter flanked fragments that cover full transcript length), here a 3ʹ end enrichment step is performed to choose only those fragments that contain the cell-specific barcode and UMI that were inserted in the first reverse transcription step. This requires using a custom read 1 sequencing primer that contains segments of the reverse transcription primer.
Figure 2. An example for a typical ‘SMART-Seq’ protocol for full transcript length mRNA sequencing from 96 individual cells using the Fludigm C1 microfluidic system. Left panel: Workflow sketch. Tissues are collected and enzymatically and/or mechanically dissociated. The desired cell type is enriched using cell-type-specific antibodies labeled with magnetic beads. The cell suspension is then inserted into a microfluidic chip, where it is pushed through a series of 96 butterfly-shaped microfluidic traps. Each trap is designed such that once an individual cell is captured, the rest of the suspension flows on to the next trap through a pair of bypass wing-shaped channels. Then, individual cells are isolated from each other, and each cell is lysed. SMART-Seq technology is used to reverse transcribe and pre-amplify the mRNA, after which the amplified cDNA is harvested from the chip into 96 tubes. Library preparation and cell barcoding are done off-chip using the Nextera protocol with 96 different combinations of i5 and i7 barcodes. All 96 libraries are combined and sequenced on a single Illumina HiSeq lane. Right panel: The SMART-Seq protocol at the molecular level. First-strand synthesis is carried out using the MMLV reverse transcriptase in the presence two primers: a cDNA synthesis (CDS) primer that contains an oligo-dT segment, and a template-switching oligo (TSO). When the reverse transcriptase reaches the 5ʹ end of the mRNA, it adds a few (2–5) C nucleotides to the 3ʹ end of the newly synthesized cDNA. The TSO, which contains three rG nucleotides at its 3ʹ end, base pairs with the C-rich tail and the reverse transcriptase ‘switches templates’ and continues to replicate the TSO. The resulting cDNA strand contains well-defined flanking regions for PCR priming and pre-amplification. After PCR amplification, the cDNA is ‘tagmented’ using Tn5 transposase-mediated fragmentation and adapter insertion (Nextera), followed by PCR, which appends the full sequencing adapters and incorporates cell-specific barcodes. Typically, 96 single cells are sequenced on a single Illumina Hi-Seq lane at 1–2 million reads per cell (Note: the location of the i5 index primer varies between different Illumina machines).
Figure 4. Single-cell RNA sequencing data analysis. Right panel: Full transcript length sequencing—libraries covering the full transcript length are sequenced either single end or paired end. Reads are demultiplexed according to cell-specific barcodes on both ends. For each cell, reads are aligned to the reference genome. The number of reads that align to a particular gene locus is an indication for its expression level. Similarly, splice isoform expression and other sequence information can be inferred. Left panel: An example for 3ʹ end digital expression. Sequencing library fragments covering the 3ʹ end of the transcript are sequenced paired end. Read 1 contains a cell-specific barcode (or ‘tag’) and a unique transcript identifier (UMI). Read 2 is aligned to the reference genome to identify the specific gene from which the transcript originated. In a simplified algorithm, each library fragment is represented by a textual string containing a unique cell ID and UMI (from read 1) and a gene of origin (from read 2). The fragments are lexicographically sorted in three levels: first according to the cell ID, then according to the gene of origin and then according to the UMI. Then, for each individual cell, the number of unique UMIs for each gene is counted. This better represents the original number of transcripts.
Transcriptome integrities and gene expression levels are preserved in fixed cells. a Drop-seq of mixed human and mouse cells (50 cells/μl). Plots show the number of human and mouse transcripts (UMIs) associated with a cell (dot) identified as human- or mouse-specific (blueor red, respectively). Cells expressing fewer than 3500 UMIs are grey; doublets are violet. bDistribution and the median of the number of genes and transcripts (UMIs) detected per cell (>3500 UMIs). Libraries were sequenced to a median depth of ~20,500 (Live) and ~15,500 (Fixed) aligned reads per cell. c Gene expression levels from live and fixed cells correlate well. Pairwise correlations between bulk mRNA-seq libraries and Drop-seq single-cell experiments. Non-single cell bulk mRNA-seq data were expressed as reads per kilobase per million (RPKM). Drop-seq expression counts were converted to average transcripts per million (ATPM) and plotted as log2 (ATPM + 1). Upper right panels show Pearson correlation. The overlap (common set) between all five samples is high (17,326 genes). Experiments with live and fixed cells were independently repeated with similar results (unpublished)
A t-SNE plot of 32,151 10X cells from 6 patients. Cells are colored by the presence (red) or absence (black) of clonal CNVs.
(B) (Top-left) Expression of GBM-enriched genes in IDH1-wildtype human GBMs from TCGA (n=144) and non-malignant human brain from Gtex (n=200). (Top-right) Average expression (+- SEM) of GBM marker-genes in cells classified as neoplastic (black) and non-neoplastic (grey). (Bottom) Heatmap of the 50 most specific genes (Wilcoxon rank-sum test) in clusters of non-neoplastic cells.
GBMs are IDH wild-type primary glioblastomas
OLIG are IDH mutant, 1p/19q co-deleted oligodendrogliomas
ASTRO are IDH mutant, 1p/19q wild-type grade II/III astrocytomas
a, Spliced and unspliced counts are estimated by separately counting reads that incorporate the intronic sequence. Multiple reads associated with a given molecule are grouped (boxes with asterisks) for unique molecular identifier (UMI)-based protocols. Pie charts show typical fractions of unspliced molecules. b, Model of transcriptional dynamics, capturing transcription (α), splicing (β) and degradation (γ) rates involved in production of unspliced (u) and spliced (s) mRNA products. c, Solution of the model in b as a function of time, showing unspliced and spliced mRNA dynamics in response to step changes in α. d, Phase portrait showing the same solution shown in c (solid curves). Steady states for different values of transcription rates α fall on the diagonal given by slope γ (dashed line). Levels of unspliced mRNA above or below this proportion indicate increasing (red shading) or decreasing (blue shading) expression of a gene, respectively. e, Abundance of spliced (s) and unspliced (u) mRNAs for circadian-associated genes in the mouse liver over a 24-h time course12. The unspliced mRNAs are predictive of spliced mRNA at the next time point. f, g, Phase portraits observed for a pair of circadian-driven genes: Fgf1 (f) and Cbs (g). The circadian time of each point is shown using a clock symbol (corresponding to those in e). The dashed diagonal line shows the steady-state relationship, as predicted by γ fit. h, Change in expression state at a future time t, as predicted by the model, is shown in the space of the first two principal components (PCs), recapitulating the progression along the circadian cycle. Each circle shows the observed expression state, with the arrow pointing to the position of the future state, extrapolated from velocity estimates.
a, PCA projection of human glutamatergic neuron differentiation (n = 1,720 cells) at post-conception week 10, shown with velocity field. Colours indicate cell types and intermediate states. A corresponding principal curve is shown in bold. b, Gene expression of known markers of radial glia (SOX2), neuroblasts (EOMES) and neurons (SLC17A7), and of novel markers is visualized on the PCA projection for the indicated genes in pseudocolour. c, Fluorescent in situ hybridization (RNAscope) for the same genes as in b on a cross-section of human developing cortex, oriented with the ventricular zone towards the bottom and the cortical surface towards the top (n = 1). Scale bars, 25 μm. d, Pseudotime expression profiles during glutamatergic neuron maturation for six example genes. Spliced abundance was multiplied by γ to match the scale of unspliced abundance.
(C) (Top) PCA of 25,899 10X neoplastic cells. Density curves of a Gaussian mixture model fit to PC1 sample scores are in gray. (Bottom) Distributions of cells from each patient along PC1.
(D) Differentially expressed genes between PCA clusters (abs. log2 fold-change>1 and adj. p <0.001 in red).
(E) Fractions of cycling cells, *** : Fisher p<0.001.
(F-G) Expression of top-loading genes from PC1 (F) and PC2 (G) in single cells. Cells are sorted by PC1/2 sample score resp.