This document describes the process of analyzing single-cell transcriptome data from pluripotent stem cells. It involves initially processing raw sequencing data through steps like demultiplexing, alignment, and quantification. Cells are then filtered based on quality control metrics. The gene expression values are normalized and clustered to identify cell populations. Differentially expressed genes are identified that characterize each population. Key goals are to understand gene expression patterns in different cell types and subpopulations.
STR DNA profiling is now a powerful, inexpensive tool that can generate unique DNA signatures that can be used to authenticate cell lines and detect contamination of more than one cell type. This presentation will talk about why scientists need cell authentication, what is STR profile and STR profile workflow from Creative Bioarray.
The NCBI Boot Camp for Beginners was designed to offer an overview of the NCBI suite of resources. In the first half of the presentation, highlighted databases were covered in four main categories: literature, sequences, genes & genomes and expression & structure. The second half of the class used the apolipoprotein A as a query that was explored through many of the NCBI databases, from identifying the reference sequences to a structural analysis of the Cys130Arg variant.
The human genome is full of repeated DNA sequences which come in various sizes and are classified according to the length of the core repeat units, the number of contiguous repeat units, and/or the overall length of the repeat region. DNA regions with short repeat units (usually 2-6 bp in length) are called Short Tandem Repeats (STR).
Creative Bioarray STR profiling is critical for verifying the identity of human cell lines, ensuring uniqueness of the cell line and detecting laboratory errors such as misidentification and cross-contamination of lines. The sensitivity and high power of discrimination makes our STR analysis an ideal choice for the various types of cell authentication.
https://www.creative-bioarray.com/Services/Short-Tandem-Repeat-Analysis.htm
The human genome is full of repeated DNA sequences which come in various sizes and are classified according to the length of the core repeat units, the number of contiguous repeat units, and/or the overall length of the repeat region. DNA regions with short repeat units (usually 2-6 bp in length) are called Short Tandem Repeats (STR).
Building Biomedical Knowledge Graphs for In-Silico Drug DiscoveryVaticle
The rapid development and spread of analytical tools in the biomedical sciences has produced a variety of information about all sorts of biological components and their functions. Though important individually, their biological characteristics need to be understood in relation to the interactions they have with other biological components, which requires the integration of vast amounts of complex, semantically-rich, heterogenous data.
Traditional systems are inadequate at accurately modelling and handling data at this scale and complexity, making solutions that speed up the integration and querying of such data a necessity.
In this talk, we present various approaches being used in organisations to build biomedical computational pipelines to address these problems using tools such as Machine Learning and TypeDB. In particular, we discuss how to create an accurate and scalable semantic representation of molecular level biomedical data by presenting examples from drug discovery, precision medicine and competitive intelligence.
Speaker: Tomás Sabat
Tomás is the Chief Operating Officer at Vaticle, dedicated to building a strongly-typed database for intelligent systems. He works directly with TypeDB's open source and enterprise users so they can fulfil their potential with TypeDB and change the world. He focuses mainly in life sciences, cyber security, finance and robotics.
Advances and Applications Enabled by Single Cell TechnologyQIAGEN
Over the past 5 years, single-cell genomics have become a powerful technology for studying small samples and rare cells, and for dissecting complex populations such as heterogeneous tumors. Single-cell technology is enabling many new insights into diverse research areas from oncology, immunology and microbiology to neuroscience, stem cell and developmental biology. This webinar introduces single-cell technology and summarizes the newest scientific applications in various research areas, all in the context of current literature.
Owing to the fragmented nature of EST reads, it is worthwhile to attempt to organize the reads into assemblies that provide a consensus view of the sampled transcripts.
Such fragmented, EST data or gene sequence data, placed in correct context, and indexed by gene such that all expressed data concerning a single gene is in a single index class, and each index class contains the information for only one gene is an EST cluster.
Part 1 of RNA-seq for DE analysis: Defining the goalJoachim Jacob
First part of the training session 'RNA-seq for Differential expression' analysis. We explain how we can detect differential expression based on RNA-seq data. Interested in following this session? Please contact http://www.jakonix.be/contact.html
Enabling the Computational Future of Biology.pdfVaticle
Computational biology has revolutionised biomedicine. The volume of data it is generating is growing exponentially. This requires tools that enable computational and non-computational biologists to collaborate and derive meaningful insights. However, traditional systems are inadequate to accurately model and handle data at this scale and complexity.
In this talk, we discuss how TypeDB enables biologists to build a deeper understanding of life, and increase the probability of groundbreaking discoveries, across the life sciences.
Speaker: Tomás Sabat
Tomás is the Chief Operating Officer at Vaticle. He works closely with TypeDB's open source and enterprise users who use TypeDB to build applications in a wide number of industries including financial services, life sciences, cybersecurity and supply chain management. A graduate of the University of Cambridge, Tomás has spent the last seven years founding and building businesses in the technology industry.
NCBI has developed a powerful suite of online biomedical and bioinformatics resources, including old friends like PubMed and OMIM and newer resources such as Genome. This collection of databases and tools are widely used by scientists and medical professionals across the world. With such a wealth of information, it is easy to get overwhelmed. Join us for an overview to NCBI resources for the information professional with an emphasis on biodata connectivity. No science degree required!
A Spanish Daily Routine for People Who Struggle with Daily RoutinesNacho Caballero
Most people would agree that having a daily practice routine helps you learn Spanish faster. However, when we try to stick to an actual routine, we tend to drop it after a few days.
Why does this happen?
In the associated video (http://bit.ly/nachotime_dailyroutine) I share a technique that has given me consistency super powers: planning what you want to do on a piece of paper, doing what you can and tracking what you actually did.
Download the guide: https://itsnachotime.com/dailyroutine/
STR DNA profiling is now a powerful, inexpensive tool that can generate unique DNA signatures that can be used to authenticate cell lines and detect contamination of more than one cell type. This presentation will talk about why scientists need cell authentication, what is STR profile and STR profile workflow from Creative Bioarray.
The NCBI Boot Camp for Beginners was designed to offer an overview of the NCBI suite of resources. In the first half of the presentation, highlighted databases were covered in four main categories: literature, sequences, genes & genomes and expression & structure. The second half of the class used the apolipoprotein A as a query that was explored through many of the NCBI databases, from identifying the reference sequences to a structural analysis of the Cys130Arg variant.
The human genome is full of repeated DNA sequences which come in various sizes and are classified according to the length of the core repeat units, the number of contiguous repeat units, and/or the overall length of the repeat region. DNA regions with short repeat units (usually 2-6 bp in length) are called Short Tandem Repeats (STR).
Creative Bioarray STR profiling is critical for verifying the identity of human cell lines, ensuring uniqueness of the cell line and detecting laboratory errors such as misidentification and cross-contamination of lines. The sensitivity and high power of discrimination makes our STR analysis an ideal choice for the various types of cell authentication.
https://www.creative-bioarray.com/Services/Short-Tandem-Repeat-Analysis.htm
The human genome is full of repeated DNA sequences which come in various sizes and are classified according to the length of the core repeat units, the number of contiguous repeat units, and/or the overall length of the repeat region. DNA regions with short repeat units (usually 2-6 bp in length) are called Short Tandem Repeats (STR).
Building Biomedical Knowledge Graphs for In-Silico Drug DiscoveryVaticle
The rapid development and spread of analytical tools in the biomedical sciences has produced a variety of information about all sorts of biological components and their functions. Though important individually, their biological characteristics need to be understood in relation to the interactions they have with other biological components, which requires the integration of vast amounts of complex, semantically-rich, heterogenous data.
Traditional systems are inadequate at accurately modelling and handling data at this scale and complexity, making solutions that speed up the integration and querying of such data a necessity.
In this talk, we present various approaches being used in organisations to build biomedical computational pipelines to address these problems using tools such as Machine Learning and TypeDB. In particular, we discuss how to create an accurate and scalable semantic representation of molecular level biomedical data by presenting examples from drug discovery, precision medicine and competitive intelligence.
Speaker: Tomás Sabat
Tomás is the Chief Operating Officer at Vaticle, dedicated to building a strongly-typed database for intelligent systems. He works directly with TypeDB's open source and enterprise users so they can fulfil their potential with TypeDB and change the world. He focuses mainly in life sciences, cyber security, finance and robotics.
Advances and Applications Enabled by Single Cell TechnologyQIAGEN
Over the past 5 years, single-cell genomics have become a powerful technology for studying small samples and rare cells, and for dissecting complex populations such as heterogeneous tumors. Single-cell technology is enabling many new insights into diverse research areas from oncology, immunology and microbiology to neuroscience, stem cell and developmental biology. This webinar introduces single-cell technology and summarizes the newest scientific applications in various research areas, all in the context of current literature.
Owing to the fragmented nature of EST reads, it is worthwhile to attempt to organize the reads into assemblies that provide a consensus view of the sampled transcripts.
Such fragmented, EST data or gene sequence data, placed in correct context, and indexed by gene such that all expressed data concerning a single gene is in a single index class, and each index class contains the information for only one gene is an EST cluster.
Part 1 of RNA-seq for DE analysis: Defining the goalJoachim Jacob
First part of the training session 'RNA-seq for Differential expression' analysis. We explain how we can detect differential expression based on RNA-seq data. Interested in following this session? Please contact http://www.jakonix.be/contact.html
Enabling the Computational Future of Biology.pdfVaticle
Computational biology has revolutionised biomedicine. The volume of data it is generating is growing exponentially. This requires tools that enable computational and non-computational biologists to collaborate and derive meaningful insights. However, traditional systems are inadequate to accurately model and handle data at this scale and complexity.
In this talk, we discuss how TypeDB enables biologists to build a deeper understanding of life, and increase the probability of groundbreaking discoveries, across the life sciences.
Speaker: Tomás Sabat
Tomás is the Chief Operating Officer at Vaticle. He works closely with TypeDB's open source and enterprise users who use TypeDB to build applications in a wide number of industries including financial services, life sciences, cybersecurity and supply chain management. A graduate of the University of Cambridge, Tomás has spent the last seven years founding and building businesses in the technology industry.
NCBI has developed a powerful suite of online biomedical and bioinformatics resources, including old friends like PubMed and OMIM and newer resources such as Genome. This collection of databases and tools are widely used by scientists and medical professionals across the world. With such a wealth of information, it is easy to get overwhelmed. Join us for an overview to NCBI resources for the information professional with an emphasis on biodata connectivity. No science degree required!
A Spanish Daily Routine for People Who Struggle with Daily RoutinesNacho Caballero
Most people would agree that having a daily practice routine helps you learn Spanish faster. However, when we try to stick to an actual routine, we tend to drop it after a few days.
Why does this happen?
In the associated video (http://bit.ly/nachotime_dailyroutine) I share a technique that has given me consistency super powers: planning what you want to do on a piece of paper, doing what you can and tracking what you actually did.
Download the guide: https://itsnachotime.com/dailyroutine/
Creating effective slides without having to become a graphic designerNacho Caballero
This is session 3 of the Awesome Presentations Workshop. You can watch the video for session 2 (developing the story) at https://www.youtube.com/watch?v=jC1glujAhZI
The purpose of a slide is to illustrate an important point in the story. Without a clear story it's impossible to create an effective slide. Once you know what you want to say, notice all the things in your slide and ask why.
Image credit from the Noun Project
Winner by Juan Pablo Bravo
Sadness by Juan Pablo Bravo
Rock Climbing by Paul Phillips
Walking by Dmitriy Lagunov
Man by Matt Brooks
BMX by Marc Serre
How to Build Compelling Research Stories That People Will RememberNacho Caballero
Video: https://www.youtube.com/watch?v=jC1glujAhZI
Session 2 of the Awesome Presentations Workshop
We remember things that surprise us. We remember things that we enjoy. Mostly, we remember things that we understand. Being understood is the main reason to give a presentation, and also its biggest challenge.
Image credits from the Noun Project:
Folder by Sergio Calcara
File by Pham Thi Dieu Linh
Sleep by DonBLC 123
Mountain by Philip Joyce
Beret by Einxel Reyes
Sheriff Hat by Camila Bertoco
Lab meeting presentation about my work doing viral metagenomics in French Guiana
Rat by Francisca Arévalo from The Noun Project
Bat by Adam Heller from The Noun Project
I was invited to give a short presentation about the tools that I use for my research in Cayenne, French Guiana (sickle, velvet, digital normalization (khmer), krona, awk and others).
Bridging data analysis and interactive visualizationNacho Caballero
Clickme is an R package that lets you generate interactive visualizations directly from R. I presented the latest iteration at the 2013 IBSB conference in Kyoto
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
Single-Cell Transcriptome Analysis of Pluripotent Stem Cells
1. Single-Cell Transcriptome Analysis
of Pluripotent Stem Cells
Nacho Caballero
Center for Regenerative Medicine
Boston University
Jun 12, 2017
From raw data to insights
11. Demultiplex
One pair of
sequencing
files
per cell
@NB500996:64:HNM72BGX2:3:12510:12240:9366 2:N:0:T
CTACTGTCTAGAGCTTGTCTCAATGGATCTAGAACTTCATCGCCCTCTG
+
AAAAAEEEE<E/EEEEEEEEE6EE/6AEEE//E/EEE/AEA/EAEEEE<
…
Millions of reads
Barcoded
sequencing
files
AT
CG
12. Demultiplex
One pair of
sequencing
files
per cell
@NB500996:64:HNM72BGX2:3:12510:12240:9366 2:N:0:T
CTACTGTCTAGAGCTTGTCTCAATGGATCTAGAACTTCATCGCCCTCTG
+
AAAAAEEEE<E/EEEEEEEEE6EE/6AEEE//E/EEE/AEA/EAEEEE<
…
Millions of reads
Metadata file
Cell_id Condition1 Condition2
Cell_01 BU3 red
Cell_02 BU3 green
Cell_03 C17 red
Cell_04 C17 green
Cell_05 BU3 red
Cell_06 BU3 green
…
Barcoded
sequencing
files
AT
CG
13. Demultiplex
One pair of
sequencing
files
per cell
@NB500996:64:HNM72BGX2:3:12510:12240:9366 2:N:0:T
CTACTGTCTAGAGCTTGTCTCAATGGATCTAGAACTTCATCGCCCTCTG
+
AAAAAEEEE<E/EEEEEEEEE6EE/6AEEE//E/EEE/AEA/EAEEEE<
…
Millions of reads
Metadata file
Cell_id Condition1 Condition2
Cell_01 BU3 red
Cell_02 BU3 green
Cell_03 C17 red
Cell_04 C17 green
Cell_05 BU3 red
Cell_06 BU3 green
…
Barcoded
sequencing
files
AT
CG
Short
simple
names
14. Raw data Initial QC Alignment and
Quantification
Outlier
analysis
Gene selection
and clustering
Insights
AT
CG
Analysis pipeline
17. Good cDNA quality
Read length is often inversely correlated with base-pair
sequencing quality
Position in Read
AvgSequenceQuality
18. Good cDNA quality Average quality
Read length is often inversely correlated with base-pair
sequencing quality
Position in Read
AvgSequenceQuality
19. Good cDNA quality Average quality Bad quality
Read length is often inversely correlated with base-pair
sequencing quality
Position in Read
AvgSequenceQuality
28. AGGCAGAGGGGCGAGATGCA…
1358 reads aligned to the SFTPC
gene in this cell
SFTPC gene
We quantify the gene expression in a cell by counting how many
reads align to each gene
29. Read type
Number of
reads per cell
Raw 333,229
Unaligned 81,673
Aligned, but non-uniquely 28,813
Aligned uniquely, but not to a gene 32,774
Aligned uniquely, but span
multiple genes
20,838
Aligned uniquely to
a single gene
167,241
30. Read type
Number of
reads per cell
Raw 333,229
Unaligned 81,673
Aligned, but non-uniquely 28,813
Aligned uniquely, but not to a gene 32,774
Aligned uniquely, but span
multiple genes
20,838
Aligned uniquely to
a single gene
167,241
31. Read type
Number of
reads per cell
Raw 333,229
Unaligned 81,673
Aligned, but non-uniquely 28,813
Aligned uniquely, but not to a gene 32,774
Aligned uniquely, but span
multiple genes
20,838
Aligned uniquely to
a single gene
167,241
32. Read type
Number of
reads per cell
Raw 333,229
Unaligned 81,673
Aligned, but non-uniquely 28,813
Aligned uniquely, but not to a gene 32,774
Aligned uniquely, but span
multiple genes
20,838
Aligned uniquely to
a single gene
167,241
33. Read type
Number of
reads per cell
Raw 333,229
Unaligned 81,673
Aligned, but non-uniquely 28,813
Aligned uniquely, but not to a gene 32,774
Aligned uniquely, but span
multiple genes
20,838
Aligned uniquely to
a single gene
167,241
34. Read type
Number of
reads per cell
Raw 333,229
Unaligned 81,673
Aligned, but non-uniquely 28,813
Aligned uniquely, but not to a gene 32,774
Aligned uniquely, but span
multiple genes
20,838
Aligned uniquely to
a single gene
167,241
35. Read type
Number of
reads per cell
Raw 333,229
Unaligned 81,673
Aligned, but non-uniquely 28,813
Aligned uniquely, but not to a gene 32,774
Aligned uniquely, but span
multiple genes
20,838
Aligned uniquely to
a single gene
167,241
40-60% of the raw reads cannot be used to quantify gene expression
36. Raw data Initial QC Alignment and
Quantification
Outlier
analysis
Gene selection
and clustering
Insights
AT
CG
Analysis pipeline
37. Filter out cells with fewer than 5K aligned reads
Numberofalignedreads
1M
10K
1K
0
120 Cells
38. Filter out cells with a high percentage of mitochondrial
gene counts (indicative of a broken cell membrane)
%ofMitochondrialgenecounts
100%
75%
50%
0
48 Cells
25%
39. Filter out cells with less than 2K expressed genes
Numberofexpressedgenes
6K
4K
0
30 Cells
2K
40. Raw data Initial QC Alignment and
Quantification
Outlier
analysis
Gene selection
and clustering
Insights
AT
CG
Analysis pipeline
42. Raw count data
Assume that most genes are not differentially expressed
Normalized expression data
43. Raw count data
Assume that most genes are not differentially expressed
Calculate scaling factors for each cell
Normalized expression data
44. Raw count data
Assume that most genes are not differentially expressed
Calculate scaling factors for each cell
Normalized expression data
Apply the scaling factors and log
45. Raw count data
Normalization corrects for differences in capture
efficiency, sequencing depth and other technical bias
Assume that most genes are not differentially expressed
Calculate scaling factors for each cell
Normalized expression data
Apply the scaling factors and log
53. Typical questions
What are the expression differences
between my experimental groups?
What are the subpopulations in my data?
54. Typical questions
What are the expression differences
between my experimental groups?
What are the subpopulations in my data?
What are the gene expression patterns
in each subpopulation?
64. ASSIGN
CELLS TO
GROUPS
SELECT
GENES
NO
k = 2
Silhouette coefficient: 0.48
TREAT
CONDITIONS AS
GROUPS?
The silhouette coefficient is a useful metric to
determine the optimal number of groups
65. ASSIGN
CELLS TO
GROUPS
SELECT
GENES
NO
k = 3
Silhouette coefficient: 0.56
TREAT
CONDITIONS AS
GROUPS?
The silhouette coefficient is a useful metric to
determine the optimal number of groups
66. ASSIGN
CELLS TO
GROUPS
SELECT
GENES
NO
k = 4
Silhouette coefficient: 0.47
TREAT
CONDITIONS AS
GROUPS?
The silhouette coefficient is a useful metric to
determine the optimal number of groups
68. ASSIGN
CELLS TO
GROUPS
TEST GENES FOR
DIFFERENTIAL
EXPRESSION
YES
SELECT
GENES
NO
TREAT
CONDITIONS AS
GROUPS?
Variance
Average
expression
Differentially expressed
genes
69. ASSIGN
CELLS TO
GROUPS
TEST GENES FOR
DIFFERENTIAL
EXPRESSION
YES
SELECT
GENES
NO
TREAT
CONDITIONS AS
GROUPS?
Variance
Average
expression
Differentially expressed
genes
70. ASSIGN
CELLS TO
GROUPS
TEST GENES FOR
DIFFERENTIAL
EXPRESSION
YES
SELECT
GENES
NO
TREAT
CONDITIONS AS
GROUPS?
Variance
Average
expression
Differentially expressed
genes
Variance
Average
expression
Highly variable
genes
71. ASSIGN
CELLS TO
GROUPS
TEST GENES FOR
DIFFERENTIAL
EXPRESSION
YES
SELECT
GENES
NO
TREAT
CONDITIONS AS
GROUPS?
Variance
Average
expression
Differentially expressed
genes
Variance
Average
expression
Highly variable
genes
72. Raw data Initial QC Alignment and
Quantification
Outlier
analysis
Gene selection
and clustering
Insights
AT
CG
Analysis pipeline
83. Geneset enrichment analysis depends on the
quality of the geneset
MsigDB hallmark genesets only contain 4000 genes
84. Geneset enrichment analysis depends on the
quality of the geneset
MsigDB hallmark genesets only contain 4000 genes
MAKE YOUR OWN GENESETS FROM THE LITERATURE
85.
86.
87.
88.
89.
90.
91.
92.
93.
94. Remember to provide a metadata file
Raw data Initial QC Alignment and
Quantification
Outlier
analysis
Gene selection
and clustering
Insights
AT
CG
Takeaways
95. Raw data Initial QC Alignment and
Quantification
Outlier
analysis
Gene selection
and clustering
Insights
AT
CG
Takeaways
More reads is usually better than longer reads
96. Raw data Initial QC Alignment and
Quantification
Outlier
analysis
Gene selection
and clustering
Insights
AT
CG
Takeaways
You will only be able to align 50% of your reads
97. Raw data Initial QC Alignment and
Quantification
Outlier
analysis
Gene selection
and clustering
Insights
AT
CG
Takeaways
Assume that 50% of your cells could fail
98. Raw data Initial QC Alignment and
Quantification
Outlier
analysis
Gene selection
and clustering
Insights
AT
CG
Takeaways
High variance doesn’t imply subpopulations
99. Raw data Initial QC Alignment and
Quantification
Outlier
analysis
Gene selection
and clustering
Insights
AT
CG
Takeaways
Make your own gene lists!
100. Slides available at: bit.ly/crem_bioinformatics
Raw data Initial QC Alignment and
Quantification
Outlier
analysis
Gene selection
and clustering
Insights
AT
CG
Takeaways