The molecular microscopes that we use to examine human biology have advanced significantly with the advent of next generation sequencing. RNA-seq is one application of this technology that leads to a very high-resolution view of the transcriptome. With these new technologies come increased data analysis and data handling burdens as well as the promise of new discovery. These slides present a high-level overview of the RNA-seq technology with a focus on the analysis approaches, quality control challenges, and experimental design.
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
RNA-seq: A High-resolution View of the Transcriptome
1. Sean Davis, M.D., Ph.D.
Genetics Branch, Center for Cancer Research
National Cancer Institute
National Institutes of Health
RNA-seq: A high-resolution
View of the Transcriptome
12. Single end vs paired end sequencing
Illumina Paired-end
sequencingPaired-end: useful for RRBS, essential for RNA-seq, not useful for ChIP-
seq
13. What comes out of the machine:
short reads in fastq format
@D3B4KKQ1_0166:8:1101:1960:2190#CGATGT/1
CTCCTGGAAAACGCTTTGGTAGATTTGGCCAGGAGCTTTCTTTTATGTAAATTG
+D3B4KKQ1_0166:8:1101:1960:2190#CGATGT/1
[^^cedeefee`cghhhfcRX`_gfghf^bZbecg^eeb[caef`ef^a_`eXa
@D3B4KKQ1_0166:8:1101:2154:2137#CGATGT/1
TCCANCCATGGCAAATTCCATGGCACCGTCAAGGCTGAGAACGGGAAGCTTGTC
+D3B4KKQ1_0166:8:1101:2154:2137#CGATGT/1
ab_eBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@D3B4KKQ1_0166:8:1101:2249:2171#CGATGT/1
TACAAGTGCAGCATCAAGGAGCGAATGCTCTACTCCAGCTGCAAGAGCCGCCTC
+D3B4KKQ1_0166:8:1101:2249:2171#CGATGT/1
_[_ceeec[^eeghdffffhh^efh_egfhfgeec_fbafhhhhd`caegfheh
@D3B4KKQ1_0166:8:1101:2043:2187#CGATGT/1
GAAGGAGAGAAGGGGAGGAGGGCGGGGGGCACCTACTACATCGCCCTCCACATC
+D3B4KKQ1_0166:8:1101:2043:2187#CGATGT/1
^_accceg`gga`f[fgcb`Ucgfaa_LVV^[bbbbbRWW`W^Y[_[^bbbbb
@D3B4KKQ1_0166:8:1101:2188:2232#CGATGT/1
GTGGCCGATTCCTGAGCTGTGTTTGAGGAGAGGGCGGAGTGCCATCTGGGTAGC
+D3B4KKQ1_0166:8:1101:2188:2232#CGATGT/1
QS to int In
R:
as.integer(ch
arToRaw(‘e'))
-33
56. Distance Metrics
• Correlation
– maximum value of 1 if X and Y are perfectly correlated
– minimum value of -1 if X and Y are exactly opposite
– d(X,Y) = 1 – rxy
• Many, many others
• Choice of distance metric can be driven by
underlying data (eg., binary data, categorical data,
outliers, etc.)
84. Experimental Design
• What are my goals?
– Differential expression?
– Transcriptome assembly?
– Identify rare, novel trancripts?
• System characteristics?
– Large, expanded genome?
– Intron/exon structures complex?
– No reference genome or transcriptome
85. Experimental Design
• Technical replicates
– Probably not needed due to low technical variation
• Biological replicates
– Not explicitly needed for transcript assembly
– Essential for differential expression analysis
– Number of replicates often driven by sample
availability for human studies
– More is almost always better
I am going to spend a few minutes illustrating how existing and emerging high-throughput genomic technologies are being used to understand cancer, a mindnumbingly complex and disregulated biologic process.
The first karyotypes were produced in 1956. Shown here is a comparison of a normal karyotype of a normal female and one from a tumor. By 1960, a karyotype of a cancer genome revealed the presence of the Philadelphia chromosome. Now known to represent the BCR-ABL fusion protein, it was not until 33 years later in 1993 that a drug, gleevec, become available that targeted the fusion product. By applying high-throughput microarray technologies, the Cancer Genetics Branch is striving to make observations of the cancer genome that will provide deeper understandings of the biology of cancer, to develop prognostic and diagnostic markers to improve patient-specific treatments, and to find promising targets for directed drug therapy.
Since Knudson’s famous hypothesis proposing the two-hit model, our understanding of cancer as a genetic disease has progressed to the realization that cancer is not often a function of a single gene gone awry, but probably represents a complex interaction of multiple processes in the genome including altered copy number, gene expression, transcriptional regulation, chromatin modification, sequence variation, and DNA methylation. It is vital to the goal of producing better patient outcomes to understand not only what genes are involved in a certain type of cancer, but also how these other processes affect gene regulation. In short, an integrated view of the cancer genome is necessary and is now becoming possible.