S
16:00-17:00
2018 6/20 (Wed.)
T
()
J
(NGS )
[MA/NGS]
Enrichment (GO, pathway), (heatmap, etc)
(Differential Expressed Genes: DEG) [MA/NGS]
(NGS )
[MA/NGS]
Enrichment (GO, pathway), (heatmap, etc)
(Differential Expressed Genes: DEG) [MA/NGS]
R
o mh bz
rutyjacpR i ezh
P ) NP
D)
)
i ezizd
( T
TACCGGCTTGCAACTCAATTACCGCGGCTGCT
GGCACGTAGTTAGCCGTGACTTTCTGGTTGAT
TACCGTCAAATAAAGGCCAGTTACTACCTCTA
TCCTTCTTC
T o m
G . 1 /
2 1 /1 A 1 1 : : 1
lci g
ivs
/C :
o mh bzn p
lci R P
:
) (
)
:
:
)
(
: (
%
% 14.7 9
% 1
% 14.7 7
http://tylervigen.com/spurious-correlations
)
(
][
[
https://www.nature.com/articles/ncomms9727
( )
SVM,
( )PCA, t-SNE ( )
-- [ ]
-- [ ] R and Bioconductor / command line / python
-- [ ]
-- [ ] ( )
-- [ ] (2 1 )
-- [ ] 29 NGS
https://biosciencedbc.jp/human/human-resources/workshop/h29
-- [ ] RNA-Seq ( )
-- [ ] Dr. Bono
-- [ ] ( )
script
shinzy.nakaoka@gmail.com
Online course
Stanford ( )
QIITA ( / )
Stack Overflow ( / )
Udemy
90%
R, python
( )
python/R
(NGS )
[MA/NGS]
Enrichment (GO, pathway), (heatmap, etc)
(Differential Expressed Genes: DEG) [MA/NGS]
RNA-seq introduction
RNA-seq flow
RNA (cDNA )
(PCR )
( )
Z Wang et al, RNA-Seq: a revolutionary tool for
transcriptomics (2009)
RNA-seq hr
RNA-seq hr
Selected genes
by edgeR, DESeq2
(RNA-seq) and
limma,
RankProduct
(microarray)
BAM2ReadCount
Gene α
Gene ω
Gene β
ŸŸŸ
Gene α
Gene ω
Gene β
Time-course visualization
Enrichment
Clustering
Network
inference
i
Keratinization
⏎
GO/KEG
G
PPI, pathway
R Thr
I TS PG ka
e dhr
T R lma
P ncqibgs
- - ( :
) )( 2 B2A 2
/- ) 2 B2A 2
op P N
: 2 E :: - B2A
I hr
A D - 2A :
D mlu o D d R bE
n moV
Lamontagne J, Mell JC, Bouchard MJ (2016) Transcriptome-Wide Analysis of Hepatitis B Virus-Mediated Changes
to Normal Hepatocyte Gene Expression. PLoS Pathog 12(2): e1005438. doi:10.1371/journal.ppat.1005438
Accession number: GSE68113
D n s i 0( / )) - b HbE
n d B A Lb J b
HbE D B d b J acb E
r D n n Dr moqg t MM
B Nc IbE
D er e d d bE
n I
A d I 2 C C vn 0
Assembly, mapping
0+ 0+ + MQ F U m vt
/I BI B& - 1& - - my o bd[g - - m I Q B
DBJ IB II( x + N S FB) LE ) d . + NB A
eln b I ps m QJ A o Q NB A b o
( - 1 f f ps
- 1 BMQBJ B B A +N EFRB N m 0+ e C M AQIL
F LU E J k L DB +A - 1 f LNBCB E o [g f ps
w
) Q F U EB d x
wu r f k q o an ] n d] n
0+ - 2 R k F eln MQ F U EB NFI II F eln
q k xo an
hcf .3& e ] n
e id TT X
LLFJD y BI U
Quality Check
PS eb C
) Q rx PF muh eCc eb Qy Q ( Qos
/ Plq A d C 0 aC G A
Qt TQiP
= 0 Q = 4 G
nG A = 4 Qh
G ST G R
A
Triming
- A d ni -
g qke q m d ni c T - c a t
)( X F d ni c XF - L XF
g qke q mc T cs a I aI q m I Na
Gene001 958
Gene002 7
Gene003 491
Gene004 1649
Gene005 1187
Gene006 12
Gene001 492
Gene002 4
Gene003 246
Gene004 861
Gene005 620
Gene006 7
lqp lqpy - - - alqorh
lqorh q mI w
XRTFa
-/- A X I a -
- - / -/ S
lqorh
cv u TJ a
(Spliced) mapping
Tophat / Bowtie2 / subread hps (spliced) mapping)
R Re I 9 a I I 9 I9E IAG uP o x fTca
e h tdTs I I E E D l D9GGAE s i
R i P e D 2) u ,2) h dWq s P
j i ta h s f SsamPD9GGAE h GCA AE dTs
GCA BLE A E u s hgs N A j g D9GGAE C eS
rP G 9 jP GCA BLE A E u d N A h PD9GGAE s R eSs
19GGAE jP L E )CA ED E 19G 9D n j h ts
)1 j b P )1 j w R e vw u sffnhP
GCA BLE A E u o TpUhgcdTs
i ejP/CCLDAE9 A E D h I I E E D N A 8) AE
o 9EE 9 A E vw WtdTs 2 / -1 0 P R en
gs vw x j DD( DD ej ( p Tqts g
PA E D Wq ,0 e gT jPI I E E D Wq N A LAC u T
d N A AE u s Ss
)1 vw hjP i fi I I 9 tdTs WqP
E i u a vw 9EE 9 A E vw f u TdP
Pn j GCA AE M9IA9E n a I9E IAG i R u s feP
h Ts R I 9 LE 9 9 qts
Alignment
STAR S ultra-fast alignment
83 - NT - m TS op G
r - dabh m
/ Tn
1 2
- / S 88 2
RT q S
/ x T
2 Tn / S
beAc efg s
i B w t
TefgA
G
: DESeq2 ( ) ( ) MA-plot (M: A )
https://www.bioconductor.org/packages/devel/bioc/vignett
es/DESeq2/inst/doc/DESeq2.pdf
Affymetrix Gene Chip 1 Gene Expression Omnibus
(Agilent ) (Cyanin dyes Cy3: Cy5: )
background noise (
)
: Mas5 (Affy), RMA (Affy), GCRMA (Affy), LOWESS (Agilent), QuanJle (all)
NGS
short read maping
transcript
GC ( PCR )
: TMM (edgeR), DEGSeq2, RPKM, FPKM (Cufflinks), Median, Quantile etc…
(
(replicate )
23000
FDR (False Discovery Ratio)
(Benjamini and Hochberg)
: limma (linear model of microarray), rank product
NGS
Read count (short read reference genome
) RPKM Read count (
Poisson )
: edgeR,, DEGSeq2, cuffdiff, voom (RPM limma ) etc…
( )
(Uninfected vs Ad-Infected)
: DEGSeq2 ( )
Fabp1 9.141085127 14.16261662 7.16E-32 5.25E-36
G6pc 9.308228996 13.12791835 3.23E-29 4.73E-33
Gnmt 7.972094667 10.85576105 1.19E-28 2.62E-32
Stac3 8.512799293 11.07394801 2.72E-28 7.99E-32
Sult1c3 8.598851154 10.83051428 4.04E-28 1.48E-31
Cyp3a2 8.643053186 13.34346161 1.19E-27 5.22E-31
Cyp3a18 7.73011204 12.40583051 3.34E-26 1.71E-29
Cyp2c7 8.080204091 14.13579037 4.41E-26 2.59E-29
Slc10a1 7.251633979 11.9006561 1.23E-25 8.14E-29
Ftcd 7.242311435 11.19173058 1.25E-24 9.15E-28
Hsd3b5 7.769501627 9.64088714 1.80E-24 1.45E-27
Cyp8b1 8.022740425 10.21022277 3.20E-24 2.82E-27
Cyp2e1 7.990226129 13.24331587 1.10E-23 1.05E-26
Ttc36 7.857192495 9.257031182 1.17E-23 1.21E-26
RGD1307603 7.384176218 11.04351261 1.50E-23 1.65E-26
Serpind1 6.718175421 12.53716856 9.42E-22 1.10E-24
Bhmt 6.386565955 14.36571852 1.08E-21 1.35E-24
Ust5r 6.387315677 10.25543675 2.74E-21 3.62E-24
Thrsp 6.080443377 11.76462419 3.68E-21 5.13E-24
Cyp3a2 - Cytochrome
P450 3A2
Enrichment
Enrichment ( )
STAT3 ( ) GO terms
Gene Ontology MF (Molecular Function), CC (Cellular Component),
BP (Biological Process) (ontology)
GO ID Qualified GO term Evidence PubMed IDs
GO:0000981 RNA polymerase II
transcription factor activity,
sequence-specific DNA
binding
IEA
GO:0004871 signal transducer activity IEA,TAS 7512451
GO:0005515 protein binding IPI 8662591
GO:0008134 transcription factor binding IPI 15664994
GO:0046983 protein dimerization activity ISS
GO ID Qualified GO term Evidence PubMed IDs
GO:0005739 mitochondrion IEA
GO:0090575 RNA polymerase II
transcription factor complex
IMP 16946298
Enrichment
Enrichment ( )
(DEG)
STAT3, JAK2, NFkB, IRF4, CRCX12, CCR5, IL1b etc…
GO:0019221 (cytokine-mediated signaling pathway) GO term
GO:0019221
GO:0019221
N
n K k
(GO term)
DEG (GO term)
p
Gene Ontology DEG (
) Gene Set Enrichment Analysis
(parametric GSEA)
Enrichment
(GO enrichment)
DEGSeq2 Gene Ontology (MF/BP/CC)
KEGG pathway, Reactome pathway ontology pathway
Gene Ontology (Biological Process)
GO:0006082 organic acid metabolic process 1.76E-22
Ehhadh/Adh7/Adh1/Cyp2d3/Cyp4a2/Ddc/Fabp1/Fbp1/Agxt/Cyp2a2/Sds/Cyp2e1/
Rgn/Gnmt/Acox2/Otc/G6pc/Ugt2b1/Cyp2c7/Glyat/Cyp2c6v1/Baat/Cyp4a3/Hgd/P
ck1/Bhmt2/Cyp4a1/Kmo/Slc27a2/Marc1/Slc27a5/Bhmt/Abat/Cdo1/Hao2/Ftcd
GO:0043436 oxoacid metabolic process 6.61E-22
Ehhadh/Adh7/Adh1/Cyp2d3/Cyp4a2/Ddc/Fabp1/Fbp1/Agxt/Cyp2a2/Sds/Cyp2e1/
Rgn/Gnmt/Acox2/Otc/Ugt2b1/Cyp2c7/Glyat/Cyp2c6v1/Baat/Cyp4a3/Hgd/Pck1/B
hmt2/Cyp4a1/Kmo/Slc27a2/Marc1/Slc27a5/Bhmt/Abat/Cdo1/Hao2/Ftcd
GO:0019752 carboxylic acid metabolic process 8.77E-22
Ehhadh/Adh7/Adh1/Cyp2d3/Cyp4a2/Ddc/Fabp1/Fbp1/Agxt/Cyp2a2/Sds/Cyp2e1/
Rgn/Gnmt/Acox2/Otc/Ugt2b1/Cyp2c7/Glyat/Cyp2c6v1/Baat/Cyp4a3/Hgd/Pck1/B
hmt2/Cyp4a1/Kmo/Slc27a2/Slc27a5/Bhmt/Abat/Cdo1/Hao2/Ftcd
GO:0044281 small molecule metabolic process 3.24E-20
Ehhadh/Adh7/Adh1/Cat/Cyp2d3/Cyp4a2/Ddc/Fabp1/Fbp1/Hsd3b5/Lcat/Agxt/Cyp
2a2/Sds/Cyp2e1/Rgn/Gnmt/Acox2/Otc/G6pc/Rbp4/Cyp3a2/Ugt2b1/Cyp2c7/Glyat
/Cyp2c6v1/Baat/Cyp4a3/Hgd/Pck1/Bhmt2/Apof/Cyp4a1/Car5a/Kmo/Slc27a2/Mar
c1/Slc27a5/Bhmt/Abat/Cdo1/Hao2/Ftcd
GO:0032787 monocarboxylic acid metabolic process 4.05E-19
Ehhadh/Adh7/Adh1/Cyp2d3/Cyp4a2/Fabp1/Fbp1/Agxt/Cyp2a2/Sds/Cyp2e1/Rgn/
Acox2/Ugt2b1/Cyp2c7/Glyat/Cyp2c6v1/Baat/Cyp4a3/Pck1/Cyp4a1/Kmo/Slc27a2/
Slc27a5/Abat/Hao2/Ftcd
GO:0044710 single-organism metabolic process 8.60E-18
Ehhadh/Adh7/Adh1/C4bpb/C6/Cat/Cyp2d3/Cyp4a2/Ddc/Fabp1/Fbp1/Hsd3b5/Lca
t/Mbl1/Agxt/Cyp2a2/Sult2a1/Sds/Cyp2e1/Rgn/Gnmt/Acox2/Cyp3a18/Thrsp/Crp/
Otc/G6pc/Rbp4/Cyp3a2/Ugt2b1/Ces1e/Cyp2c7/Glyat/Cyp2c6v1/Baat/Cyp4a3/Dhr
s7/Serpina6/Hgd/Pck1/Adhfe1/Bhmt2/Aox3/RGD1564865/Apof/Cyp4a1/Slco1a1/
Car5a/Kmo/Slc27a2/Marc1/Slc27a5/Bhmt/Abat/Cdo1/Cyp8b1/Hao2/Ftcd
GO:0055114 oxidation-reduction process 1.19E-16
Ehhadh/Adh7/Adh1/Cat/Cyp2d3/Cyp4a2/Fabp1/Hsd3b5/Cyp2a2/Cyp2e1/Gnmt/A
cox2/Cyp3a18/G6pc/Cyp3a2/Cyp2c7/Cyp2c6v1/Cyp4a3/Dhrs7/Hgd/Adhfe1/Aox3/
RGD1564865/Cyp4a1/Kmo/Slc27a2/Marc1/Cdo1/Cyp8b1/Hao2
GO:0006629 lipid metabolic process 2.78E-14
Ehhadh/Adh7/Adh1/Cat/Cyp2d3/Cyp4a2/Fabp1/Hsd3b5/Lcat/Cyp2a2/Sult2a1/Cy
p2e1/Rgn/Acox2/Thrsp/G6pc/Rbp4/Cyp3a2/Ces1e/Cyp2c7/Cyp2c6v1/Baat/Cyp4a
3/Serpina6/Pck1/Apof/Cyp4a1/Slc27a2/Slc27a5/Hao2
GO:0044255 cellular lipid metabolic process 1.54E-13
Ehhadh/Adh7/Adh1/Cat/Cyp2d3/Cyp4a2/Fabp1/Lcat/Cyp2a2/Cyp2e1/Rgn/Acox2/
Thrsp/G6pc/Rbp4/Cyp3a2/Cyp2c7/Cyp2c6v1/Baat/Cyp4a3/Pck1/Apof/Cyp4a1/Slc
27a2/Slc27a5/Hao2
Enrichment
(Reactome enrichment)
DEGSeq2 Gene Ontology (MF/BP/CC)
KEGG pathway, Reactome pathway ontology pathway
−1
0
1
2
0 5 10 15 20
AM
Gene
nonDEG
DEG
Dlx3
(DEG) ( )
MA-plot GO bar plot
MA-plot
hair cell differentiation
inner ear receptor cell development
detection of mechanical stimulus involved in sensory perception
multicellular organismal process
single−multicellular organism process
detection of mechanical stimulus involved in sensory perception of sound
fibril organization
neuromuscular process
extracellular fibril organization
neuromuscular process controlling balance
0.0 0.5 1.0 1.5 2.0
Q value (−log10)
DEG
Enriched
GO-barplot
(heatmap)
qplot z
heatmap.2 (>500)
(eps) heatmap.2 option
( ) ggplot2 heatmap.2
Tips
Q. ( )
Q. (UCSC/Emsemble)
Q.
Q.
Q.
Q. Read replicate
Q. mRNA ( ) polyA
RNA-seq
Brief Bioinform (2013) 14 (6): 671-683.doi:
10.1093/bib/bbs046
edgeR
TMM (Trimmed Mean of M-
value) DESeq
NCBI SRA
(FASTQ sra )
(RPKM)
Count data edgeR or
DESeq2 or voom (limma)
, cuffdiff RPKM
1
RNA-seq annotation ( (
Zhao S, Zhang B. A comprehensive evaluation of ensembl, RefSeq, and UCSC
annotations in the context of RNA-seq read mapping and gene quantification. BMC
Genomics. 2015, 16:97
NCBI / Ensembl / UCSC (
1 Human Body
Map 2.0 Project RNA-seq
Ensemble ) 50% ( 6 unique ( )
( RefSeq vs Ensemble 6
RNA. 2016 Jun; 22(6): 839–851. doi: 10.1261/rna.053959.115
n=3 (biological replicates)
(edgeR / DESeq2
/ Cuffdiff etc)
fold-change threshold
Fold-change
n=6
Independent filtering
www.pnas.org/cgi/doi/10.1073/pnas.0914005107
http://master.bioconductor.org/help/course-
materials/2009/BioC2009/talks/Independent_filters_and_multiple_testing.pdf
( ) θ % t
Type I error filtering
ComBat
W.E. Johnson, C. Li, and A. Rabinovic. Adjusting batch effects in microarray data
using empirical bayes methods. Biostatistics, 8(1):118–127, 2007.
: Caleb K Stein et al, Blood 2014 124:3355; Fig.1
replicate
ComBat sva package
ComBat
replicate
RNA-seq differential expression studies: more sequence or more replication?
Vol. 30 no. 3 2014, pages 301–304 doi:10.1093/bioinformatics/btt688
10M
replicate
(
depth
replicate
)
mRNA
mRNA polyA
Scientific Reports | (2018) 8:4781 | DOI:10.1038/s41598-018-23226-4
Ribosomal RNA (rRNA) robozero polyA mRNA
(human blood, colon) RNA Ribozero
( ) protein-coding RNA ( )
PolyA protein-coding RNA
Memo
40
(DNA chip) , ( )
DNS (cDNA)
short read
mRNA tRNA, 99%
PCR
Affymetrix, Agilent, Illumina ( )
PCR, Roche FLX, Ion PGM bridge PCR, Illumina Hiseq (single/pair end),
: PacBio (long-read), Oxford Nanopore
(depth),
Quality check (RIN ) GC ,
CEL
DEG (Differentially Expressed Genes) GO (Gene Ontology)
mapping ( ), bowtie2&tophat2
FASTA&FASTQ, , BAM, count-data, RPKM ( )
de novo ( )
Gene set enrichment
, heatmap, , (False Discovery Ratio)
log-normal
PCR bias, single cell
R & Bioconductor NCBI/EMBL /DDBJ BLAST, GEO, SRA
https://www.genome.gov/13014330/transcriptome-fact-sheet/
Gene+ome ( ) = genome
Transcript ( )
= transcriptome
(microarray NGS)
( )
(
)
https://www.genome.gov/10000533/dna-microarray-technology/
DNA
Chip
( )
DNA
RNA
mRNA
cDNA
( )
( )
PCR
https://www.genome.gov/10000207/pcr-fact-sheet/
PCR (Polymerase Chain Reaction)
( ) DNA
DNA (DNA )
DNA
( )
DNA
DNA
DNA
DNA
http://bitesizebio.com/13546/sequencing-by-synthesis-
explaining-the-illumina-sequencing-technology/
cDNA
(50-100bp ) PCR
DNA
(ligate)
DNA
( )
(bridge amplification)
(Roche Illumina)
Illumina (bridge PCR)
Maxiam-Gilbert Sanger Nanopore
DNA
(4 )
(sequencing by synthesis)
http://bitesizebio.com/13546/sequencing-by-synthesis-
explaining-the-illumina-sequencing-technology/
DNA
de novo ( )
SNP (variant
call) : Trinity, Velvet, BWA
de novo
Exome
exon ( )
(Picard) SNV (Single Nucleotide
Variation)/Indel (snpEff)
DNA
https://en.wikipedia.org/wiki/Methylated_DNA_immunop
recipitation
(Chromatin ImmunoPrecipitation) DNA
RNA
CLIP (Cross-Linking ImmunoPrecipitation)-seq
ChIP-seq
Methylated DNA immunoprecipitation (MeDIP) seq
5-methylcytosine (5mC) DNA
(Immunoprecipitation)
(distal)
Chromosome
Confirmation Capture (3C) Hi-C
ENCODE project
http://www.nias.affrc.go.jp/gmogmo/FAQ/app/J3.html
16S rRNA
sequencing
16S rRNA ( )
Whole Genome Shotgun (WGS) sequence
alignment
WGS
de novo
WGS (fragment)
(assembly) contig (contiguous
sequence) contig scaffold
( )
http://www.nias.affrc.go.jp/gmogmo/FAQ/app/J3.html
( )
Mouse Phenome Database (MPD)
http://phenome.jax.org/

RNA-seq tutorial

  • 1.
  • 2.
    (NGS ) [MA/NGS] Enrichment (GO,pathway), (heatmap, etc) (Differential Expressed Genes: DEG) [MA/NGS]
  • 3.
    (NGS ) [MA/NGS] Enrichment (GO,pathway), (heatmap, etc) (Differential Expressed Genes: DEG) [MA/NGS]
  • 4.
    R o mh bz rutyjacpRi ezh P ) NP D) ) i ezizd ( T TACCGGCTTGCAACTCAATTACCGCGGCTGCT GGCACGTAGTTAGCCGTGACTTTCTGGTTGAT TACCGTCAAATAAAGGCCAGTTACTACCTCTA TCCTTCTTC T o m G . 1 / 2 1 /1 A 1 1 : : 1 lci g ivs /C : o mh bzn p lci R P
  • 5.
  • 6.
    % % 14.7 9 %1 % 14.7 7 http://tylervigen.com/spurious-correlations
  • 7.
  • 8.
  • 9.
    -- [ ] --[ ] R and Bioconductor / command line / python -- [ ] -- [ ] ( ) -- [ ] (2 1 ) -- [ ] 29 NGS https://biosciencedbc.jp/human/human-resources/workshop/h29 -- [ ] RNA-Seq ( ) -- [ ] Dr. Bono -- [ ] ( ) script shinzy.nakaoka@gmail.com
  • 10.
    Online course Stanford () QIITA ( / ) Stack Overflow ( / ) Udemy 90% R, python ( ) python/R
  • 11.
    (NGS ) [MA/NGS] Enrichment (GO,pathway), (heatmap, etc) (Differential Expressed Genes: DEG) [MA/NGS]
  • 12.
    RNA-seq introduction RNA-seq flow RNA(cDNA ) (PCR ) ( ) Z Wang et al, RNA-Seq: a revolutionary tool for transcriptomics (2009)
  • 13.
    RNA-seq hr RNA-seq hr Selectedgenes by edgeR, DESeq2 (RNA-seq) and limma, RankProduct (microarray) BAM2ReadCount Gene α Gene ω Gene β ŸŸŸ Gene α Gene ω Gene β Time-course visualization Enrichment Clustering Network inference i Keratinization ⏎ GO/KEG G PPI, pathway R Thr I TS PG ka e dhr T R lma P ncqibgs - - ( : ) )( 2 B2A 2 /- ) 2 B2A 2 op P N : 2 E :: - B2A I hr A D - 2A :
  • 14.
    D mlu oD d R bE n moV Lamontagne J, Mell JC, Bouchard MJ (2016) Transcriptome-Wide Analysis of Hepatitis B Virus-Mediated Changes to Normal Hepatocyte Gene Expression. PLoS Pathog 12(2): e1005438. doi:10.1371/journal.ppat.1005438 Accession number: GSE68113 D n s i 0( / )) - b HbE n d B A Lb J b HbE D B d b J acb E r D n n Dr moqg t MM B Nc IbE D er e d d bE n I A d I 2 C C vn 0
  • 15.
    Assembly, mapping 0+ 0++ MQ F U m vt /I BI B& - 1& - - my o bd[g - - m I Q B DBJ IB II( x + N S FB) LE ) d . + NB A eln b I ps m QJ A o Q NB A b o ( - 1 f f ps - 1 BMQBJ B B A +N EFRB N m 0+ e C M AQIL F LU E J k L DB +A - 1 f LNBCB E o [g f ps w ) Q F U EB d x wu r f k q o an ] n d] n 0+ - 2 R k F eln MQ F U EB NFI II F eln q k xo an hcf .3& e ] n e id TT X LLFJD y BI U
  • 16.
    Quality Check PS ebC ) Q rx PF muh eCc eb Qy Q ( Qos / Plq A d C 0 aC G A Qt TQiP = 0 Q = 4 G nG A = 4 Qh G ST G R A
  • 17.
    Triming - A dni - g qke q m d ni c T - c a t )( X F d ni c XF - L XF g qke q mc T cs a I aI q m I Na Gene001 958 Gene002 7 Gene003 491 Gene004 1649 Gene005 1187 Gene006 12 Gene001 492 Gene002 4 Gene003 246 Gene004 861 Gene005 620 Gene006 7 lqp lqpy - - - alqorh lqorh q mI w XRTFa -/- A X I a - - - / -/ S lqorh cv u TJ a
  • 18.
    (Spliced) mapping Tophat /Bowtie2 / subread hps (spliced) mapping) R Re I 9 a I I 9 I9E IAG uP o x fTca e h tdTs I I E E D l D9GGAE s i R i P e D 2) u ,2) h dWq s P j i ta h s f SsamPD9GGAE h GCA AE dTs GCA BLE A E u s hgs N A j g D9GGAE C eS rP G 9 jP GCA BLE A E u d N A h PD9GGAE s R eSs 19GGAE jP L E )CA ED E 19G 9D n j h ts )1 j b P )1 j w R e vw u sffnhP GCA BLE A E u o TpUhgcdTs i ejP/CCLDAE9 A E D h I I E E D N A 8) AE o 9EE 9 A E vw WtdTs 2 / -1 0 P R en gs vw x j DD( DD ej ( p Tqts g PA E D Wq ,0 e gT jPI I E E D Wq N A LAC u T d N A AE u s Ss )1 vw hjP i fi I I 9 tdTs WqP E i u a vw 9EE 9 A E vw f u TdP Pn j GCA AE M9IA9E n a I9E IAG i R u s feP h Ts R I 9 LE 9 9 qts
  • 19.
    Alignment STAR S ultra-fastalignment 83 - NT - m TS op G r - dabh m / Tn 1 2 - / S 88 2 RT q S / x T 2 Tn / S beAc efg s i B w t TefgA G
  • 20.
    : DESeq2 () ( ) MA-plot (M: A ) https://www.bioconductor.org/packages/devel/bioc/vignett es/DESeq2/inst/doc/DESeq2.pdf
  • 21.
    Affymetrix Gene Chip1 Gene Expression Omnibus (Agilent ) (Cyanin dyes Cy3: Cy5: ) background noise ( ) : Mas5 (Affy), RMA (Affy), GCRMA (Affy), LOWESS (Agilent), QuanJle (all) NGS short read maping transcript GC ( PCR ) : TMM (edgeR), DEGSeq2, RPKM, FPKM (Cufflinks), Median, Quantile etc…
  • 22.
    ( (replicate ) 23000 FDR (FalseDiscovery Ratio) (Benjamini and Hochberg) : limma (linear model of microarray), rank product NGS Read count (short read reference genome ) RPKM Read count ( Poisson ) : edgeR,, DEGSeq2, cuffdiff, voom (RPM limma ) etc… ( )
  • 23.
    (Uninfected vs Ad-Infected) :DEGSeq2 ( ) Fabp1 9.141085127 14.16261662 7.16E-32 5.25E-36 G6pc 9.308228996 13.12791835 3.23E-29 4.73E-33 Gnmt 7.972094667 10.85576105 1.19E-28 2.62E-32 Stac3 8.512799293 11.07394801 2.72E-28 7.99E-32 Sult1c3 8.598851154 10.83051428 4.04E-28 1.48E-31 Cyp3a2 8.643053186 13.34346161 1.19E-27 5.22E-31 Cyp3a18 7.73011204 12.40583051 3.34E-26 1.71E-29 Cyp2c7 8.080204091 14.13579037 4.41E-26 2.59E-29 Slc10a1 7.251633979 11.9006561 1.23E-25 8.14E-29 Ftcd 7.242311435 11.19173058 1.25E-24 9.15E-28 Hsd3b5 7.769501627 9.64088714 1.80E-24 1.45E-27 Cyp8b1 8.022740425 10.21022277 3.20E-24 2.82E-27 Cyp2e1 7.990226129 13.24331587 1.10E-23 1.05E-26 Ttc36 7.857192495 9.257031182 1.17E-23 1.21E-26 RGD1307603 7.384176218 11.04351261 1.50E-23 1.65E-26 Serpind1 6.718175421 12.53716856 9.42E-22 1.10E-24 Bhmt 6.386565955 14.36571852 1.08E-21 1.35E-24 Ust5r 6.387315677 10.25543675 2.74E-21 3.62E-24 Thrsp 6.080443377 11.76462419 3.68E-21 5.13E-24 Cyp3a2 - Cytochrome P450 3A2
  • 24.
    Enrichment Enrichment ( ) STAT3( ) GO terms Gene Ontology MF (Molecular Function), CC (Cellular Component), BP (Biological Process) (ontology) GO ID Qualified GO term Evidence PubMed IDs GO:0000981 RNA polymerase II transcription factor activity, sequence-specific DNA binding IEA GO:0004871 signal transducer activity IEA,TAS 7512451 GO:0005515 protein binding IPI 8662591 GO:0008134 transcription factor binding IPI 15664994 GO:0046983 protein dimerization activity ISS GO ID Qualified GO term Evidence PubMed IDs GO:0005739 mitochondrion IEA GO:0090575 RNA polymerase II transcription factor complex IMP 16946298
  • 25.
    Enrichment Enrichment ( ) (DEG) STAT3,JAK2, NFkB, IRF4, CRCX12, CCR5, IL1b etc… GO:0019221 (cytokine-mediated signaling pathway) GO term GO:0019221 GO:0019221 N n K k (GO term) DEG (GO term) p Gene Ontology DEG ( ) Gene Set Enrichment Analysis (parametric GSEA)
  • 26.
    Enrichment (GO enrichment) DEGSeq2 GeneOntology (MF/BP/CC) KEGG pathway, Reactome pathway ontology pathway Gene Ontology (Biological Process) GO:0006082 organic acid metabolic process 1.76E-22 Ehhadh/Adh7/Adh1/Cyp2d3/Cyp4a2/Ddc/Fabp1/Fbp1/Agxt/Cyp2a2/Sds/Cyp2e1/ Rgn/Gnmt/Acox2/Otc/G6pc/Ugt2b1/Cyp2c7/Glyat/Cyp2c6v1/Baat/Cyp4a3/Hgd/P ck1/Bhmt2/Cyp4a1/Kmo/Slc27a2/Marc1/Slc27a5/Bhmt/Abat/Cdo1/Hao2/Ftcd GO:0043436 oxoacid metabolic process 6.61E-22 Ehhadh/Adh7/Adh1/Cyp2d3/Cyp4a2/Ddc/Fabp1/Fbp1/Agxt/Cyp2a2/Sds/Cyp2e1/ Rgn/Gnmt/Acox2/Otc/Ugt2b1/Cyp2c7/Glyat/Cyp2c6v1/Baat/Cyp4a3/Hgd/Pck1/B hmt2/Cyp4a1/Kmo/Slc27a2/Marc1/Slc27a5/Bhmt/Abat/Cdo1/Hao2/Ftcd GO:0019752 carboxylic acid metabolic process 8.77E-22 Ehhadh/Adh7/Adh1/Cyp2d3/Cyp4a2/Ddc/Fabp1/Fbp1/Agxt/Cyp2a2/Sds/Cyp2e1/ Rgn/Gnmt/Acox2/Otc/Ugt2b1/Cyp2c7/Glyat/Cyp2c6v1/Baat/Cyp4a3/Hgd/Pck1/B hmt2/Cyp4a1/Kmo/Slc27a2/Slc27a5/Bhmt/Abat/Cdo1/Hao2/Ftcd GO:0044281 small molecule metabolic process 3.24E-20 Ehhadh/Adh7/Adh1/Cat/Cyp2d3/Cyp4a2/Ddc/Fabp1/Fbp1/Hsd3b5/Lcat/Agxt/Cyp 2a2/Sds/Cyp2e1/Rgn/Gnmt/Acox2/Otc/G6pc/Rbp4/Cyp3a2/Ugt2b1/Cyp2c7/Glyat /Cyp2c6v1/Baat/Cyp4a3/Hgd/Pck1/Bhmt2/Apof/Cyp4a1/Car5a/Kmo/Slc27a2/Mar c1/Slc27a5/Bhmt/Abat/Cdo1/Hao2/Ftcd GO:0032787 monocarboxylic acid metabolic process 4.05E-19 Ehhadh/Adh7/Adh1/Cyp2d3/Cyp4a2/Fabp1/Fbp1/Agxt/Cyp2a2/Sds/Cyp2e1/Rgn/ Acox2/Ugt2b1/Cyp2c7/Glyat/Cyp2c6v1/Baat/Cyp4a3/Pck1/Cyp4a1/Kmo/Slc27a2/ Slc27a5/Abat/Hao2/Ftcd GO:0044710 single-organism metabolic process 8.60E-18 Ehhadh/Adh7/Adh1/C4bpb/C6/Cat/Cyp2d3/Cyp4a2/Ddc/Fabp1/Fbp1/Hsd3b5/Lca t/Mbl1/Agxt/Cyp2a2/Sult2a1/Sds/Cyp2e1/Rgn/Gnmt/Acox2/Cyp3a18/Thrsp/Crp/ Otc/G6pc/Rbp4/Cyp3a2/Ugt2b1/Ces1e/Cyp2c7/Glyat/Cyp2c6v1/Baat/Cyp4a3/Dhr s7/Serpina6/Hgd/Pck1/Adhfe1/Bhmt2/Aox3/RGD1564865/Apof/Cyp4a1/Slco1a1/ Car5a/Kmo/Slc27a2/Marc1/Slc27a5/Bhmt/Abat/Cdo1/Cyp8b1/Hao2/Ftcd GO:0055114 oxidation-reduction process 1.19E-16 Ehhadh/Adh7/Adh1/Cat/Cyp2d3/Cyp4a2/Fabp1/Hsd3b5/Cyp2a2/Cyp2e1/Gnmt/A cox2/Cyp3a18/G6pc/Cyp3a2/Cyp2c7/Cyp2c6v1/Cyp4a3/Dhrs7/Hgd/Adhfe1/Aox3/ RGD1564865/Cyp4a1/Kmo/Slc27a2/Marc1/Cdo1/Cyp8b1/Hao2 GO:0006629 lipid metabolic process 2.78E-14 Ehhadh/Adh7/Adh1/Cat/Cyp2d3/Cyp4a2/Fabp1/Hsd3b5/Lcat/Cyp2a2/Sult2a1/Cy p2e1/Rgn/Acox2/Thrsp/G6pc/Rbp4/Cyp3a2/Ces1e/Cyp2c7/Cyp2c6v1/Baat/Cyp4a 3/Serpina6/Pck1/Apof/Cyp4a1/Slc27a2/Slc27a5/Hao2 GO:0044255 cellular lipid metabolic process 1.54E-13 Ehhadh/Adh7/Adh1/Cat/Cyp2d3/Cyp4a2/Fabp1/Lcat/Cyp2a2/Cyp2e1/Rgn/Acox2/ Thrsp/G6pc/Rbp4/Cyp3a2/Cyp2c7/Cyp2c6v1/Baat/Cyp4a3/Pck1/Apof/Cyp4a1/Slc 27a2/Slc27a5/Hao2
  • 27.
    Enrichment (Reactome enrichment) DEGSeq2 GeneOntology (MF/BP/CC) KEGG pathway, Reactome pathway ontology pathway
  • 28.
    −1 0 1 2 0 5 1015 20 AM Gene nonDEG DEG Dlx3 (DEG) ( ) MA-plot GO bar plot MA-plot hair cell differentiation inner ear receptor cell development detection of mechanical stimulus involved in sensory perception multicellular organismal process single−multicellular organism process detection of mechanical stimulus involved in sensory perception of sound fibril organization neuromuscular process extracellular fibril organization neuromuscular process controlling balance 0.0 0.5 1.0 1.5 2.0 Q value (−log10) DEG Enriched GO-barplot
  • 29.
    (heatmap) qplot z heatmap.2 (>500) (eps)heatmap.2 option ( ) ggplot2 heatmap.2
  • 30.
    Tips Q. ( ) Q.(UCSC/Emsemble) Q. Q. Q. Q. Read replicate Q. mRNA ( ) polyA
  • 31.
    RNA-seq Brief Bioinform (2013)14 (6): 671-683.doi: 10.1093/bib/bbs046 edgeR TMM (Trimmed Mean of M- value) DESeq NCBI SRA (FASTQ sra ) (RPKM) Count data edgeR or DESeq2 or voom (limma) , cuffdiff RPKM
  • 32.
    1 RNA-seq annotation (( Zhao S, Zhang B. A comprehensive evaluation of ensembl, RefSeq, and UCSC annotations in the context of RNA-seq read mapping and gene quantification. BMC Genomics. 2015, 16:97 NCBI / Ensembl / UCSC ( 1 Human Body Map 2.0 Project RNA-seq Ensemble ) 50% ( 6 unique ( ) ( RefSeq vs Ensemble 6
  • 33.
    RNA. 2016 Jun;22(6): 839–851. doi: 10.1261/rna.053959.115 n=3 (biological replicates) (edgeR / DESeq2 / Cuffdiff etc) fold-change threshold Fold-change n=6
  • 34.
  • 35.
    ComBat W.E. Johnson, C.Li, and A. Rabinovic. Adjusting batch effects in microarray data using empirical bayes methods. Biostatistics, 8(1):118–127, 2007. : Caleb K Stein et al, Blood 2014 124:3355; Fig.1 replicate ComBat sva package ComBat
  • 36.
    replicate RNA-seq differential expressionstudies: more sequence or more replication? Vol. 30 no. 3 2014, pages 301–304 doi:10.1093/bioinformatics/btt688 10M replicate ( depth replicate )
  • 37.
    mRNA mRNA polyA Scientific Reports| (2018) 8:4781 | DOI:10.1038/s41598-018-23226-4 Ribosomal RNA (rRNA) robozero polyA mRNA (human blood, colon) RNA Ribozero ( ) protein-coding RNA ( ) PolyA protein-coding RNA
  • 38.
  • 39.
    40 (DNA chip) ,( ) DNS (cDNA) short read mRNA tRNA, 99% PCR Affymetrix, Agilent, Illumina ( ) PCR, Roche FLX, Ion PGM bridge PCR, Illumina Hiseq (single/pair end), : PacBio (long-read), Oxford Nanopore (depth), Quality check (RIN ) GC , CEL DEG (Differentially Expressed Genes) GO (Gene Ontology) mapping ( ), bowtie2&tophat2 FASTA&FASTQ, , BAM, count-data, RPKM ( ) de novo ( ) Gene set enrichment , heatmap, , (False Discovery Ratio) log-normal PCR bias, single cell R & Bioconductor NCBI/EMBL /DDBJ BLAST, GEO, SRA
  • 40.
    https://www.genome.gov/13014330/transcriptome-fact-sheet/ Gene+ome ( )= genome Transcript ( ) = transcriptome (microarray NGS) ( ) ( )
  • 41.
  • 42.
    PCR https://www.genome.gov/10000207/pcr-fact-sheet/ PCR (Polymerase ChainReaction) ( ) DNA DNA (DNA ) DNA ( ) DNA DNA DNA DNA
  • 43.
    http://bitesizebio.com/13546/sequencing-by-synthesis- explaining-the-illumina-sequencing-technology/ cDNA (50-100bp ) PCR DNA (ligate) DNA () (bridge amplification) (Roche Illumina) Illumina (bridge PCR) Maxiam-Gilbert Sanger Nanopore DNA (4 ) (sequencing by synthesis)
  • 44.
    http://bitesizebio.com/13546/sequencing-by-synthesis- explaining-the-illumina-sequencing-technology/ DNA de novo () SNP (variant call) : Trinity, Velvet, BWA de novo Exome exon ( ) (Picard) SNV (Single Nucleotide Variation)/Indel (snpEff) DNA
  • 45.
    https://en.wikipedia.org/wiki/Methylated_DNA_immunop recipitation (Chromatin ImmunoPrecipitation) DNA RNA CLIP(Cross-Linking ImmunoPrecipitation)-seq ChIP-seq Methylated DNA immunoprecipitation (MeDIP) seq 5-methylcytosine (5mC) DNA (Immunoprecipitation) (distal) Chromosome Confirmation Capture (3C) Hi-C ENCODE project
  • 46.
    http://www.nias.affrc.go.jp/gmogmo/FAQ/app/J3.html 16S rRNA sequencing 16S rRNA( ) Whole Genome Shotgun (WGS) sequence alignment WGS de novo WGS (fragment) (assembly) contig (contiguous sequence) contig scaffold ( )
  • 47.