SlideShare a Scribd company logo
Deep Seq Data Analysis
Theoretical training
Christophe.antoniewski@upmc.fr
http://artbio.fr
Mouse Genetics
January 21, 2016, 13:30–15:00
Sequencing Technologies
Latest commercialized Sequencing Technology
e Sequencing-by-pH-variations in ION TORRENT
Sequencing Technologies : Quantitative Facts
Sequencing Technologies : Focus on Illumina
technology
Deep sequencing applications
◆
◆
◆
◆
◆
◆
◆
◆
◆
◆
◆
◆
◆
◆
◆
◆
◆
◆
◆
◆
◆
◆
◆
High throughput sequencing of DNA or RNA provides Qualitative (sequence) and Quantitative (number of reads) information
Stranded RNAseq
library
20-30nt RNA gel
purification
Small RNA library
(Biases)
Library “Bar
coding”
ChIPseq library preparation
(Non Directional)
What can I do with my sequence reads ?
◆
➢
◆ …
➢
◆ …
➢
Platform
Selection
Library
Preparation
Sequencing
Quality Control
Alignment Assembly
Visualization & Statistics
• Normalization (library comparison)
• Peak finding (Binding sites, Breakpoints, etc…)
• Differential Calling (expression, variants, etc)
What am I going to sequence ? For what analysis ?
Technical biases and
limitations
Specific benefits
(Read length, single or paired ends, number of
reads)
Whole genome
Whole exome
Target
enrichment
Size selection –
Stranded/unstranded ?
Amplification
Single Cell Protocol
Length of the read
Single or paired
ends
Number of lanes (depth of
sequencing)
Adapter
Clipping
Quality
trimming
Contaminant and Sequencing
Errors
Biases in GC contents
Bowtie
BWA……
Nature Methods 2009
P Flicek & E Birney
Velvet, Oases
Trinity, SOAP
SSAKE……
PLoS ONE 6(3)
Zhang W, Chen J, et al. (2011)
R, mathlab
& Open Source software
tools
Flowchart of a sequencing
project
Think to the number of replicates
Basic Material for mining sequencing data
◆
◆
◆
◆
◆
◆
◆
◆
◆
◆ …
◆
Connect to our server
$ ssh lbcd41.snv.jussieu.fr
$ mkdir <mydir>
$ cd <mydir>
What is this big* fastq file containning ?
→
→
…
…
...
mouse@GED-Server:~/raw_data$ more GKG-13.fastq
@HWIEAS210R_0028:2:1:3019:1114#AGAAGA/1 Header
TNGGAACTTCATACCGTGCTCTCTGTAGGCACCATCAA Sequence
+HWIEAS210R_0028:2:1:3019:1114#AGAAGA/1 Header
bBb`bfffffhhhhhhhhhhhhhhhhhhhfhhhhhhgh Sequence Quality (ASCII encoded)
@HWIEAS210R_0028:2:1:3925:1114#AGAAGA/1
TNCTTGGACTACATATGGTTGAGGGTTGTACTGTAGGC
+HWIEAS210R_0028:2:1:3925:1114#AGAAGA/1
]B]VWaaaaaagggfggggggcggggegdgfgeggbab
@HWIEAS210R_0028:2:1:6220:1114#AGAAGA/1
TNGGAACTTCATACCGTGCTCTCTGTAGGCACCATCAA
+HWIEAS210R_0028:2:1:6220:1114#AGAAGA/1
aB^^afffffhhhhhhhhhhhhhhhhhhhhhhhchhhh
@HWIEAS210R_0028:2:1:6252:1115#AGAAGA/1
TNCTTGGACTACATATGGTTGAGGGTTGTACTGTAGGC
+HWIEAS210R_0028:2:1:6252:1115#AGAAGA/1
aBa^ddeeehhhhhhhhhhhhhhhhghhhhhhhefff
@HWIEAS210R_0028:2:1:6534:1114#AGAAGA/1
TNAATGCACTATCTGGTACGACTGTAGGCACCATCAAT
+HWIEAS210R_0028:2:1:6534:1114#AGAAGA/1
aB^^eeeeegcggfffffffcfffgcgcfffffR^^]
@HWIEAS210R_0028:2:1:8869:1114#AGAAGA/1
GNGGACTGAAGTGGAGCTGTAGGCACCATCAATAGATC
+HWIEAS210R_0028:2:1:8869:1114#AGAAGA/1
aBaaaeeeeehhhhhhhhhhhhfgfhhgfhhhhgga^^
How many sequence reads in my file ?
→ wc - l <path/to/my/file>
mouse@GED-Server:~/raw_data$ wc -l GKG-13.fastq
25703828 GKG-13.fastq
mouse@GED-Server:~/raw_data$ grep -c -e "^@" GKG-13.fastq
6425957
in python interpreter:
>>> 25703828 / 4
6425957
Are my sequence reads containing the adapter ?
→ cat <path/file> | grep CTGTAGG | wc –l
→ grep -c "CTGTAGG" <path/file>
mouse@GED-Server:~/raw_data$ cat GKG-13.fastq | grep CTGTAGG | wc -l
6355061
mouse@GED-Server:~/raw_data$ grep -c "CTGTAGG" GKG-13.fastq
6355061
6 355 061 out of
6 425 957 sequences
… not bad (98.8%)
My 3’ adapter: CTGTAGGCACCATCAAT
mouse@GED-Server:~/raw_data$ cat GKG-13.fastq | grep ATCTCGT| wc -l
308
A contrario
$mouse@GED-Server:~/raw_data$ cat GKG-13.fastq | perl -ne 'print if /^[ATGCN]{22}CTGTAGG/' | wc -l
Outputs the content
of a file, line by line
The output is passed
to the input of the
next command
perl interpreter is called
with –ne options (loop
& execute)
In line perl code
Regular expression
The output is passed
to the input of the
next command
wc with –l option
counts the lines
A more advanced example of combining Unix
commands
1 675 469 22nt long reads with 3’ flanking CTGTAGG adapter sequence
Clipping adapter sequences
Unix Operating Systems already contain powerful native tools for sequence analyses
cat GKG-13.fastq | perl -ne 'if (/^(.+CTGTAGG)/) {print "$1n"}' | more
mouse@GED-Server:~/raw_data$
cat GKG-13.fastq | perl -ne 'if (/^([GATC]{18,})CTGTAGG/) {$count++; print ">$countn"; print
"$1n"}' > clipped_GKG13.fasta
Final command line clipper
Sequence Quality Control
http://www.bioinformatics.babraham.ac.
uk/projects/fastqc/
FastQC, GUI version
http://bowtie-bio.sourceforge.
net/
Bowtie aligns reads on indexed
genomes
mouse@GED-Server:~/instructor$bowtie ../genomes/Dmel_r5.49 -f clipped_GKG13.fasta -v 1 -k 1 -p 6 --
al droso_matched_GKG-13.fa --un unmatched_GKG13.fa -S > GKG13_bowtie_output.sam
A bowtie alignment (command lines)
../genomes/Dmel_r5.49
-f clipped_GKG13.fasta
-v 1
-k 1
-p 6
--al droso_matched_GKG-13.fa
--un unmatched_GKG13.fa
-S
> GKG13_bowtie_output.sam
# reads processed: 5930851
# reads with at least one reported alignment: 4992296 (84.18%)
# reads that failed to align: 938555 (15.82%)
Reported 4992296 alignments to 1 output stream(s)
mouse@GED-Server:~/genomes$ bowtie-build Dmel_r5.49.fa Dmel_r5.49
Bowtie outputs
deepseq$ ls -laht
-rw-r--r-- 1 deepseq staff 351M Mar 24 17:46 GKG13_bowtie_output.tabulated
-rw-r--r-- 1 deepseq staff 156M Mar 24 17:46 droso_matched_GKG-13.fa
-rw-r--r-- 1 deepseq staff 28M Mar 24 17:46 unmatched_GKG13.fa
SAM alignment : $ more GKG13_bowtie_output.sam
Aligned reads: $ more droso_matched_GKG-13.fa
Unaligned reads: $ more unmatched_GKG13.fa
SAM - BAM
Formats
Raw sequence: Fastq (quality), Fasta (w/o quality)
Aligned sequence:
Genome annotation:
GFF, GTF,
Sam
Bam
• Sorted
• Indexed
• Compressed
GFF - GTF
•
•
•
•
•
•
•
•
Pileup Format
seq1 272 T 24 ,.$.....,,.,.,...,,,.,..^+. <<<+;<<<<<<<<<<<=<;<;7<&
seq1 273 T 23 ,.....,,.,.,...,,,.,..A <<<;<<<<<<<<<3<=<<<;<<+
seq1 274 T 23 ,.$....,,.,.,...,,,.,... 7<7;<;<<<<<<<<<=<;<;<<6
seq1 275 A 23 ,$....,,.,.,...,,,.,...^l. <+;9*<<<<<<<<<=<<:;<<<<
seq1 276 G 22 ...T,,.,.,...,,,.,.... 33;+<<7=7<<7<&<<1;<<6<
seq1 277 T 22 ....,,.,.,.C.,,,.,..G. +7<;<<<<<<<&<=<<:;<<&<
seq1 278 G 23 ....,,.,.,...,,,.,....^k. %38*<<;<7<<7<=<<<;<<<<<
seq1 279 C 23 A..T,,.,.,...,,,.,..... ;75&<<<<<<<<<=<<<9<<:<<
Next week, we will perform an NGS analysis using the Galaxy framework.
We will speak about Accessibility, Reproducibility and Transparency.
Please have a look to http://galaxyproject.org/
You can register and try it
Also, access to http://lbcd41.snv.jussieu.fr with
login: (to be communicated)
password: (to be communicated)
AND
Register (Menu “user” → “register”) with your email address

More Related Content

What's hot

و کشف بد افزار OSSEC
 و کشف بد افزار OSSEC و کشف بد افزار OSSEC
و کشف بد افزار OSSEC
milad saber
 
eBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to Userspace
SUSE Labs Taipei
 
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
Андрей Шорин
 
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
Ontico
 
LPC2019 BPF Tracing Tools
LPC2019 BPF Tracing ToolsLPC2019 BPF Tracing Tools
LPC2019 BPF Tracing Tools
Brendan Gregg
 
True stories on the analysis of network activity using Python
True stories on the analysis of network activity using PythonTrue stories on the analysis of network activity using Python
True stories on the analysis of network activity using Python
delimitry
 
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Valeriy Kravchuk
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
Brendan Gregg
 
Active proxied sessions
Active proxied sessionsActive proxied sessions
Active proxied sessionsds5ysm
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPF
Brendan Gregg
 
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificityPredikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
Neil Saunders
 
bcc/BPF tools - Strategy, current tools, future challenges
bcc/BPF tools - Strategy, current tools, future challengesbcc/BPF tools - Strategy, current tools, future challenges
bcc/BPF tools - Strategy, current tools, future challenges
IO Visor Project
 
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Hsien-Hsin Sean Lee, Ph.D.
 
Kernel Recipes 2017 - Modern Key Management with GPG - Werner Koch
Kernel Recipes 2017 - Modern Key Management with GPG - Werner KochKernel Recipes 2017 - Modern Key Management with GPG - Werner Koch
Kernel Recipes 2017 - Modern Key Management with GPG - Werner Koch
Anne Nicolas
 
eBPF maps 101
eBPF maps 101eBPF maps 101
eBPF maps 101
SUSE Labs Taipei
 
Performance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux KernelPerformance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux Kernel
lcplcp1
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at Netflix
Brendan Gregg
 
3 Vampir Trace In Detail
3 Vampir Trace In Detail3 Vampir Trace In Detail
3 Vampir Trace In DetailPTIHPA
 
Ipv6 test plan for opnfv poc v2.2 spirent-vctlab
Ipv6 test plan for opnfv poc v2.2 spirent-vctlabIpv6 test plan for opnfv poc v2.2 spirent-vctlab
Ipv6 test plan for opnfv poc v2.2 spirent-vctlab
Iben Rodriguez
 

What's hot (20)

و کشف بد افزار OSSEC
 و کشف بد افزار OSSEC و کشف بد افزار OSSEC
و کشف بد افزار OSSEC
 
eBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to Userspace
 
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
 
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
 
LPC2019 BPF Tracing Tools
LPC2019 BPF Tracing ToolsLPC2019 BPF Tracing Tools
LPC2019 BPF Tracing Tools
 
True stories on the analysis of network activity using Python
True stories on the analysis of network activity using PythonTrue stories on the analysis of network activity using Python
True stories on the analysis of network activity using Python
 
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
 
Active proxied sessions
Active proxied sessionsActive proxied sessions
Active proxied sessions
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPF
 
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificityPredikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
 
bcc/BPF tools - Strategy, current tools, future challenges
bcc/BPF tools - Strategy, current tools, future challengesbcc/BPF tools - Strategy, current tools, future challenges
bcc/BPF tools - Strategy, current tools, future challenges
 
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
 
Kernel Recipes 2017 - Modern Key Management with GPG - Werner Koch
Kernel Recipes 2017 - Modern Key Management with GPG - Werner KochKernel Recipes 2017 - Modern Key Management with GPG - Werner Koch
Kernel Recipes 2017 - Modern Key Management with GPG - Werner Koch
 
eBPF maps 101
eBPF maps 101eBPF maps 101
eBPF maps 101
 
Combo fix
Combo fixCombo fix
Combo fix
 
Performance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux KernelPerformance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux Kernel
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at Netflix
 
3 Vampir Trace In Detail
3 Vampir Trace In Detail3 Vampir Trace In Detail
3 Vampir Trace In Detail
 
Ipv6 test plan for opnfv poc v2.2 spirent-vctlab
Ipv6 test plan for opnfv poc v2.2 spirent-vctlabIpv6 test plan for opnfv poc v2.2 spirent-vctlab
Ipv6 test plan for opnfv poc v2.2 spirent-vctlab
 

Viewers also liked

Université Laval - Analyste d'affaires - agence affaires électroniques - Alai...
Université Laval - Analyste d'affaires - agence affaires électroniques - Alai...Université Laval - Analyste d'affaires - agence affaires électroniques - Alai...
Université Laval - Analyste d'affaires - agence affaires électroniques - Alai...
Alain Fortier
 
EM Strasbourg - Créateur de valeur, analyse d'affaires et marketing - Alain...
EM Strasbourg  -  Créateur de valeur, analyse d'affaires et marketing - Alain...EM Strasbourg  -  Créateur de valeur, analyse d'affaires et marketing - Alain...
EM Strasbourg - Créateur de valeur, analyse d'affaires et marketing - Alain...
Alain Fortier
 
Alcances sobre el café en el Perú 2016
Alcances sobre el café en el Perú 2016Alcances sobre el café en el Perú 2016
Alcances sobre el café en el Perú 2016
Juan José Sandoval Zapata
 
The Risks of Lone Working
The Risks of Lone WorkingThe Risks of Lone Working
The Risks of Lone Working
LoneALERT
 
Saxo bank - Annual report 2009
Saxo bank - Annual report 2009Saxo bank - Annual report 2009
Saxo bank - Annual report 2009
Finance Magnates
 
Topic7.1a compensation basic_factors_in_determining_pay_rates new
Topic7.1a compensation basic_factors_in_determining_pay_rates newTopic7.1a compensation basic_factors_in_determining_pay_rates new
Topic7.1a compensation basic_factors_in_determining_pay_rates new
tellstptrisakti
 
Metodología Cheltenham
Metodología CheltenhamMetodología Cheltenham
Metodología Cheltenham
Oliblogmagazine
 
Topic5 1 d_implementing_managementdevelopmentprogram_traininganddevelopingemp...
Topic5 1 d_implementing_managementdevelopmentprogram_traininganddevelopingemp...Topic5 1 d_implementing_managementdevelopmentprogram_traininganddevelopingemp...
Topic5 1 d_implementing_managementdevelopmentprogram_traininganddevelopingemp...
tellstptrisakti
 
CTR: Beyond the Kilt
CTR: Beyond the KiltCTR: Beyond the Kilt
CTR: Beyond the Kilt
Citytravelreview / Curso eG
 
2016 Digital Trends
2016 Digital Trends2016 Digital Trends
2016 Digital Trends
GSW
 
Topic5 3 c_managing_the_appraisal_interview-rev
Topic5 3 c_managing_the_appraisal_interview-revTopic5 3 c_managing_the_appraisal_interview-rev
Topic5 3 c_managing_the_appraisal_interview-rev
tellstptrisakti
 
Competency based training & career development
Competency based training & career developmentCompetency based training & career development
Competency based training & career development
RISHIRAJ EDUCATION FOUNDATION
 
RETIREMENT PLANNING SENSITIZATION NOTES-YVONNE CHASONKHANA
RETIREMENT PLANNING SENSITIZATION NOTES-YVONNE CHASONKHANARETIREMENT PLANNING SENSITIZATION NOTES-YVONNE CHASONKHANA
RETIREMENT PLANNING SENSITIZATION NOTES-YVONNE CHASONKHANAYvonne Chasonkhana
 

Viewers also liked (16)

Université Laval - Analyste d'affaires - agence affaires électroniques - Alai...
Université Laval - Analyste d'affaires - agence affaires électroniques - Alai...Université Laval - Analyste d'affaires - agence affaires électroniques - Alai...
Université Laval - Analyste d'affaires - agence affaires électroniques - Alai...
 
EM Strasbourg - Créateur de valeur, analyse d'affaires et marketing - Alain...
EM Strasbourg  -  Créateur de valeur, analyse d'affaires et marketing - Alain...EM Strasbourg  -  Créateur de valeur, analyse d'affaires et marketing - Alain...
EM Strasbourg - Créateur de valeur, analyse d'affaires et marketing - Alain...
 
Mongodb for DBAs
Mongodb for DBAsMongodb for DBAs
Mongodb for DBAs
 
Alcances sobre el café en el Perú 2016
Alcances sobre el café en el Perú 2016Alcances sobre el café en el Perú 2016
Alcances sobre el café en el Perú 2016
 
Unofficial Transcript
Unofficial TranscriptUnofficial Transcript
Unofficial Transcript
 
The Risks of Lone Working
The Risks of Lone WorkingThe Risks of Lone Working
The Risks of Lone Working
 
FOSET Certificate
FOSET CertificateFOSET Certificate
FOSET Certificate
 
Saxo bank - Annual report 2009
Saxo bank - Annual report 2009Saxo bank - Annual report 2009
Saxo bank - Annual report 2009
 
Topic7.1a compensation basic_factors_in_determining_pay_rates new
Topic7.1a compensation basic_factors_in_determining_pay_rates newTopic7.1a compensation basic_factors_in_determining_pay_rates new
Topic7.1a compensation basic_factors_in_determining_pay_rates new
 
Metodología Cheltenham
Metodología CheltenhamMetodología Cheltenham
Metodología Cheltenham
 
Topic5 1 d_implementing_managementdevelopmentprogram_traininganddevelopingemp...
Topic5 1 d_implementing_managementdevelopmentprogram_traininganddevelopingemp...Topic5 1 d_implementing_managementdevelopmentprogram_traininganddevelopingemp...
Topic5 1 d_implementing_managementdevelopmentprogram_traininganddevelopingemp...
 
CTR: Beyond the Kilt
CTR: Beyond the KiltCTR: Beyond the Kilt
CTR: Beyond the Kilt
 
2016 Digital Trends
2016 Digital Trends2016 Digital Trends
2016 Digital Trends
 
Topic5 3 c_managing_the_appraisal_interview-rev
Topic5 3 c_managing_the_appraisal_interview-revTopic5 3 c_managing_the_appraisal_interview-rev
Topic5 3 c_managing_the_appraisal_interview-rev
 
Competency based training & career development
Competency based training & career developmentCompetency based training & career development
Competency based training & career development
 
RETIREMENT PLANNING SENSITIZATION NOTES-YVONNE CHASONKHANA
RETIREMENT PLANNING SENSITIZATION NOTES-YVONNE CHASONKHANARETIREMENT PLANNING SENSITIZATION NOTES-YVONNE CHASONKHANA
RETIREMENT PLANNING SENSITIZATION NOTES-YVONNE CHASONKHANA
 

Similar to Pasteur deep seq_analysis_theory_2016

Stress your DUT
Stress your DUTStress your DUT
Stress your DUT
Redge Technologies
 
Workshop NGS data analysis - 2
Workshop NGS data analysis - 2Workshop NGS data analysis - 2
Workshop NGS data analysis - 2
Maté Ongenaert
 
List intersection for web search: Algorithms, Cost Models, and Optimizations
List intersection for web search: Algorithms, Cost Models, and OptimizationsList intersection for web search: Algorithms, Cost Models, and Optimizations
List intersection for web search: Algorithms, Cost Models, and Optimizations
Sunghwan Kim
 
Reproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and NextflowReproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and Nextflow
inside-BigData.com
 
Metrics with Ganglia
Metrics with GangliaMetrics with Ganglia
Metrics with Ganglia
Gareth Rushgrove
 
BioMake BOSC 2004
BioMake BOSC 2004BioMake BOSC 2004
BioMake BOSC 2004
Chris Mungall
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2
BITS
 
Introduction to FPGA acceleration
Introduction to FPGA accelerationIntroduction to FPGA acceleration
Introduction to FPGA acceleration
Marco77328
 
Finding the path, by Yoshinobu Matsuzaki [APNIC 38 / APOPS 1]
Finding the path, by Yoshinobu Matsuzaki [APNIC 38 / APOPS 1]Finding the path, by Yoshinobu Matsuzaki [APNIC 38 / APOPS 1]
Finding the path, by Yoshinobu Matsuzaki [APNIC 38 / APOPS 1]
APNIC
 
GDG Cloud Taipei meetup #50 - Build go kit microservices at kubernetes with ...
GDG Cloud Taipei meetup #50 - Build go kit microservices at kubernetes  with ...GDG Cloud Taipei meetup #50 - Build go kit microservices at kubernetes  with ...
GDG Cloud Taipei meetup #50 - Build go kit microservices at kubernetes with ...
KAI CHU CHUNG
 
Debugging node in prod
Debugging node in prodDebugging node in prod
Debugging node in prod
Yunong Xiao
 
clang-intro
clang-introclang-intro
clang-intro
Hajime Morrita
 
[Webinar Slides] Programming the Network Dataplane in P4
[Webinar Slides] Programming the Network Dataplane in P4[Webinar Slides] Programming the Network Dataplane in P4
[Webinar Slides] Programming the Network Dataplane in P4
Open Networking Summits
 
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Databricks
 
[1C2]webrtc 개발, 현재와 미래
[1C2]webrtc 개발, 현재와 미래[1C2]webrtc 개발, 현재와 미래
[1C2]webrtc 개발, 현재와 미래
NAVER D2
 
PostgreSQL Monitoring using modern software stacks
PostgreSQL Monitoring using modern software stacksPostgreSQL Monitoring using modern software stacks
PostgreSQL Monitoring using modern software stacks
Showmax Engineering
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
Kohei KaiGai
 
Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0
Anyscale
 
Handy Networking Tools and How to Use Them
Handy Networking Tools and How to Use ThemHandy Networking Tools and How to Use Them
Handy Networking Tools and How to Use Them
Sneha Inguva
 

Similar to Pasteur deep seq_analysis_theory_2016 (20)

Stress your DUT
Stress your DUTStress your DUT
Stress your DUT
 
Workshop NGS data analysis - 2
Workshop NGS data analysis - 2Workshop NGS data analysis - 2
Workshop NGS data analysis - 2
 
List intersection for web search: Algorithms, Cost Models, and Optimizations
List intersection for web search: Algorithms, Cost Models, and OptimizationsList intersection for web search: Algorithms, Cost Models, and Optimizations
List intersection for web search: Algorithms, Cost Models, and Optimizations
 
Reproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and NextflowReproducible Computational Pipelines with Docker and Nextflow
Reproducible Computational Pipelines with Docker and Nextflow
 
Metrics with Ganglia
Metrics with GangliaMetrics with Ganglia
Metrics with Ganglia
 
BioMake BOSC 2004
BioMake BOSC 2004BioMake BOSC 2004
BioMake BOSC 2004
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2
 
Introduction to FPGA acceleration
Introduction to FPGA accelerationIntroduction to FPGA acceleration
Introduction to FPGA acceleration
 
Finding the path, by Yoshinobu Matsuzaki [APNIC 38 / APOPS 1]
Finding the path, by Yoshinobu Matsuzaki [APNIC 38 / APOPS 1]Finding the path, by Yoshinobu Matsuzaki [APNIC 38 / APOPS 1]
Finding the path, by Yoshinobu Matsuzaki [APNIC 38 / APOPS 1]
 
GDG Cloud Taipei meetup #50 - Build go kit microservices at kubernetes with ...
GDG Cloud Taipei meetup #50 - Build go kit microservices at kubernetes  with ...GDG Cloud Taipei meetup #50 - Build go kit microservices at kubernetes  with ...
GDG Cloud Taipei meetup #50 - Build go kit microservices at kubernetes with ...
 
Debugging node in prod
Debugging node in prodDebugging node in prod
Debugging node in prod
 
clang-intro
clang-introclang-intro
clang-intro
 
[Webinar Slides] Programming the Network Dataplane in P4
[Webinar Slides] Programming the Network Dataplane in P4[Webinar Slides] Programming the Network Dataplane in P4
[Webinar Slides] Programming the Network Dataplane in P4
 
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
 
Insight Data Engineering - Demo
Insight Data Engineering - DemoInsight Data Engineering - Demo
Insight Data Engineering - Demo
 
[1C2]webrtc 개발, 현재와 미래
[1C2]webrtc 개발, 현재와 미래[1C2]webrtc 개발, 현재와 미래
[1C2]webrtc 개발, 현재와 미래
 
PostgreSQL Monitoring using modern software stacks
PostgreSQL Monitoring using modern software stacksPostgreSQL Monitoring using modern software stacks
PostgreSQL Monitoring using modern software stacks
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
 
Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0
 
Handy Networking Tools and How to Use Them
Handy Networking Tools and How to Use ThemHandy Networking Tools and How to Use Them
Handy Networking Tools and How to Use Them
 

Recently uploaded

Anemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditionsAnemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditions
muralinath2
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
anitaento25
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
plant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptxplant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptx
yusufzako14
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
kumarmathi863
 
justice-and-fairness-ethics with example
justice-and-fairness-ethics with examplejustice-and-fairness-ethics with example
justice-and-fairness-ethics with example
azzyixes
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
Areesha Ahmad
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 

Recently uploaded (20)

Anemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditionsAnemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditions
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
plant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptxplant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptx
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 
justice-and-fairness-ethics with example
justice-and-fairness-ethics with examplejustice-and-fairness-ethics with example
justice-and-fairness-ethics with example
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 

Pasteur deep seq_analysis_theory_2016

  • 1. Deep Seq Data Analysis Theoretical training Christophe.antoniewski@upmc.fr http://artbio.fr Mouse Genetics January 21, 2016, 13:30–15:00
  • 3. Latest commercialized Sequencing Technology e Sequencing-by-pH-variations in ION TORRENT
  • 4. Sequencing Technologies : Quantitative Facts
  • 5. Sequencing Technologies : Focus on Illumina technology
  • 6. Deep sequencing applications ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ High throughput sequencing of DNA or RNA provides Qualitative (sequence) and Quantitative (number of reads) information
  • 8. 20-30nt RNA gel purification Small RNA library (Biases) Library “Bar coding”
  • 10. What can I do with my sequence reads ? ◆ ➢ ◆ … ➢ ◆ … ➢
  • 11. Platform Selection Library Preparation Sequencing Quality Control Alignment Assembly Visualization & Statistics • Normalization (library comparison) • Peak finding (Binding sites, Breakpoints, etc…) • Differential Calling (expression, variants, etc) What am I going to sequence ? For what analysis ? Technical biases and limitations Specific benefits (Read length, single or paired ends, number of reads) Whole genome Whole exome Target enrichment Size selection – Stranded/unstranded ? Amplification Single Cell Protocol Length of the read Single or paired ends Number of lanes (depth of sequencing) Adapter Clipping Quality trimming Contaminant and Sequencing Errors Biases in GC contents Bowtie BWA…… Nature Methods 2009 P Flicek & E Birney Velvet, Oases Trinity, SOAP SSAKE…… PLoS ONE 6(3) Zhang W, Chen J, et al. (2011) R, mathlab & Open Source software tools Flowchart of a sequencing project Think to the number of replicates
  • 12. Basic Material for mining sequencing data ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ … ◆
  • 13. Connect to our server $ ssh lbcd41.snv.jussieu.fr $ mkdir <mydir> $ cd <mydir>
  • 14. What is this big* fastq file containning ? → → … … ... mouse@GED-Server:~/raw_data$ more GKG-13.fastq @HWIEAS210R_0028:2:1:3019:1114#AGAAGA/1 Header TNGGAACTTCATACCGTGCTCTCTGTAGGCACCATCAA Sequence +HWIEAS210R_0028:2:1:3019:1114#AGAAGA/1 Header bBb`bfffffhhhhhhhhhhhhhhhhhhhfhhhhhhgh Sequence Quality (ASCII encoded) @HWIEAS210R_0028:2:1:3925:1114#AGAAGA/1 TNCTTGGACTACATATGGTTGAGGGTTGTACTGTAGGC +HWIEAS210R_0028:2:1:3925:1114#AGAAGA/1 ]B]VWaaaaaagggfggggggcggggegdgfgeggbab @HWIEAS210R_0028:2:1:6220:1114#AGAAGA/1 TNGGAACTTCATACCGTGCTCTCTGTAGGCACCATCAA +HWIEAS210R_0028:2:1:6220:1114#AGAAGA/1 aB^^afffffhhhhhhhhhhhhhhhhhhhhhhhchhhh @HWIEAS210R_0028:2:1:6252:1115#AGAAGA/1 TNCTTGGACTACATATGGTTGAGGGTTGTACTGTAGGC +HWIEAS210R_0028:2:1:6252:1115#AGAAGA/1 aBa^ddeeehhhhhhhhhhhhhhhhghhhhhhhefff @HWIEAS210R_0028:2:1:6534:1114#AGAAGA/1 TNAATGCACTATCTGGTACGACTGTAGGCACCATCAAT +HWIEAS210R_0028:2:1:6534:1114#AGAAGA/1 aB^^eeeeegcggfffffffcfffgcgcfffffR^^] @HWIEAS210R_0028:2:1:8869:1114#AGAAGA/1 GNGGACTGAAGTGGAGCTGTAGGCACCATCAATAGATC +HWIEAS210R_0028:2:1:8869:1114#AGAAGA/1 aBaaaeeeeehhhhhhhhhhhhfgfhhgfhhhhgga^^
  • 15. How many sequence reads in my file ? → wc - l <path/to/my/file> mouse@GED-Server:~/raw_data$ wc -l GKG-13.fastq 25703828 GKG-13.fastq mouse@GED-Server:~/raw_data$ grep -c -e "^@" GKG-13.fastq 6425957 in python interpreter: >>> 25703828 / 4 6425957
  • 16. Are my sequence reads containing the adapter ? → cat <path/file> | grep CTGTAGG | wc –l → grep -c "CTGTAGG" <path/file> mouse@GED-Server:~/raw_data$ cat GKG-13.fastq | grep CTGTAGG | wc -l 6355061 mouse@GED-Server:~/raw_data$ grep -c "CTGTAGG" GKG-13.fastq 6355061 6 355 061 out of 6 425 957 sequences … not bad (98.8%) My 3’ adapter: CTGTAGGCACCATCAAT mouse@GED-Server:~/raw_data$ cat GKG-13.fastq | grep ATCTCGT| wc -l 308 A contrario
  • 17. $mouse@GED-Server:~/raw_data$ cat GKG-13.fastq | perl -ne 'print if /^[ATGCN]{22}CTGTAGG/' | wc -l Outputs the content of a file, line by line The output is passed to the input of the next command perl interpreter is called with –ne options (loop & execute) In line perl code Regular expression The output is passed to the input of the next command wc with –l option counts the lines A more advanced example of combining Unix commands 1 675 469 22nt long reads with 3’ flanking CTGTAGG adapter sequence
  • 18. Clipping adapter sequences Unix Operating Systems already contain powerful native tools for sequence analyses cat GKG-13.fastq | perl -ne 'if (/^(.+CTGTAGG)/) {print "$1n"}' | more mouse@GED-Server:~/raw_data$ cat GKG-13.fastq | perl -ne 'if (/^([GATC]{18,})CTGTAGG/) {$count++; print ">$countn"; print "$1n"}' > clipped_GKG13.fasta Final command line clipper
  • 21. mouse@GED-Server:~/instructor$bowtie ../genomes/Dmel_r5.49 -f clipped_GKG13.fasta -v 1 -k 1 -p 6 -- al droso_matched_GKG-13.fa --un unmatched_GKG13.fa -S > GKG13_bowtie_output.sam A bowtie alignment (command lines) ../genomes/Dmel_r5.49 -f clipped_GKG13.fasta -v 1 -k 1 -p 6 --al droso_matched_GKG-13.fa --un unmatched_GKG13.fa -S > GKG13_bowtie_output.sam # reads processed: 5930851 # reads with at least one reported alignment: 4992296 (84.18%) # reads that failed to align: 938555 (15.82%) Reported 4992296 alignments to 1 output stream(s) mouse@GED-Server:~/genomes$ bowtie-build Dmel_r5.49.fa Dmel_r5.49
  • 22. Bowtie outputs deepseq$ ls -laht -rw-r--r-- 1 deepseq staff 351M Mar 24 17:46 GKG13_bowtie_output.tabulated -rw-r--r-- 1 deepseq staff 156M Mar 24 17:46 droso_matched_GKG-13.fa -rw-r--r-- 1 deepseq staff 28M Mar 24 17:46 unmatched_GKG13.fa SAM alignment : $ more GKG13_bowtie_output.sam Aligned reads: $ more droso_matched_GKG-13.fa Unaligned reads: $ more unmatched_GKG13.fa
  • 24. Formats Raw sequence: Fastq (quality), Fasta (w/o quality) Aligned sequence: Genome annotation: GFF, GTF, Sam Bam • Sorted • Indexed • Compressed
  • 26. Pileup Format seq1 272 T 24 ,.$.....,,.,.,...,,,.,..^+. <<<+;<<<<<<<<<<<=<;<;7<& seq1 273 T 23 ,.....,,.,.,...,,,.,..A <<<;<<<<<<<<<3<=<<<;<<+ seq1 274 T 23 ,.$....,,.,.,...,,,.,... 7<7;<;<<<<<<<<<=<;<;<<6 seq1 275 A 23 ,$....,,.,.,...,,,.,...^l. <+;9*<<<<<<<<<=<<:;<<<< seq1 276 G 22 ...T,,.,.,...,,,.,.... 33;+<<7=7<<7<&<<1;<<6< seq1 277 T 22 ....,,.,.,.C.,,,.,..G. +7<;<<<<<<<&<=<<:;<<&< seq1 278 G 23 ....,,.,.,...,,,.,....^k. %38*<<;<7<<7<=<<<;<<<<< seq1 279 C 23 A..T,,.,.,...,,,.,..... ;75&<<<<<<<<<=<<<9<<:<<
  • 27. Next week, we will perform an NGS analysis using the Galaxy framework. We will speak about Accessibility, Reproducibility and Transparency. Please have a look to http://galaxyproject.org/ You can register and try it Also, access to http://lbcd41.snv.jussieu.fr with login: (to be communicated) password: (to be communicated) AND Register (Menu “user” → “register”) with your email address