D03-NextGen-Bio-NGS

Bio-‐NGS:
BioRuby
plugin
to
conduct

programmable
workﬂows
for

Next
Genera?on
Sequencing
data

Raoul
J.P.
Bonnal
co-‐authors

bonnal@ingm.org
Francesco
Strozzi

Valeria
Ranzani

Integra(ve
Biology
Program
Toshiaki
Katayama

Is(tuto
Nazionale
di
Gene(ca
Molecolare

Italy

July
15,
2011
BOSC,
Vienna,
Austria

Bio-‐Gem

authors:
Raoul
J.P.
Bonnal,
Pjotr
Prins,
Toshiaki
Katayama

•  a
soOware
generator
for
crea(ng
BioRuby

plugins

•  last
year
(@BOSC
2010)
was
an
idea
and
a
prototype

•  Features:
• bio-‐assembly
(0.1.0)
• bio-‐isoelectric_point
(0.1.1)

• bio-‐blastxmlparser
(0.6.1)
• bio-‐kb-‐illumina
(0.1.0)

• bio-‐bwa
(0.2.2)
• bio-‐lazyblastxml
(0.4.0)

–  Extend
BioRuby
• bio-‐cnls_screenscraper

(0.1.0)

• bio-‐
• bio-‐logger
(0.9.0)

• bio-‐nexml
(0.0.1)

• bio-‐ngs
(0.2.1)

–  Modularity
emboss_six_frame_nucleo(de • bio-‐octopus
(0.1.1)

_sequences
(0.1.0)

• bio-‐gem
(0.2.2)

• bio-‐samtools
(0.2.4)

• bio-‐sge
(0.0.0)

–  Easy
• bio-‐genomic-‐interval
(0.1.2)
• bio-‐tm_hmm
(0.2.0)

• bio-‐gﬀ3
(0.8.6)

• bio-‐graphics
(1.4)

• bio-‐ucsc-‐api
(0.1.0)

•  sharing:packaging:publishing
• bio-‐hello
(0.0.0)

–  Just
Code
!

Dev:
hcps://github.com/helios/bioruby-‐gem

Install:
gems
install
bio-‐gem

July
15,
2011
BOSC,
Vienna,
Austria

Bio-‐NGS

An
Applica(on

A
SoOware
Development
Framework

A
Project
Environment

July
15,
2011
BOSC,
Vienna,
Austria

Applica(on

•  Stand
alone

–  Auto
install
everything
it
needs
–
sandbox/isola*on-‐

–  System-‐wide
or
User
(RVM
–Ruby
Version
Manager-‐)

•  Mul(
plagorms

–  Linux,
OS
X

–  MRI,
JRuby

•  Command
line

–  Thor:
a
simple
and
eﬃcient
tool
for
building
self-‐documen(ng
command

line
u(li(es

•  Common
syntax
to
diﬀerent
applica(ons

•  Collec(on
of
Tasks

–  Basic,
Advanced
RVM
hcps://rvm.beginrescueend.com/

Thor
hcps://github.com/wycats/thor

July
15,
2011
BOSC,
Vienna,
Austria

SoOware
Development
Framework

•  Expand
BioRuby’s
func(onali(es
to
NGS

•  API
+
Consistent
Namespace

•  Integrate
third-‐party
tools

•  Wrapping
:
quick,
easy
support,
increase
produc(vity

•  Binding
:
low-‐level
func(onali(es

•  Modular,
reuse
other
plug-‐ins

•  BioBwa
(binding)

•  BioSamtools
(binding)

July
15,
2011
BOSC,
Vienna,
Austria

Project
Environment

•  Directory
scaﬀold

•  Customize

–  Tasks
:
Thor
or
Rake
(Ruby
DSL)

–  Conﬁgura(ons:
YAML

•  History

•  Embedded
DB

–  SQLite3

July
15,
2011
BOSC,
Vienna,
Austria

Tools

e/
Bow(

BWA
?

?

More…

Quant

FASTX-‐
Toolkit

July
15,
2011
BOSC,
Vienna,
Austria

Tools

Bio-‐NGS

Primary:

Secondary:
Ter(ary:

Pre-‐Processing
Alignment
Knowledge

Conversion,
Filter,

Bow(e/
TopHat
Samtools
Cuﬄinks
Ontology

FASTX-‐Toolkit
BWA

More…
BAM
More…
Diﬀeren(al
More…

Illumina
bcl
FASTQ
Quant

Expression

Local
Execu(on
Distributed/Parallel

July
15,
2011
BOSC,
Vienna,
Austria

Wrapper

module Bio
module Ngs
module Cufflinks
class Compare
include Bio::Command::Wrapper

set_program Bio::Ngs::Utils.binary("cufflinks/cuffcompare")
use_aliases

add_option "outprefix", :type => :string, :aliases => '-o', :default =>
"Comparison"
add_option "gtf_combine_file", :type => :string, :aliases => '-i'
add_option "gtf_reference", :type => :string, :aliases => '-r'
add_option "only_overlap", :type => :boolean, :aliases => '-R'
add_option "discard_transfrags", :type => :boolean, :aliases => '-M’

end
end
end
end

July
15,
2011
BOSC,
Vienna,
Austria

Wrapper

module Bio
module Ngs
module Cufflinks
class Compare
include Bio::Command::Wrapper

set_program Bio::Ngs::Utils.binary("cufflinks/cuffcompare")
use_aliases

add_option "outprefix", :type => :string, :aliases => '-o', :default =>
"Comparison"
add_option "gtf_combine_file", :type => :string, :aliases => '-i'
add_option "gtf_reference", :type => :string, :aliases => '-r'
irb(main):001:0> require:type => :boolean, :aliases => '-R'
add_option "only_overlap", ‘bio-ngs’
add_option "discard_transfrags", :type => :boolean, :aliases => '-M’
irb(main):001:1> cuffcompare = Bio::Ngs::Cufflinks::Compare.new
irb(main):001:2> cuffcompare.params = {….}
irb(main):001:3> cuffcompare.run(:arguments=>[…])
end
end
=> #<Bio::Ngs::Cufflinks::Compare:0x0000000c1630f8 @program="/
end
end usr/local/lib/ruby/gems/1.9.1/gems/bio-ngs-0.2.1/lib/bio/ngs/
ext/bin/linux/cufflinks/cuffcompare", @options={}, @params={}>

July
15,
2011
BOSC,
Vienna,
Austria

Tasks

No
binary
found
with
this
name:
setupBclToQseq.py
biongs
convert:qseq:fastq:samples_by_lane
SAMPLES
LANE
project

No
binary
found
with
this
name:
fastq_quality_boxplot_graph.sh
OUTPUT
-‐-‐-‐-‐-‐-‐-‐

No
binary
found
with
this
name:
blastn
biongs
project:new
[NAME]

No
binary
found
with
this
name:
blastx
history
biongs
project:update
[TYPE]

WARNING:
no
program
is
associated
with
BCLQSEQ
task,
does
-‐-‐-‐-‐-‐-‐-‐

not
make
sense
to
create
a
thor
task.
biongs
history:8

#
Task
convert:illumina:de:isoform
quality

WARNING:
no
program
is
associated
with
BLASTN
task,
does
not
PARAMETERS:
/Users/bonnalraoul/Desktop/
make
sense
to
create
a
thor
task.
RRep16giugno/DE_lane1-‐2-‐3-‐4-‐6-‐8/DE_lane1-‐2-‐3-‐4-‐6-‐8/ -‐-‐-‐-‐-‐-‐-‐

isoform_exp.diff
/Users/bonnalraoul/Desktop/ biongs
quality:boxplot
FASTQ_QUALITY_STATS

WARNING:
no
program
is
associated
with
BLASTX
task,
does
not

RRep16giugno/COMPARE_lane1-‐2-‐3-‐4-‐6-‐8/COMPA...
biongs
quality:fastq_stats
FASTQ

make
sense
to
create
a
thor
task.
biongs
quality:illumina_b_profile_raw
FASTQ
-‐-‐read-‐length=N

bwa
homology
biongs
quality:illumina_b_profile_svg
FASTQ

-‐-‐-‐
-‐-‐-‐-‐-‐-‐-‐-‐
biongs
quality:reads
FASTQ

biongs
bwa:aln:long
[FASTQ]
-‐-‐file-‐out=FILE_OUT
-‐-‐prefix=PREFIX
biongs
homology:convert:blast2text
[XML
FILE]
-‐-‐file-‐ biongs
quality:reads_coverage
FASTQ_QUALITY_STATS

biongs
bwa:aln:short
[FASTQ]
-‐-‐ out=FILE_OUT
biongs
quality:scacerplot
EXPR1
EXPR2
OUTPUT

prefix=PREFIX
biongs
homology:convert:go2json
biongs
quality:trim
FASTQ

biongs
bwa:index:long
[FASTA]

biongs
bwa:index:short
[FASTA]
biongs
homology:db:export
[TABLE]
-‐-‐fileout=FILEOUT
rna

biongs
bwa:sam:paired
-‐-‐fastq=one
two
three
homology:db:init

out=FILE_OUT
-‐-‐sai=one
two
three

-‐-‐-‐

biongs
bwa:sam:single
[SAI]
-‐-‐fastq=FASTQ
homology:download:all
biongs
rna:compare
GTF_REF
OUTPUTDIR

out=FILE_OUT
biongs
homology:download:goannota(on

GTFS_QUANTIFICATION

biongs
homology:download:uniprot
biongs
rna:idx2fasta
INDEX
FASTA

convert
biongs
homology:load:blast
[FILE]
biongs
rna:mapquant
DIST
INDEX
OUTPUTDIR
FASTQS

-‐-‐-‐-‐-‐-‐-‐
biongs
homology:load:goa
biongs
rna:quant
GTF
OUTPUTDIR
BAM

biongs
convert:bam:extract_genes
BAM
GENES
-‐-‐ensembl-‐ biongs
homology:report:blast
biongs
rna:tophat
DIST
INDEX
OUTPUTDIR
FASTQS

release=N
-‐o,
-‐-‐output=OUTPUT

biongs
convert:bam:merge
-‐i,
-‐-‐input-‐bams=one
two
three
ontology
sff

biongs
convert:bam:sort
BAM
[PREFIX]
-‐-‐-‐-‐-‐-‐-‐-‐
-‐-‐-‐

biongs
convert:bcl:qseq:convert
RUN
OUTPUT
[JOBS]

biongs
ontology:db:export
[TABLE]
biongs
sff:extract
[FILE]

biongs
convert:illumina:de:gene
DIFF
GTF
biongs
ontology:db:init

biongs
DIFF
GTF
biongs
ontology:download:all

biongs
convert:illumina:de:rename_qs
DIFF_FILE
NAMES
biongs
ontology:download:go

biongs
convert:illumina:fastq:trim_b
FASTQ
biongs
ontology:download:goslim

biongs
convert:illumina:humanize:build_compare_kb
GTF
biongs
ontology:load:genego
[FILE]

biongs
convert:illumina:humanize:isoform_exp
GTF
ISOFORM
biongs
ontology:load:go
[FILE]

biongs
convert:qseq:fastq:by_file
FIRST
OUTPUT
biongs
ontology:report:go

biongs
convert:qseq:fastq:by_lane
LANE
OUTPUT

biongs
convert:qseq:fastq:by_lane_index
LANE
INDEX
OUTPUT

July
15,
2011
BOSC,
Vienna,
Austria

N o
B i n a r y
Tasks

Task
disabled

No
binary
found
with
this
name:
setupBclToQseq.py
biongs
convert:qseq:fastq:samples_by_lane
SAMPLES
LANE
project

Keep

OUTPUT

No
binary
found
with
this
name:
fastq_quality_boxplot_graph.sh

No
binary
found
with
this
name:
blastn

-‐-‐-‐-‐-‐-‐-‐

biongs
project:new
[NAME]

everything

No
binary
found
with
this
name:
blastx
history
biongs
project:update
[TYPE]

WARNING:
no
program
is
associated
with
BCLQSEQ
task,
does
-‐-‐-‐-‐-‐-‐-‐

not
make
sense
to
create
a
thor
task.
biongs
history:8

#
Task

organized

quality

WARNING:
no
program
is
associated
with
BLASTN
task,
does
not
PARAMETERS:
/Users/bonnalraoul/Desktop/
make
sense
to
create
a
thor
task.
RRep16giugno/DE_lane1-‐2-‐3-‐4-‐6-‐8/DE_lane1-‐2-‐3-‐4-‐6-‐8/ -‐-‐-‐-‐-‐-‐-‐

isoform_exp.diff
/Users/bonnalraoul/Desktop/ biongs
quality:boxplot
FASTQ_QUALITY_STATS

WARNING:
no
program
is
associated
with
BLASTX
task,
does
not

RRep16giugno/COMPARE_lane1-‐2-‐3-‐4-‐6-‐8/COMPA...
biongs
quality:fastq_stats
FASTQ

make
sense
to
create
a
thor
task.
biongs
quality:illumina_b_profile_raw
FASTQ

bwa
homology
biongs
quality:illumina_b_profile_svg
FASTQ

-‐-‐-‐
-‐-‐-‐-‐-‐-‐-‐-‐
biongs
quality:reads
FASTQ

biongs
bwa:aln:long
[FASTQ]
biongs
homology:convert:blast2text
[XML
FILE]
quality:reads_coverage
FASTQ_QUALITY_STATS

biongs
bwa:aln:short
[FASTQ]
-‐-‐ out=FILE_OUT
biongs
quality:scacerplot
EXPR1
EXPR2
OUTPUT

prefix=PREFIX

biongs
bwa:index:long
[FASTA]

biongs
homology:convert:go2json

Repor(ng
biongs
quality:trim
FASTQ

biongs
bwa:index:short
[FASTA]

biongs
bwa:sam:paired
-‐-‐fastq=one
two
three
-‐-‐file-‐
Recall
an

biongs
homology:db:export
[TABLE]
rna

biongs
homology:db:init
-‐-‐-‐

old

out=FILE_OUT
-‐-‐sai=one
two
three

biongs
bwa:sam:single
[SAI]
-‐-‐fastq=FASTQ
-‐-‐file-‐
out=FILE_OUT

biongs
homology:download:all

biongs
homology:download:goannota(on

biongs
rna:compare
GTF_REF
OUTPUTDIR

GTFS_QUANTIFICATION

convert

analysis

biongs
homology:download:uniprot

biongs
homology:load:blast
[FILE]

biongs
rna:idx2fasta
INDEX
FASTA

biongs
rna:mapquant
DIST
INDEX
OUTPUTDIR
FASTQS

-‐-‐-‐-‐-‐-‐-‐
biongs
homology:load:goa
biongs
rna:quant
GTF
OUTPUTDIR
BAM

biongs
convert:bam:extract_genes
BAM
GENES
-‐-‐ensembl-‐ biongs
homology:report:blast
biongs
rna:tophat
DIST
INDEX
OUTPUTDIR
FASTQS

release=N
-‐o,
-‐-‐output=OUTPUT

biongs
convert:bam:merge
-‐i,
-‐-‐input-‐bams=one
two
three
ontology
sff

biongs
convert:bam:sort
BAM
[PREFIX]
-‐-‐-‐-‐-‐-‐-‐-‐
-‐-‐-‐

biongs
convert:bcl:qseq:convert
RUN
OUTPUT
[JOBS]

biongs
ontology:db:export
[TABLE]
biongs
sff:extract
[FILE]

biongs
convert:illumina:de:gene
DIFF
GTF
biongs
ontology:db:init

biongs
DIFF
GTF
biongs
ontology:download:all

biongs
convert:illumina:de:rename_qs
DIFF_FILE
NAMES
biongs
ontology:download:go

biongs
convert:illumina:fastq:trim_b
FASTQ
biongs
ontology:download:goslim

biongs
convert:illumina:humanize:build_compare_kb
GTF
biongs
ontology:load:genego
[FILE]

biongs
convert:illumina:humanize:isoform_exp
GTF
ISOFORM
biongs
ontology:load:go
[FILE]

biongs
convert:qseq:fastq:by_file
FIRST
OUTPUT

biongs
convert:qseq:fastq:by_lane
LANE
OUTPUT

biongs
ontology:report:go
Basic
Advanced

biongs
convert:qseq:fastq:by_lane_index
LANE
INDEX
OUTPUT

July
15,
2011
BOSC,
Vienna,
Austria

Tasks

class Rna < Thor

desc "mapquant DIST INDEX OUTPUTDIR FASTQS", "map and quantify"
method_option :paired, :type => :boolean, :default => false, :desc => 'Are reads paired? If you chose
this option pass just the basename
of the file without forward/reverse
and .fastq'
def mapquant(dist, index, outputdir, fastqs)
#tophat
invoke :tophat, [dist, index, outputdir, fastqs], :paired=>options.paired
#cufflinks quantification on gtf
invoke :quant, ["#{index}.gtf", File.join(outputdir,"quantification"), File.join(outputdir,"accepted_hits_sort.bam")]
end
…
end

July
15,
2011
BOSC,
Vienna,
Austria

Tasks

class Rna < Thor

# you'll end up with 3 accept file, regular, sorted, sorted-indexed
desc "tophat DIST INDEX OUTPUTDIR FASTQS", "run tophat as from command line, default 6 processors and then create a
sorted bam indexed."
method_option :paired, :type => :boolean, :default => false, :desc => 'Are reads paired? If you chose this option pass
just the…’
Bio::Ngs::Tophat.new.thor_task(self, :tophat) do |wrapper, task, dist, index, outputdir, fastqs|
wrapper.params = task.options #merge passed options to the wrapper.
wrapper.params = {"mate-inner-dist"=>dist, "output-dir"=>outputdir, "num-threads"=>6, "solexa1.3-quals"=>true}
fastq_files = task.options[:paired] ? ["#{fastqs}_forward.fastq","#{fastqs}_reverse.fastq"] : ["#{fastqs}"]
wrapper.run :arguments=>[index, fastq_files ].flatten, :separator=>"="
class Rna < Thor

accepted_hits_bam_fn = File.join(outputdir, "accepted_hits.bam")
method_option :paired, "convert:bam:sort", :default => false, :desc => 'Are reads paired? If you chose
task.invoke :type => :boolean, [accepted_hits_bam_fn] # call the sorting procedure.
end this option pass just the basename
end of the file without forward/reverse
and .fastq'
#tophat
end
…
end

July
15,
2011
BOSC,
Vienna,
Austria

Tasks

class Rna < Thor

# you'll end up with 3 accept file, regular, sorted, sorted-indexed
desc "tophat DIST INDEX OUTPUTDIR FASTQS", "run tophat as from command line, default 6 processors and then create a
sorted bam indexed."
method_option :paired, :type => :boolean, :default => false, :desc => 'Are reads paired? If you chose this option pass
just the…’
Bio::Ngs::Tophat.new.thor_task(self, :tophat) do |wrapper, task, dist, index, outputdir, fastqs|
wrapper.params = task.options #merge passed options to the wrapper.
wrapper.params = {"mate-inner-dist"=>dist, "output-dir"=>outputdir, "num-threads"=>6, "solexa1.3-quals"=>true}
fastq_files = task.options[:paired] ? ["#{fastqs}_forward.fastq","#{fastqs}_reverse.fastq"] : ["#{fastqs}"]
wrapper.run :arguments=>[index, fastq_files ].flatten, :separator=>"="
class Rna < Thor

accepted_hits_bam_fn = File.join(outputdir, "accepted_hits.bam")
method_option :paired, "convert:bam:sort", :default => false, :desc => 'Are reads paired? If you chose
task.invoke :type => :boolean, [accepted_hits_bam_fn] # call the sorting procedure.
end this option pass just the basename
end of the file without forward/reverse
and .fastq'
#tophat
end
…
end

class Rna < Thor
desc "quant GTF OUTPUTDIR BAM ", "Genes and transcripts quantification"
Bio::Ngs::Cufflinks::Quantification.new.thor_task(self, :quant) do |wrapper, task, gtf, outputdir, bam|
wrapper.params = task.options
wrapper.params = {"num-threads" => 6, "output-dir" => outputdir, "GTF" => gtf }
wrapper.run :arguments=>[bam], :separator => "="
end
end

July
15,
2011
BOSC,
Vienna,
Austria

Next?

•  Support
more
soOware,
not
only
NGS

•  Wrap
EMBOSS
on
the
ﬂy
reading
acd
ﬁles

•  Tune
according
to
hardware

•  Share
tasks

–  Thor
&
Rake

•  Improve
JRuby
compa(bility

•  Contributes

•  Scalability

–  Cloud
?
BioLinux

–  BioHub:
distribute
tasks
using
messaging

•  Ac(veMQ

•  Stomp

•  Ac(veMessaging

•  Adapters
for
Queuing
Systems

July
15,
2011
BOSC,
Vienna,
Austria

Acknowledgments

Serena
Cur(
Francesco
Strozzi1,3

Groningen
Bioinforma(cs
Centre

Debora
Mascheroni
Pjotr
Prins2

Alessandra
Stella

Valeria
Parente

Valeria
Ranzani1

Anna
Ripamon(

Grazisa
Rossez

Riccardo
L.
Rossi

Laboratory
of
Genome
Database
Dan
MacLean
4

Roberto
Sciarreca
Toshiaki
Katayama1,2

The
Genome
Analysis
Centre

Ricardo
Ramirez-‐Gonzalez
4

Massimiliano
Pagani

1
bio-‐ngs,
2
bio-‐gem,
3
bio-‐bwa,
4
bio-‐samtools

July
15,
2011
BOSC,
Vienna,
Austria

Ques(ons
?

INFO

E-‐mail:
bonnal@ingm.org
/
r@bioruby.org

Dev

:
hcp://github/helios/bioruby-‐ngs

Docs
:
hcps://github.com/helios/bioruby-‐ngs/blob/master/README.rdoc

Wiki

:
hcp://bioruby.open-‐bio.org/wiki/Next_Genera(on_Sequencing

BioRuby-‐ML:
hcp://lists.open-‐bio.org/mailman/lis(nfo/bioruby

Irc:
#bioruby
(
irc.freenode.org
)

July
15,
2011
BOSC,
Vienna,
Austria

D03-NextGen-Bio-NGS

More Related Content

Viewers also liked

Similar to D03-NextGen-Bio-NGS

More from Bioinformatics Open Source Conference

Recently uploaded

D03-NextGen-Bio-NGS