SlideShare a Scribd company logo
1 of 68
Download to read offline
Building
collaborative
workflows for
scientific data
bmpvieira.com/orcambridge14
|
Phd Student @
Bioinformatics and
Population Genomics
Supervisor:
Yannick Wurm |
Before:
Bruno Vieira @bmpvieira
@yannick__
© 2014 Bruno Vieira CC-BY 4.0
Sequencing cost drops
Sequencing data rises
Goodbye Excel/Windows
Hello command line
Hello super computers
Programming
Programming
Programming
Programming
Programming
Reproducibility crisis
Losing data
Reproducibility layers
Code
Data
Workflow
Environment
Code
The GitHub for Science...
is GitHub!
Code as a research output
Reproducibility layers
Code
Data
Workflow
Environment
Data
Dat
open source tool for sharing and
collaborating on data
started august '13, we are grant funded
and 100% open source
public
on freenode
dat-data.com
#dat
gitter.im/datproject/discussions
Dat Community Call #1
Dat - "git for data"
npm install -g dat
dat init
collect-data | dat import
dat listen
Dat
dat clone
dat pull --live
dat blobs put mygenome data.fasta
dat cat | transform
dat cat | docker run -i transform
http://eukaryota.dathub.org
Dat
Planned
dat checkout revision
dat diff
dat branch
multi master replication
sync to databases
registry
Data stored locally in leveldb, but can use
other backends such as
Postgres
Redis
etc
Files stored in blob-stores
s3
local-fs
bitorrent
ftp
etc
Dat features
auto schema generation
free REST API
all APIs are streaming
Dat workshop
maxogden.github.io/get-dat
Dat quick deploy
github.com/bmpvieira/heroku-dat-template
Reproducibility layers
Code
Data
Workflow
Environment
Workflow
Bionode
open source project for modular and
universal bioinformatics
started january '14
bionode.io
Some problems I faced
during my research:
Difficulty getting relevant descriptions and
datasets from NCBI API using bio* libs
For web projects, needed to implement
the same functionality on browser and
server
Difficulty writing scalable, reproducible
and complex bioinformatic pipelines
Bionode also collaborates with BioJS
Bionode
npm install -g bionode
bionode ncbi download gff bacteria
bionode ncbi download sra arthropoda |
bionode sra fastq-dump
npm install -g bionode-ncbi
bionode-ncbi search assembly formicidae |
dat import --json
Bionode - list of modules
Name Type Status People
Data
access
status production
Parser status production
Wrangling status production
Data
access
status production
Parser status production
ncbi
fasta
seq IM
ensembl
blast-
parser
Bionode - list of modules
Name Type Status People
Documentation status production
Documentation status production
Documentation status production
Documentation status production
template
JS pipeline
Gasket
pipeline
Dat/Bionode
workshop
Bionode - list of modules
Name Type Status People
Wrappers status development
Wrappers status development
Wrappers status development
Parser status development
sra
bwa
sam
bbi
Bionode - list of modules
status request
Name Type People
Data access
Data access
Parser
Parser
Wrappers
Wrappers
Wrappers
ebi
semantic
vcf
gff
bowtie
sge badryan
blast
Bionode - list of modules
Name Type People
Wrappers
Wrappers
Wrappers
Wrappers
Wrappers
Wrappers
vsearch
khmer
rsem
gmap
star
go badryan
Bionode - Why wrappers?
Same interface between modules
(Streams and NDJSON)
Easy installation with NPM
Semantic versioning
Add tests
Abstract complexity / More user friendly
Bionode - Why Node.js?
Same code client/server side
Need to reimplement the same code on
browser and server.
Solution: JavaScript everywhere
->
-> ,
->
->
Afra bionode-seq
GeneValidator seq fasta
SequenceServer
BioJS collaborating for code reuse
Biodalliance converting to bionode
Bionode - Why Node.js?
Reusable, small and tested
modules
Benefit from other JS
projects
Dat BioJS NoFlo
Difficulty getting relevant description and
datasets from NCBI API using bio* libs
Python example: URL for the Achromyrmex
assembly?
Solution:
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG
import xml.etree.ElementTree as ET
from Bio import Entrez
Entrez.email = "mail@bmpvieira.com"
esearch_handle = Entrez.esearch(db="assembly", term="Achromyrmex")
esearch_record = Entrez.read(esearch_handle)
for id in esearch_record['IdList']:
esummary_handle = Entrez.esummary(db="assembly", id=id)
esummary_record = Entrez.read(esummary_handle)
documentSummarySet = esummary_record['DocumentSummarySet']
document = documentSummarySet['DocumentSummary'][0]
metadata_XML = document['Meta'].encode('utf-8')
metadata = ET.fromstring('' + metadata_XML + '')
for entry in Metadata[1]:
print entry.text
bionode-ncbi
Difficulty getting relevant description and
datasets from NCBI API using bio* libs
Example: URL for the Achromyrmex
assembly?
JavaScript
http://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000204515.1_Aech_3.9/GCA_00020
4515.1_Aech_3.9_genomic.fna.gz
var bio = require('bionode')
bio.ncbi.urls('assembly', 'Acromyrmex', function(urls) {
console.log(urls[0].genomic.fna)
})
bio.ncbi.urls('assembly', 'Acromyrmex').on('data', printGenomeURL)
function printGenomeURL(urls) {
console.log(urls[0].genomic.fna)
})
Difficulty getting relevant description and
datasets from NCBI API using bio* libs
Example: URL for the Achromyrmex
assembly?
JavaScript
BASH
http://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000204515.1_Aech_3.9/GCA_00020
4515.1_Aech_3.9_genomic.fna.gz
var ncbi = require('bionode-ncbi')
var ndjson = require('ndjson')
ncbi.urls('assembly', 'Acromyrmex')
.pipe(ndjson.stringify())
.pipe(process.stdout)
bionode-ncbi urls assembly Acromyrmex |
tool-stream extractProperty genomic.fna
Difficulty writing scalable, reproducible and
complex bioinformatic pipelines.
Solution: Node.js Streams everywhere
var ncbi = require('bionode-ncbi')
var tool = require('tool-stream')
var through = require('through2')
var fork1 = through.obj()
var fork2 = through.obj()
Difficulty writing scalable, reproducible and
complex bioinformatic pipelines.
Solution: Node.js Streams everywhere
ncbi
.search('sra', 'Solenopsis invicta')
.pipe(fork1)
.pipe(dat.reads)
fork1
.pipe(tool.extractProperty('expxml.Biosample.id'))
.pipe(ncbi.search('biosample'))
.pipe(dat.samples)
fork1
.pipe(tool.extractProperty('uid'))
.pipe(ncbi.link('sra', 'pubmed'))
.pipe(ncbi.search('pubmed'))
.pipe(fork2)
.pipe(dat.papers)
Difficulty writing scalable, reproducible and
complex bioinformatic pipelines.
bionode-ncbi search genome Guillardia theta |
tool-stream extractProperty assemblyid |
bionode-ncbi download assembly |
tool-stream collectMatch status completed |
tool-stream extractProperty uid|
bionode-ncbi link assembly bioproject |
tool-stream extractProperty destUID |
bionode-ncbi link bioproject sra |
tool-stream extractProperty destUID |
bionode-ncbi download sra |
bionode-sra fastq-dump |
tool-stream extractProperty destFile |
bionode-bwa mem 503988/GCA_000315625.1_Guith1_genomic.fna.gz |
tool-stream collectMatch status finished|
tool-stream extractProperty sam|
bionode-sam
Difficulty writing scalable, reproducible and
complex bioinformatic pipelines.
bionode-example-dat-gasket
get-dat workshop
get-dat bionode gasket example
Difficulty writing scalable, reproducible and
complex bioinformatic pipelines.
{
"import-data": [
"bionode-ncbi search genome eukaryota",
"dat import --json --primary=uid"
],
"search-ncbi": [
"dat cat",
"grep Guillardia",
"tool-stream extractProperty assemblyid",
"bionode-ncbi download assembly -",
"tool-stream collectMatch status completed",
"tool-stream extractProperty uid",
"bionode-ncbi link assembly bioproject -",
"tool-stream extractProperty destUID",
"bionode-ncbi link bioproject sra -",
"tool-stream extractProperty destUID",
"grep 35526",
"bionode-ncbi download sra -",
"tool-stream collectMatch status completed",
"tee > metadata.json"
],
Difficulty writing scalable, reproducible and
complex bioinformatic pipelines.
"index-and-align": [
"cat metadata.json",
"bionode-sra fastq-dump -",
"tool-stream extractProperty destFile",
"bionode-bwa mem **/*fna.gz"
],
"convert-to-bam": [
"bionode-sam 35526/SRR070675.sam"
]
}
Difficulty writing scalable, reproducible and
complex bioinformatic pipelines.
datscript
pipeline main
run pipeline import
pipeline import
run foobar | run dat import --json
bmpvieira example
ekg example
Reproducibility layers
Code
Data
Workflow
Environment
Environment
Docker for reproducible
science
docker run bmpvieira/thesis
- Modular and universal bioinformatics
Pipeable UNIX command line tools and
JavaScript / Node.js APIs for bioinformatic
analysis workflows on the server and browser.
- Build data pipelines
Provides a streaming interface between every file
format and data storage backend. "git for data"
Bionode.io
#bionode
gitter.im/bionode/bionode
Dat-data.com
#dat
gitter.im/datproject/discussions
Acknowledgements
@yannick__
@maxogden
@mafintosh
@erikgarrison
@QM_SBCS
@opendata
Bionode contributors
Thanks!
"Science should work as an
Open Source project"
dat-data.com
bionode.io

More Related Content

What's hot

FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational WorkflowsCarole Goble
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsCarole Goble
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)Carole Goble
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects Carole Goble
 
FAIR History and the Future
FAIR History and the FutureFAIR History and the Future
FAIR History and the FutureCarole Goble
 
Open Science: how to serve the needs of the researcher?
Open Science: how to serve the needs of the researcher? Open Science: how to serve the needs of the researcher?
Open Science: how to serve the needs of the researcher? Carole Goble
 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data ManagementCarole Goble
 
Data management, data sharing: the SysMO-SEEK Story
Data management, data sharing: the SysMO-SEEK StoryData management, data sharing: the SysMO-SEEK Story
Data management, data sharing: the SysMO-SEEK StoryCarole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational WorkflowsCarole Goble
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble
 
Building the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of ScientistsBuilding the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of ScientistsCarole Goble
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryCarole Goble
 
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...Carole Goble
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceCarole Goble
 
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...Carole Goble
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Carole Goble
 
Open Access: Open Access Looking for ways to increase the reach and impact of...
Open Access: Open Access Looking for ways to increase the reach and impact of...Open Access: Open Access Looking for ways to increase the reach and impact of...
Open Access: Open Access Looking for ways to increase the reach and impact of...librarianrafia
 

What's hot (20)

FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
 
FAIR History and the Future
FAIR History and the FutureFAIR History and the Future
FAIR History and the Future
 
Open Science: how to serve the needs of the researcher?
Open Science: how to serve the needs of the researcher? Open Science: how to serve the needs of the researcher?
Open Science: how to serve the needs of the researcher?
 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data Management
 
Data management, data sharing: the SysMO-SEEK Story
Data management, data sharing: the SysMO-SEEK StoryData management, data sharing: the SysMO-SEEK Story
Data management, data sharing: the SysMO-SEEK Story
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
 
Building the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of ScientistsBuilding the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of Scientists
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow Collaboratory
 
DCC Keynote 2007
DCC Keynote 2007DCC Keynote 2007
DCC Keynote 2007
 
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
 
FAIRer Research
FAIRer ResearchFAIRer Research
FAIRer Research
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data Science
 
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
 
Open Access: Open Access Looking for ways to increase the reach and impact of...
Open Access: Open Access Looking for ways to increase the reach and impact of...Open Access: Open Access Looking for ways to increase the reach and impact of...
Open Access: Open Access Looking for ways to increase the reach and impact of...
 

Viewers also liked

What is the reproducibility crisis in science and what can we do about it?
What is the reproducibility crisis in science and what can we do about it?What is the reproducibility crisis in science and what can we do about it?
What is the reproducibility crisis in science and what can we do about it?Dorothy Bishop
 
Repeat after me: Is our research reproducible (enough)?
Repeat after me: Is our research reproducible (enough)? Repeat after me: Is our research reproducible (enough)?
Repeat after me: Is our research reproducible (enough)? Lex Nederbragt
 
Neoliberalismo y perdida de la soberanía alimentaria: cuando el maná se convi...
Neoliberalismo y perdida de la soberanía alimentaria: cuando el maná se convi...Neoliberalismo y perdida de la soberanía alimentaria: cuando el maná se convi...
Neoliberalismo y perdida de la soberanía alimentaria: cuando el maná se convi...FLOR KAREN ARIANA GONZALES ZUÑIGA
 
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...GigaScience, BGI Hong Kong
 
10 Recommendations from the Reproducibility Crisis in Psychological Science
10 Recommendations from the Reproducibility Crisis in Psychological Science10 Recommendations from the Reproducibility Crisis in Psychological Science
10 Recommendations from the Reproducibility Crisis in Psychological ScienceJimGrange
 
The vJUG talk about jOOQ: Get Back in Control of Your SQL
The vJUG talk about jOOQ: Get Back in Control of Your SQLThe vJUG talk about jOOQ: Get Back in Control of Your SQL
The vJUG talk about jOOQ: Get Back in Control of Your SQLLukas Eder
 
Rubrica de Clase Semestre 2015 - I, UNCP
Rubrica de Clase Semestre 2015 - I, UNCPRubrica de Clase Semestre 2015 - I, UNCP
Rubrica de Clase Semestre 2015 - I, UNCPGusstock Concha Flores
 
DESIGN OF SUBSURFACE DRAINAGE SYSTEM
DESIGN OF SUBSURFACE DRAINAGE SYSTEMDESIGN OF SUBSURFACE DRAINAGE SYSTEM
DESIGN OF SUBSURFACE DRAINAGE SYSTEMNamitha M R
 

Viewers also liked (10)

What is the reproducibility crisis in science and what can we do about it?
What is the reproducibility crisis in science and what can we do about it?What is the reproducibility crisis in science and what can we do about it?
What is the reproducibility crisis in science and what can we do about it?
 
Open and Flexible Studies
Open and Flexible StudiesOpen and Flexible Studies
Open and Flexible Studies
 
Repeat after me: Is our research reproducible (enough)?
Repeat after me: Is our research reproducible (enough)? Repeat after me: Is our research reproducible (enough)?
Repeat after me: Is our research reproducible (enough)?
 
Neoliberalismo y perdida de la soberanía alimentaria: cuando el maná se convi...
Neoliberalismo y perdida de la soberanía alimentaria: cuando el maná se convi...Neoliberalismo y perdida de la soberanía alimentaria: cuando el maná se convi...
Neoliberalismo y perdida de la soberanía alimentaria: cuando el maná se convi...
 
How can we rise in the world again
How can we rise in the world againHow can we rise in the world again
How can we rise in the world again
 
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...
 
10 Recommendations from the Reproducibility Crisis in Psychological Science
10 Recommendations from the Reproducibility Crisis in Psychological Science10 Recommendations from the Reproducibility Crisis in Psychological Science
10 Recommendations from the Reproducibility Crisis in Psychological Science
 
The vJUG talk about jOOQ: Get Back in Control of Your SQL
The vJUG talk about jOOQ: Get Back in Control of Your SQLThe vJUG talk about jOOQ: Get Back in Control of Your SQL
The vJUG talk about jOOQ: Get Back in Control of Your SQL
 
Rubrica de Clase Semestre 2015 - I, UNCP
Rubrica de Clase Semestre 2015 - I, UNCPRubrica de Clase Semestre 2015 - I, UNCP
Rubrica de Clase Semestre 2015 - I, UNCP
 
DESIGN OF SUBSURFACE DRAINAGE SYSTEM
DESIGN OF SUBSURFACE DRAINAGE SYSTEMDESIGN OF SUBSURFACE DRAINAGE SYSTEM
DESIGN OF SUBSURFACE DRAINAGE SYSTEM
 

Similar to Building collaborative workflows for scientific data

AllBio and EU CodeFest 2014
AllBio and EU CodeFest 2014AllBio and EU CodeFest 2014
AllBio and EU CodeFest 2014Bruno Vieira
 
Reproducible, Automated and Portable Computational and Data Science Experimen...
Reproducible, Automated and Portable Computational and Data Science Experimen...Reproducible, Automated and Portable Computational and Data Science Experimen...
Reproducible, Automated and Portable Computational and Data Science Experimen...Ivo Jimenez
 
Getting started with developing Nodejs
Getting started with developing NodejsGetting started with developing Nodejs
Getting started with developing NodejsPhil Hawksworth
 
SD, a P2P bug tracking system
SD, a P2P bug tracking systemSD, a P2P bug tracking system
SD, a P2P bug tracking systemJesse Vincent
 
4Developers 2015: Continuous Security in DevOps - Maciej Lasyk
4Developers 2015: Continuous Security in DevOps - Maciej Lasyk4Developers 2015: Continuous Security in DevOps - Maciej Lasyk
4Developers 2015: Continuous Security in DevOps - Maciej LasykPROIDEA
 
Continuous Security in DevOps
Continuous Security in DevOpsContinuous Security in DevOps
Continuous Security in DevOpsMaciej Lasyk
 
Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Genomic Computation at Scale with Serverless, StackStorm and Docker SwarmGenomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Genomic Computation at Scale with Serverless, StackStorm and Docker SwarmDmitri Zimine
 
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCONMicroservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCONAdrian Cockcroft
 
Scaling up development of a modular code base
Scaling up development of a modular code baseScaling up development of a modular code base
Scaling up development of a modular code baseRobert Munteanu
 
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"Daniel Bryant
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeperSaurav Haloi
 
2017-07-22 Common Workflow Language Viewer
2017-07-22 Common Workflow Language Viewer2017-07-22 Common Workflow Language Viewer
2017-07-22 Common Workflow Language ViewerStian Soiland-Reyes
 
Through the firewall with miniCRAN
Through the firewall with miniCRANThrough the firewall with miniCRAN
Through the firewall with miniCRANRevolution Analytics
 
Continuous delivery w projekcie open source - Marcin Stachniuk
Continuous delivery w projekcie open source - Marcin StachniukContinuous delivery w projekcie open source - Marcin Stachniuk
Continuous delivery w projekcie open source - Marcin StachniukMarcinStachniuk
 
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...Keiichiro Ono
 
Recent Developments in Free Medical Imaging Software
Recent Developments in Free Medical Imaging SoftwareRecent Developments in Free Medical Imaging Software
Recent Developments in Free Medical Imaging SoftwareAndrew Crabb
 
Practical Chaos Engineering
Practical Chaos EngineeringPractical Chaos Engineering
Practical Chaos EngineeringSIGHUP
 
What's Next Replay - SpringSource
What's Next Replay - SpringSourceWhat's Next Replay - SpringSource
What's Next Replay - SpringSourceZenikaOuest
 
Linux Desktop Automation
Linux Desktop AutomationLinux Desktop Automation
Linux Desktop AutomationRui Lapa
 

Similar to Building collaborative workflows for scientific data (20)

AllBio and EU CodeFest 2014
AllBio and EU CodeFest 2014AllBio and EU CodeFest 2014
AllBio and EU CodeFest 2014
 
Reproducible, Automated and Portable Computational and Data Science Experimen...
Reproducible, Automated and Portable Computational and Data Science Experimen...Reproducible, Automated and Portable Computational and Data Science Experimen...
Reproducible, Automated and Portable Computational and Data Science Experimen...
 
Getting started with developing Nodejs
Getting started with developing NodejsGetting started with developing Nodejs
Getting started with developing Nodejs
 
SD, a P2P bug tracking system
SD, a P2P bug tracking systemSD, a P2P bug tracking system
SD, a P2P bug tracking system
 
4Developers 2015: Continuous Security in DevOps - Maciej Lasyk
4Developers 2015: Continuous Security in DevOps - Maciej Lasyk4Developers 2015: Continuous Security in DevOps - Maciej Lasyk
4Developers 2015: Continuous Security in DevOps - Maciej Lasyk
 
Continuous Security in DevOps
Continuous Security in DevOpsContinuous Security in DevOps
Continuous Security in DevOps
 
Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Genomic Computation at Scale with Serverless, StackStorm and Docker SwarmGenomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm
 
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCONMicroservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
 
Scaling up development of a modular code base
Scaling up development of a modular code baseScaling up development of a modular code base
Scaling up development of a modular code base
 
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
2017-07-22 Common Workflow Language Viewer
2017-07-22 Common Workflow Language Viewer2017-07-22 Common Workflow Language Viewer
2017-07-22 Common Workflow Language Viewer
 
Through the firewall with miniCRAN
Through the firewall with miniCRANThrough the firewall with miniCRAN
Through the firewall with miniCRAN
 
Continuous delivery w projekcie open source - Marcin Stachniuk
Continuous delivery w projekcie open source - Marcin StachniukContinuous delivery w projekcie open source - Marcin Stachniuk
Continuous delivery w projekcie open source - Marcin Stachniuk
 
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
 
Recent Developments in Free Medical Imaging Software
Recent Developments in Free Medical Imaging SoftwareRecent Developments in Free Medical Imaging Software
Recent Developments in Free Medical Imaging Software
 
HDF-EOS Java Application Programming Interfaces
HDF-EOS Java Application Programming InterfacesHDF-EOS Java Application Programming Interfaces
HDF-EOS Java Application Programming Interfaces
 
Practical Chaos Engineering
Practical Chaos EngineeringPractical Chaos Engineering
Practical Chaos Engineering
 
What's Next Replay - SpringSource
What's Next Replay - SpringSourceWhat's Next Replay - SpringSource
What's Next Replay - SpringSource
 
Linux Desktop Automation
Linux Desktop AutomationLinux Desktop Automation
Linux Desktop Automation
 

Recently uploaded

Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 

Recently uploaded (20)

Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 

Building collaborative workflows for scientific data

  • 2. | Phd Student @ Bioinformatics and Population Genomics Supervisor: Yannick Wurm | Before: Bruno Vieira @bmpvieira @yannick__ © 2014 Bruno Vieira CC-BY 4.0
  • 16. Code
  • 17. The GitHub for Science... is GitHub!
  • 18. Code as a research output
  • 20. Data
  • 21. Dat open source tool for sharing and collaborating on data started august '13, we are grant funded and 100% open source public on freenode dat-data.com #dat gitter.im/datproject/discussions Dat Community Call #1
  • 22.
  • 23. Dat - "git for data" npm install -g dat dat init collect-data | dat import dat listen
  • 24.
  • 25. Dat dat clone dat pull --live dat blobs put mygenome data.fasta dat cat | transform dat cat | docker run -i transform http://eukaryota.dathub.org
  • 26. Dat Planned dat checkout revision dat diff dat branch multi master replication sync to databases registry
  • 27. Data stored locally in leveldb, but can use other backends such as Postgres Redis etc Files stored in blob-stores s3 local-fs bitorrent ftp etc
  • 28. Dat features auto schema generation free REST API all APIs are streaming
  • 33. Bionode open source project for modular and universal bioinformatics started january '14 bionode.io
  • 34. Some problems I faced during my research: Difficulty getting relevant descriptions and datasets from NCBI API using bio* libs For web projects, needed to implement the same functionality on browser and server Difficulty writing scalable, reproducible and complex bioinformatic pipelines
  • 36. Bionode npm install -g bionode bionode ncbi download gff bacteria bionode ncbi download sra arthropoda | bionode sra fastq-dump npm install -g bionode-ncbi bionode-ncbi search assembly formicidae | dat import --json
  • 37. Bionode - list of modules Name Type Status People Data access status production Parser status production Wrangling status production Data access status production Parser status production ncbi fasta seq IM ensembl blast- parser
  • 38. Bionode - list of modules Name Type Status People Documentation status production Documentation status production Documentation status production Documentation status production template JS pipeline Gasket pipeline Dat/Bionode workshop
  • 39. Bionode - list of modules Name Type Status People Wrappers status development Wrappers status development Wrappers status development Parser status development sra bwa sam bbi
  • 40. Bionode - list of modules status request Name Type People Data access Data access Parser Parser Wrappers Wrappers Wrappers ebi semantic vcf gff bowtie sge badryan blast
  • 41. Bionode - list of modules Name Type People Wrappers Wrappers Wrappers Wrappers Wrappers Wrappers vsearch khmer rsem gmap star go badryan
  • 42. Bionode - Why wrappers? Same interface between modules (Streams and NDJSON) Easy installation with NPM Semantic versioning Add tests Abstract complexity / More user friendly
  • 43. Bionode - Why Node.js? Same code client/server side
  • 44. Need to reimplement the same code on browser and server. Solution: JavaScript everywhere -> -> , -> -> Afra bionode-seq GeneValidator seq fasta SequenceServer BioJS collaborating for code reuse Biodalliance converting to bionode
  • 45. Bionode - Why Node.js?
  • 46. Reusable, small and tested modules
  • 47. Benefit from other JS projects Dat BioJS NoFlo
  • 48.
  • 49.
  • 50.
  • 51. Difficulty getting relevant description and datasets from NCBI API using bio* libs Python example: URL for the Achromyrmex assembly? Solution: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG import xml.etree.ElementTree as ET from Bio import Entrez Entrez.email = "mail@bmpvieira.com" esearch_handle = Entrez.esearch(db="assembly", term="Achromyrmex") esearch_record = Entrez.read(esearch_handle) for id in esearch_record['IdList']: esummary_handle = Entrez.esummary(db="assembly", id=id) esummary_record = Entrez.read(esummary_handle) documentSummarySet = esummary_record['DocumentSummarySet'] document = documentSummarySet['DocumentSummary'][0] metadata_XML = document['Meta'].encode('utf-8') metadata = ET.fromstring('' + metadata_XML + '') for entry in Metadata[1]: print entry.text bionode-ncbi
  • 52. Difficulty getting relevant description and datasets from NCBI API using bio* libs Example: URL for the Achromyrmex assembly? JavaScript http://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000204515.1_Aech_3.9/GCA_00020 4515.1_Aech_3.9_genomic.fna.gz var bio = require('bionode') bio.ncbi.urls('assembly', 'Acromyrmex', function(urls) { console.log(urls[0].genomic.fna) }) bio.ncbi.urls('assembly', 'Acromyrmex').on('data', printGenomeURL) function printGenomeURL(urls) { console.log(urls[0].genomic.fna) })
  • 53. Difficulty getting relevant description and datasets from NCBI API using bio* libs Example: URL for the Achromyrmex assembly? JavaScript BASH http://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000204515.1_Aech_3.9/GCA_00020 4515.1_Aech_3.9_genomic.fna.gz var ncbi = require('bionode-ncbi') var ndjson = require('ndjson') ncbi.urls('assembly', 'Acromyrmex') .pipe(ndjson.stringify()) .pipe(process.stdout) bionode-ncbi urls assembly Acromyrmex | tool-stream extractProperty genomic.fna
  • 54. Difficulty writing scalable, reproducible and complex bioinformatic pipelines. Solution: Node.js Streams everywhere var ncbi = require('bionode-ncbi') var tool = require('tool-stream') var through = require('through2') var fork1 = through.obj() var fork2 = through.obj()
  • 55. Difficulty writing scalable, reproducible and complex bioinformatic pipelines. Solution: Node.js Streams everywhere ncbi .search('sra', 'Solenopsis invicta') .pipe(fork1) .pipe(dat.reads) fork1 .pipe(tool.extractProperty('expxml.Biosample.id')) .pipe(ncbi.search('biosample')) .pipe(dat.samples) fork1 .pipe(tool.extractProperty('uid')) .pipe(ncbi.link('sra', 'pubmed')) .pipe(ncbi.search('pubmed')) .pipe(fork2) .pipe(dat.papers)
  • 56.
  • 57.
  • 58. Difficulty writing scalable, reproducible and complex bioinformatic pipelines. bionode-ncbi search genome Guillardia theta | tool-stream extractProperty assemblyid | bionode-ncbi download assembly | tool-stream collectMatch status completed | tool-stream extractProperty uid| bionode-ncbi link assembly bioproject | tool-stream extractProperty destUID | bionode-ncbi link bioproject sra | tool-stream extractProperty destUID | bionode-ncbi download sra | bionode-sra fastq-dump | tool-stream extractProperty destFile | bionode-bwa mem 503988/GCA_000315625.1_Guith1_genomic.fna.gz | tool-stream collectMatch status finished| tool-stream extractProperty sam| bionode-sam
  • 59. Difficulty writing scalable, reproducible and complex bioinformatic pipelines. bionode-example-dat-gasket get-dat workshop get-dat bionode gasket example
  • 60. Difficulty writing scalable, reproducible and complex bioinformatic pipelines. { "import-data": [ "bionode-ncbi search genome eukaryota", "dat import --json --primary=uid" ], "search-ncbi": [ "dat cat", "grep Guillardia", "tool-stream extractProperty assemblyid", "bionode-ncbi download assembly -", "tool-stream collectMatch status completed", "tool-stream extractProperty uid", "bionode-ncbi link assembly bioproject -", "tool-stream extractProperty destUID", "bionode-ncbi link bioproject sra -", "tool-stream extractProperty destUID", "grep 35526", "bionode-ncbi download sra -", "tool-stream collectMatch status completed", "tee > metadata.json" ],
  • 61. Difficulty writing scalable, reproducible and complex bioinformatic pipelines. "index-and-align": [ "cat metadata.json", "bionode-sra fastq-dump -", "tool-stream extractProperty destFile", "bionode-bwa mem **/*fna.gz" ], "convert-to-bam": [ "bionode-sam 35526/SRR070675.sam" ] }
  • 62. Difficulty writing scalable, reproducible and complex bioinformatic pipelines. datscript pipeline main run pipeline import pipeline import run foobar | run dat import --json bmpvieira example ekg example
  • 66. - Modular and universal bioinformatics Pipeable UNIX command line tools and JavaScript / Node.js APIs for bioinformatic analysis workflows on the server and browser. - Build data pipelines Provides a streaming interface between every file format and data storage backend. "git for data" Bionode.io #bionode gitter.im/bionode/bionode Dat-data.com #dat gitter.im/datproject/discussions
  • 68. Thanks! "Science should work as an Open Source project" dat-data.com bionode.io