Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

bionode.io

125 views

Published on

Bioinformatics London Jan 2016

Published in: Science
  • Be the first to comment

bionode.io

  1. 1. BIONODE.IO Modular and universal bioinformatics bmpvieira.com/ BioinformLon16
  2. 2. bionode.io • Streamable UNIX command line tools • JavaScript / Node.js API • Bioinformatic on the server and browser
 CommunityCore team “Modular and universal bioinformatics” dat-data.com biojs.net Collaborations http://try.bionode.io
  3. 3. Richard Nichols      Conrad Bessant       |      Phd Student @ Bioinformatics and Population Genomics Eusociality and effective population size in insects and rodents Supervisor: Yannick Wurm |      Panel: Bruno Vieira @bmpvieira @yannick__  wurmlab.github.io  @qmwugbt112  @conradbessant 
  4. 4. Some problems I faced during my research:
  5. 5. Some problems I faced during my research: Rewrite code in JavaScript for Web Apps
  6. 6. Some problems I faced during my research: Rewrite code in JavaScript for Web Apps Difficulty to get data from NCBI API
  7. 7. Some problems I faced during my research: Rewrite code in JavaScript for Web Apps Difficulty to get data from NCBI API Difficulty writing bioinformatic pipelines
  8. 8. Reusable, small & tested
  9. 9. Bionode - Why Node.js? Same code client/server side
  10. 10. Difficulty getting relevant description and datasets from NCBI API using bio* libs
  11. 11. Difficulty getting relevant description and datasets from NCBI API using bio* libs Python example: URL for the Acromyrmex assembly? ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG
  12. 12. Difficulty getting relevant description and datasets from NCBI API using bio* libs Python example: URL for the Acromyrmex assembly? ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG import xml.etree.ElementTree as ET
  13. 13. Difficulty getting relevant description and datasets from NCBI API using bio* libs Python example: URL for the Acromyrmex assembly? ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG import xml.etree.ElementTree as ET from Bio import Entrez
  14. 14. Difficulty getting relevant description and datasets from NCBI API using bio* libs Python example: URL for the Acromyrmex assembly? ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG import xml.etree.ElementTree as ET from Bio import Entrez Entrez.email = "mail@bmpvieira.com"
  15. 15. Difficulty getting relevant description and datasets from NCBI API using bio* libs Python example: URL for the Acromyrmex assembly? ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG import xml.etree.ElementTree as ET from Bio import Entrez Entrez.email = "mail@bmpvieira.com" esearch_handle = Entrez.esearch(db="assembly", term="Acromyrmex")
  16. 16. Difficulty getting relevant description and datasets from NCBI API using bio* libs Python example: URL for the Acromyrmex assembly? ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG import xml.etree.ElementTree as ET from Bio import Entrez Entrez.email = "mail@bmpvieira.com" esearch_handle = Entrez.esearch(db="assembly", term="Acromyrmex") esearch_record = Entrez.read(esearch_handle)
  17. 17. Difficulty getting relevant description and datasets from NCBI API using bio* libs Python example: URL for the Acromyrmex assembly? ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG import xml.etree.ElementTree as ET from Bio import Entrez Entrez.email = "mail@bmpvieira.com" esearch_handle = Entrez.esearch(db="assembly", term="Acromyrmex") esearch_record = Entrez.read(esearch_handle) for id in esearch_record['IdList']:
  18. 18. Difficulty getting relevant description and datasets from NCBI API using bio* libs Python example: URL for the Acromyrmex assembly? ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG import xml.etree.ElementTree as ET from Bio import Entrez Entrez.email = "mail@bmpvieira.com" esearch_handle = Entrez.esearch(db="assembly", term="Acromyrmex") esearch_record = Entrez.read(esearch_handle) for id in esearch_record['IdList']: esummary_handle = Entrez.esummary(db="assembly", id=id)
  19. 19. Difficulty getting relevant description and datasets from NCBI API using bio* libs Python example: URL for the Acromyrmex assembly? ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG import xml.etree.ElementTree as ET from Bio import Entrez Entrez.email = "mail@bmpvieira.com" esearch_handle = Entrez.esearch(db="assembly", term="Acromyrmex") esearch_record = Entrez.read(esearch_handle) for id in esearch_record['IdList']: esummary_handle = Entrez.esummary(db="assembly", id=id) esummary_record = Entrez.read(esummary_handle)
  20. 20. Difficulty getting relevant description and datasets from NCBI API using bio* libs Python example: URL for the Acromyrmex assembly? ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG import xml.etree.ElementTree as ET from Bio import Entrez Entrez.email = "mail@bmpvieira.com" esearch_handle = Entrez.esearch(db="assembly", term="Acromyrmex") esearch_record = Entrez.read(esearch_handle) for id in esearch_record['IdList']: esummary_handle = Entrez.esummary(db="assembly", id=id) esummary_record = Entrez.read(esummary_handle) documentSummarySet = esummary_record['DocumentSummarySet']
  21. 21. Difficulty getting relevant description and datasets from NCBI API using bio* libs Python example: URL for the Acromyrmex assembly? ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG import xml.etree.ElementTree as ET from Bio import Entrez Entrez.email = "mail@bmpvieira.com" esearch_handle = Entrez.esearch(db="assembly", term="Acromyrmex") esearch_record = Entrez.read(esearch_handle) for id in esearch_record['IdList']: esummary_handle = Entrez.esummary(db="assembly", id=id) esummary_record = Entrez.read(esummary_handle) documentSummarySet = esummary_record['DocumentSummarySet'] document = documentSummarySet['DocumentSummary'][0]
  22. 22. Difficulty getting relevant description and datasets from NCBI API using bio* libs Python example: URL for the Acromyrmex assembly? ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG import xml.etree.ElementTree as ET from Bio import Entrez Entrez.email = "mail@bmpvieira.com" esearch_handle = Entrez.esearch(db="assembly", term="Acromyrmex") esearch_record = Entrez.read(esearch_handle) for id in esearch_record['IdList']: esummary_handle = Entrez.esummary(db="assembly", id=id) esummary_record = Entrez.read(esummary_handle) documentSummarySet = esummary_record['DocumentSummarySet'] document = documentSummarySet['DocumentSummary'][0] metadata_XML = document['Meta'].encode('utf-8')
  23. 23. Difficulty getting relevant description and datasets from NCBI API using bio* libs Python example: URL for the Acromyrmex assembly? ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG import xml.etree.ElementTree as ET from Bio import Entrez Entrez.email = "mail@bmpvieira.com" esearch_handle = Entrez.esearch(db="assembly", term="Acromyrmex") esearch_record = Entrez.read(esearch_handle) for id in esearch_record['IdList']: esummary_handle = Entrez.esummary(db="assembly", id=id) esummary_record = Entrez.read(esummary_handle) documentSummarySet = esummary_record['DocumentSummarySet'] document = documentSummarySet['DocumentSummary'][0] metadata_XML = document['Meta'].encode('utf-8') metadata = ET.fromstring('' + metadata_XML + '')
  24. 24. Difficulty getting relevant description and datasets from NCBI API using bio* libs Python example: URL for the Acromyrmex assembly? ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG import xml.etree.ElementTree as ET from Bio import Entrez Entrez.email = "mail@bmpvieira.com" esearch_handle = Entrez.esearch(db="assembly", term="Acromyrmex") esearch_record = Entrez.read(esearch_handle) for id in esearch_record['IdList']: esummary_handle = Entrez.esummary(db="assembly", id=id) esummary_record = Entrez.read(esummary_handle) documentSummarySet = esummary_record['DocumentSummarySet'] document = documentSummarySet['DocumentSummary'][0] metadata_XML = document['Meta'].encode('utf-8') metadata = ET.fromstring('' + metadata_XML + '') for entry in Metadata[1]:
  25. 25. Difficulty getting relevant description and datasets from NCBI API using bio* libs Python example: URL for the Acromyrmex assembly? ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG import xml.etree.ElementTree as ET from Bio import Entrez Entrez.email = "mail@bmpvieira.com" esearch_handle = Entrez.esearch(db="assembly", term="Acromyrmex") esearch_record = Entrez.read(esearch_handle) for id in esearch_record['IdList']: esummary_handle = Entrez.esummary(db="assembly", id=id) esummary_record = Entrez.read(esummary_handle) documentSummarySet = esummary_record['DocumentSummarySet'] document = documentSummarySet['DocumentSummary'][0] metadata_XML = document['Meta'].encode('utf-8') metadata = ET.fromstring('' + metadata_XML + '') for entry in Metadata[1]: print entry.text
  26. 26. Difficulty getting relevant description and datasets from NCBI API using bio* libs Python example: URL for the Acromyrmex assembly? Solution:  ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG import xml.etree.ElementTree as ET from Bio import Entrez Entrez.email = "mail@bmpvieira.com" esearch_handle = Entrez.esearch(db="assembly", term="Acromyrmex") esearch_record = Entrez.read(esearch_handle) for id in esearch_record['IdList']: esummary_handle = Entrez.esummary(db="assembly", id=id) esummary_record = Entrez.read(esummary_handle) documentSummarySet = esummary_record['DocumentSummarySet'] document = documentSummarySet['DocumentSummary'][0] metadata_XML = document['Meta'].encode('utf-8') metadata = ET.fromstring('' + metadata_XML + '') for entry in Metadata[1]: print entry.text bionode-ncbi
  27. 27. Better way with Bionode - 4 approaches
  28. 28. Better way with Bionode - 4 approaches BASH bionode ncbi urls assembly Acromyrmex | json genomic.fna
  29. 29. Better way with Bionode - 4 approaches BASH JavaScript bionode ncbi urls assembly Acromyrmex | json genomic.fna ar bio re uire bionode
  30. 30. Better way with Bionode - 4 approaches BASH JavaScript bionode ncbi urls assembly Acromyrmex | json genomic.fna ar bio re uire bionode allbac a ern bio.ncbi.urls assembly Acromyrmex func ion urls console.log urls .genomic.fna
  31. 31. Better way with Bionode - 4 approaches BASH JavaScript bionode ncbi urls assembly Acromyrmex | json genomic.fna ar bio re uire bionode allbac a ern bio.ncbi.urls assembly Acromyrmex func ion urls console.log urls .genomic.fna en a ern bio.ncbi.urls assembly Acromyrmex .on da a rin enome func ion rin enome url console.log url.genomic.fna
  32. 32. Better way with Bionode - 4 approaches BASH JavaScript bionode ncbi urls assembly Acromyrmex | json genomic.fna ar bio re uire bionode allbac a ern bio.ncbi.urls assembly Acromyrmex func ion urls console.log urls .genomic.fna en a ern bio.ncbi.urls assembly Acromyrmex .on da a rin enome func ion rin enome url console.log url.genomic.fna i e a ern ar ool re uire ool s ream bio.ncbi.urls assembly Acromyrmex . i e ool.ex rac ro er y genomic.fna . i e rocess.s dou
  33. 33. Complex pipelines with forks ncbi .search('sra', 'Solenopsis invicta') .pipe(fork1) .pipe(dat.reads) fork1 .pipe(tool.extractProperty('expxml.Biosample.id')) .pipe(ncbi.search('biosample')) .pipe(dat.samples) fork1 .pipe(tool.extractProperty('uid')) .pipe(ncbi.link('sra', 'pubmed')) .pipe(ncbi.search('pubmed')) .pipe(fork2) .pipe(dat.papers)
  34. 34. noflo
  35. 35. http://substack.net/doc/nodeconf_2012/images/streams.png ! OOP Bionode BioJS "Modular and universal bioinformatics" "Represent biological data on the web"
  36. 36. Install Node.js and Bionode Install bionode and json parser Online   # Mac brew install n n stable # Ubuntu sudo apt-get install npm npm install -g n n stable # Windows Go to http://nodejs.org npm install -g bionode-ncbi bionode- asta json try.bionode.io bit.ly/try-dat
  37. 37. bionode-ncbi search genome spiders bionode-ncbi search genome spiders | wc bionode-ncbi search genome spiders | head -n 1 | json bionode-ncbi search genome spiders | json -ga organism_name bionode-ncbi search genome spiders | json -ga uid | bionode-ncbi link genome pubmed - | json -ga destUID | bionode-ncbi search pubmed - | json -ga title bionode-ncbi download assembly Guillardia theta | json -ga -c 'this.status === "completed"' | json -ga path | bionode-fasta -f | json -ga -c 'this.seq.length > 10000' | bionode-fasta --write > gtheta-big-scaffolds.fasta
  38. 38. How to write a Stream? var through = require('through2') var stream = through2.obj(transform) function transform (obj, enc, next) { // do things, example: obj.name = obj.name.toUpperCase() // Push downstream this.push(obj) // Callback to fetch next object next() }
  39. 39. Bionode - list of modules Name Type Status People Data access status production         Parser status production Wrangling status production     Data access status production     Parser status production ncbi fasta seq IM ensembl blast- parser
  40. 40. Bionode - list of modules Name Type Status People Documentation status production Documentation status production Documentation status production Documentation status production template JS pipeline Gasket pipeline Dat/Bionode workshop
  41. 41. Bionode - list of modules Name Type Status People Wrappers status development     Wrappers status development Wrappers status development   Parser status development   sra bwa sam bbi
  42. 42. Bionode - list of modules status request Name Type People Data access       Data access   Parser Parser Wrappers Wrappers           Wrappers ebi semantic vcf gff bowtie sge badryan blast
  43. 43. Bionode - list of modules Name Type People Wrappers Wrappers Wrappers Wrappers Wrappers Wrappers vsearch khmer rsem gmap star go badryan
  44. 44. Bionode - Why wrappers? Same interface between modules (Streams and NDJSON) Easy installation with NPM Semantic versioning Add tests Abstract complexity / More user friendly
  45. 45. Biohackathon 2015 (Nagasaki, Japan)
  46. 46. Bionode should do (IMHO):
  47. 47. Bionode should do (IMHO): ,  , scalable Modular WORE Streams
  48. 48. Bionode should do (IMHO): ,  , scalable  Implement new tools Modular WORE Streams
  49. 49. Bionode should do (IMHO): ,  , scalable  Implement new tools Access online data Modular WORE Streams
  50. 50. Bionode should do (IMHO): ,  , scalable  Implement new tools Access online data Parse file formats (mostly browser) Modular WORE Streams
  51. 51. Bionode should do (IMHO): ,  , scalable  Implement new tools Access online data Parse file formats (mostly browser) New algorithms / analysis Modular WORE Streams
  52. 52. Bionode should do (IMHO): ,  , scalable  Implement new tools Access online data Parse file formats (mostly browser) New algorithms / analysis Useful for the web Modular WORE Streams
  53. 53. Bionode should do (IMHO): ,  , scalable  Implement new tools Access online data Parse file formats (mostly browser) New algorithms / analysis Useful for the web Build reactive pipelines Modular WORE Streams
  54. 54. Bionode should do (IMHO): ,  , scalable  Implement new tools Access online data Parse file formats (mostly browser) New algorithms / analysis Useful for the web Build reactive pipelines Take advantage of Node.js / Modular WORE Streams Streams Events
  55. 55. Bionode shouldn't do (IMHO):
  56. 56. Bionode shouldn't do (IMHO): Package management ( ,  )Guix Linuxbrew
  57. 57. Bionode shouldn't do (IMHO): Package management ( ,  ) Environment ( ,  / ,  ) Guix Linuxbrew Docker HyperOS Dat CWL
  58. 58. Bionode shouldn't do (IMHO): Package management ( ,  ) Environment ( ,  / ,  ) Bioinformatic tools standardization and wrapping for  /  I/O ( ,  ) Guix Linuxbrew Docker HyperOS Dat CWL (nd)JSON Protobufs CWL Bioboxes
  59. 59. bmpvieira.com/BioinformLon16 Website bionode.io
 Development waffle.io/bionode/bionode Chat gitter.im/bionode/bionode Wurmlab wurmlab.github.io Thank you for listening!
  60. 60. Acknowledgements BioinformLon meetup Bionode contributors wurmlab.github.io

×