Successfully reported this slideshow.
Your SlideShare is downloading. ×

Biothings APIs: high-performance bioentity-centric web services

Biothings APIs: high-performance bioentity-centric web services

Download to read offline

High performance web service API for gene and genetic variant annotations: MyGene.info and MyVariant.info, And a SDK for building same high-performance API for other biomedical data types ("biothings")

High performance web service API for gene and genetic variant annotations: MyGene.info and MyVariant.info, And a SDK for building same high-performance API for other biomedical data types ("biothings")

More Related Content

Related Books

Free with a 30 day trial from Scribd

See all

Biothings APIs: high-performance bioentity-centric web services

  1. 1. Chunlei Wu, Ph.D. cwu@scripps.edu @chunleiwu Associate Professor of Molecular Medicine Dept. of Molecular Experimental Medicine The Scripps Research Institute La Jolla, CA, USA 07/2016 Biothings APIs: high-performance bioentity-centric web services
  2. 2. BioThings APIs Objective: Building unified APIs for “Bio-things” (biological entities)
  3. 3. Biological knowledge is a complex network No one-fit-all database can capture the entire knowledge space
  4. 4. Typical database representations { _id: 1017, name: CDK2, taxid: 9606 } Relational database Document database RDF triplestore Tables JSON objects Triples Key-value store Key-value pairs
  5. 5. BioThings APIs are built on document databases Why we picked document databases: • Object representation • Rich data structures, handles heterogeneous data very well • Atomic operations, built for big-data scale
  6. 6. Gene and Variant annotations represented in JSON documents { "_id": "chr1:g.196659237C>T", "cosmic": { "chrom": "1", "hg19": { "start": 196659237, "end": 196659237 }, "ref": "C", "alt": "T", "tumor_site": "breast", "mut_freq": 0.49, "mut_nt": "C>T", "cosmic_id": "COSM424915" } { “_id”: “1017”, “Symbol”: “CDK2”, “Ensembl”: “ENSG00000123374”, “RefSeq”: [ “NM_001798”, “NM_052827” ], “Reporter”: { “U95A”: [ “1792_g_at”, “1833_at” ], “U133A”:[ “211804_s_at”, “2045252_at”, “211803_at” ] } }
  7. 7. Keep data always up-to-date Each data source is updated individually. Colors indicate their different updating schedules. Schematic view of MyVariant.info architecture
  8. 8. High-performance web service APIs Schematic view of MyVariant.info architecture
  9. 9. MyGene.info + MyVariant.info Gene G Variant V MyVariant.infoMyGene.info /v2/gene/<geneid> /v2/query?q=<query> /v1/variant/<hgvsid> /v1/query?q=<query> /v3/gene/<geneid> /v3/query?q=<query> single query on GET, batch query on POST
  10. 10. We focus on building APIs. Try to …
  11. 11. Make it really easy to use Just two endpoints No registration/sign-in No API key
  12. 12. Developer-friendly Python/R clients (also js client for myvariant) search “mygene” and “myvariant” in PyPI and Bioconductor JSONP CORS https msgpack http compression http caching JSON-LD Supported!
  13. 13. Aggregate Everything about gene and variant MyVariant.infoMyGene.info Support >15M genes for ~17K species ~ 200 annotation fields Support > 334 M variants ~ 500 annotation fields from 14 sources: ClinVar dbNSFP dbSNP …
  14. 14. Keep up-to-date MyVariant.infoMyGene.info Weekly Monthly Support >15M genes for ~17K species ~ 200 annotation fields Support > 334 M variants ~ 500 annotation fields from 14 sources: ClinVar dbNSFP dbSNP …
  15. 15. High-performance and scalable >95% queries response < 30ms
  16. 16. High-performance and scalable
  17. 17. High availability 99.999% over last year MyVariant.infoMyGene.info 99.87% over last 6 months Availability tracked by
  18. 18. Who is using MinePath.org Gene Wiki JBrowse Live applications:
  19. 19. Who is using Many users use them in their daily analysis pipelines or simply caching annotations locally
  20. 20. MyGene.info recent usage stats requests unique IPs Jan-16 3,885,192 2,498 Feb-16 5,313,950 2,786 Mar-16 3,362,354 3,121 Apr-16 10,918,104 3,065 May-16 10,776,858 3,803 Jun-16 6,396,148 3,940 39% direct calls 38% mygene.py 14% mygene.R 9% BioGPS Over 40M requests In six months
  21. 21. MyVariant.info recent usage stats requests unique IPs Jan-16 83,519 1,330 Feb-16 3,054,191 1,192 Mar-16 272,424 1,771 Apr-16 701,526 1,500 May-16 89,642 1,891 Jun-16 213,767 1,924 21% direct calls 23% myvariant.py 50% myvariant.R 6% myvariant.js ~4.5M requests In six months
  22. 22. Generalized BioThings SDK BioThings SDK MyVariant.info MyGene.info JSON data aggregation mechanism High- performance query engine Well-designed REST API pattern JSON-LD enabled Linked Data Data-updating scheduler Python/R clients …
  23. 23. BioThings SDK A tutorial here (more docs are coming): http://biothingsapi.readthedocs.io/en/latest/
  24. 24. v.biothings.io g.biothings.io BioThings SDK gene variant s.biothings.io species/ taxonomy alias to MyGene.info alias to MyVariant.info
  25. 25. BioThings API for species/taxonomy { "_id": "9606", "_version": 1, "authority": [ "homo sapiens linnaeus, 1758" ], "children": [ 63221, 741158], "common_name": "man", "genbank_common_name": "human", "has_gene": true, "lineage": [ 9606, 9605, 207598, …,131567, 1], "parent_taxid": 9605, "rank": "species", "scientific_name": "homo sapiens", "taxid": 9606, "uniprot_name": "homo sapiens" } http://s.biothings.io/v1/species/9606?include_children=true
  26. 26. BioThings API for species/taxonomy { "hits": [ { "_id": "1239", "_score": 10.971453, "common_name": […], "genbank_common_name": "gram-positive bacteria", "has_gene": false, "lineage": [1239, 1783272, 2, 131567, 1], "parent_taxid": 1783272, "rank": "phylum", "scientific_name": "firmicutes", "taxid": 1239, "uniprot_name": "firmicutes" } ], "max_score": 10.971453, "took": 12, "total": 1 } http://s.biothings.io/v1/query?q=rank:phylum AND common_name:gram-positive
  27. 27. Species API used in MyGene.info You can now query for genes beyond species: Q: Give me all lytic enzymes for any firmicutes http://mygene.info/v2/query?q=lytic enzyme&species=1239&include_tax_tree=true http://mygene.info/v2/query?q=lytic enzyme&species=1239 0 hits 5 hits
  28. 28. Very minimal code for building a species API
  29. 29. Have the flexibility to customize your query
  30. 30. v.biothings.io g.biothings.io BioThings SDK s.biothings.io c.biothings.io gene variant species/ taxonomy drugs/ compounds ∙ ∙ ∙ ∙ ∙ ∙ alias to MyGene.info alias to MyVariant.info diseased.biothings.io
  31. 31. BioThings APIs A collection of data APIs A framework for building new APIs Data as a service Software as a service Got a new type of “BioThings”? We can help you to build or even host your biothings API
  32. 32. BioThings TEAM Funding and Support U01HG008473 U54GM114833 TSRI: Chunlei Wu Andrew Su Jiwen Xin Cyrus Afrasiabi Sebastien Lelong Ginger Tsueng Julee Adesara Mike Mayers U. Washington: Sean Mooney Moritz Juchler Nikhil Gopal More @ISMB: Talk: TT13 Poster: G09
  33. 33. Source code • MyGene.info https://github.com/sulab/mygene.info • MyVariant.info https://github.com/sulab/myvariant.info • BioThings API for species/taxonomy https://github.com/sulab/biothings.species • BioThings SDK https://github.com/sulab/biothings.api

Editor's Notes


  • A high-performance query engine for aggregated variant annotations.

×