Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

1,105 views

Published on

My slides at HeartBD2K weekly technical conference call.

Published in: Science
  • Be the first to comment

Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

  1. 1. Chunlei Wu, Ph.D. cwu@scripps.edu @chunleiwu Associate Professor of Molecular Medicine Dept. of Molecular Experimental Medicine The Scripps Research Institute La Jolla, CA, USA 01/22/2016 From MyGene.info and MyVariant.info towards BioThings API
  2. 2. As a MyGene.info and MyVariant.info recap Annotations Gene Variant (Aggregated) (high-performance) (real-time) Web Service
  3. 3. So many variant annotation resources dbNSFP The Exome Aggregation Consortium (ExAC)
  4. 4. Annotations centered around bio-entities Gene G Variant V Pathway P D Metabolite M Disease
  5. 5. Simple JSON-based Aggregation mechanism { "_id": "chr1:g.196659237C>T", "cadd": { … }, "clinvar": { … }, "cosmic": { … }, "dbsnp": { … }, "dbnsfp": { … }, "evs": { … }, "emv": { … }, "mutdb": { … }, "gwassnp": { … }, "snpedia": { … }, "wellderly": { … } } { "_id": "chr1:g.196659237C>T", “dbsnp": { "snpclass": "single", "rsid": "rs1061170", "func": "missense" } } { "_id": "chr1:g.196659237C>T", “cosmic": { "tumor_site": "breast", "mut_freq": 0.49, } } { "_id": "chr1:g.196659237C>T", “dbnsfp": { “sift": { "breast“: “tolerated”, “val”: 1 } } } “cadd” “clinvar” “evs” “mutdb” …
  6. 6. Keep data always up-to-date Each data source is updated individually. Colors indicate their different updating schedules. Schematic view of MyVariant.info architecture
  7. 7. High-performance web service APIs Schematic view of MyVariant.info architecture
  8. 8. MyVariant.info for the end users: http://MyVariant.info (currently v1 API, two endpoints) http://MyVariant.info/v1/query?q=<query> any query term(s) matching variant hits http://MyVariant.info/v1/variant/<variantid> hgvs id(s) matching variant object(s) Both supports batch-mode via POST Simple API. No sign-up. No API key. Try our live API , and documentations
  9. 9. MyGene.info for the end users: http://MyGene.info (currently v2 API, two endpoints) http://MyGene.info/v2/query?q=<query> any query term(s) matching gene hits http://MyGene.info/v2/gene/<geneid> gene id(s) matching gene object(s) Both supports batch-mode via POST Simple API. No sign-up. No API key. Try our live API , and documentations
  10. 10. MyGene.info usage updates last year this year 2M 3MMonthly hits in Millions
  11. 11. Usage spikes (5M hits/day) during X-Mas 2014
  12. 12. 30%9% 35% 26% Increased clients adoption Requests by MyGene.info clients Highlights: • mygene Python client usage now surpasses BioGPS usage • mygene R client usage now increased to 9% from <1% 10/07/2015-01/05/2016
  13. 13. 30%9% 35% 26% Increased clients adoption mygene Python client hosted in PyPI mygene R client hosted in Bioconductor
  14. 14. MyVariant.info updates Total over 334 Millions of annotated variants The Exome Aggregation Consortium (ExAC) New additions: dbNSFP Updated:
  15. 15. MyVariant.info updates 30% 68% 2% 10/07/2015-01/05/2016 1 Million requests in 3 months
  16. 16. MyVariant.info official Python/R Clients myvariant Python client hosted in PyPI (initial release in Aug 2015) myvariant R client hosted in Bioconductor (initial release in Oct 2015)
  17. 17. A Node.js client made by a user with passion
  18. 18. Next? MyVariant.info MyGene.info
  19. 19. Make our APIs serve Linked Data via
  20. 20. Why Linked Data? Gene G Variant V Pathway P D Metabolite M Disease
  21. 21. Linked Data for data aggregation MyVariant.info V Another Variant API V V
  22. 22. Linked Data for data aggregation MyVariant.info Another Variant API { "_id": "chr1:g.196659237C>T", “cosmic": { "tumor_site": "breast", "mut_freq": 0.49, }, "clinvar": {…}, "dbsnp": {…}, … } { "pop": "GWD", "nobs": 226, "freq": 0.371681415929, … } { "_id": "chr1:g.196659237C>T", “cosmic": { "tumor_site": "breast", "mut_freq": 0.49, }, "clinvar": {…}, "dbsnp": {…}, "new_src": { "pop": "GWD", "nobs": 226, "freq": 0.371681415929 }, … }
  23. 23. JSON + context = JSON-LD { "@context": { "clinvar": "http://schema.myvariant.info/datasource/clinvar", "rcv": "http://schema.myvariant.info/datanode/rcv", "gene": "http://schema.myvariant.info/datanode/gene", "_id": "@id" }, "_id": "chr6:g.26093141G>A", "clinvar": { "@context": { "uniprot": "http://identifiers.org/uniprot/", "omim": "http://identifiers.org/omim/" }, "chrom": "6", "alt": "A", "ref": "G", "allele_id": 15048, "rsid": "rs1800562", "rcv": { "@context": { "accession": "http://identifer.org/clinvar" }, "accession": "RCV000000020", "origin": "germline", "clinical_significance": "risk factor" }, "gene": { "@context": { "symbol": "http://identifiers.org/hgnc.symbol/" }, "id": "3077", "symbol": "HFE" }, "omim": "613609.0001", "variant_id": 9 } }
  24. 24. Processed JSON-LD <chr6:g.26093141G>A> <http://schema.myvariant.info/datasource/clinvar> _:b0 . _:b0 <http://identifiers.org/omim/> "613609.0001" . _:b0 <http://schema.myvariant.info/datanode/gene> _:b1 . _:b0 <http://schema.myvariant.info/datanode/rcv> _:b2 . _:b1 <http://identifiers.org/hgnc.symbol/> "HFE" . _:b2 <http://identifer.org/clinvar> "RCV000000020" . JSON-LD N-Quads output: { "@id": "chr6:g.26093141G>A", "http://schema.myvariant.info/datasource/clinvar": { "http://identifiers.org/omim/": "613609.0001", "http://schema.myvariant.info/datanode/gene": { "http://identifiers.org/hgnc.symbol/": "HFE" }, "http://schema.myvariant.info/datanode/rcv": { "http://identifer.org/clinvar": "RCV000000020" } } } JSON-LD compacted output:
  25. 25. In a nut-shell, what JSON-LD context does? Marks values in a JSON object to defined URIs "http://identifer.org/clinvar" →clinvar.rcv.accession
  26. 26. JSON-LD context makes your data "Linkable" "Linked" Downstream processing libraries
  27. 27. A Python library for processing JSON-LD data In [1]: fetch_value_source_for_variant("chr6:g.26093141G>A","http://identifiers.org/dbsnp/") Out[1]: ['rs1800562 http://schema.myvarint.info/datasource/dbnsfp', 'rs1800562 http://schema.myvarint.info/datasource/clinvar', 'rs1800562 http://schema.myvarint.info/datasource/dbsnp', 'rs1800562 http://schema.myvarint.info/datasource/evs', 'rs1800562 http://schema.myvarint.info/datasource/gwassnps', 'rs1800562 http://schema.myvarint.info/datasource/mutdb'] By Kevin Xin
  28. 28. Need to define an API specs • Output as a JSON object with a defined _id. • "jsonld=true/false" toggle for the inclusion of JSON-LD context. • Support the retrieval of a single entity via GET (use case: individual data aggregation on the fly) • Support the retrieval of a list of entities via POST (use case: routine data aggregation in batches) • Output should indicate the entity existence: GET /variant/<unknown_id>  404 POST /variant/ id1, <unknown_id>, id3  [id1: {…}, <unknown_id>: "notfound", id3: {…}] to enable data exchange via JSON-LD
  29. 29. BioThings API MyVariant.info MyGene.info By Cyrus Afrasiabi
  30. 30. BioThings API MyVariant.info MyGene.info JSON data aggregation mechanism High- performance query engine Well-designed REST API pattern JSON-LD enabled Linked Data Data-updating scheduler Python/R clients …
  31. 31. Data-sharing via Web API is trending Making a single web service is trivial, but making a sustainable/scalable web API is non-trivial. We would like to help other groups to create their own hosted web API for sharing their data.
  32. 32. Action item 1: BioThings API whitepaper Also the action item from last BD2K CA consortium meeting and the API working group from last year's NIH BD2K AHM
  33. 33. Action item 2: BioThings API framework NIH commons Infrastructure as a Service: Software as a Service: BioThings API
  34. 34. Action item 3: expansion to other "BioThings" D Disease D Drugs MyDrug.info MyDisease.info need an alt. name here
  35. 35. Acknowledgement Funding and Support U54GM114833 U01HG008473 Washtington U: Ben Ainscough Obi Griffith TSRI: Andrew Su Jiwen Xin Cyrus Afrasiabi Ginger Tsueng Adam Mark Greg Stupp Tim Putman STSI: Eric Topol Ali Torkamani Galina Erikson U. Washington: Sean Mooney Moritz Juchler Nikhil Gopal OICR: Robin Haw UC Berkeley: Chris Mungall UCSD: Trish Whetzel MyVariant.info MyGene.info

×