My talk about BioThings API project at ISMB 2018 Chicago, as part of BD2K special session. BioThings API project provides a collection of high-performance APIs (MyGene.info, MyVariant.info, MyChem.info), an SDK for building a new biomedical API (BioThings SDK), and a JSON-LD and OpenAPI based solution for across-API interoperability and knowledge exploration.
2. Biomedical Data API
API – Application Programming Interface
API is a way to abstract the data-access layer.
3. Why bioinformaticians need APIs
It's about
Modularization
photo credits: http://www.edmentum.com/sites/edmentum.com/files/solutions/content/building_0.jpg
http://www.howcsharp.com/img/0/68/dont-repeat-yourself-dry-300x211.jpg
http://blog.capinc.com/wp-content/uploads/2013/02/Recycle_Logo_by_Har1-300x263.png
Reusability DRY principle
4. Biomedical APIs and FAIR matrix
APIs are not quite findable
APIs are naturally accessible
But enterprise-grade Biomedical APIs are still few
Often not interoperable across APIs
APIs serve reusable piece of data
But more can be made reusable in API development
?
?
7. Enterprise-grade API via Simple interface
http://mygene.info/v3/gene/1017
http://mygene.info/v3/gene/1017?fields=symbol,name,pathway,uniprot
http://mygene.info/v3/query?q=CDK2
http://mygene.info/v3/query?q=name:kinase&species=human
http://mygene.info/v3/query?q=name:kinase AND _exists_:pathway
http://mygene.info/v3/query?q=pathway.kegg.name:wnt&fields=entrezgene,symbol,taxid,interpro
Simple to use
Always up-to-date (weekly updated)
Comprehensive (23M genes from 22K species)
High-performance and scalable
High-availability
Python, R, Javascript clients
Developer-friendly (support CORS, gzip, https, msgpack, etc.)
8. A collection of “BioThings APIs”
Aggregates annotations for
93 million drugs/chemicals from 11 resources
I have a list of drug/chemical ids, want to get annotations
about them?
Drug/chemical annotation service:
GET /v1/drug/<drugid>
POST /v1/drug/ (batch mode)
I want to get matching drugs/chemicals with my query
term(s)
Drug/chemical query service:
GET /v1/query/?q= <query>
POST /v1/query/ (batch mode)
http://mygene.info http://myvariant.info http://mychem.info
~8 M requests
~10,000 unique IPs
every month
~1 M requests
1000 unique IPs
every month
Just launched!
Aggregates annotations for
22 million genes from 30 resources
I have a list of gene ids, want to get annotations about
them?
Gene annotation service:
GET /v3/gene/<geneid>
POST /v3/gene/ (batch mode)
I want to get matching genes with my query term(s)
Gene query service:
GET /v3/query/?q= <query>
POST /v3/query/ (batch mode)
Aggregates annotations for
874 million variants from 19 resources
I have a list of variant ids, want to get annotations about
them?
Variant annotation service:
GET /v1/variant/<hgvsid>
POST /v1/variant/ (batch mode)
I want to get matching variants with my query term(s)
Variant query service:
GET /v1/query/?q= <query>
POST /v1/query/ (batch mode)
9. Who is using BioThings API
Many users use our APIs in their daily analysis pipelines or simply caching annotations locally
MinePath.org
Gene Wiki
JBrowse
Allele
Registry
Variant
Curation
Interface
GNOMICS
10. Others can build their own APIs with
src monitor
scheduler
data merger
data indexer
URL pattern
JSONP
CORS
compression
JSON-LD
Tracking
unit tests
cluster setup
data deploy
cluster
scaling
load-balancing
Optional query
customization
Data Hub Web API Cloud
Deployment
data parsers
for individual
resources
MongoDB +
Elasticsearch
Python/Tornado
Amazon
AWS
http://biothings.io
BioThingsSDK
done by Users
abstracted in SDK
11. A collection of high-
performance APIs
An SDK for building
your own APIs
http://T.biothings.io
fast, up-to-date, simple-to-use
JSON data
aggregation
mechanism
High-
performance
query engine
Well-designed
REST API
pattern
JSON-LD
enabled
Linked Data
Data-updating scheduler
Python/R clients
…
Your data source
Your API
Abstraction of API building/deployment
Gene
Variant
Drug/Chemical
Taxonomy
Linked APIs via JSON-
LD & SmartAPI
JSON Object
{...}
Semantic Context
Linked API
Enhancing API interoperability
http://MyDisease.info
Disease
12. API-level data integration for translational research
Electronic
Health
Record
(EHS)
Drugs
Proteins
Pathways
Genes
Variants
MyVariant.info
ClinVar
CiVIC
…
MyGene.info
Ensembl
… Reactome
WikiPathways
…UniProt
…
MyChem.info
Clue.io
DrugBank
…
Pharos
Biolink
Wikidata
NDEx
…
20. A Real-world Translational Questions
From NCATS Translator Hackathon in May 2018
Disease - Gene
Gene - Pathways
Pathways - GeneGene - Chemical
Symptom - Disease
21. BioThings Explorer
To explore the network of “SmartAPIs”
– discover APIs for specific questions
– Automatically trigger API calls to construct a subset of the
knowledge graph
http://biothings.io/explorer/
22. Find APIs can get me from pathways to genes:
Pathways Available APIs Genes
23. Find associated drug compounds to gene LCK:
LCK CHEML3707348
LCK
inhibits
Via DGIDB API
INCHIKEY:KKYYLKPGILUPOA-UHFFFAOYSA-N
UniProt:P06239
equals
Via MyGene API
targets
Via MyChem API
CHEMBL223873
equals Via MyChem API
24. Accessible via BioThings Explorer’s own API
http://biothings.io/explorer/api/v2/directinput2output
?input_prefix=hgnc.symbol
&output_prefix=chembl.compound
&input_value=LCK
&format=translator
View the complete API document:
http://biothings.io/explorer/api/
25. More about BioThings Explorer
Video Tutorial
https://youtu.be/cPUKRsaTlhg
Demos in Jupyter Notebook:
BioThings Explorer Demo
BioThings Explorer Metadata
http://biothings.io/explorer/
26. BioThings as a FAIR API Ecosystem
Accessible
Findable
Interoperable
Reusable
27. Acknowledgement
Scripps Research
Andrew Su (sulab.org)
Cyrus Afrasiabi
Sebastien Lelong
Jiwen (Kevin) Xin
Marco Cano Alvarado
Ginger Tsueng
Byung Ryul Jeon
Greg Taylor
Maastricht Univ.
Michel Dumontier
(dumontierlab.com)
Amrapali Zaveri
Kody Moodley
Trish Whetzel (T2 Labs)
Shima Dastgheib (NuMedii)
Ruben Verborgh (Ghent Univ.)
Paul Avillach (Harvard)
Gabor Korodi (Harvard)
Raymond Terryn (Univ. of Miami)
Kathleen Jagodnik (Mount Sinai)
Pedro Assis (Stanford)
Funding support from
NIH Data Commons
API interoperability working group
Univ. of Washington
Sean Mooney
Vikas R Pejaver
Editor's Notes
Up-to-date and high-performance and high-availability
SmartAPI tries to solve interoperability problem with two established community standards: JSON-LD and OpenAPI specifications.