SlideShare a Scribd company logo
1 of 46
Chunlei Wu, Ph.D.
cwu@scripps.edu
@chunleiwu
https://wulab.io
Associate Professor
Dept. of Integrative Structural and Computational Biology
The Scripps Research Institute
La Jolla, CA, USA
01/16/2019
NCI – CBIIT Speaker Series
Building a FAIR API Ecosystem for Biomedical Knowledge
http://biothings.io
Biomedical Data API
API – Application Programming Interface
API is a way to abstract the data-access layer.
APIs as a reusable data layer
Presentation Layer
Business logic Layer
Data Layer
Application 1
Presentation Layer
Business logic Layer
Data Layer
Application 2
View
Controller
Model
Repetitive data wrangling:
• Parsing dump files
• ID conversion
• Data merging
• Data transformation
• Source monitoring
• Download scheduler
• … …
Presentation Layer
Business logic Layer
Common Data Layer
Application 1
Presentation Layer
Business logic Layer
Data Layer
Application 2
Why bioinformaticians need APIs
It's about
Modularization
photo credits: http://www.edmentum.com/sites/edmentum.com/files/solutions/content/building_0.jpg
http://www.howcsharp.com/img/0/68/dont-repeat-yourself-dry-300x211.jpg
http://blog.capinc.com/wp-content/uploads/2013/02/Recycle_Logo_by_Har1-300x263.png
Reusability DRY principle
Biomedical APIs and FAIR matrix
APIs are not quite findable
APIs are naturally accessible
But enterprise-grade Biomedical APIs are still few
Often not interoperable across APIs
APIs serve reusable piece of data
But more can be made reusable in API development
?
?
Computer science is all about “Abstraction”
“Abstraction” is the simple guiding-principle for informaticians
Reducing
repetitive efforts
Opportunities
for informaticians
An example: abstracting the gene search box
http://biogps.org
MyGene.info API
http://mygene.info
Aggregated Gene annotations represented in JSON documents
{
“_id”: “1017”,
“symbol”: “CDK2”,
“ensembl”: “ENSG00000123374”,
“refseq”: [
“NM_001798”,
“NM_052827”
],
“reporter”: {
“U95A”: [
“1792_g_at”,
“1833_at”
],
“U133A”:[
“211804_s_at”,
“2045252_at”,
“211803_at”
]
}
}
Source merging criteria:
matching NCBI or Ensembl Gene ids
HGNC
MGI
RGD
Refseq
Ensembl
UniProt
UniGene
Homologene
PantherDB
GO
Reactome
Wikipathways
KEGG
PDB
PFAM
Interpro
Prosite
PIR
Pharmgkb
UMLS
Wikipedia
Pharos
…
Gene-centric API via a simple interface
Get gene object(s) via either NCBI/Ensembl gene ids:
http://mygene.info/v3/gene/1017
http://mygene.info/v3/gene/ENSG00000123374
http://mygene.info/v3/gene/1017?fields=symbol,name,pathway,uniprot
Find matching gene objects with any query terms:
http://mygene.info/v3/query?q=CDK2
http://mygene.info/v3/query?q=name:kinase&species=human
http://mygene.info/v3/query?q=name:kinase AND _exists_:pathway
http://mygene.info/v3/query?q=pathway.kegg.name:wnt&fields=entrezgene,symbol,taxid,interpro
Batch queries supported via POST
MyVariant.info API
{
"_id": "chr1:g.196659237C>T",
"cosmic": {
"chrom": "1",
"hg19": {
"start": 196659237,
"end": 196659237
},
"ref": "C",
"alt": "T",
"tumor_site": "breast",
"mut_freq": 0.49,
"mut_nt": "C>T",
"cosmic_id": "COSM424915"
}
{
"_id": "chr1:g.196659237C>T",
"cadd": { … },
"clinvar": { … },
"cosmic": { … },
"dbsnp": { … },
"dbnsfp": { … },
"evs": { … },
"emv": { … },
"mutdb": { … },
"gwassnp": { … },
"snpedia": { … },
"wellderly": { … }
}
Source merging criteria: matching HGVS names
Only genomic-based HGVS names are used (support both hg19 and hg38)
more at: http://docs.myvariant.info/en/latest/doc/data.html#id-field
http://myvariant.info
A real example online
21 sources:
dbSNP
dbNSFP
CADD
UniProt
ClinVar
CIVIC
CGI
DOCM
ExAC
GNOMAD
EMV
EVS
Grasp
SNPEFF
…
MyVariant.info API
Data source license and metadata:
{
"_id": "chr1:g.196659237C>T",
"cadd": {
"_license": “http://bit.ly/2TIuab9”,
…
},
"clinvar": {
"_license": “http://bit.ly/2SQdcI0”,
…
},
" civic": {
"_license": “http://bit.ly/2FqS871”,
…
},
“dbnsfp": {
"_license": “http://bit.ly/2VLnQBz” ,
…
},
…
}
{
"build_date": "2018-12-06T22:15:39.743302",
"build_version": "20181206",
"src": {
"cadd": {
"license_url": "http://cadd.gs.washington.edu/contact",
"license_url_short": "http://bit.ly/2TIuab9",
"stats": {
"cadd": 226932858
},
"url": "http://cadd.gs.washington.edu/home",
"version": "1.3"
},
"civic": {
"licence": "CC0 1.0 Universal",
"license_url": "https://creativecommons.org/publicdomain/zero/1.0/",
"license_url_short": "http://bit.ly/2FqS871",
"stats": {
"civic": 1559
},
"url": "https://civicdb.org",
"version": "201706"
},
…
}}
“_license” urls embedded in every response
Detailed source metadata at
http://myvariant.info/metadata
MyChem.info API for chemicals and drugs
{
"_id": "RRUDCFGSUDOHDG-UHFFFAOYSA-N",
“chebi": {
“id”: “CHEBI:49029”,
“formulae”: “C2H5NO2",
“name”: “N-hydroxyacetimidic acid”,
“smiles”: “CC(O)=NO”,
“xrefs": {
“pubchem": {
“cid”: “1990”,
“sid”: “49693671”
}
}
},
“drugbank”: {…},
“drugcentral”: {…}
}
Source merging criteria: matching InChiKey
more at: http://docs.mychem.info/en/latest/doc/data.html#id-field
11 sources:
AEOLUS
ChEBI
ChEMBL
Drugbank
Drugcentral
GINAS
NDC
PharmGKB
PubChem
UNII
Collectively, we call them “BioThings APIs”
Aggregates annotations for
96 million drugs/chemicals from 11 resources
I have a list of drug/chemical ids, want to get annotations
about them?
Drug/chemical annotation service:
GET /v1/drug/<drugid>
POST /v1/drug/ (batch mode)
I want to get matching drugs/chemicals with my query
term(s)
Drug/chemical query service:
GET /v1/query/?q= <query>
POST /v1/query/ (batch mode)
http://mygene.info http://myvariant.info http://mychem.info
~10 M requests
~20,000 unique IPs
every month
~5 M requests
8000 unique IPs
every month
recently launched!
Aggregates annotations for
25 million genes from 30 resources
I have a list of gene ids, want to get annotations about
them?
Gene annotation service:
GET /v3/gene/<geneid>
POST /v3/gene/ (batch mode)
I want to get matching genes with my query term(s)
Gene query service:
GET /v3/query/?q= <query>
POST /v3/query/ (batch mode)
Aggregates annotations for
874 million variants from 21 resources
I have a list of variant ids, want to get annotations about
them?
Variant annotation service:
GET /v1/variant/<hgvsid>
POST /v1/variant/ (batch mode)
I want to get matching variants with my query term(s)
Variant query service:
GET /v1/query/?q= <query>
POST /v1/query/ (batch mode)
Who is using BioThings API
Many users use our APIs in their daily analysis pipelines or simply caching annotations locally
http://biothings.io/who-is-using
Who is using BioThings API
Baylor College of Med 17,264,902
OHSU 16,442,387
Google LLC 590,305
UNC 480,168
Cincinnati Children 229,686
Université Laval 226,243
UCSD 101,867
Rockefeller University 96,018
Illumina 92,902
Yale Univ 44,587
NY Genome Center 3,502,635
UTexas-Austin 2,785,542
Stanford University 2,607,072
Univ of Colorado 1,325,650
Yale Univ 1,054,124
Vanderbilt Univ 851,375
Univ of Chicago 614,891
Baylor College of Med 550,022
Oregon State Univ 525,350
Univ of Illinois - UC 507,421
Top 10 organizations* and their requests
(01/01/2018-12/31/2018)
* Orgs mapped to the general ISPs were removed
# of requests # of requests
BioThings API usage by numbers
Total requests 130M
Avg. Monthly requests 10.7M
Total Unique IPs 173K
Monthly Unique IPs ~19K
mygene Python client
monthly download
~4470
mygene R client monthly
download
~611
Availability tracked by
UptimeRobot
100%
Based on usage data (01/01/2018-12/31/2018)
Total requests 55M
Average Monthly requests 4.6M
Total Unique IPs 86K
Monthly Unique IPs ~8K
myvariant Python client
monthly download
~3600
myvariant R client monthly
download
~164
Availability tracked by
UptimeRobot
100%
mygene and myvariant Python clients
Open source repositories depending on our python clients
(total 29) (total 11)
https://libraries.io/pypi/mygene https://libraries.io/pypi/myvariant
Build Enterprise-grade Biomedical APIs
 Simple to use
 Always up-to-date (weekly updated)
 Comprehensive
- MyGene.info: 25M genes from 24K species
- MyVariant.info: 874M (700M observed)
- MyChem.info: 96M chemicals/drugs
 High-performance and scalable
 High-availability
 Python, R, JavaScript clients
 Developer-friendly (support CORS, gzip, https, msgpack, etc.)
• “fetch_all” feature for streaming large query results
A collection of high-
performance APIs
http://T.biothings.io
fast, up-to-date, simple-to-use
Gene
Variant
Drug/Chemical
Taxonomy
http://MyDisease.info
Disease
What about other “BioThings”, with our limited bandwidth?
Can we further abstract the process of making APIs?
Help ourselves as well as others to build APIs.
Schematic view of MyVariant.info architecture
Web
module
Hub
module
Individual server node
* Colors indicate the different updating schedules
Others can build their own APIs with
src monitor
scheduler
data merger
data indexer
URL pattern
JSONP
CORS
compression
JSON-LD
Tracking
unit tests
cluster setup
data deploy
cluster
scaling
load-balancing
Optional query
customization
Data Hub Web API Cloud
Deployment
data parsers
for individual
resources
MongoDB +
Elasticsearch
Python/Tornado
Amazon
AWS
http://docs.biothings.io
BioThingsSDK
done by Users
abstracted in SDK
My data file
I will write a
parser
Describe data
schema for
indexing
Setup
Elasticsearch
Index JSON
objects in
Elasticsearch
Ready to
serve
Your BioThings
API is live!
LIVE
Inspector
indexer
In [1]: from biothings.www import BiothingsAPIApp
In [2]: drug_api_app = BiothingsAPIApp(
...: APP_LIST= [(r'/v1/drug/(.+)/?', 'BiothingHandler'),
...: (r'/v1/drug/?$', 'BiothingHandler')],
...: ES_INDEX=‘drug_databuild_20170708', ES_DOC_TYPE=‘drug')
In [3]: drug_api_app.start(port=8002)
INFO:root:Server is running on "0.0.0.0:8002"...
code snippet
user actions
done by SDK
Scenario 1 - I have a data file, and I want to make it an API:
- Turn a data file into a high-quality API
http://docs.biothings.io/en/latest/doc/single_source_tutorial.html
- Unified API clients in Python/R/JS
# Access your live API from the unified Python client:
In [1]: from biothings_client import get_client
In [2]: mydrug = get_client("drug", url="localhost:8002/v1")
In [3]: mydrug.getdrug("DB08571”)
In [4]: mydrug.query("drugbank.name:celecoxib")
In [5]: mygene = get_client("gene")
In [6]: mygene.getgene("1017")
In [7]: mygene.query("symbol:cdk2")
In [8]: myvariant = get_client("variant“)
In [9]: myvariant.getvariant("chr7:g.140453134T>C")
In [10]:myvariant.query("dbsnp.rsid:rs58991260")
User API
MyGene.info API
MyVariant.info API
biothings_client available in
Python R Javascript https://biothings-clientpy.readthedocs.io
- Merging and keeping data sources in-sync
Scenario 2 - I need to aggregate multiple data sources,
and keep them up-to-date:
A data source management console included in SDK
http://docs.biothings.io/en/latest/doc/hub_tutorial.html
BioThings Studio as web-based development environment
Contribute to the existing
BioThings APIs
Build your
own API
Biomedical
Data
Sources
(MyGene.info data sources shown in BioThings Studio)
https://github.com/biothings/biothings_studio
What about data schemas?
BioThings API and SDK are data-schema neutral, but can be
customized to be an specialized API and SDK focusing on a
particular schema or vocabulary standards.
Schemas
Ontologies
Vocabularies Specialized API and SDK
Incentivize the adoption of standards
A collection of high-
performance APIs
An SDK for building
your own APIs
http://T.biothings.io
fast, up-to-date, simple-to-use
JSON data
aggregation
mechanism
High-
performance
query engine
Well-designed
REST API
pattern
JSON-LD
enabled
Linked Data
Data-updating scheduler
Python/R clients
…
Your data source
Your API
Abstraction of API building/deployment
Gene
Variant
Drug/Chemical
Taxonomy
http://MyDisease.info
Disease
What about other APIs?
How can APIs work together?
Use cases in NCATS Translator Program
NCATS Biomedical Data Translator Program
https://ncats.nih.gov/translator
Two proof-of-concept queries
For each of the drug-condition pairs listed
below, construct a clinical outcome
pathway that best explains how the drug
effects its action.
Drug Condition
METADOXINE Hepatitis, Alcoholic
MEMANTINE Alzheimer Disease
OXYMORPHONE Anxiety
… …
For each of the diseases listed below, list
which other genetic conditions observed in
the human population might offer
protection AND WHY.
Disease
Osteoporosis
Asthma
Ebola Virus Infection
…
API-level data integration for translational research
Electronic
Health
Record
(EHS)
Drugs
Proteins
Pathways
Genes
Variants
MyVariant.info
ClinVar
CiVIC
…
MyGene.info
Ensembl
… Reactome
WikiPathways
…UniProt
…
MyChem.info
Clue.io
DrugBank
…
Pharos
Biolink
Wikidata
NDEx
…
Cross-API data interoperability
Input
Output
1. Compacted
Format
2. Compacted
Format
3. Nquads Format
Semantically-aligned API output
The separation of data and its semantic context:
• Deal with data first, and semantic second
• Deal with data only and others can help
the semantic annotations
Semantic relationship represented in JSON-LD
{
"_id": "RZVAJINKPMORJF-UHFFFAOYSA-N",
"indication":[
{
concept_id: "Migraine",
concept_name: "37796009"
},
...
]
}
{
"@context": {
"indication": {
"@type": "@id",
"@id": "assoc:treats",
"@context": {
"concept_name": {
"@type": "@id",
"@id": "attr:label",
"@context": {
"@base": "http://biothings.io/explorer/vocab/terms/disease-name/"
}
},
"concept_id": {
"@type": "@id",
"@id": "attr:id",
"@context": {
"@base": "http://identifiers.org/snomedct/"
}
}
}
}
}
}
acetaminophen Migraine
treats
JSON object
JSON-LD context
OpenAPI specifications for API metadata
Tells how an API works
SmartAPI built on community standards
http://smart-api.info
Adds the semantic
context for the data
served from an API
Tells how an API works
SmartAPI defines extensions for rich API metadata
Biological domain-specific
metadata fields
SmartAPI as an API registry
http://smart-api.info
Hosted interactive documentation for your API
http://myvariant.smart-api.info http://myvariant.info/v1/api
Project-specific API portals
https://smart-api.info/registry/translatorhttps://smart-api.info/registry/nihdatacommons
NIH Data Commons Project NCATS Translator Project
A Real-world Translational Questions
From NCATS Translator Hackathon in May 2018
Disease - Gene
Gene - Pathways
Pathways - GeneGene - Chemical
Symptom - Disease
To explore the network of “SmartAPIs”:
http://biothings.io/explorer/
http://biothings.io/explorer_beta/
Discover
APIs for
specific
tasks
Automatically
trigger API calls
to construct a
subset of the
knowledge graph
Downstream
analysis
Find APIs can get me from pathways to genes:
Pathways Available APIs Genes
biocarta
kegg
wikipathway
reactome
ncbigene
uniprot
Find associated drug compounds to gene LCK:
LCK CHEML3707348
LCK
inhibits
Via DGIDB API
INCHIKEY:KKYYLKPGILUPOA-UHFFFAOYSA-N
UniProt:P06239
equals
Via MyGene API
targets
Via MyChem API
CHEMBL223873
equals Via MyChem API
More about
Video Tutorial
https://youtu.be/cPUKRsaTlhg
BioThings Explorer API:
http://biothings.io/explorer/api/
Demos in Jupyter Notebook:
BioThings Explorer Demo
BioThings Explorer Metadata
http://biothings.io/explorer/
BioThings project as a FAIR API Ecosystem
Accessible
Findable
Interoperable
Reusable
If you want fast and update-
to-date access to gene,
variant, chemical, drug data.
If you want to quickly turn
your data into an high-
performance API.
If you built your API and want
others to find your API and use
it together with other APIs for a
specific workflow.
Acknowledgement
Scripps Research
Andrew Su (sulab.org)
Cyrus Afrasiabi
Sebastien Lelong
Jiwen (Kevin) Xin
Marco Cano Alvarado
Ginger Tsueng
Byung Ryul Jeon
Greg Taylor
Xinhua (Jerry) Zhou
Nina Moore
Maastricht Univ.
Michel Dumontier
(dumontierlab.com)
Amrapali Zaveri
Kody Moodley
Trish Whetzel (EBI)
Shima Dastgheib (NuMedii)
Ruben Verborgh (Ghent Univ.)
Paul Avillach (Harvard)
Gabor Korodi (Harvard)
Raymond Terryn (Univ. of Miami)
Kathleen Jagodnik (Mount Sinai)
Pedro Assis (Stanford)
Funding support from
NIH Data Commons
API interoperability working group
Univ. of Washington
Sean Mooney
Vikas R Pejaver
Translator, CD2H

More Related Content

What's hot

Text and Data Mining explained at FTDM
Text and Data Mining explained at FTDMText and Data Mining explained at FTDM
Text and Data Mining explained at FTDMpetermurrayrust
 
MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...
MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...
MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...MongoDB
 
SureChEMBL patent annotations in Open PHACTS
SureChEMBL patent annotations in Open PHACTSSureChEMBL patent annotations in Open PHACTS
SureChEMBL patent annotations in Open PHACTSGeorge Papadatos
 
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...open_phacts
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Databasenist-spin
 
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...ChemAxon
 
Content Mining of Science and Medicine
Content Mining of Science and MedicineContent Mining of Science and Medicine
Content Mining of Science and MedicineTheContentMine
 
Plant ontology web services on Araport
Plant ontology web services on AraportPlant ontology web services on Araport
Plant ontology web services on AraportAraport
 
SureChEMBL and Open PHACTS
SureChEMBL and Open PHACTSSureChEMBL and Open PHACTS
SureChEMBL and Open PHACTSGeorge Papadatos
 
BigDataEurope - Big Data & Health
BigDataEurope - Big Data & HealthBigDataEurope - Big Data & Health
BigDataEurope - Big Data & HealthBigData_Europe
 
ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika! ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika! TheContentMine
 
ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!petermurrayrust
 
Linking Linked Data CSHALS2013
Linking Linked Data CSHALS2013Linking Linked Data CSHALS2013
Linking Linked Data CSHALS2013Nadia Anwar
 

What's hot (16)

Text and Data Mining explained at FTDM
Text and Data Mining explained at FTDMText and Data Mining explained at FTDM
Text and Data Mining explained at FTDM
 
MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...
MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...
MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...
 
Overview of SureChEMBL
Overview of SureChEMBLOverview of SureChEMBL
Overview of SureChEMBL
 
SureChEMBL patent annotations in Open PHACTS
SureChEMBL patent annotations in Open PHACTSSureChEMBL patent annotations in Open PHACTS
SureChEMBL patent annotations in Open PHACTS
 
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
 
Jan2016 horizon GIAB
Jan2016 horizon GIABJan2016 horizon GIAB
Jan2016 horizon GIAB
 
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
 
Content Mining of Science and Medicine
Content Mining of Science and MedicineContent Mining of Science and Medicine
Content Mining of Science and Medicine
 
Plant ontology web services on Araport
Plant ontology web services on AraportPlant ontology web services on Araport
Plant ontology web services on Araport
 
SureChEMBL and Open PHACTS
SureChEMBL and Open PHACTSSureChEMBL and Open PHACTS
SureChEMBL and Open PHACTS
 
BigDataEurope - Big Data & Health
BigDataEurope - Big Data & HealthBigDataEurope - Big Data & Health
BigDataEurope - Big Data & Health
 
ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika! ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!
 
ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!
 
Sourcing high quality online data resources for computational toxicology
Sourcing high quality online data resources for computational toxicologySourcing high quality online data resources for computational toxicology
Sourcing high quality online data resources for computational toxicology
 
Linking Linked Data CSHALS2013
Linking Linked Data CSHALS2013Linking Linked Data CSHALS2013
Linking Linked Data CSHALS2013
 

Similar to Build a FAIR API for Biomedical Knowledge

BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeChunlei Wu
 
BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...
BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...
BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...Chunlei Wu
 
MyVariant.info: Variant Annotation as a Service
MyVariant.info: Variant Annotation as a ServiceMyVariant.info: Variant Annotation as a Service
MyVariant.info: Variant Annotation as a ServiceChunlei Wu
 
Chunlei wu heart_bd2k_201602_ebi
Chunlei wu heart_bd2k_201602_ebiChunlei wu heart_bd2k_201602_ebi
Chunlei wu heart_bd2k_201602_ebiChunlei Wu
 
BioThings SDK: a toolkit for building high-performance data APIs in biology
BioThings SDK: a toolkit for building high-performance data APIs in biologyBioThings SDK: a toolkit for building high-performance data APIs in biology
BioThings SDK: a toolkit for building high-performance data APIs in biologyChunlei Wu
 
Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info
Chunlei Wu BD2K 201601 MyGene.info and MyVariant.infoChunlei Wu BD2K 201601 MyGene.info and MyVariant.info
Chunlei Wu BD2K 201601 MyGene.info and MyVariant.infoChunlei Wu
 
BioThings and SmartAPI: building an ecosystem of interoperable biological kno...
BioThings and SmartAPI: building an ecosystem of interoperable biological kno...BioThings and SmartAPI: building an ecosystem of interoperable biological kno...
BioThings and SmartAPI: building an ecosystem of interoperable biological kno...Chunlei Wu
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchDavid Ruau
 
How can you access PubChem programmatically?
How can you access PubChem programmatically?How can you access PubChem programmatically?
How can you access PubChem programmatically?Sunghwan Kim
 
Computational Resources In Infectious Disease
Computational Resources In Infectious DiseaseComputational Resources In Infectious Disease
Computational Resources In Infectious DiseaseJoão André Carriço
 
BioIT Europe 2010 - BioCatalogue
BioIT Europe 2010 - BioCatalogueBioIT Europe 2010 - BioCatalogue
BioIT Europe 2010 - BioCatalogueBioCatalogue
 
Opportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationOpportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationBenjamin Good
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Sanjay Padhi, Ph.D
 
Open chemistry registry and mapping platform based on open source cheminforma...
Open chemistry registry and mapping platform based on open source cheminforma...Open chemistry registry and mapping platform based on open source cheminforma...
Open chemistry registry and mapping platform based on open source cheminforma...Valery Tkachenko
 
Enhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataEnhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataBarry Smith
 
Implementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSImplementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSValery Tkachenko
 
PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...
PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...
PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...Araport
 

Similar to Build a FAIR API for Biomedical Knowledge (20)

BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
 
BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...
BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...
BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...
 
MyVariant.info: Variant Annotation as a Service
MyVariant.info: Variant Annotation as a ServiceMyVariant.info: Variant Annotation as a Service
MyVariant.info: Variant Annotation as a Service
 
Chunlei wu heart_bd2k_201602_ebi
Chunlei wu heart_bd2k_201602_ebiChunlei wu heart_bd2k_201602_ebi
Chunlei wu heart_bd2k_201602_ebi
 
BioThings SDK: a toolkit for building high-performance data APIs in biology
BioThings SDK: a toolkit for building high-performance data APIs in biologyBioThings SDK: a toolkit for building high-performance data APIs in biology
BioThings SDK: a toolkit for building high-performance data APIs in biology
 
Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info
Chunlei Wu BD2K 201601 MyGene.info and MyVariant.infoChunlei Wu BD2K 201601 MyGene.info and MyVariant.info
Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info
 
BioThings and SmartAPI: building an ecosystem of interoperable biological kno...
BioThings and SmartAPI: building an ecosystem of interoperable biological kno...BioThings and SmartAPI: building an ecosystem of interoperable biological kno...
BioThings and SmartAPI: building an ecosystem of interoperable biological kno...
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
 
How can you access PubChem programmatically?
How can you access PubChem programmatically?How can you access PubChem programmatically?
How can you access PubChem programmatically?
 
Computational Resources In Infectious Disease
Computational Resources In Infectious DiseaseComputational Resources In Infectious Disease
Computational Resources In Infectious Disease
 
BioIT Europe 2010 - BioCatalogue
BioIT Europe 2010 - BioCatalogueBioIT Europe 2010 - BioCatalogue
BioIT Europe 2010 - BioCatalogue
 
Opportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationOpportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocuration
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021
 
Open chemistry registry and mapping platform based on open source cheminforma...
Open chemistry registry and mapping platform based on open source cheminforma...Open chemistry registry and mapping platform based on open source cheminforma...
Open chemistry registry and mapping platform based on open source cheminforma...
 
Enhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataEnhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort Data
 
Implementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSImplementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTS
 
PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...
PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...
PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...
 
Biothings presentation
Biothings presentationBiothings presentation
Biothings presentation
 
Harvester I
Harvester IHarvester I
Harvester I
 

Recently uploaded

Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett SquareIsiahStephanRadaza
 
Cytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxCytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxVarshiniMK
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
insect anatomy and insect body wall and their physiology
insect anatomy and insect body wall and their  physiologyinsect anatomy and insect body wall and their  physiology
insect anatomy and insect body wall and their physiologyDrAnita Sharma
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Heredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsHeredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsCharlene Llagas
 
Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10ROLANARIBATO3
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |aasikanpl
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantadityabhardwaj282
 

Recently uploaded (20)

Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett Square
 
Cytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxCytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptx
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
insect anatomy and insect body wall and their physiology
insect anatomy and insect body wall and their  physiologyinsect anatomy and insect body wall and their  physiology
insect anatomy and insect body wall and their physiology
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Heredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsHeredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of Traits
 
Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are important
 

Build a FAIR API for Biomedical Knowledge

  • 1. Chunlei Wu, Ph.D. cwu@scripps.edu @chunleiwu https://wulab.io Associate Professor Dept. of Integrative Structural and Computational Biology The Scripps Research Institute La Jolla, CA, USA 01/16/2019 NCI – CBIIT Speaker Series Building a FAIR API Ecosystem for Biomedical Knowledge http://biothings.io
  • 2. Biomedical Data API API – Application Programming Interface API is a way to abstract the data-access layer.
  • 3. APIs as a reusable data layer Presentation Layer Business logic Layer Data Layer Application 1 Presentation Layer Business logic Layer Data Layer Application 2 View Controller Model Repetitive data wrangling: • Parsing dump files • ID conversion • Data merging • Data transformation • Source monitoring • Download scheduler • … … Presentation Layer Business logic Layer Common Data Layer Application 1 Presentation Layer Business logic Layer Data Layer Application 2
  • 4. Why bioinformaticians need APIs It's about Modularization photo credits: http://www.edmentum.com/sites/edmentum.com/files/solutions/content/building_0.jpg http://www.howcsharp.com/img/0/68/dont-repeat-yourself-dry-300x211.jpg http://blog.capinc.com/wp-content/uploads/2013/02/Recycle_Logo_by_Har1-300x263.png Reusability DRY principle
  • 5. Biomedical APIs and FAIR matrix APIs are not quite findable APIs are naturally accessible But enterprise-grade Biomedical APIs are still few Often not interoperable across APIs APIs serve reusable piece of data But more can be made reusable in API development ? ?
  • 6. Computer science is all about “Abstraction” “Abstraction” is the simple guiding-principle for informaticians Reducing repetitive efforts Opportunities for informaticians
  • 7. An example: abstracting the gene search box http://biogps.org
  • 9. Aggregated Gene annotations represented in JSON documents { “_id”: “1017”, “symbol”: “CDK2”, “ensembl”: “ENSG00000123374”, “refseq”: [ “NM_001798”, “NM_052827” ], “reporter”: { “U95A”: [ “1792_g_at”, “1833_at” ], “U133A”:[ “211804_s_at”, “2045252_at”, “211803_at” ] } } Source merging criteria: matching NCBI or Ensembl Gene ids HGNC MGI RGD Refseq Ensembl UniProt UniGene Homologene PantherDB GO Reactome Wikipathways KEGG PDB PFAM Interpro Prosite PIR Pharmgkb UMLS Wikipedia Pharos …
  • 10. Gene-centric API via a simple interface Get gene object(s) via either NCBI/Ensembl gene ids: http://mygene.info/v3/gene/1017 http://mygene.info/v3/gene/ENSG00000123374 http://mygene.info/v3/gene/1017?fields=symbol,name,pathway,uniprot Find matching gene objects with any query terms: http://mygene.info/v3/query?q=CDK2 http://mygene.info/v3/query?q=name:kinase&species=human http://mygene.info/v3/query?q=name:kinase AND _exists_:pathway http://mygene.info/v3/query?q=pathway.kegg.name:wnt&fields=entrezgene,symbol,taxid,interpro Batch queries supported via POST
  • 11. MyVariant.info API { "_id": "chr1:g.196659237C>T", "cosmic": { "chrom": "1", "hg19": { "start": 196659237, "end": 196659237 }, "ref": "C", "alt": "T", "tumor_site": "breast", "mut_freq": 0.49, "mut_nt": "C>T", "cosmic_id": "COSM424915" } { "_id": "chr1:g.196659237C>T", "cadd": { … }, "clinvar": { … }, "cosmic": { … }, "dbsnp": { … }, "dbnsfp": { … }, "evs": { … }, "emv": { … }, "mutdb": { … }, "gwassnp": { … }, "snpedia": { … }, "wellderly": { … } } Source merging criteria: matching HGVS names Only genomic-based HGVS names are used (support both hg19 and hg38) more at: http://docs.myvariant.info/en/latest/doc/data.html#id-field http://myvariant.info A real example online 21 sources: dbSNP dbNSFP CADD UniProt ClinVar CIVIC CGI DOCM ExAC GNOMAD EMV EVS Grasp SNPEFF …
  • 12. MyVariant.info API Data source license and metadata: { "_id": "chr1:g.196659237C>T", "cadd": { "_license": “http://bit.ly/2TIuab9”, … }, "clinvar": { "_license": “http://bit.ly/2SQdcI0”, … }, " civic": { "_license": “http://bit.ly/2FqS871”, … }, “dbnsfp": { "_license": “http://bit.ly/2VLnQBz” , … }, … } { "build_date": "2018-12-06T22:15:39.743302", "build_version": "20181206", "src": { "cadd": { "license_url": "http://cadd.gs.washington.edu/contact", "license_url_short": "http://bit.ly/2TIuab9", "stats": { "cadd": 226932858 }, "url": "http://cadd.gs.washington.edu/home", "version": "1.3" }, "civic": { "licence": "CC0 1.0 Universal", "license_url": "https://creativecommons.org/publicdomain/zero/1.0/", "license_url_short": "http://bit.ly/2FqS871", "stats": { "civic": 1559 }, "url": "https://civicdb.org", "version": "201706" }, … }} “_license” urls embedded in every response Detailed source metadata at http://myvariant.info/metadata
  • 13. MyChem.info API for chemicals and drugs { "_id": "RRUDCFGSUDOHDG-UHFFFAOYSA-N", “chebi": { “id”: “CHEBI:49029”, “formulae”: “C2H5NO2", “name”: “N-hydroxyacetimidic acid”, “smiles”: “CC(O)=NO”, “xrefs": { “pubchem": { “cid”: “1990”, “sid”: “49693671” } } }, “drugbank”: {…}, “drugcentral”: {…} } Source merging criteria: matching InChiKey more at: http://docs.mychem.info/en/latest/doc/data.html#id-field 11 sources: AEOLUS ChEBI ChEMBL Drugbank Drugcentral GINAS NDC PharmGKB PubChem UNII
  • 14. Collectively, we call them “BioThings APIs” Aggregates annotations for 96 million drugs/chemicals from 11 resources I have a list of drug/chemical ids, want to get annotations about them? Drug/chemical annotation service: GET /v1/drug/<drugid> POST /v1/drug/ (batch mode) I want to get matching drugs/chemicals with my query term(s) Drug/chemical query service: GET /v1/query/?q= <query> POST /v1/query/ (batch mode) http://mygene.info http://myvariant.info http://mychem.info ~10 M requests ~20,000 unique IPs every month ~5 M requests 8000 unique IPs every month recently launched! Aggregates annotations for 25 million genes from 30 resources I have a list of gene ids, want to get annotations about them? Gene annotation service: GET /v3/gene/<geneid> POST /v3/gene/ (batch mode) I want to get matching genes with my query term(s) Gene query service: GET /v3/query/?q= <query> POST /v3/query/ (batch mode) Aggregates annotations for 874 million variants from 21 resources I have a list of variant ids, want to get annotations about them? Variant annotation service: GET /v1/variant/<hgvsid> POST /v1/variant/ (batch mode) I want to get matching variants with my query term(s) Variant query service: GET /v1/query/?q= <query> POST /v1/query/ (batch mode)
  • 15. Who is using BioThings API Many users use our APIs in their daily analysis pipelines or simply caching annotations locally http://biothings.io/who-is-using
  • 16. Who is using BioThings API Baylor College of Med 17,264,902 OHSU 16,442,387 Google LLC 590,305 UNC 480,168 Cincinnati Children 229,686 Université Laval 226,243 UCSD 101,867 Rockefeller University 96,018 Illumina 92,902 Yale Univ 44,587 NY Genome Center 3,502,635 UTexas-Austin 2,785,542 Stanford University 2,607,072 Univ of Colorado 1,325,650 Yale Univ 1,054,124 Vanderbilt Univ 851,375 Univ of Chicago 614,891 Baylor College of Med 550,022 Oregon State Univ 525,350 Univ of Illinois - UC 507,421 Top 10 organizations* and their requests (01/01/2018-12/31/2018) * Orgs mapped to the general ISPs were removed # of requests # of requests
  • 17. BioThings API usage by numbers Total requests 130M Avg. Monthly requests 10.7M Total Unique IPs 173K Monthly Unique IPs ~19K mygene Python client monthly download ~4470 mygene R client monthly download ~611 Availability tracked by UptimeRobot 100% Based on usage data (01/01/2018-12/31/2018) Total requests 55M Average Monthly requests 4.6M Total Unique IPs 86K Monthly Unique IPs ~8K myvariant Python client monthly download ~3600 myvariant R client monthly download ~164 Availability tracked by UptimeRobot 100%
  • 18. mygene and myvariant Python clients Open source repositories depending on our python clients (total 29) (total 11) https://libraries.io/pypi/mygene https://libraries.io/pypi/myvariant
  • 19. Build Enterprise-grade Biomedical APIs  Simple to use  Always up-to-date (weekly updated)  Comprehensive - MyGene.info: 25M genes from 24K species - MyVariant.info: 874M (700M observed) - MyChem.info: 96M chemicals/drugs  High-performance and scalable  High-availability  Python, R, JavaScript clients  Developer-friendly (support CORS, gzip, https, msgpack, etc.) • “fetch_all” feature for streaming large query results
  • 20. A collection of high- performance APIs http://T.biothings.io fast, up-to-date, simple-to-use Gene Variant Drug/Chemical Taxonomy http://MyDisease.info Disease What about other “BioThings”, with our limited bandwidth? Can we further abstract the process of making APIs? Help ourselves as well as others to build APIs.
  • 21. Schematic view of MyVariant.info architecture Web module Hub module Individual server node * Colors indicate the different updating schedules
  • 22. Others can build their own APIs with src monitor scheduler data merger data indexer URL pattern JSONP CORS compression JSON-LD Tracking unit tests cluster setup data deploy cluster scaling load-balancing Optional query customization Data Hub Web API Cloud Deployment data parsers for individual resources MongoDB + Elasticsearch Python/Tornado Amazon AWS http://docs.biothings.io BioThingsSDK done by Users abstracted in SDK
  • 23. My data file I will write a parser Describe data schema for indexing Setup Elasticsearch Index JSON objects in Elasticsearch Ready to serve Your BioThings API is live! LIVE Inspector indexer In [1]: from biothings.www import BiothingsAPIApp In [2]: drug_api_app = BiothingsAPIApp( ...: APP_LIST= [(r'/v1/drug/(.+)/?', 'BiothingHandler'), ...: (r'/v1/drug/?$', 'BiothingHandler')], ...: ES_INDEX=‘drug_databuild_20170708', ES_DOC_TYPE=‘drug') In [3]: drug_api_app.start(port=8002) INFO:root:Server is running on "0.0.0.0:8002"... code snippet user actions done by SDK Scenario 1 - I have a data file, and I want to make it an API: - Turn a data file into a high-quality API http://docs.biothings.io/en/latest/doc/single_source_tutorial.html
  • 24. - Unified API clients in Python/R/JS # Access your live API from the unified Python client: In [1]: from biothings_client import get_client In [2]: mydrug = get_client("drug", url="localhost:8002/v1") In [3]: mydrug.getdrug("DB08571”) In [4]: mydrug.query("drugbank.name:celecoxib") In [5]: mygene = get_client("gene") In [6]: mygene.getgene("1017") In [7]: mygene.query("symbol:cdk2") In [8]: myvariant = get_client("variant“) In [9]: myvariant.getvariant("chr7:g.140453134T>C") In [10]:myvariant.query("dbsnp.rsid:rs58991260") User API MyGene.info API MyVariant.info API biothings_client available in Python R Javascript https://biothings-clientpy.readthedocs.io
  • 25. - Merging and keeping data sources in-sync Scenario 2 - I need to aggregate multiple data sources, and keep them up-to-date: A data source management console included in SDK http://docs.biothings.io/en/latest/doc/hub_tutorial.html
  • 26. BioThings Studio as web-based development environment Contribute to the existing BioThings APIs Build your own API Biomedical Data Sources (MyGene.info data sources shown in BioThings Studio) https://github.com/biothings/biothings_studio
  • 27. What about data schemas? BioThings API and SDK are data-schema neutral, but can be customized to be an specialized API and SDK focusing on a particular schema or vocabulary standards. Schemas Ontologies Vocabularies Specialized API and SDK Incentivize the adoption of standards
  • 28. A collection of high- performance APIs An SDK for building your own APIs http://T.biothings.io fast, up-to-date, simple-to-use JSON data aggregation mechanism High- performance query engine Well-designed REST API pattern JSON-LD enabled Linked Data Data-updating scheduler Python/R clients … Your data source Your API Abstraction of API building/deployment Gene Variant Drug/Chemical Taxonomy http://MyDisease.info Disease What about other APIs? How can APIs work together?
  • 29. Use cases in NCATS Translator Program NCATS Biomedical Data Translator Program https://ncats.nih.gov/translator Two proof-of-concept queries For each of the drug-condition pairs listed below, construct a clinical outcome pathway that best explains how the drug effects its action. Drug Condition METADOXINE Hepatitis, Alcoholic MEMANTINE Alzheimer Disease OXYMORPHONE Anxiety … … For each of the diseases listed below, list which other genetic conditions observed in the human population might offer protection AND WHY. Disease Osteoporosis Asthma Ebola Virus Infection …
  • 30. API-level data integration for translational research Electronic Health Record (EHS) Drugs Proteins Pathways Genes Variants MyVariant.info ClinVar CiVIC … MyGene.info Ensembl … Reactome WikiPathways …UniProt … MyChem.info Clue.io DrugBank … Pharos Biolink Wikidata NDEx …
  • 32. Input Output 1. Compacted Format 2. Compacted Format 3. Nquads Format Semantically-aligned API output The separation of data and its semantic context: • Deal with data first, and semantic second • Deal with data only and others can help the semantic annotations
  • 33. Semantic relationship represented in JSON-LD { "_id": "RZVAJINKPMORJF-UHFFFAOYSA-N", "indication":[ { concept_id: "Migraine", concept_name: "37796009" }, ... ] } { "@context": { "indication": { "@type": "@id", "@id": "assoc:treats", "@context": { "concept_name": { "@type": "@id", "@id": "attr:label", "@context": { "@base": "http://biothings.io/explorer/vocab/terms/disease-name/" } }, "concept_id": { "@type": "@id", "@id": "attr:id", "@context": { "@base": "http://identifiers.org/snomedct/" } } } } } } acetaminophen Migraine treats JSON object JSON-LD context
  • 34. OpenAPI specifications for API metadata Tells how an API works
  • 35. SmartAPI built on community standards http://smart-api.info Adds the semantic context for the data served from an API Tells how an API works
  • 36. SmartAPI defines extensions for rich API metadata Biological domain-specific metadata fields
  • 37. SmartAPI as an API registry http://smart-api.info
  • 38. Hosted interactive documentation for your API http://myvariant.smart-api.info http://myvariant.info/v1/api
  • 40. A Real-world Translational Questions From NCATS Translator Hackathon in May 2018 Disease - Gene Gene - Pathways Pathways - GeneGene - Chemical Symptom - Disease
  • 41. To explore the network of “SmartAPIs”: http://biothings.io/explorer/ http://biothings.io/explorer_beta/ Discover APIs for specific tasks Automatically trigger API calls to construct a subset of the knowledge graph Downstream analysis
  • 42. Find APIs can get me from pathways to genes: Pathways Available APIs Genes biocarta kegg wikipathway reactome ncbigene uniprot
  • 43. Find associated drug compounds to gene LCK: LCK CHEML3707348 LCK inhibits Via DGIDB API INCHIKEY:KKYYLKPGILUPOA-UHFFFAOYSA-N UniProt:P06239 equals Via MyGene API targets Via MyChem API CHEMBL223873 equals Via MyChem API
  • 44. More about Video Tutorial https://youtu.be/cPUKRsaTlhg BioThings Explorer API: http://biothings.io/explorer/api/ Demos in Jupyter Notebook: BioThings Explorer Demo BioThings Explorer Metadata http://biothings.io/explorer/
  • 45. BioThings project as a FAIR API Ecosystem Accessible Findable Interoperable Reusable If you want fast and update- to-date access to gene, variant, chemical, drug data. If you want to quickly turn your data into an high- performance API. If you built your API and want others to find your API and use it together with other APIs for a specific workflow.
  • 46. Acknowledgement Scripps Research Andrew Su (sulab.org) Cyrus Afrasiabi Sebastien Lelong Jiwen (Kevin) Xin Marco Cano Alvarado Ginger Tsueng Byung Ryul Jeon Greg Taylor Xinhua (Jerry) Zhou Nina Moore Maastricht Univ. Michel Dumontier (dumontierlab.com) Amrapali Zaveri Kody Moodley Trish Whetzel (EBI) Shima Dastgheib (NuMedii) Ruben Verborgh (Ghent Univ.) Paul Avillach (Harvard) Gabor Korodi (Harvard) Raymond Terryn (Univ. of Miami) Kathleen Jagodnik (Mount Sinai) Pedro Assis (Stanford) Funding support from NIH Data Commons API interoperability working group Univ. of Washington Sean Mooney Vikas R Pejaver Translator, CD2H

Editor's Notes

  1. Up-to-date and high-performance and high-availability