SlideShare a Scribd company logo
1 of 36
Chunlei Wu, Ph.D.
cwu@scripps.edu
@chunleiwu
Associate Professor of Molecular Medicine
Dept. of Molecular Experimental Medicine
The Scripps Research Institute
La Jolla, CA, USA
07/2016
High-performance web services for
gene and variant annotations
MyVariant.infoMyGene.info
Biological knowledge is a complex network
No one-fit-all database can capture
the entire knowledge space
Typical database representations
{
_id: 1017,
name: CDK2,
taxid: 9606
}
Relational
database
Document
database
RDF
triplestore
Tables JSON objects Triples
Key-value
store
Key-value pairs
BioThings APIs are built on document databases
Why we picked document databases:
• Object representation
• Rich data structures, handles heterogeneous data
very well
• Atomic operations, built for big-data scale
Gene and Variant annotations represented in JSON documents
{
"_id": "chr1:g.196659237C>T",
"cosmic": {
"chrom": "1",
"hg19": {
"start": 196659237,
"end": 196659237
},
"ref": "C",
"alt": "T",
"tumor_site": "breast",
"mut_freq": 0.49,
"mut_nt": "C>T",
"cosmic_id": "COSM424915"
}
{
“_id”: “1017”,
“Symbol”: “CDK2”,
“Ensembl”: “ENSG00000123374”,
“RefSeq”: [
“NM_001798”,
“NM_052827”
],
“Reporter”: {
“U95A”: [
“1792_g_at”,
“1833_at”
],
“U133A”:[
“211804_s_at”,
“2045252_at”,
“211803_at”
]
}
}
Keep data always up-to-date
Each data source is updated individually. Colors
indicate their different updating schedules.
Schematic view of MyVariant.info architecture
High-performance web service APIs
Schematic view of MyVariant.info architecture
MyGene.info + MyVariant.info
Gene
G
Variant
V
MyVariant.infoMyGene.info
/v2/gene/<geneid>
/v2/query?q=<query>
/v1/variant/<hgvsid>
/v1/query?q=<query>
/v3/gene/<geneid>
/v3/query?q=<query>
single query on GET, batch query on POST
We focus on building APIs. Try to …
Make it really easy to use
Just two endpoints
No registration/sign-in
No API key
Developer-friendly
Python/R clients
(also js client for myvariant)
search “mygene” and “myvariant”
in PyPI and Bioconductor
JSONP
CORS
https
msgpack
http compression
http caching
JSON-LD
Supported!
Aggregate Everything about gene and variant
MyVariant.infoMyGene.info
Support >15M genes
for ~17K species
~ 200 annotation fields
Support > 334 M variants
~ 500 annotation fields
from 14 sources:
ClinVar
dbNSFP
dbSNP
…
Keep up-to-date
MyVariant.infoMyGene.info
Weekly ~Monthly
Support >15M genes
for ~17K species
~ 200 annotation fields
Support > 334 M variants
~ 500 annotation fields
from 14 sources:
ClinVar
dbNSFP
dbSNP
…
High-performance and scalable
>95% queries response < 30ms
High-performance and scalable
“Stress test” suggests support for
>5,000 concurrent users for
~10,000
requests per minute
High-performance and scalable
High availability
99.999%
over last year
MyVariant.infoMyGene.info
99.87%
over last 6 months
Availability tracked by
Who is using
MinePath.org
Gene Wiki
JBrowse
Live applications:
Who is using
Many users use them in their
daily analysis pipelines
or
simply caching annotations locally
MyGene.info recent usage stats
requests unique IPs
Jan-16 3,885,192 2,498
Feb-16 5,313,950 2,786
Mar-16 3,362,354 3,121
Apr-16 10,918,104 3,065
May-16 10,776,858 3,803
Jun-16 6,396,148 3,940
39%
direct calls 38%
mygene.py
14%
mygene.R
9%
BioGPS
Over 40M requests
In six months
MyVariant.info recent usage stats
requests unique IPs
Jan-16 83,519 1,330
Feb-16 3,054,191 1,192
Mar-16 272,424 1,771
Apr-16 701,526 1,500
May-16 89,642 1,891
Jun-16 213,767 1,924
21%
direct calls
23%
myvariant.py
50%
myvariant.R
6%
myvariant.js
~4.5M requests
In six months
Generalized BioThings SDK
BioThings SDK
MyVariant.info
MyGene.info
JSON data
aggregation
mechanism
High-
performance
query engine
Well-designed
REST API
pattern
JSON-LD
enabled
Linked Data
Data-updating scheduler
Python/R clients
…
BioThings SDK
A tutorial here (more docs are coming):
http://biothingsapi.readthedocs.io/en/latest/
v.biothings.io
g.biothings.io
BioThings SDK
gene
variant
s.biothings.io species/
taxonomy
alias to MyGene.info
alias to MyVariant.info
BioThings API for species/taxonomy
{
"_id": "9606",
"_version": 1,
"authority": [
"homo sapiens linnaeus, 1758"
],
"children": [ 63221, 741158],
"common_name": "man",
"genbank_common_name": "human",
"has_gene": true,
"lineage": [ 9606, 9605, 207598, …,131567, 1],
"parent_taxid": 9605,
"rank": "species",
"scientific_name": "homo sapiens",
"taxid": 9606,
"uniprot_name": "homo sapiens"
}
http://s.biothings.io/v1/species/9606?include_children=true
BioThings API for species/taxonomy
{
"hits": [
{
"_id": "1239",
"_score": 10.971453,
"common_name": […],
"genbank_common_name": "gram-positive bacteria",
"has_gene": false,
"lineage": [1239, 1783272, 2, 131567, 1],
"parent_taxid": 1783272,
"rank": "phylum",
"scientific_name": "firmicutes",
"taxid": 1239,
"uniprot_name": "firmicutes"
}
],
"max_score": 10.971453,
"took": 12,
"total": 1
}
http://s.biothings.io/v1/query?q=rank:phylum AND
common_name:gram-positive
Species API used in MyGene.info
You can now query for genes beyond species:
Q: Give me all lytic enzymes for any firmicutes
http://mygene.info/v3/query?q=lytic enzyme&species=1239&include_tax_tree=true
http://mygene.info/v3/query?q=lytic enzyme&species=1239
0 hits
5 hits
Very minimal code for building a species API
Have the flexibility to customize your query
v.biothings.io
g.biothings.io
BioThings SDK
s.biothings.io
c.biothings.io
gene
variant
species/
taxonomy
drugs/
compounds
∙ ∙ ∙ ∙ ∙ ∙
alias to MyGene.info
alias to MyVariant.info
diseased.biothings.io
BioThings APIs
A collection of data APIs A framework for building new APIs
Data as a service Software as a service
Got a new type of “BioThings”?
We can help you to build or even host your biothings API
BioThings TEAM
Funding and Support
U01HG008473
U54GM114833
TSRI:
Chunlei Wu
Andrew Su
Jiwen Xin
Cyrus Afrasiabi
Sebastien Lelong
Ginger Tsueng
Julee Adesara
Mike Mayers
U. Washington:
Sean Mooney
Moritz Juchler
Nikhil Gopal
Source code
• MyGene.info
https://github.com/sulab/mygene.info
• MyVariant.info
https://github.com/sulab/myvariant.info
• BioThings API for species/taxonomy
https://github.com/sulab/biothings.species
• BioThings SDK
https://github.com/sulab/biothings.api
DEMO time!
by Jiwen (Kevin) Xin
2441
2308
1917
18
9
5
Initial number of genes mutated in all four patients:
filter2 <- lapply(filter1, function(i) subset(i, cadd.consequence %in%
c("NON_SYNONYMOUS", "STOP_GAINED", "STOP_LOST", "CANONICAL_SPLICE", "SPLICE_SITE")))
nVars <- countGenes(vars)
filter1 <- lapply(vars, function(i) subset(i, DP > 8 & FS < 30 & QD > 2))
Filtering for sequencing coverage and strand bias:
Filtering for nonsynonymous and splice site variants:
filter3 <- lapply(filter2, function(i) subset(i, exac.af < 0.01))
Filtering for rare variants based on allele frequencies from ExAC:
filter4 <- lapply(filter3, function(i) subset(i, sapply(dbnsfp.1000gp1.af,
function(j) j < 0.01 )))
Filtering for rare variants based on allele frequencies from 1000 Genomes Project:
goBP <- data.frame(queryMany(top.genes$Var1, scopes="symbol", species="human",
fields=c("go.BP", "name", "MIM", "uniprot")))
# The Bioconductor package go.DB is used to find all genes with a GO biological process annotation
that # is a descendant of GO:0008152 - the GO id for metabolic process.
miller.bp <- lapply(goBP$go.BP, function(i) unlist(i$id))
bp.ancestor <- lapply(miller.bp, function(i) sapply(i, function(j) "GO:0008152" %in%
unlist(GOBPANCESTOR[[j]])))
candidate.genes <- top.genes$Var1[sapply(bp.ancestor, function(i) TRUE %in% i)]
Filtering by GO biological process annotation using MyGene.info:
Number of genes Filtering steps to prioritize candidate genes:
Demos in Jupyter notebooks
• Using myvariant and mygene in R for variant
prioritization
http://nbviewer.jupyter.org/github/SuLab/myvariant.info/blob/master/d
ocs/ipynb/myvariant_R_miller.ipynb
• Access ClinVar data from myvariant in Python
http://nbviewer.jupyter.org/github/SuLab/myvariant.info/blob/master/d
ocs/ipynb/myvariant_clinvar_demo.ipynb
• ID mapping using mygene module in Python
http://nbviewer.jupyter.org/gist/newgene/6771106

More Related Content

What's hot

Idcc kansa-kansa-arbuckle
Idcc kansa-kansa-arbuckleIdcc kansa-kansa-arbuckle
Idcc kansa-kansa-arbuckleEric Kansa
 
ICAR 2015 Poster - Araport
ICAR 2015 Poster - AraportICAR 2015 Poster - Araport
ICAR 2015 Poster - AraportAraport
 
DCC Keynote 2007
DCC Keynote 2007DCC Keynote 2007
DCC Keynote 2007Carole Goble
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Alejandra Gonzalez-Beltran
 
Why are we still doing industrial age drug
Why are we still doing industrial age drugWhy are we still doing industrial age drug
Why are we still doing industrial age drugSean Ekins
 
Content Mining of Science and Medicine
Content Mining of Science and MedicineContent Mining of Science and Medicine
Content Mining of Science and MedicineTheContentMine
 
Cassavabase workshop ibadan March17
Cassavabase workshop ibadan March17Cassavabase workshop ibadan March17
Cassavabase workshop ibadan March17solgenomics
 
biomedical research in an increasingly digital world
biomedical research in an increasingly digital worldbiomedical research in an increasingly digital world
biomedical research in an increasingly digital worldBrian Bot
 
VariantSpark a library for genomics by Lynn Langit
VariantSpark a library for genomics by Lynn LangitVariantSpark a library for genomics by Lynn Langit
VariantSpark a library for genomics by Lynn LangitData Con LA
 
Model Organism Linked Data
Model Organism Linked DataModel Organism Linked Data
Model Organism Linked DataMichel Dumontier
 
the beginnings of an open ecosystem in mHealth
the beginnings of an open ecosystem in mHealththe beginnings of an open ecosystem in mHealth
the beginnings of an open ecosystem in mHealthBrian Bot
 
Cross-Disciplinary Biomedical Research at Calit2
Cross-Disciplinary Biomedical Research at Calit2Cross-Disciplinary Biomedical Research at Calit2
Cross-Disciplinary Biomedical Research at Calit2Larry Smarr
 
Cassavabase-PhenoApps demo ISTRC 2018
Cassavabase-PhenoApps demo ISTRC 2018Cassavabase-PhenoApps demo ISTRC 2018
Cassavabase-PhenoApps demo ISTRC 2018solgenomics
 
Open PHACTS: The Data Today
Open PHACTS: The Data TodayOpen PHACTS: The Data Today
Open PHACTS: The Data TodayAlasdair Gray
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
Hail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton Seed
Hail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton SeedHail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton Seed
Hail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton SeedSpark Summit
 
mobile technologies: riding the hype cycle together
mobile technologies: riding the hype cycle togethermobile technologies: riding the hype cycle together
mobile technologies: riding the hype cycle togetherBrian Bot
 

What's hot (20)

Semantic (Web) Technologies for Translational Research in Life Sciences
Semantic (Web) Technologies for Translational Research in Life SciencesSemantic (Web) Technologies for Translational Research in Life Sciences
Semantic (Web) Technologies for Translational Research in Life Sciences
 
eScience Resources for the Chemistry Community from the Royal Society of Chem...
eScience Resources for the Chemistry Community from the Royal Society of Chem...eScience Resources for the Chemistry Community from the Royal Society of Chem...
eScience Resources for the Chemistry Community from the Royal Society of Chem...
 
Idcc kansa-kansa-arbuckle
Idcc kansa-kansa-arbuckleIdcc kansa-kansa-arbuckle
Idcc kansa-kansa-arbuckle
 
ICAR 2015 Poster - Araport
ICAR 2015 Poster - AraportICAR 2015 Poster - Araport
ICAR 2015 Poster - Araport
 
DCC Keynote 2007
DCC Keynote 2007DCC Keynote 2007
DCC Keynote 2007
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
 
Why are we still doing industrial age drug
Why are we still doing industrial age drugWhy are we still doing industrial age drug
Why are we still doing industrial age drug
 
Content Mining of Science and Medicine
Content Mining of Science and MedicineContent Mining of Science and Medicine
Content Mining of Science and Medicine
 
Cassavabase workshop ibadan March17
Cassavabase workshop ibadan March17Cassavabase workshop ibadan March17
Cassavabase workshop ibadan March17
 
biomedical research in an increasingly digital world
biomedical research in an increasingly digital worldbiomedical research in an increasingly digital world
biomedical research in an increasingly digital world
 
VariantSpark a library for genomics by Lynn Langit
VariantSpark a library for genomics by Lynn LangitVariantSpark a library for genomics by Lynn Langit
VariantSpark a library for genomics by Lynn Langit
 
Model Organism Linked Data
Model Organism Linked DataModel Organism Linked Data
Model Organism Linked Data
 
the beginnings of an open ecosystem in mHealth
the beginnings of an open ecosystem in mHealththe beginnings of an open ecosystem in mHealth
the beginnings of an open ecosystem in mHealth
 
Cross-Disciplinary Biomedical Research at Calit2
Cross-Disciplinary Biomedical Research at Calit2Cross-Disciplinary Biomedical Research at Calit2
Cross-Disciplinary Biomedical Research at Calit2
 
Cassavabase-PhenoApps demo ISTRC 2018
Cassavabase-PhenoApps demo ISTRC 2018Cassavabase-PhenoApps demo ISTRC 2018
Cassavabase-PhenoApps demo ISTRC 2018
 
Open PHACTS: The Data Today
Open PHACTS: The Data TodayOpen PHACTS: The Data Today
Open PHACTS: The Data Today
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
Hail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton Seed
Hail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton SeedHail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton Seed
Hail: SCALING GENETIC DATA ANALYSIS WITH APACHE SPARK: Keynote by Cotton Seed
 
2014 ASPB Presentation- Berardini
2014 ASPB Presentation- Berardini2014 ASPB Presentation- Berardini
2014 ASPB Presentation- Berardini
 
mobile technologies: riding the hype cycle together
mobile technologies: riding the hype cycle togethermobile technologies: riding the hype cycle together
mobile technologies: riding the hype cycle together
 

Viewers also liked

20120717 ismb2012
20120717 ismb201220120717 ismb2012
20120717 ismb2012anewgene
 
Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info
Chunlei Wu BD2K 201601 MyGene.info and MyVariant.infoChunlei Wu BD2K 201601 MyGene.info and MyVariant.info
Chunlei Wu BD2K 201601 MyGene.info and MyVariant.infoChunlei Wu
 
2016 03 25_group_meeting MyVariant.info
2016 03 25_group_meeting MyVariant.info2016 03 25_group_meeting MyVariant.info
2016 03 25_group_meeting MyVariant.infoJiwen Xin
 
Chunlei wu heart_bd2k_201602_ebi
Chunlei wu heart_bd2k_201602_ebiChunlei wu heart_bd2k_201602_ebi
Chunlei wu heart_bd2k_201602_ebiChunlei Wu
 
MyGene.info learn-more
MyGene.info learn-moreMyGene.info learn-more
MyGene.info learn-moreanewgene
 
MyGene.info talk at ISMB/BOSC 2013
MyGene.info talk at ISMB/BOSC 2013MyGene.info talk at ISMB/BOSC 2013
MyGene.info talk at ISMB/BOSC 2013anewgene
 

Viewers also liked (7)

20120717 ismb2012
20120717 ismb201220120717 ismb2012
20120717 ismb2012
 
Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info
Chunlei Wu BD2K 201601 MyGene.info and MyVariant.infoChunlei Wu BD2K 201601 MyGene.info and MyVariant.info
Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info
 
2016 03 25_group_meeting MyVariant.info
2016 03 25_group_meeting MyVariant.info2016 03 25_group_meeting MyVariant.info
2016 03 25_group_meeting MyVariant.info
 
Chunlei wu heart_bd2k_201602_ebi
Chunlei wu heart_bd2k_201602_ebiChunlei wu heart_bd2k_201602_ebi
Chunlei wu heart_bd2k_201602_ebi
 
MyGene.info learn-more
MyGene.info learn-moreMyGene.info learn-more
MyGene.info learn-more
 
MyGene.info talk at ISMB/BOSC 2013
MyGene.info talk at ISMB/BOSC 2013MyGene.info talk at ISMB/BOSC 2013
MyGene.info talk at ISMB/BOSC 2013
 
F01-Cloud-Mygene.info
F01-Cloud-Mygene.infoF01-Cloud-Mygene.info
F01-Cloud-Mygene.info
 

Similar to High-performance web services for gene and variant annotations

Biothings APIs: high-performance bioentity-centric web services
Biothings APIs: high-performance bioentity-centric web servicesBiothings APIs: high-performance bioentity-centric web services
Biothings APIs: high-performance bioentity-centric web servicesChunlei Wu
 
Workshop finding and accessing data - fiona nadia charlotte - cambridge apr...
Workshop   finding and accessing data - fiona nadia charlotte - cambridge apr...Workshop   finding and accessing data - fiona nadia charlotte - cambridge apr...
Workshop finding and accessing data - fiona nadia charlotte - cambridge apr...Fiona Nielsen
 
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeChunlei Wu
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingGigaScience, BGI Hong Kong
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchEuropean Bioinformatics Institute
 
Funding data for research
Funding data for researchFunding data for research
Funding data for researchCrossref
 
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceScott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceGigaScience, BGI Hong Kong
 
Ruby on bioinformatics
Ruby on bioinformaticsRuby on bioinformatics
Ruby on bioinformaticsTse-Ching Ho
 
(Bio)Hackathons
(Bio)Hackathons(Bio)Hackathons
(Bio)HackathonsBenjamin Good
 
2014-06-03-Taverna-IS-ENES2
2014-06-03-Taverna-IS-ENES22014-06-03-Taverna-IS-ENES2
2014-06-03-Taverna-IS-ENES2myGrid team
 
NCBO haendel talk 2013
NCBO haendel talk 2013NCBO haendel talk 2013
NCBO haendel talk 2013mhaendel
 
Nowomics at Cambridge Open Research
Nowomics at Cambridge Open ResearchNowomics at Cambridge Open Research
Nowomics at Cambridge Open ResearchNowomics
 
2013 CrossRef Annual Meeting Strategic Update Geoffrey Bilder
2013 CrossRef Annual Meeting Strategic Update Geoffrey Bilder2013 CrossRef Annual Meeting Strategic Update Geoffrey Bilder
2013 CrossRef Annual Meeting Strategic Update Geoffrey BilderCrossref
 
Opportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationOpportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationBenjamin Good
 
Ramil Mauleon: Galaxy: bioinformatics for rice scientists
Ramil Mauleon: Galaxy: bioinformatics for rice scientistsRamil Mauleon: Galaxy: bioinformatics for rice scientists
Ramil Mauleon: Galaxy: bioinformatics for rice scientistsGigaScience, BGI Hong Kong
 
Finding and accessing human genome data with Repositive
Finding and accessing human genome data with RepositiveFinding and accessing human genome data with Repositive
Finding and accessing human genome data with RepositiveManuel Corpas
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"GigaScience, BGI Hong Kong
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...GigaScience, BGI Hong Kong
 
Workshop finding and accessing data - fiona - lunteren april 18 2016
Workshop   finding and accessing data - fiona - lunteren april 18 2016Workshop   finding and accessing data - fiona - lunteren april 18 2016
Workshop finding and accessing data - fiona - lunteren april 18 2016Fiona Nielsen
 

Similar to High-performance web services for gene and variant annotations (20)

Biothings APIs: high-performance bioentity-centric web services
Biothings APIs: high-performance bioentity-centric web servicesBiothings APIs: high-performance bioentity-centric web services
Biothings APIs: high-performance bioentity-centric web services
 
Workshop finding and accessing data - fiona nadia charlotte - cambridge apr...
Workshop   finding and accessing data - fiona nadia charlotte - cambridge apr...Workshop   finding and accessing data - fiona nadia charlotte - cambridge apr...
Workshop finding and accessing data - fiona nadia charlotte - cambridge apr...
 
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
 
Funding data for research
Funding data for researchFunding data for research
Funding data for research
 
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceScott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
 
Ruby on bioinformatics
Ruby on bioinformaticsRuby on bioinformatics
Ruby on bioinformatics
 
(Bio)Hackathons
(Bio)Hackathons(Bio)Hackathons
(Bio)Hackathons
 
2014-06-03-Taverna-IS-ENES2
2014-06-03-Taverna-IS-ENES22014-06-03-Taverna-IS-ENES2
2014-06-03-Taverna-IS-ENES2
 
NCBO haendel talk 2013
NCBO haendel talk 2013NCBO haendel talk 2013
NCBO haendel talk 2013
 
Nowomics at Cambridge Open Research
Nowomics at Cambridge Open ResearchNowomics at Cambridge Open Research
Nowomics at Cambridge Open Research
 
2013 CrossRef Annual Meeting Strategic Update Geoffrey Bilder
2013 CrossRef Annual Meeting Strategic Update Geoffrey Bilder2013 CrossRef Annual Meeting Strategic Update Geoffrey Bilder
2013 CrossRef Annual Meeting Strategic Update Geoffrey Bilder
 
Opportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationOpportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocuration
 
Ramil Mauleon: Galaxy: bioinformatics for rice scientists
Ramil Mauleon: Galaxy: bioinformatics for rice scientistsRamil Mauleon: Galaxy: bioinformatics for rice scientists
Ramil Mauleon: Galaxy: bioinformatics for rice scientists
 
Finding and accessing human genome data with Repositive
Finding and accessing human genome data with RepositiveFinding and accessing human genome data with Repositive
Finding and accessing human genome data with Repositive
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
Workshop finding and accessing data - fiona - lunteren april 18 2016
Workshop   finding and accessing data - fiona - lunteren april 18 2016Workshop   finding and accessing data - fiona - lunteren april 18 2016
Workshop finding and accessing data - fiona - lunteren april 18 2016
 

Recently uploaded

Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...SĂŠrgio Sacani
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...SĂŠrgio Sacani
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSĂŠrgio Sacani
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 

Recently uploaded (20)

Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 

High-performance web services for gene and variant annotations

  • 1. Chunlei Wu, Ph.D. cwu@scripps.edu @chunleiwu Associate Professor of Molecular Medicine Dept. of Molecular Experimental Medicine The Scripps Research Institute La Jolla, CA, USA 07/2016 High-performance web services for gene and variant annotations MyVariant.infoMyGene.info
  • 2. Biological knowledge is a complex network No one-fit-all database can capture the entire knowledge space
  • 3. Typical database representations { _id: 1017, name: CDK2, taxid: 9606 } Relational database Document database RDF triplestore Tables JSON objects Triples Key-value store Key-value pairs
  • 4. BioThings APIs are built on document databases Why we picked document databases: • Object representation • Rich data structures, handles heterogeneous data very well • Atomic operations, built for big-data scale
  • 5. Gene and Variant annotations represented in JSON documents { "_id": "chr1:g.196659237C>T", "cosmic": { "chrom": "1", "hg19": { "start": 196659237, "end": 196659237 }, "ref": "C", "alt": "T", "tumor_site": "breast", "mut_freq": 0.49, "mut_nt": "C>T", "cosmic_id": "COSM424915" } { “_id”: “1017”, “Symbol”: “CDK2”, “Ensembl”: “ENSG00000123374”, “RefSeq”: [ “NM_001798”, “NM_052827” ], “Reporter”: { “U95A”: [ “1792_g_at”, “1833_at” ], “U133A”:[ “211804_s_at”, “2045252_at”, “211803_at” ] } }
  • 6. Keep data always up-to-date Each data source is updated individually. Colors indicate their different updating schedules. Schematic view of MyVariant.info architecture
  • 7. High-performance web service APIs Schematic view of MyVariant.info architecture
  • 9. We focus on building APIs. Try to …
  • 10. Make it really easy to use Just two endpoints No registration/sign-in No API key
  • 11. Developer-friendly Python/R clients (also js client for myvariant) search “mygene” and “myvariant” in PyPI and Bioconductor JSONP CORS https msgpack http compression http caching JSON-LD Supported!
  • 12. Aggregate Everything about gene and variant MyVariant.infoMyGene.info Support >15M genes for ~17K species ~ 200 annotation fields Support > 334 M variants ~ 500 annotation fields from 14 sources: ClinVar dbNSFP dbSNP …
  • 13. Keep up-to-date MyVariant.infoMyGene.info Weekly ~Monthly Support >15M genes for ~17K species ~ 200 annotation fields Support > 334 M variants ~ 500 annotation fields from 14 sources: ClinVar dbNSFP dbSNP …
  • 14. High-performance and scalable >95% queries response < 30ms
  • 15. High-performance and scalable “Stress test” suggests support for >5,000 concurrent users for ~10,000 requests per minute
  • 17. High availability 99.999% over last year MyVariant.infoMyGene.info 99.87% over last 6 months Availability tracked by
  • 18. Who is using MinePath.org Gene Wiki JBrowse Live applications:
  • 19. Who is using Many users use them in their daily analysis pipelines or simply caching annotations locally
  • 20. MyGene.info recent usage stats requests unique IPs Jan-16 3,885,192 2,498 Feb-16 5,313,950 2,786 Mar-16 3,362,354 3,121 Apr-16 10,918,104 3,065 May-16 10,776,858 3,803 Jun-16 6,396,148 3,940 39% direct calls 38% mygene.py 14% mygene.R 9% BioGPS Over 40M requests In six months
  • 21. MyVariant.info recent usage stats requests unique IPs Jan-16 83,519 1,330 Feb-16 3,054,191 1,192 Mar-16 272,424 1,771 Apr-16 701,526 1,500 May-16 89,642 1,891 Jun-16 213,767 1,924 21% direct calls 23% myvariant.py 50% myvariant.R 6% myvariant.js ~4.5M requests In six months
  • 22. Generalized BioThings SDK BioThings SDK MyVariant.info MyGene.info JSON data aggregation mechanism High- performance query engine Well-designed REST API pattern JSON-LD enabled Linked Data Data-updating scheduler Python/R clients …
  • 23. BioThings SDK A tutorial here (more docs are coming): http://biothingsapi.readthedocs.io/en/latest/
  • 25. BioThings API for species/taxonomy { "_id": "9606", "_version": 1, "authority": [ "homo sapiens linnaeus, 1758" ], "children": [ 63221, 741158], "common_name": "man", "genbank_common_name": "human", "has_gene": true, "lineage": [ 9606, 9605, 207598, …,131567, 1], "parent_taxid": 9605, "rank": "species", "scientific_name": "homo sapiens", "taxid": 9606, "uniprot_name": "homo sapiens" } http://s.biothings.io/v1/species/9606?include_children=true
  • 26. BioThings API for species/taxonomy { "hits": [ { "_id": "1239", "_score": 10.971453, "common_name": […], "genbank_common_name": "gram-positive bacteria", "has_gene": false, "lineage": [1239, 1783272, 2, 131567, 1], "parent_taxid": 1783272, "rank": "phylum", "scientific_name": "firmicutes", "taxid": 1239, "uniprot_name": "firmicutes" } ], "max_score": 10.971453, "took": 12, "total": 1 } http://s.biothings.io/v1/query?q=rank:phylum AND common_name:gram-positive
  • 27. Species API used in MyGene.info You can now query for genes beyond species: Q: Give me all lytic enzymes for any firmicutes http://mygene.info/v3/query?q=lytic enzyme&species=1239&include_tax_tree=true http://mygene.info/v3/query?q=lytic enzyme&species=1239 0 hits 5 hits
  • 28. Very minimal code for building a species API
  • 29. Have the flexibility to customize your query
  • 30. v.biothings.io g.biothings.io BioThings SDK s.biothings.io c.biothings.io gene variant species/ taxonomy drugs/ compounds ∙ ∙ ∙ ∙ ∙ ∙ alias to MyGene.info alias to MyVariant.info diseased.biothings.io
  • 31. BioThings APIs A collection of data APIs A framework for building new APIs Data as a service Software as a service Got a new type of “BioThings”? We can help you to build or even host your biothings API
  • 32. BioThings TEAM Funding and Support U01HG008473 U54GM114833 TSRI: Chunlei Wu Andrew Su Jiwen Xin Cyrus Afrasiabi Sebastien Lelong Ginger Tsueng Julee Adesara Mike Mayers U. Washington: Sean Mooney Moritz Juchler Nikhil Gopal
  • 33. Source code • MyGene.info https://github.com/sulab/mygene.info • MyVariant.info https://github.com/sulab/myvariant.info • BioThings API for species/taxonomy https://github.com/sulab/biothings.species • BioThings SDK https://github.com/sulab/biothings.api
  • 34. DEMO time! by Jiwen (Kevin) Xin
  • 35. 2441 2308 1917 18 9 5 Initial number of genes mutated in all four patients: filter2 <- lapply(filter1, function(i) subset(i, cadd.consequence %in% c("NON_SYNONYMOUS", "STOP_GAINED", "STOP_LOST", "CANONICAL_SPLICE", "SPLICE_SITE"))) nVars <- countGenes(vars) filter1 <- lapply(vars, function(i) subset(i, DP > 8 & FS < 30 & QD > 2)) Filtering for sequencing coverage and strand bias: Filtering for nonsynonymous and splice site variants: filter3 <- lapply(filter2, function(i) subset(i, exac.af < 0.01)) Filtering for rare variants based on allele frequencies from ExAC: filter4 <- lapply(filter3, function(i) subset(i, sapply(dbnsfp.1000gp1.af, function(j) j < 0.01 ))) Filtering for rare variants based on allele frequencies from 1000 Genomes Project: goBP <- data.frame(queryMany(top.genes$Var1, scopes="symbol", species="human", fields=c("go.BP", "name", "MIM", "uniprot"))) # The Bioconductor package go.DB is used to find all genes with a GO biological process annotation that # is a descendant of GO:0008152 - the GO id for metabolic process. miller.bp <- lapply(goBP$go.BP, function(i) unlist(i$id)) bp.ancestor <- lapply(miller.bp, function(i) sapply(i, function(j) "GO:0008152" %in% unlist(GOBPANCESTOR[[j]]))) candidate.genes <- top.genes$Var1[sapply(bp.ancestor, function(i) TRUE %in% i)] Filtering by GO biological process annotation using MyGene.info: Number of genes Filtering steps to prioritize candidate genes:
  • 36. Demos in Jupyter notebooks • Using myvariant and mygene in R for variant prioritization http://nbviewer.jupyter.org/github/SuLab/myvariant.info/blob/master/d ocs/ipynb/myvariant_R_miller.ipynb • Access ClinVar data from myvariant in Python http://nbviewer.jupyter.org/github/SuLab/myvariant.info/blob/master/d ocs/ipynb/myvariant_clinvar_demo.ipynb • ID mapping using mygene module in Python http://nbviewer.jupyter.org/gist/newgene/6771106

Editor's Notes

  1. A high-performance query engine for aggregated variant annotations.