SlideShare a Scribd company logo
1 of 35
Download to read offline
Chunlei Wu, Ph.D.
cwu@scripps.edu
@chunleiwu
Associate Professor of Molecular Medicine
Dept. of Molecular Experimental Medicine
The Scripps Research Institute
La Jolla, CA, USA
01/22/2016
From MyGene.info and MyVariant.info towards BioThings API
As a
MyGene.info and MyVariant.info recap
Annotations
Gene
Variant
(Aggregated)
(high-performance)
(real-time) Web Service
So many variant annotation resources
dbNSFP
The Exome Aggregation
Consortium (ExAC)
Annotations centered around bio-entities
Gene
G
Variant
V
Pathway
P
D
Metabolite
M
Disease
Simple JSON-based Aggregation mechanism
{
"_id": "chr1:g.196659237C>T",
"cadd": { … },
"clinvar": { … },
"cosmic": { … },
"dbsnp": { … },
"dbnsfp": { … },
"evs": { … },
"emv": { … },
"mutdb": { … },
"gwassnp": { … },
"snpedia": { … },
"wellderly": { … }
}
{
"_id": "chr1:g.196659237C>T",
“dbsnp": {
"snpclass": "single",
"rsid": "rs1061170",
"func": "missense"
}
}
{
"_id": "chr1:g.196659237C>T",
“cosmic": {
"tumor_site": "breast",
"mut_freq": 0.49,
}
}
{
"_id": "chr1:g.196659237C>T",
“dbnsfp": {
“sift": {
"breast“: “tolerated”,
“val”: 1
}
}
}
“cadd” “clinvar” “evs” “mutdb”
…
Keep data always up-to-date
Each data source is updated individually. Colors
indicate their different updating schedules.
Schematic view of MyVariant.info architecture
High-performance web service APIs
Schematic view of MyVariant.info architecture
MyVariant.info for the end users:
http://MyVariant.info
(currently v1 API, two endpoints)
http://MyVariant.info/v1/query?q=<query>
any query term(s)
matching variant hits
http://MyVariant.info/v1/variant/<variantid>
hgvs id(s)
matching variant object(s)
Both supports batch-mode via POST
Simple API. No sign-up. No API key.
Try our live API , and documentations
MyGene.info for the end users:
http://MyGene.info
(currently v2 API, two endpoints)
http://MyGene.info/v2/query?q=<query>
any query term(s)
matching gene hits
http://MyGene.info/v2/gene/<geneid>
gene id(s)
matching gene object(s)
Both supports batch-mode via POST
Simple API. No sign-up. No API key.
Try our live API , and documentations
MyGene.info usage updates
last
year
this
year
2M
3MMonthly hits in Millions
Usage spikes (5M hits/day) during X-Mas 2014
30%9%
35%
26%
Increased clients adoption
Requests by MyGene.info clients
Highlights:
• mygene Python client usage now surpasses BioGPS usage
• mygene R client usage now increased to 9% from <1%
10/07/2015-01/05/2016
30%9%
35%
26%
Increased clients adoption
mygene Python client hosted in PyPI
mygene R client hosted in Bioconductor
MyVariant.info updates
Total over 334 Millions of annotated variants
The Exome Aggregation Consortium (ExAC)
New additions:
dbNSFP
Updated:
MyVariant.info updates
30%
68%
2%
10/07/2015-01/05/2016
1 Million requests in 3 months
MyVariant.info official Python/R Clients
myvariant Python client hosted in PyPI
(initial release in Aug 2015)
myvariant R client hosted in Bioconductor
(initial release in Oct 2015)
A Node.js client made by a user with passion
Next?
MyVariant.info
MyGene.info
Make our APIs serve Linked Data
via
Why Linked Data?
Gene
G
Variant
V
Pathway
P
D
Metabolite
M
Disease
Linked Data for data aggregation
MyVariant.info
V
Another Variant API
V
V
Linked Data for data aggregation
MyVariant.info Another Variant API
{
"_id": "chr1:g.196659237C>T",
“cosmic": {
"tumor_site": "breast",
"mut_freq": 0.49,
},
"clinvar": {…},
"dbsnp": {…},
…
}
{
"pop": "GWD",
"nobs": 226,
"freq": 0.371681415929,
…
}
{
"_id": "chr1:g.196659237C>T",
“cosmic": {
"tumor_site": "breast",
"mut_freq": 0.49,
},
"clinvar": {…},
"dbsnp": {…},
"new_src": {
"pop": "GWD",
"nobs": 226,
"freq": 0.371681415929
},
…
}
JSON + context = JSON-LD
{
"@context": {
"clinvar": "http://schema.myvariant.info/datasource/clinvar",
"rcv": "http://schema.myvariant.info/datanode/rcv",
"gene": "http://schema.myvariant.info/datanode/gene",
"_id": "@id"
},
"_id": "chr6:g.26093141G>A",
"clinvar": {
"@context": {
"uniprot": "http://identifiers.org/uniprot/",
"omim": "http://identifiers.org/omim/"
},
"chrom": "6",
"alt": "A",
"ref": "G",
"allele_id": 15048,
"rsid": "rs1800562",
"rcv": {
"@context": {
"accession": "http://identifer.org/clinvar"
},
"accession": "RCV000000020",
"origin": "germline",
"clinical_significance": "risk factor"
},
"gene": {
"@context": {
"symbol": "http://identifiers.org/hgnc.symbol/"
},
"id": "3077",
"symbol": "HFE"
},
"omim": "613609.0001",
"variant_id": 9
}
}
Processed JSON-LD
<chr6:g.26093141G>A> <http://schema.myvariant.info/datasource/clinvar> _:b0 .
_:b0 <http://identifiers.org/omim/> "613609.0001" .
_:b0 <http://schema.myvariant.info/datanode/gene> _:b1 .
_:b0 <http://schema.myvariant.info/datanode/rcv> _:b2 .
_:b1 <http://identifiers.org/hgnc.symbol/> "HFE" .
_:b2 <http://identifer.org/clinvar> "RCV000000020" .
JSON-LD N-Quads output:
{
"@id": "chr6:g.26093141G>A",
"http://schema.myvariant.info/datasource/clinvar": {
"http://identifiers.org/omim/": "613609.0001",
"http://schema.myvariant.info/datanode/gene": {
"http://identifiers.org/hgnc.symbol/": "HFE"
},
"http://schema.myvariant.info/datanode/rcv": {
"http://identifer.org/clinvar": "RCV000000020"
}
}
}
JSON-LD compacted output:
In a nut-shell, what JSON-LD context does?
Marks values in a JSON object to defined URIs
"http://identifer.org/clinvar"
→clinvar.rcv.accession
JSON-LD context makes your data
"Linkable"
"Linked"
Downstream
processing libraries
A Python library for processing JSON-LD data
In [1]: fetch_value_source_for_variant("chr6:g.26093141G>A","http://identifiers.org/dbsnp/")
Out[1]:
['rs1800562 http://schema.myvarint.info/datasource/dbnsfp',
'rs1800562 http://schema.myvarint.info/datasource/clinvar',
'rs1800562 http://schema.myvarint.info/datasource/dbsnp',
'rs1800562 http://schema.myvarint.info/datasource/evs',
'rs1800562 http://schema.myvarint.info/datasource/gwassnps',
'rs1800562 http://schema.myvarint.info/datasource/mutdb']
By Kevin Xin
Need to define an API specs
• Output as a JSON object with a defined _id.
• "jsonld=true/false" toggle for the inclusion of JSON-LD
context.
• Support the retrieval of a single entity via GET
(use case: individual data aggregation on the fly)
• Support the retrieval of a list of entities via POST
(use case: routine data aggregation in batches)
• Output should indicate the entity existence:
GET /variant/<unknown_id>  404
POST /variant/ id1, <unknown_id>, id3 
[id1: {…},
<unknown_id>: "notfound",
id3: {…}]
to enable data exchange via JSON-LD
BioThings
API
MyVariant.info
MyGene.info
By Cyrus Afrasiabi
BioThings API
MyVariant.info
MyGene.info
JSON data
aggregation
mechanism
High-
performance
query engine
Well-designed
REST API
pattern
JSON-LD
enabled
Linked Data
Data-updating scheduler
Python/R clients
…
Data-sharing via Web API is trending
Making a single web service is trivial,
but making a sustainable/scalable
web API is non-trivial.
We would like to help other groups to
create their own hosted web API for
sharing their data.
Action item 1: BioThings API whitepaper
Also the action item from last BD2K CA
consortium meeting and the API working
group from last year's NIH BD2K AHM
Action item 2: BioThings API framework
NIH commons
Infrastructure as a Service:
Software as a Service:
BioThings API
Action item 3: expansion to other "BioThings"
D
Disease
D
Drugs
MyDrug.info MyDisease.info
need an alt. name here
Acknowledgement
Funding and Support
U54GM114833
U01HG008473
Washtington U:
Ben Ainscough
Obi Griffith
TSRI:
Andrew Su
Jiwen Xin
Cyrus Afrasiabi
Ginger Tsueng
Adam Mark
Greg Stupp
Tim Putman
STSI:
Eric Topol
Ali Torkamani
Galina Erikson
U. Washington:
Sean Mooney
Moritz Juchler
Nikhil Gopal
OICR:
Robin Haw
UC Berkeley:
Chris Mungall
UCSD:
Trish Whetzel
MyVariant.info MyGene.info

More Related Content

Viewers also liked

The Open Patent Chemistry “Big Bang”: Implications, Opportunities and Caveats
The Open Patent Chemistry “Big Bang”: Implications, Opportunities and CaveatsThe Open Patent Chemistry “Big Bang”: Implications, Opportunities and Caveats
The Open Patent Chemistry “Big Bang”: Implications, Opportunities and CaveatsChris Southan
 
20120717 ismb2012
20120717 ismb201220120717 ismb2012
20120717 ismb2012anewgene
 
2016 03 25_group_meeting MyVariant.info
2016 03 25_group_meeting MyVariant.info2016 03 25_group_meeting MyVariant.info
2016 03 25_group_meeting MyVariant.infoJiwen Xin
 
UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6Andrew Su
 
Open biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceOpen biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceAndrew Su
 
MyGene.info learn-more
MyGene.info learn-moreMyGene.info learn-more
MyGene.info learn-moreanewgene
 
MyGene.info talk at ISMB/BOSC 2013
MyGene.info talk at ISMB/BOSC 2013MyGene.info talk at ISMB/BOSC 2013
MyGene.info talk at ISMB/BOSC 2013anewgene
 
High-performance web services for gene and variant annotations
High-performance web services for gene and variant annotationsHigh-performance web services for gene and variant annotations
High-performance web services for gene and variant annotationsChunlei Wu
 
Chunlei wu heart_bd2k_201602_ebi
Chunlei wu heart_bd2k_201602_ebiChunlei wu heart_bd2k_201602_ebi
Chunlei wu heart_bd2k_201602_ebiChunlei Wu
 

Viewers also liked (10)

The Open Patent Chemistry “Big Bang”: Implications, Opportunities and Caveats
The Open Patent Chemistry “Big Bang”: Implications, Opportunities and CaveatsThe Open Patent Chemistry “Big Bang”: Implications, Opportunities and Caveats
The Open Patent Chemistry “Big Bang”: Implications, Opportunities and Caveats
 
20120717 ismb2012
20120717 ismb201220120717 ismb2012
20120717 ismb2012
 
2016 03 25_group_meeting MyVariant.info
2016 03 25_group_meeting MyVariant.info2016 03 25_group_meeting MyVariant.info
2016 03 25_group_meeting MyVariant.info
 
UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6
 
Open biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceOpen biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen science
 
MyGene.info learn-more
MyGene.info learn-moreMyGene.info learn-more
MyGene.info learn-more
 
MyGene.info talk at ISMB/BOSC 2013
MyGene.info talk at ISMB/BOSC 2013MyGene.info talk at ISMB/BOSC 2013
MyGene.info talk at ISMB/BOSC 2013
 
F01-Cloud-Mygene.info
F01-Cloud-Mygene.infoF01-Cloud-Mygene.info
F01-Cloud-Mygene.info
 
High-performance web services for gene and variant annotations
High-performance web services for gene and variant annotationsHigh-performance web services for gene and variant annotations
High-performance web services for gene and variant annotations
 
Chunlei wu heart_bd2k_201602_ebi
Chunlei wu heart_bd2k_201602_ebiChunlei wu heart_bd2k_201602_ebi
Chunlei wu heart_bd2k_201602_ebi
 

Similar to Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeChunlei Wu
 
MyVariant.info: Variant Annotation as a Service
MyVariant.info: Variant Annotation as a ServiceMyVariant.info: Variant Annotation as a Service
MyVariant.info: Variant Annotation as a ServiceChunlei Wu
 
Biothings APIs: high-performance bioentity-centric web services
Biothings APIs: high-performance bioentity-centric web servicesBiothings APIs: high-performance bioentity-centric web services
Biothings APIs: high-performance bioentity-centric web servicesChunlei Wu
 
BioIT Europe 2010 - BioCatalogue
BioIT Europe 2010 - BioCatalogueBioIT Europe 2010 - BioCatalogue
BioIT Europe 2010 - BioCatalogueBioCatalogue
 
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeChunlei Wu
 
BioThings SDK: a toolkit for building high-performance data APIs in biology
BioThings SDK: a toolkit for building high-performance data APIs in biologyBioThings SDK: a toolkit for building high-performance data APIs in biology
BioThings SDK: a toolkit for building high-performance data APIs in biologyChunlei Wu
 
BioThings and SmartAPI: building an ecosystem of interoperable biological kno...
BioThings and SmartAPI: building an ecosystem of interoperable biological kno...BioThings and SmartAPI: building an ecosystem of interoperable biological kno...
BioThings and SmartAPI: building an ecosystem of interoperable biological kno...Chunlei Wu
 
Arabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, IntroductionArabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, IntroductionJasonRafeMiller
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookKeiichiro Ono
 
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...Amazon Web Services
 
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...University of California, San Diego
 
Fostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataFostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataMuhammad Saleem
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchDavid Ruau
 
HyQue: Evaluating scientific Hypotheses using semantic web technologies
HyQue: Evaluating scientific Hypotheses using semantic web technologiesHyQue: Evaluating scientific Hypotheses using semantic web technologies
HyQue: Evaluating scientific Hypotheses using semantic web technologiesMichel Dumontier
 
Mining SQL Injection and Cross Site Scripting Vulnerabilities using Hybrid Pr...
Mining SQL Injection and Cross Site Scripting Vulnerabilities using Hybrid Pr...Mining SQL Injection and Cross Site Scripting Vulnerabilities using Hybrid Pr...
Mining SQL Injection and Cross Site Scripting Vulnerabilities using Hybrid Pr...Lionel Briand
 
B Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysisB Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysisJan Aerts
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Sanjay Padhi, Ph.D
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Paolo Missier
 

Similar to Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info (20)

BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
 
MyVariant.info: Variant Annotation as a Service
MyVariant.info: Variant Annotation as a ServiceMyVariant.info: Variant Annotation as a Service
MyVariant.info: Variant Annotation as a Service
 
Biothings APIs: high-performance bioentity-centric web services
Biothings APIs: high-performance bioentity-centric web servicesBiothings APIs: high-performance bioentity-centric web services
Biothings APIs: high-performance bioentity-centric web services
 
BioIT Europe 2010 - BioCatalogue
BioIT Europe 2010 - BioCatalogueBioIT Europe 2010 - BioCatalogue
BioIT Europe 2010 - BioCatalogue
 
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
 
BioThings SDK: a toolkit for building high-performance data APIs in biology
BioThings SDK: a toolkit for building high-performance data APIs in biologyBioThings SDK: a toolkit for building high-performance data APIs in biology
BioThings SDK: a toolkit for building high-performance data APIs in biology
 
BioThings and SmartAPI: building an ecosystem of interoperable biological kno...
BioThings and SmartAPI: building an ecosystem of interoperable biological kno...BioThings and SmartAPI: building an ecosystem of interoperable biological kno...
BioThings and SmartAPI: building an ecosystem of interoperable biological kno...
 
Arabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, IntroductionArabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, Introduction
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter Notebook
 
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
 
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
 
Fostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataFostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked Data
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
 
HyQue: Evaluating scientific Hypotheses using semantic web technologies
HyQue: Evaluating scientific Hypotheses using semantic web technologiesHyQue: Evaluating scientific Hypotheses using semantic web technologies
HyQue: Evaluating scientific Hypotheses using semantic web technologies
 
Mining SQL Injection and Cross Site Scripting Vulnerabilities using Hybrid Pr...
Mining SQL Injection and Cross Site Scripting Vulnerabilities using Hybrid Pr...Mining SQL Injection and Cross Site Scripting Vulnerabilities using Hybrid Pr...
Mining SQL Injection and Cross Site Scripting Vulnerabilities using Hybrid Pr...
 
Variant Query Tool
Variant Query ToolVariant Query Tool
Variant Query Tool
 
B Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysisB Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysis
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021
 
Harvester I
Harvester IHarvester I
Harvester I
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
 

Recently uploaded

6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPRPirithiRaju
 
Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxzeus70441
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learningvschiavoni
 
AICTE activity on Water Conservation spreading awareness
AICTE activity on Water Conservation spreading awarenessAICTE activity on Water Conservation spreading awareness
AICTE activity on Water Conservation spreading awareness1hk20is002
 
Introduction of Organ-On-A-Chip - Creative Biolabs
Introduction of Organ-On-A-Chip - Creative BiolabsIntroduction of Organ-On-A-Chip - Creative Biolabs
Introduction of Organ-On-A-Chip - Creative BiolabsCreative-Biolabs
 
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsTimeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsDanielBaumann11
 
Loudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxLoudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxpriyankatabhane
 
Role of Gibberellins, mode of action and external applications.pptx
Role of Gibberellins, mode of action and external applications.pptxRole of Gibberellins, mode of action and external applications.pptx
Role of Gibberellins, mode of action and external applications.pptxjana861314
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxpriyankatabhane
 
Environment modelling and its environmental aspects
Environment modelling and its environmental aspectsEnvironment modelling and its environmental aspects
Environment modelling and its environmental aspectsMansi Rastogi
 
BACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
BACTERIAL SECRETION SYSTEM by Dr. Chayanika DasBACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
BACTERIAL SECRETION SYSTEM by Dr. Chayanika DasChayanika Das
 
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Christina Parmionova
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxGiDMOh
 
Environmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxEnvironmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxpriyankatabhane
 
Food_safety_Management_pptx.pptx in microbiology
Food_safety_Management_pptx.pptx in microbiologyFood_safety_Management_pptx.pptx in microbiology
Food_safety_Management_pptx.pptx in microbiologyHemantThakare8
 
Think Science: What Are Eclipses (101), by Craig Bobchin
Think Science: What Are Eclipses (101), by Craig BobchinThink Science: What Are Eclipses (101), by Craig Bobchin
Think Science: What Are Eclipses (101), by Craig BobchinNathan Cone
 
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer ZahanaEGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer ZahanaDr.Mahmoud Abbas
 
Unit-V-Introduction to Data Mining.pptx
Unit-V-Introduction to  Data Mining.pptxUnit-V-Introduction to  Data Mining.pptx
Unit-V-Introduction to Data Mining.pptxHarsha Patel
 
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsTotal Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsMarkus Roggen
 

Recently uploaded (20)

6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
 
Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptx
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
 
AICTE activity on Water Conservation spreading awareness
AICTE activity on Water Conservation spreading awarenessAICTE activity on Water Conservation spreading awareness
AICTE activity on Water Conservation spreading awareness
 
Introduction of Organ-On-A-Chip - Creative Biolabs
Introduction of Organ-On-A-Chip - Creative BiolabsIntroduction of Organ-On-A-Chip - Creative Biolabs
Introduction of Organ-On-A-Chip - Creative Biolabs
 
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsTimeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
 
Loudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxLoudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptx
 
Role of Gibberellins, mode of action and external applications.pptx
Role of Gibberellins, mode of action and external applications.pptxRole of Gibberellins, mode of action and external applications.pptx
Role of Gibberellins, mode of action and external applications.pptx
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
 
Environment modelling and its environmental aspects
Environment modelling and its environmental aspectsEnvironment modelling and its environmental aspects
Environment modelling and its environmental aspects
 
BACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
BACTERIAL SECRETION SYSTEM by Dr. Chayanika DasBACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
BACTERIAL SECRETION SYSTEM by Dr. Chayanika Das
 
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptx
 
Environmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxEnvironmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptx
 
Food_safety_Management_pptx.pptx in microbiology
Food_safety_Management_pptx.pptx in microbiologyFood_safety_Management_pptx.pptx in microbiology
Food_safety_Management_pptx.pptx in microbiology
 
Think Science: What Are Eclipses (101), by Craig Bobchin
Think Science: What Are Eclipses (101), by Craig BobchinThink Science: What Are Eclipses (101), by Craig Bobchin
Think Science: What Are Eclipses (101), by Craig Bobchin
 
Introduction Classification Of Alkaloids
Introduction Classification Of AlkaloidsIntroduction Classification Of Alkaloids
Introduction Classification Of Alkaloids
 
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer ZahanaEGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
 
Unit-V-Introduction to Data Mining.pptx
Unit-V-Introduction to  Data Mining.pptxUnit-V-Introduction to  Data Mining.pptx
Unit-V-Introduction to Data Mining.pptx
 
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsTotal Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
 

Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

  • 1. Chunlei Wu, Ph.D. cwu@scripps.edu @chunleiwu Associate Professor of Molecular Medicine Dept. of Molecular Experimental Medicine The Scripps Research Institute La Jolla, CA, USA 01/22/2016 From MyGene.info and MyVariant.info towards BioThings API
  • 2. As a MyGene.info and MyVariant.info recap Annotations Gene Variant (Aggregated) (high-performance) (real-time) Web Service
  • 3. So many variant annotation resources dbNSFP The Exome Aggregation Consortium (ExAC)
  • 4. Annotations centered around bio-entities Gene G Variant V Pathway P D Metabolite M Disease
  • 5. Simple JSON-based Aggregation mechanism { "_id": "chr1:g.196659237C>T", "cadd": { … }, "clinvar": { … }, "cosmic": { … }, "dbsnp": { … }, "dbnsfp": { … }, "evs": { … }, "emv": { … }, "mutdb": { … }, "gwassnp": { … }, "snpedia": { … }, "wellderly": { … } } { "_id": "chr1:g.196659237C>T", “dbsnp": { "snpclass": "single", "rsid": "rs1061170", "func": "missense" } } { "_id": "chr1:g.196659237C>T", “cosmic": { "tumor_site": "breast", "mut_freq": 0.49, } } { "_id": "chr1:g.196659237C>T", “dbnsfp": { “sift": { "breast“: “tolerated”, “val”: 1 } } } “cadd” “clinvar” “evs” “mutdb” …
  • 6. Keep data always up-to-date Each data source is updated individually. Colors indicate their different updating schedules. Schematic view of MyVariant.info architecture
  • 7. High-performance web service APIs Schematic view of MyVariant.info architecture
  • 8. MyVariant.info for the end users: http://MyVariant.info (currently v1 API, two endpoints) http://MyVariant.info/v1/query?q=<query> any query term(s) matching variant hits http://MyVariant.info/v1/variant/<variantid> hgvs id(s) matching variant object(s) Both supports batch-mode via POST Simple API. No sign-up. No API key. Try our live API , and documentations
  • 9. MyGene.info for the end users: http://MyGene.info (currently v2 API, two endpoints) http://MyGene.info/v2/query?q=<query> any query term(s) matching gene hits http://MyGene.info/v2/gene/<geneid> gene id(s) matching gene object(s) Both supports batch-mode via POST Simple API. No sign-up. No API key. Try our live API , and documentations
  • 11. Usage spikes (5M hits/day) during X-Mas 2014
  • 12. 30%9% 35% 26% Increased clients adoption Requests by MyGene.info clients Highlights: • mygene Python client usage now surpasses BioGPS usage • mygene R client usage now increased to 9% from <1% 10/07/2015-01/05/2016
  • 13. 30%9% 35% 26% Increased clients adoption mygene Python client hosted in PyPI mygene R client hosted in Bioconductor
  • 14. MyVariant.info updates Total over 334 Millions of annotated variants The Exome Aggregation Consortium (ExAC) New additions: dbNSFP Updated:
  • 16. MyVariant.info official Python/R Clients myvariant Python client hosted in PyPI (initial release in Aug 2015) myvariant R client hosted in Bioconductor (initial release in Oct 2015)
  • 17. A Node.js client made by a user with passion
  • 19. Make our APIs serve Linked Data via
  • 21. Linked Data for data aggregation MyVariant.info V Another Variant API V V
  • 22. Linked Data for data aggregation MyVariant.info Another Variant API { "_id": "chr1:g.196659237C>T", “cosmic": { "tumor_site": "breast", "mut_freq": 0.49, }, "clinvar": {…}, "dbsnp": {…}, … } { "pop": "GWD", "nobs": 226, "freq": 0.371681415929, … } { "_id": "chr1:g.196659237C>T", “cosmic": { "tumor_site": "breast", "mut_freq": 0.49, }, "clinvar": {…}, "dbsnp": {…}, "new_src": { "pop": "GWD", "nobs": 226, "freq": 0.371681415929 }, … }
  • 23. JSON + context = JSON-LD { "@context": { "clinvar": "http://schema.myvariant.info/datasource/clinvar", "rcv": "http://schema.myvariant.info/datanode/rcv", "gene": "http://schema.myvariant.info/datanode/gene", "_id": "@id" }, "_id": "chr6:g.26093141G>A", "clinvar": { "@context": { "uniprot": "http://identifiers.org/uniprot/", "omim": "http://identifiers.org/omim/" }, "chrom": "6", "alt": "A", "ref": "G", "allele_id": 15048, "rsid": "rs1800562", "rcv": { "@context": { "accession": "http://identifer.org/clinvar" }, "accession": "RCV000000020", "origin": "germline", "clinical_significance": "risk factor" }, "gene": { "@context": { "symbol": "http://identifiers.org/hgnc.symbol/" }, "id": "3077", "symbol": "HFE" }, "omim": "613609.0001", "variant_id": 9 } }
  • 24. Processed JSON-LD <chr6:g.26093141G>A> <http://schema.myvariant.info/datasource/clinvar> _:b0 . _:b0 <http://identifiers.org/omim/> "613609.0001" . _:b0 <http://schema.myvariant.info/datanode/gene> _:b1 . _:b0 <http://schema.myvariant.info/datanode/rcv> _:b2 . _:b1 <http://identifiers.org/hgnc.symbol/> "HFE" . _:b2 <http://identifer.org/clinvar> "RCV000000020" . JSON-LD N-Quads output: { "@id": "chr6:g.26093141G>A", "http://schema.myvariant.info/datasource/clinvar": { "http://identifiers.org/omim/": "613609.0001", "http://schema.myvariant.info/datanode/gene": { "http://identifiers.org/hgnc.symbol/": "HFE" }, "http://schema.myvariant.info/datanode/rcv": { "http://identifer.org/clinvar": "RCV000000020" } } } JSON-LD compacted output:
  • 25. In a nut-shell, what JSON-LD context does? Marks values in a JSON object to defined URIs "http://identifer.org/clinvar" →clinvar.rcv.accession
  • 26. JSON-LD context makes your data "Linkable" "Linked" Downstream processing libraries
  • 27. A Python library for processing JSON-LD data In [1]: fetch_value_source_for_variant("chr6:g.26093141G>A","http://identifiers.org/dbsnp/") Out[1]: ['rs1800562 http://schema.myvarint.info/datasource/dbnsfp', 'rs1800562 http://schema.myvarint.info/datasource/clinvar', 'rs1800562 http://schema.myvarint.info/datasource/dbsnp', 'rs1800562 http://schema.myvarint.info/datasource/evs', 'rs1800562 http://schema.myvarint.info/datasource/gwassnps', 'rs1800562 http://schema.myvarint.info/datasource/mutdb'] By Kevin Xin
  • 28. Need to define an API specs • Output as a JSON object with a defined _id. • "jsonld=true/false" toggle for the inclusion of JSON-LD context. • Support the retrieval of a single entity via GET (use case: individual data aggregation on the fly) • Support the retrieval of a list of entities via POST (use case: routine data aggregation in batches) • Output should indicate the entity existence: GET /variant/<unknown_id>  404 POST /variant/ id1, <unknown_id>, id3  [id1: {…}, <unknown_id>: "notfound", id3: {…}] to enable data exchange via JSON-LD
  • 30. BioThings API MyVariant.info MyGene.info JSON data aggregation mechanism High- performance query engine Well-designed REST API pattern JSON-LD enabled Linked Data Data-updating scheduler Python/R clients …
  • 31. Data-sharing via Web API is trending Making a single web service is trivial, but making a sustainable/scalable web API is non-trivial. We would like to help other groups to create their own hosted web API for sharing their data.
  • 32. Action item 1: BioThings API whitepaper Also the action item from last BD2K CA consortium meeting and the API working group from last year's NIH BD2K AHM
  • 33. Action item 2: BioThings API framework NIH commons Infrastructure as a Service: Software as a Service: BioThings API
  • 34. Action item 3: expansion to other "BioThings" D Disease D Drugs MyDrug.info MyDisease.info need an alt. name here
  • 35. Acknowledgement Funding and Support U54GM114833 U01HG008473 Washtington U: Ben Ainscough Obi Griffith TSRI: Andrew Su Jiwen Xin Cyrus Afrasiabi Ginger Tsueng Adam Mark Greg Stupp Tim Putman STSI: Eric Topol Ali Torkamani Galina Erikson U. Washington: Sean Mooney Moritz Juchler Nikhil Gopal OICR: Robin Haw UC Berkeley: Chris Mungall UCSD: Trish Whetzel MyVariant.info MyGene.info

Editor's Notes

  1. A high-performance query engine for aggregated variant annotations.
  2. A high-performance query engine for aggregated variant annotations.
  3. Annotation data are fundamental Gene anno: no need a slide to explain, everyone need them Var anno: relatively new, more and more trending due to the booming of NGS