Your SlideShare is downloading. ×
0
MyGene.info
Chunlei Wu, Ph.D.
The Scripps Research Institute
La Jolla, CA, USA
BOSC 2013
July 20, 2013
- Making Elastic Ge...
Being “Elastic“
●
Fast
●
Always ON
●
Up-to-date
●
Scalable
●
Extensible
Common procedure for gene data retrieval
Entrez
Ensembl
UniProt
Internal gene object
Data sources Data parsing
update regu...
Common procedure for gene data retrieval
Entrez
Ensembl
UniProt
Internal gene object
Data sources Data parsing
update regu...
Working model - 1
Entrez
Ensembl
UniProt
Data sources
Merging
Query Engine
User queries
...
Working model - 1
Entrez
Ensembl
UniProt
Data sources
Merging
Query Engine
...
{
“entrezgene“: 1017
}
{
“uniprot“: “P24941...
Gene object in noSQL database
“key” “document”
1017: {
“Symbol”: “CDK2”,
“Ensembl”: “ENSG00000123374”,
“RefSeq”: [
“NM_001...
Syncing from data-hub to query instance
Merging
Entrez
Ensembl
UniProt
Data sources
...
Query Engine
User queries
Public d...
Syncing from data-hub to query instances
Merging
Entrez
Ensembl
UniProt
Data sources
...
Query Engine
User queries
Public ...
Public query instance
http://MyGene.info
(currently v2 API, two endpoints)
http://MyGene.info/v2/query?q=<query>
any query...
Public query instance
●
Support ALL species, from NCBI (>12K species, >13M genes)
●
>40 annotation fields and expanding
●
...
Public query instance
High-performance host (serving ~500K requests/day)
http://MyGene.info
Public query instance
MyGene.py - Python wrapper
https://pypi.python.org/pypi/mygene
Third-party packages
pip install myge...
Public query instance
MyGene.autocomplete
- Gene query autocomplete widget
https://bitbucket.org/sulab/mygene.autocomplete...
Public query instance
MyGene.autocomplete
- Gene query autocomplete widget
https://bitbucket.org/sulab/mygene.autocomplete...
Working model – 2
Entrez
Ensembl
UniProt
Data sources
Merging
Query Engine
User queriesMerging 1
Merging 2
Merging 3
Query...
Syncing from data-hub to query instances
Merging
Entrez
Ensembl
UniProt
Data sources
...
Query Engine
User queries
Public ...
Private query instance
●
Dedicated host
●
Same powerful query interface
●
Third-party packages still work
●
Public data st...
To reach us?
Questions on public query instance
or
interested in setting up your own private query instance?
Please let us...
Code repositories
●
Web front-end
https://bitbucket.org/sulab/mygene.info
Apache 2 licensed
●
Data hub
https://bitbucket.o...
Acknowledgement
Funding and Support
R01GM083924
Sulab
Andrew Su
Benjamin Good
Max Nannis
Salvatore Loguercio
Katie Fisch
T...
Syncing from data-hub to query instances
Merging
Entrez
Ensembl
UniProt
Data sources
...
Query Engine
User queries
Public ...
Private query instance
●
Same powerful query interface
●
Third-party packages still work
●
Public data get sync-ed
●
Allow...
MyGene.info talk at ISMB/BOSC 2013
MyGene.info talk at ISMB/BOSC 2013
Upcoming SlideShare
Loading in...5
×

MyGene.info talk at ISMB/BOSC 2013

370

Published on

MyGene.info: Gene Annotation Query as a Service

talk at ISMB/BOSC 2013

Published in: Science, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
370
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "MyGene.info talk at ISMB/BOSC 2013"

  1. 1. MyGene.info Chunlei Wu, Ph.D. The Scripps Research Institute La Jolla, CA, USA BOSC 2013 July 20, 2013 - Making Elastic Gene API
  2. 2. Being “Elastic“ ● Fast ● Always ON ● Up-to-date ● Scalable ● Extensible
  3. 3. Common procedure for gene data retrieval Entrez Ensembl UniProt Internal gene object Data sources Data parsing update regularly ...
  4. 4. Common procedure for gene data retrieval Entrez Ensembl UniProt Internal gene object Data sources Data parsing update regularly As A Service? ... Common procedure for gene data retrieval Entrez Ensembl UniProt Internal gene object Data sources Data parsing update regularly As A Service? ...
  5. 5. Working model - 1 Entrez Ensembl UniProt Data sources Merging Query Engine User queries ...
  6. 6. Working model - 1 Entrez Ensembl UniProt Data sources Merging Query Engine ... { “entrezgene“: 1017 } { “uniprot“: “P24941” } A dummy merging example: { “ensemblgene“: “ENSG00000123374” } { “entrezgene“: 1017, “ensemblgene“: “ENSG00000123374”, “uniprot“: “P24941” }
  7. 7. Gene object in noSQL database “key” “document” 1017: { “Symbol”: “CDK2”, “Ensembl”: “ENSG00000123374”, “RefSeq”: [ “NM_001798”, “NM_052827” ], “Reporter”: { “U95A”: [ “1792_g_at”, “1833_at” ], “U133A”:[ “211804_s_at”, “2045252_at”, “211803_at” ] } }
  8. 8. Syncing from data-hub to query instance Merging Entrez Ensembl UniProt Data sources ... Query Engine User queries Public data hub Public query instance
  9. 9. Syncing from data-hub to query instances Merging Entrez Ensembl UniProt Data sources ... Query Engine User queries Public data hub Public query instance
  10. 10. Public query instance http://MyGene.info (currently v2 API, two endpoints) http://MyGene.info/v2/query?q=<query> any query term(s) matching gene hits http://MyGene.info/v2/gene/<geneid> gene id(s) matching gene objects
  11. 11. Public query instance ● Support ALL species, from NCBI (>12K species, >13M genes) ● >40 annotation fields and expanding ● Weekly-updated ● Flexible query interface ● Simple queries ● Fielded queries ● Wildcard queries ● Genomic interval queries ● Species filter ● Returning fields filter ● Support batch queries, JSONP, CORS ● Committed for long-term availability ?q=cdk2 ?q=symbol:cdk2 ?q=cdk* ?q=chr1:1-100,000&species=human ?q=cdk2&species=mouse,rat ?q=cdk2&fields=symbol,homologene http://MyGene.info
  12. 12. Public query instance High-performance host (serving ~500K requests/day) http://MyGene.info
  13. 13. Public query instance MyGene.py - Python wrapper https://pypi.python.org/pypi/mygene Third-party packages pip install mygene
  14. 14. Public query instance MyGene.autocomplete - Gene query autocomplete widget https://bitbucket.org/sulab/mygene.autocomplete Third-party packages
  15. 15. Public query instance MyGene.autocomplete - Gene query autocomplete widget https://bitbucket.org/sulab/mygene.autocomplete Third-party packages
  16. 16. Working model – 2 Entrez Ensembl UniProt Data sources Merging Query Engine User queriesMerging 1 Merging 2 Merging 3 Query Engine Query Engine ...
  17. 17. Syncing from data-hub to query instances Merging Entrez Ensembl UniProt Data sources ... Query Engine User queries Public data hub User queries Query Engine Private data hub Merging Merging Private query instancePublic query instance Private 1 Private 2 ...
  18. 18. Private query instance ● Dedicated host ● Same powerful query interface ● Third-party packages still work ● Public data still get sync-ed ● Allow to merge private data
  19. 19. To reach us? Questions on public query instance or interested in setting up your own private query instance? Please let us know: help@mygene.info
  20. 20. Code repositories ● Web front-end https://bitbucket.org/sulab/mygene.info Apache 2 licensed ● Data hub https://bitbucket.org/sulab/mygene.hub GPL v3 licensed
  21. 21. Acknowledgement Funding and Support R01GM083924 Sulab Andrew Su Benjamin Good Max Nannis Salvatore Loguercio Katie Fisch Tobias Meissner
  22. 22. Syncing from data-hub to query instances Merging Entrez Ensembl UniProt Data sources ... Query Engine User queries Public data hub User queries Query Engine Private data hub Merging Merging Private query instancePublic query instance Private 1 Private 2 ...
  23. 23. Private query instance ● Same powerful query interface ● Third-party packages still work ● Public data get sync-ed ● Allow to merge private dataPublic data hub Query Engine Private data hub MergingMerging Private query instance Private 1 Private 2 ... Entrez Ensembl UniProt Data sources ...
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×