Published on

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. MyGene.Info: Gene Annotation as a Service - GAaaS Chunlei Wu cwu@scripps.edu 2011/07/16
  2. 2. A migration story for BioGPS http://biogps.org
  3. 3. A migration story for BioGPS http://biogps.org
  4. 4. A migration story for BioGPS http://biogps.org
  5. 5. Gene-centric annotation data http://biogps.orgA simple view: Gene 1017 →Symbol: CDK2 →Ensembl: ENSG00000123374 →RefSeq: NM_001798 NM_052827 →Reporter: →U95A: 1792_g_at 1833_at →U133A: 211804_s_at 204252_at 211803_at
  6. 6. A real example: Symbol Name Alias Summary Ensembl Refseq UniGene Homologene GO UniProt InterPro PDB Prosite IPI And many more…
  7. 7. Relational database solutions Solution 1: “star” schema Ensembl Table GeneID EnsemblID Reporter Table 1017 ENSG00000250560GeneID Platform Reporter1017 U95A 1792_s_at Master Table1017 U95A 1833_at GeneID Symbol1017 U133A 211804_s_at 1017 CDK21017 U133A 204252_at Refseq Table1017 U133A 211803_at GeneID RefseqID 1017 NM_001798 1017 NM_052827
  8. 8. Relational database solutionsSolution 2: “weakly-typed” schema Generic Data Table ID Type Value Parent Root 1 GeneID 1017 NULL 1017 2 Symbol CDK2 1017 1017 3 Ensembl ENSG00000123374 1017 1017 4 RefSeq NM_001798 1017 1017 5 RefSeq NM_052827 1017 1017 6 Platform U95A 1017 1017 7 Platform U133A 1017 1017 8 Reporter 1792_g_at U95A 1017 9 Reporter 1833_at U95A 1017 10 Reporter 211804_s_at U133A 1017 11 Reporter 204252_at U133A 1017 12 Reporter 211803_at U133A 1017
  9. 9. “Document”-based database solution CDK2 1017: { “Symbol”: “CDK2”, “Ensembl”: “ENSG00000123374”, “RefSeq”: [ “NM_001798”, “NM_052827” ], “Reporter”: { “U95A”: [ “1792_g_at”, “1833_at” ], “U133A”:[ “211804_s_at”, “2045252_at”, “211803_at” ] } }
  10. 10. What’s CouchDB Document-based (“schema-free”) database Index and query data in MapReduce fashion using Javascript RESTful JSON API Bi-directional replicator Distributed
  11. 11. Load data into “CouchDB”  NCBI “gene_info” file Create bare-bone document for each gene Appending more annotation to “document” Gene “document” “gene2refseq” “gene2ensembl” “u95a_annot” “u133a_annot” And more …Easy to add append data type Easy to update incrementally
  12. 12. What’s behind Mygene.info
  13. 13. Gene Annotation as a Service http://MyGene.Info Gene annotation services go PUBLIC Gene query service http://mygene.info/query?q=<query> Gene annotation service http://mygene.info/gene/<geneid>
  14. 14. Gene Query Service user query  matching gene IDs/symbols/names (JSON output) http://mygene.info/query?q=<query>Examples: http://mygene.info/query?q=cdk2 http://mygene.info/query?q=cdk2+AND+species:human http://mygene.info/query?q=cdk? http://mygene.info/query?q=p* http://mygene.info/query?q=entrezgene:1017 http://mygene.info/query?q=ensemblgene:ENSG00000123374
  15. 15. Gene Annotation Service gene id  full or filtered gene annotation object (JSON output) http://mygene.info/gene/<geneid>Examples: http://mygene.info/gene/1017 http://mygene.info/gene/ENSG00000123374 http://mygene.info/gene/1017?filter=name,symbol,summary http://mygene.info/gene/1017?filter=name,symbol,refseq.rnaSpecies supported: human mouse rat fruitfly nematode zebrafish thale cress frog
  16. 16. Targeted use case: Quickly build a gene-centric online resource without the need of maintaining a local gene annotation databaseUse it in a web application: Server side Making direct HTTP calls Client side Setup a server-side proxy JSONP calls Cross-domain AJAX calls via CORS (Cross-Origin Resource Sharing) Demo and full documentation at http://mygene.info Source code: https://bitbucket.org/newgene/genedoc/src
  17. 17. AcknowledgementGroup members: GNF collaborators:Andrew Su Camilo Orozco Jon HussIan MacLeodBenjamin Good Past contributor:Eric Clarke Marc Leglise ISMB travel support Funding and Support (NIH grant: R01GM083924) http://mygene.info