Your SlideShare is downloading. ×
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply


Published on

Published in: Technology

1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. MyGene.Info: Gene Annotation as a Service - GAaaS Chunlei Wu 2011/07/16
  • 2. A migration story for BioGPS
  • 3. A migration story for BioGPS
  • 4. A migration story for BioGPS
  • 5. Gene-centric annotation data http://biogps.orgA simple view: Gene 1017 →Symbol: CDK2 →Ensembl: ENSG00000123374 →RefSeq: NM_001798 NM_052827 →Reporter: →U95A: 1792_g_at 1833_at →U133A: 211804_s_at 204252_at 211803_at
  • 6. A real example: Symbol Name Alias Summary Ensembl Refseq UniGene Homologene GO UniProt InterPro PDB Prosite IPI And many more…
  • 7. Relational database solutions Solution 1: “star” schema Ensembl Table GeneID EnsemblID Reporter Table 1017 ENSG00000250560GeneID Platform Reporter1017 U95A 1792_s_at Master Table1017 U95A 1833_at GeneID Symbol1017 U133A 211804_s_at 1017 CDK21017 U133A 204252_at Refseq Table1017 U133A 211803_at GeneID RefseqID 1017 NM_001798 1017 NM_052827
  • 8. Relational database solutionsSolution 2: “weakly-typed” schema Generic Data Table ID Type Value Parent Root 1 GeneID 1017 NULL 1017 2 Symbol CDK2 1017 1017 3 Ensembl ENSG00000123374 1017 1017 4 RefSeq NM_001798 1017 1017 5 RefSeq NM_052827 1017 1017 6 Platform U95A 1017 1017 7 Platform U133A 1017 1017 8 Reporter 1792_g_at U95A 1017 9 Reporter 1833_at U95A 1017 10 Reporter 211804_s_at U133A 1017 11 Reporter 204252_at U133A 1017 12 Reporter 211803_at U133A 1017
  • 9. “Document”-based database solution CDK2 1017: { “Symbol”: “CDK2”, “Ensembl”: “ENSG00000123374”, “RefSeq”: [ “NM_001798”, “NM_052827” ], “Reporter”: { “U95A”: [ “1792_g_at”, “1833_at” ], “U133A”:[ “211804_s_at”, “2045252_at”, “211803_at” ] } }
  • 10. What’s CouchDB Document-based (“schema-free”) database Index and query data in MapReduce fashion using Javascript RESTful JSON API Bi-directional replicator Distributed
  • 11. Load data into “CouchDB”  NCBI “gene_info” file Create bare-bone document for each gene Appending more annotation to “document” Gene “document” “gene2refseq” “gene2ensembl” “u95a_annot” “u133a_annot” And more …Easy to add append data type Easy to update incrementally
  • 12. What’s behind
  • 13. Gene Annotation as a Service http://MyGene.Info Gene annotation services go PUBLIC Gene query service<query> Gene annotation service<geneid>
  • 14. Gene Query Service user query  matching gene IDs/symbols/names (JSON output)<query>Examples:*
  • 15. Gene Annotation Service gene id  full or filtered gene annotation object (JSON output)<geneid>Examples:,symbol,summary,symbol,refseq.rnaSpecies supported: human mouse rat fruitfly nematode zebrafish thale cress frog
  • 16. Targeted use case: Quickly build a gene-centric online resource without the need of maintaining a local gene annotation databaseUse it in a web application: Server side Making direct HTTP calls Client side Setup a server-side proxy JSONP calls Cross-domain AJAX calls via CORS (Cross-Origin Resource Sharing) Demo and full documentation at Source code:
  • 17. AcknowledgementGroup members: GNF collaborators:Andrew Su Camilo Orozco Jon HussIan MacLeodBenjamin Good Past contributor:Eric Clarke Marc Leglise ISMB travel support Funding and Support (NIH grant: R01GM083924)