Your SlideShare is downloading. ×
F01-Cloud-Mygene.info
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

F01-Cloud-Mygene.info

678
views

Published on

Published in: Technology

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
678
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. MyGene.Info: Gene Annotation as a Service - GAaaS Chunlei Wu cwu@scripps.edu 2011/07/16
  • 2. A migration story for BioGPS http://biogps.org
  • 3. A migration story for BioGPS http://biogps.org
  • 4. A migration story for BioGPS http://biogps.org
  • 5. Gene-centric annotation data http://biogps.orgA simple view: Gene 1017 →Symbol: CDK2 →Ensembl: ENSG00000123374 →RefSeq: NM_001798 NM_052827 →Reporter: →U95A: 1792_g_at 1833_at →U133A: 211804_s_at 204252_at 211803_at
  • 6. A real example: Symbol Name Alias Summary Ensembl Refseq UniGene Homologene GO UniProt InterPro PDB Prosite IPI And many more…
  • 7. Relational database solutions Solution 1: “star” schema Ensembl Table GeneID EnsemblID Reporter Table 1017 ENSG00000250560GeneID Platform Reporter1017 U95A 1792_s_at Master Table1017 U95A 1833_at GeneID Symbol1017 U133A 211804_s_at 1017 CDK21017 U133A 204252_at Refseq Table1017 U133A 211803_at GeneID RefseqID 1017 NM_001798 1017 NM_052827
  • 8. Relational database solutionsSolution 2: “weakly-typed” schema Generic Data Table ID Type Value Parent Root 1 GeneID 1017 NULL 1017 2 Symbol CDK2 1017 1017 3 Ensembl ENSG00000123374 1017 1017 4 RefSeq NM_001798 1017 1017 5 RefSeq NM_052827 1017 1017 6 Platform U95A 1017 1017 7 Platform U133A 1017 1017 8 Reporter 1792_g_at U95A 1017 9 Reporter 1833_at U95A 1017 10 Reporter 211804_s_at U133A 1017 11 Reporter 204252_at U133A 1017 12 Reporter 211803_at U133A 1017
  • 9. “Document”-based database solution CDK2 1017: { “Symbol”: “CDK2”, “Ensembl”: “ENSG00000123374”, “RefSeq”: [ “NM_001798”, “NM_052827” ], “Reporter”: { “U95A”: [ “1792_g_at”, “1833_at” ], “U133A”:[ “211804_s_at”, “2045252_at”, “211803_at” ] } }
  • 10. What’s CouchDB Document-based (“schema-free”) database Index and query data in MapReduce fashion using Javascript RESTful JSON API Bi-directional replicator Distributed
  • 11. Load data into “CouchDB”  NCBI “gene_info” file Create bare-bone document for each gene Appending more annotation to “document” Gene “document” “gene2refseq” “gene2ensembl” “u95a_annot” “u133a_annot” And more …Easy to add append data type Easy to update incrementally
  • 12. What’s behind Mygene.info
  • 13. Gene Annotation as a Service http://MyGene.Info Gene annotation services go PUBLIC Gene query service http://mygene.info/query?q=<query> Gene annotation service http://mygene.info/gene/<geneid>
  • 14. Gene Query Service user query  matching gene IDs/symbols/names (JSON output) http://mygene.info/query?q=<query>Examples: http://mygene.info/query?q=cdk2 http://mygene.info/query?q=cdk2+AND+species:human http://mygene.info/query?q=cdk? http://mygene.info/query?q=p* http://mygene.info/query?q=entrezgene:1017 http://mygene.info/query?q=ensemblgene:ENSG00000123374
  • 15. Gene Annotation Service gene id  full or filtered gene annotation object (JSON output) http://mygene.info/gene/<geneid>Examples: http://mygene.info/gene/1017 http://mygene.info/gene/ENSG00000123374 http://mygene.info/gene/1017?filter=name,symbol,summary http://mygene.info/gene/1017?filter=name,symbol,refseq.rnaSpecies supported: human mouse rat fruitfly nematode zebrafish thale cress frog
  • 16. Targeted use case: Quickly build a gene-centric online resource without the need of maintaining a local gene annotation databaseUse it in a web application: Server side Making direct HTTP calls Client side Setup a server-side proxy JSONP calls Cross-domain AJAX calls via CORS (Cross-Origin Resource Sharing) Demo and full documentation at http://mygene.info Source code: https://bitbucket.org/newgene/genedoc/src
  • 17. AcknowledgementGroup members: GNF collaborators:Andrew Su Camilo Orozco Jon HussIan MacLeodBenjamin Good Past contributor:Eric Clarke Marc Leglise ISMB travel support Funding and Support (NIH grant: R01GM083924) http://mygene.info