Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1 of 17

F01-Cloud-Mygene.info

1

Share

Download to read offline

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

F01-Cloud-Mygene.info

  1. 1. MyGene.Info: Gene Annotation as a Service - GAaaS Chunlei Wu cwu@scripps.edu 2011/07/16
  2. 2. A migration story for BioGPS http://biogps.org
  3. 3. A migration story for BioGPS http://biogps.org
  4. 4. A migration story for BioGPS http://biogps.org
  5. 5. Gene-centric annotation data http://biogps.org A simple view: Gene 1017 →Symbol: CDK2 →Ensembl: ENSG00000123374 →RefSeq: NM_001798 NM_052827 →Reporter: →U95A: 1792_g_at 1833_at →U133A: 211804_s_at 204252_at 211803_at
  6. 6. A real example: Symbol Name Alias Summary Ensembl Refseq UniGene Homologene GO UniProt InterPro PDB Prosite IPI And many more…
  7. 7. Relational database solutions Solution 1: “star” schema Ensembl Table GeneID EnsemblID Reporter Table 1017 ENSG00000250560 GeneID Platform Reporter 1017 U95A 1792_s_at Master Table 1017 U95A 1833_at GeneID Symbol 1017 U133A 211804_s_at 1017 CDK2 1017 U133A 204252_at Refseq Table 1017 U133A 211803_at GeneID RefseqID 1017 NM_001798 1017 NM_052827
  8. 8. Relational database solutions Solution 2: “weakly-typed” schema Generic Data Table ID Type Value Parent Root 1 GeneID 1017 NULL 1017 2 Symbol CDK2 1017 1017 3 Ensembl ENSG00000123374 1017 1017 4 RefSeq NM_001798 1017 1017 5 RefSeq NM_052827 1017 1017 6 Platform U95A 1017 1017 7 Platform U133A 1017 1017 8 Reporter 1792_g_at U95A 1017 9 Reporter 1833_at U95A 1017 10 Reporter 211804_s_at U133A 1017 11 Reporter 204252_at U133A 1017 12 Reporter 211803_at U133A 1017
  9. 9. “Document”-based database solution CDK2 1017: { “Symbol”: “CDK2”, “Ensembl”: “ENSG00000123374”, “RefSeq”: [ “NM_001798”, “NM_052827” ], “Reporter”: { “U95A”: [ “1792_g_at”, “1833_at” ], “U133A”:[ “211804_s_at”, “2045252_at”, “211803_at” ] } }
  10. 10. What’s CouchDB  Document-based (“schema-free”) database  Index and query data in MapReduce fashion using Javascript  RESTful JSON API  Bi-directional replicator  Distributed
  11. 11. Load data into “CouchDB”  NCBI “gene_info” file Create bare-bone document for each gene  Appending more annotation to “document” Gene “document” “gene2refseq” “gene2ensembl” “u95a_annot” “u133a_annot” And more … Easy to add append data type Easy to update incrementally
  12. 12. What’s behind Mygene.info
  13. 13. Gene Annotation as a Service http://MyGene.Info Gene annotation services go PUBLIC Gene query service http://mygene.info/query?q=<query> Gene annotation service http://mygene.info/gene/<geneid>
  14. 14. Gene Query Service user query  matching gene IDs/symbols/names (JSON output) http://mygene.info/query?q=<query> Examples: http://mygene.info/query?q=cdk2 http://mygene.info/query?q=cdk2+AND+species:human http://mygene.info/query?q=cdk? http://mygene.info/query?q=p* http://mygene.info/query?q=entrezgene:1017 http://mygene.info/query?q=ensemblgene:ENSG00000123374
  15. 15. Gene Annotation Service gene id  full or filtered gene annotation object (JSON output) http://mygene.info/gene/<geneid> Examples: http://mygene.info/gene/1017 http://mygene.info/gene/ENSG00000123374 http://mygene.info/gene/1017?filter=name,symbol,summary http://mygene.info/gene/1017?filter=name,symbol,refseq.rna Species supported: human mouse rat fruitfly nematode zebrafish thale cress frog
  16. 16. Targeted use case: Quickly build a gene-centric online resource without the need of maintaining a local gene annotation database Use it in a web application: Server side Making direct HTTP calls Client side Setup a server-side proxy JSONP calls Cross-domain AJAX calls via CORS (Cross-Origin Resource Sharing) Demo and full documentation at http://mygene.info Source code: https://bitbucket.org/newgene/genedoc/src
  17. 17. Acknowledgement Group members: GNF collaborators: Andrew Su Camilo Orozco Jon Huss Ian MacLeod Benjamin Good Past contributor: Eric Clarke Marc Leglise ISMB travel support Funding and Support (NIH grant: R01GM083924) http://mygene.info

×