Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Unknown Genes, Community Profiling, & Biotorrents.net

  • Login to see the comments

Unknown Genes, Community Profiling, & Biotorrents.net

  1. 1. unknown genes, Community Profiling,& Biotorrents.net<br />Morgan Langille<br /> UC Davis<br />
  2. 2. Genes with unknown function<br />
  3. 3. Questions<br />If we wanted to start studying a gene of unknown function, which one(s) should we study first?<br />How many un-annotated genes could be annotated?<br />What proportion of unknown genes (hypothetical proteins) are probably not real proteins (i.e. pseudo-genes, mis-predicted orfs, etc.) ?<br />What proportion of unknown gene families are probably phage-related?<br />Can some of these families (hopefully the top ranking ones) be characterized using non-similarity based bioinformatic approaches? <br />
  4. 4. Outline of project<br />
  5. 5. Community Profiling<br />
  6. 6. Phylogenetic profiling<br />Wu, et al., PLOS Genetics, 2005<br />C. hydrogenoformansidentified presence or absence of homologs in all other completely sequence genomes<br />Identified many hypothetical proteins that had the same profile as other sporulation proteins<br />
  7. 7. Community Profiling<br />KEGG<br />COG<br />Delong, et al., Science, 2006<br />
  8. 8. Community Profiling<br />Look across multiple metagenomic samples<br />Gene families that have similar profiles may have similar function<br />Similar to using co-expression to identify similar functioning genes<br />
  9. 9. So what have I done? <br />&quot;all metagenomics peptides&quot; from CAMERA <br />43M sequences (mostly GOS)<br />Searched against 11,000 Pfams using HMMER 3<br />Used “cluster” to group genes and samples<br />
  10. 10. Results<br />Metagenomic Samples<br />Red = above avg. number of pfams<br />Green = below avg. number of pfams<br />Have not normalized<br />Number of sequences per sample<br />For number of pfams<br />Pfams<br />
  11. 11. Example of phage Pfams clustering together<br />
  12. 12. Measuring functional relatedness <br />Need to measure community profiling performance<br />The hierarchal clusters were broken into 575 groups using a correlation cutoff of 0.90 or above. <br />PFams were mapped to GO terms using pfam2GO<br />1893 PFams had no associated GO term <br />695 of these were Domains of Unknown Function:DUFs<br />3377 PFams had one or more associated GO terms and could be used for further analysis <br />Only 67 (of 575) clusters contained 4 or more PFams with at least one GO term <br />
  13. 13. Measuring GO similarity<br />G-SESAME <br />Measures the semantic similarity of any two GO terms<br />Not downloadable so queries had to be made to their web server (not fun)<br />Pair-wise similarity was measure for each pair of GO terms in each cluster <br />had to check if terms were in same namespace<br />
  14. 14. Results<br />Average G-Sesame scores for each cluster<br />The average of all cluster averages was 0.484 <br />10 clusters had a score of 0.60 or greater. <br />The data was then randomized by using the same GO terms but in different random clusters and a score of 0.412-0.420 over 4 iterations <br />Each of the 4 iterations had only 1 or 0 clusters with a score of 0.60 or greater <br />
  15. 15. Community Profiling Results<br /><ul><li> Average of all clusters= 0.49
  16. 16. 10 clusters are > 0.60</li></li></ul><li>Random Results<br /><ul><li> Average of all clusters (4 iterations) = 0.41 - 0.42
  17. 17. 1 or 0 clusters are > 0.60</li></li></ul><li>BioTorrents<br />
  18. 18. Bittorrent<br />A peer-to-peer file sharing protocol<br />~ 27-55% of all Internet traffic<br />Mostly illegal file sharing<br />Files are shared in small <br /> pieces between several <br /> users<br />
  19. 19. Torrents for Biology<br />Why use torrent technology?<br />Download large datasets much faster<br />Searchable central listing<br />Decentralization of data<br />
  20. 20. What is BioTorrents?<br />A legal file sharing website for scientists<br />Users can upload their own research results, data, software<br />Users can browse or search through all datasets<br />Data is not hosted on BioTorrents<br />
  21. 21. www.biotorrents.net<br />
  22. 22. Browse & Search<br />
  23. 23. Details<br />
  24. 24. Sign Up<br />
  25. 25. Upload<br />
  26. 26. Other Features<br />Forum<br />RSS Feed<br />Top 10<br />FAQ<br />Links<br />
  27. 27. Who will upload data?<br />Everyone! <br />Realistically,<br />Large organizations (e.g. NCBI, CAMERA, etc.) <br />May need some convincing to host their data via torrents in addition to FTP, HTTP, etc. <br />Scientists that really support open science<br /> Sharing data before formally complete and published <br />
  28. 28. Technical Challenges <br />Many institutions frown on BitTorrent technology<br />A port must be opened/forwarded<br />Client program and computer must be left running<br />Ensuring data is legal, virus free, etc.<br />Users that upload many legitimate torrents will provide more confidence to people downloading<br />Making downloading and uploading easy<br />

×