Unknown Genes, Community Profiling, & Biotorrents.net

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Unknown Genes, Community Profiling, & Biotorrents.net - Presentation Transcript

    1. unknown genes, Community Profiling,& Biotorrents.net
      Morgan Langille
      UC Davis
    2. Genes with unknown function
    3. Questions
      If we wanted to start studying a gene of unknown function, which one(s) should we study first?
      How many un-annotated genes could be annotated?
      What proportion of unknown genes (hypothetical proteins) are probably not real proteins (i.e. pseudo-genes, mis-predicted orfs, etc.) ?
      What proportion of unknown gene families are probably phage-related?
      Can some of these families (hopefully the top ranking ones) be characterized using non-similarity based bioinformatic approaches?
    4. Outline of project
    5. Community Profiling
    6. Phylogenetic profiling
      Wu, et al., PLOS Genetics, 2005
      C. hydrogenoformansidentified presence or absence of homologs in all other completely sequence genomes
      Identified many hypothetical proteins that had the same profile as other sporulation proteins
    7. Community Profiling
      KEGG
      COG
      Delong, et al., Science, 2006
    8. Community Profiling
      Look across multiple metagenomic samples
      Gene families that have similar profiles may have similar function
      Similar to using co-expression to identify similar functioning genes
    9. So what have I done?
      "all metagenomics peptides" from CAMERA
      43M sequences (mostly GOS)
      Searched against 11,000 Pfams using HMMER 3
      Used “cluster” to group genes and samples
    10. Results
      Metagenomic Samples
      Red = above avg. number of pfams
      Green = below avg. number of pfams
      Have not normalized
      Number of sequences per sample
      For number of pfams
      Pfams
    11. Example of phage Pfams clustering together
    12. Measuring functional relatedness
      Need to measure community profiling performance
      The hierarchal clusters were broken into 575 groups using a correlation cutoff of 0.90 or above.
      PFams were mapped to GO terms using pfam2GO
      1893 PFams had no associated GO term
      695 of these were Domains of Unknown Function:DUFs
      3377 PFams had one or more associated GO terms and could be used for further analysis
      Only 67 (of 575) clusters contained 4 or more PFams with at least one GO term
    13. Measuring GO similarity
      G-SESAME
      Measures the semantic similarity of any two GO terms
      Not downloadable so queries had to be made to their web server (not fun)
      Pair-wise similarity was measure for each pair of GO terms in each cluster
      had to check if terms were in same namespace
    14. Results
      Average G-Sesame scores for each cluster
      The average of all cluster averages was 0.484
      10 clusters had a score of 0.60 or greater.
      The data was then randomized by using the same GO terms but in different random clusters and a score of 0.412-0.420 over 4 iterations
      Each of the 4 iterations had only 1 or 0 clusters with a score of 0.60 or greater
    15. Community Profiling Results
      • Average of all clusters= 0.49
      • 10 clusters are > 0.60
    16. Random Results
      • Average of all clusters (4 iterations) = 0.41 - 0.42
      • 1 or 0 clusters are > 0.60
    17. BioTorrents
    18. Bittorrent
      A peer-to-peer file sharing protocol
      ~ 27-55% of all Internet traffic
      Mostly illegal file sharing
      Files are shared in small
      pieces between several
      users
    19. Torrents for Biology
      Why use torrent technology?
      Download large datasets much faster
      Searchable central listing
      Decentralization of data
    20. What is BioTorrents?
      A legal file sharing website for scientists
      Users can upload their own research results, data, software
      Users can browse or search through all datasets
      Data is not hosted on BioTorrents
    21. www.biotorrents.net
    22. Browse & Search
    23. Details
    24. Sign Up
    25. Upload
    26. Other Features
      Forum
      RSS Feed
      Top 10
      FAQ
      Links
    27. Who will upload data?
      Everyone!
      Realistically,
      Large organizations (e.g. NCBI, CAMERA, etc.)
      May need some convincing to host their data via torrents in addition to FTP, HTTP, etc.
      Scientists that really support open science
      Sharing data before formally complete and published
    28. Technical Challenges
      Many institutions frown on BitTorrent technology
      A port must be opened/forwarded
      Client program and computer must be left running
      Ensuring data is legal, virus free, etc.
      Users that upload many legitimate torrents will provide more confidence to people downloading
      Making downloading and uploading easy
    SlideShare Zeitgeist 2009

    + UC DavisUC Davis Nominate

    custom

    146 views, 0 favs, 0 embeds more stats

    More info about this document

    CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

    Go to text version

    • Total Views 146
      • 146 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 1
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories