SlideShare a Scribd company logo
unknown genes, Community Profiling,& Biotorrents.net Morgan Langille  UC Davis
Genes with unknown function
Questions If we wanted to start studying a gene of unknown function, which one(s) should we study first? How many un-annotated genes could be annotated? What proportion of unknown genes (hypothetical proteins) are probably not real proteins (i.e. pseudo-genes, mis-predicted orfs, etc.) ? What proportion of unknown gene families are probably phage-related? Can some of these families (hopefully the top ranking ones) be characterized using non-similarity based bioinformatic approaches?
Outline of project
Community Profiling
Phylogenetic profiling Wu, et al., PLOS Genetics, 2005 C. hydrogenoformansidentified presence or absence of homologs in all other completely sequence genomes Identified many hypothetical proteins that had the same profile as other sporulation proteins
Community Profiling KEGG COG Delong, et al., Science, 2006
Community Profiling Look across multiple metagenomic samples Gene families that have similar profiles may have similar function Similar to using co-expression to identify similar functioning genes
So what have I done?	 "all metagenomics peptides" from CAMERA  43M sequences (mostly GOS) Searched against 11,000 Pfams using HMMER 3 Used “cluster” to group genes and samples
Results Metagenomic Samples Red = above avg. number of pfams Green = below avg. number of pfams Have not normalized Number of sequences per sample For number of pfams Pfams
Example of phage Pfams clustering together
Measuring functional relatedness  Need to measure community profiling performance The hierarchal clusters were broken into 575 groups using a correlation cutoff of 0.90 or above.  PFams were mapped to GO terms using pfam2GO 1893 PFams had no associated GO term  695 of these were Domains of Unknown Function:DUFs 3377 PFams had one or more associated GO terms and could be used for further analysis  Only 67 (of 575) clusters contained 4 or more PFams with at least one GO term
Measuring GO similarity G-SESAME  Measures the semantic similarity of any two GO terms Not downloadable so queries had to be made to their web server (not fun) Pair-wise similarity was measure for each pair of GO terms in each cluster  had to check if terms were in same namespace
Results Average G-Sesame scores for each cluster The average of all cluster averages was 0.484  10 clusters had a score of 0.60 or greater.  The data was then randomized by using the same GO terms but in different random clusters and a score of 0.412-0.420 over 4 iterations  Each of the 4 iterations had only 1 or 0 clusters with a score of 0.60 or greater
Community Profiling Results ,[object Object]
 10 clusters are > 0.60,[object Object]
  1 or 0 clusters are > 0.60,[object Object]
Bittorrent A peer-to-peer file sharing protocol ~ 27-55% of all Internet traffic Mostly illegal file sharing Files are shared in small     pieces between several     users
Torrents for Biology Why use torrent technology? Download large datasets much faster Searchable central listing Decentralization of data
What is BioTorrents? A legal file sharing website for scientists Users can upload their own research results, data, software Users can browse or search through all datasets Data is not hosted on BioTorrents
www.biotorrents.net
Browse & Search
Details
Sign Up
Upload
Other Features Forum RSS Feed Top 10 FAQ Links
Who will upload data? Everyone!  Realistically, Large organizations (e.g. NCBI, CAMERA, etc.)  May need some convincing to host their data via torrents in addition to FTP, HTTP, etc.  Scientists that really support open science  Sharing data before formally complete and published
Technical Challenges  Many institutions frown on BitTorrent technology A port must be opened/forwarded Client program and computer must be left running Ensuring data is legal, virus free, etc. Users that upload many legitimate torrents will provide more confidence to people downloading Making downloading and uploading easy

More Related Content

What's hot

Repeatable plant pathology bioinformatic analysis: Not everything is NGS data
Repeatable plant pathology bioinformatic analysis: Not everything is NGS dataRepeatable plant pathology bioinformatic analysis: Not everything is NGS data
Repeatable plant pathology bioinformatic analysis: Not everything is NGS dataLeighton Pritchard
 
Fairport domain specific metadata using w3 c dcat & skos w ontology views
Fairport domain specific metadata using w3 c dcat & skos w ontology viewsFairport domain specific metadata using w3 c dcat & skos w ontology views
Fairport domain specific metadata using w3 c dcat & skos w ontology viewsTim Clark
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Alejandra Gonzalez-Beltran
 
150219 agbt giab_poster_marc
150219 agbt giab_poster_marc150219 agbt giab_poster_marc
150219 agbt giab_poster_marcGenomeInABottle
 
Metabolic Network Analysis
Metabolic Network AnalysisMetabolic Network Analysis
Metabolic Network AnalysisMas Kot
 
Full text
Full textFull text
Full textbutest
 
TGAC Browser bosc 2014
TGAC Browser bosc 2014TGAC Browser bosc 2014
TGAC Browser bosc 2014Anil Thanki
 
Ondex: Data integration and visualisation
Ondex: Data integration and visualisationOndex: Data integration and visualisation
Ondex: Data integration and visualisationBiogeeks
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyMelanie Courtot
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialDmitry Grapov
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOAEBI
 
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...Enrico Glaab
 
140127 Performance Metrics WG
140127 Performance Metrics WG140127 Performance Metrics WG
140127 Performance Metrics WGGenomeInABottle
 
Apollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriApollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriMonica Munoz-Torres
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchEuropean Bioinformatics Institute
 
Gene Ontology Project
Gene Ontology ProjectGene Ontology Project
Gene Ontology Projectvaibhavdeoda
 

What's hot (20)

Repeatable plant pathology bioinformatic analysis: Not everything is NGS data
Repeatable plant pathology bioinformatic analysis: Not everything is NGS dataRepeatable plant pathology bioinformatic analysis: Not everything is NGS data
Repeatable plant pathology bioinformatic analysis: Not everything is NGS data
 
Fairport domain specific metadata using w3 c dcat & skos w ontology views
Fairport domain specific metadata using w3 c dcat & skos w ontology viewsFairport domain specific metadata using w3 c dcat & skos w ontology views
Fairport domain specific metadata using w3 c dcat & skos w ontology views
 
The Chemtools LaBLog
The Chemtools LaBLogThe Chemtools LaBLog
The Chemtools LaBLog
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
150219 agbt giab_poster_marc
150219 agbt giab_poster_marc150219 agbt giab_poster_marc
150219 agbt giab_poster_marc
 
Metabolic Network Analysis
Metabolic Network AnalysisMetabolic Network Analysis
Metabolic Network Analysis
 
Full text
Full textFull text
Full text
 
TGAC Browser bosc 2014
TGAC Browser bosc 2014TGAC Browser bosc 2014
TGAC Browser bosc 2014
 
Ondex: Data integration and visualisation
Ondex: Data integration and visualisationOndex: Data integration and visualisation
Ondex: Data integration and visualisation
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontology
 
Sequence assembly
Sequence assemblySequence assembly
Sequence assembly
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -Tutorial
 
MicrobeDB Overview
MicrobeDB OverviewMicrobeDB Overview
MicrobeDB Overview
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOA
 
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
 
140127 Performance Metrics WG
140127 Performance Metrics WG140127 Performance Metrics WG
140127 Performance Metrics WG
 
Apollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriApollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citri
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
 
Gene Ontology Project
Gene Ontology ProjectGene Ontology Project
Gene Ontology Project
 

Viewers also liked

司馬光 對話方塊
司馬光 對話方塊司馬光 對話方塊
司馬光 對話方塊honan4108
 
International Group Work For Sustainable Development
International Group Work For Sustainable DevelopmentInternational Group Work For Sustainable Development
International Group Work For Sustainable DevelopmentKatherine Haxton
 
Infolit day 24_may2016
Infolit day 24_may2016Infolit day 24_may2016
Infolit day 24_may2016HELIGLIASA
 
Comunicado de la oficina del coordinador residente de naciones unidas
Comunicado de la oficina del coordinador residente de naciones unidasComunicado de la oficina del coordinador residente de naciones unidas
Comunicado de la oficina del coordinador residente de naciones unidasCasa de la Mujer
 
Quarterly Technology Briefing, Manchester, UK September 2013
Quarterly Technology Briefing, Manchester, UK September 2013Quarterly Technology Briefing, Manchester, UK September 2013
Quarterly Technology Briefing, Manchester, UK September 2013Thoughtworks
 
Parallel Tuning of Machine Learning Algorithms, Thesis Proposal
Parallel Tuning of Machine Learning Algorithms, Thesis ProposalParallel Tuning of Machine Learning Algorithms, Thesis Proposal
Parallel Tuning of Machine Learning Algorithms, Thesis ProposalGianmario Spacagna
 
和菓子ここだけの話
和菓子ここだけの話和菓子ここだけの話
和菓子ここだけの話stucon
 
The Latest SEO Statistics for SEOs, Tweeted at SMX West 2013
The Latest SEO Statistics for SEOs, Tweeted at SMX West 2013The Latest SEO Statistics for SEOs, Tweeted at SMX West 2013
The Latest SEO Statistics for SEOs, Tweeted at SMX West 2013seoinhouse
 
User experience for drupal
User experience for drupalUser experience for drupal
User experience for drupalAnne Stefanyk
 
How High Tech CEOs Can Increase Sales and Marketing Effectiveness and Reduce ...
How High Tech CEOs Can Increase Sales and Marketing Effectiveness and Reduce ...How High Tech CEOs Can Increase Sales and Marketing Effectiveness and Reduce ...
How High Tech CEOs Can Increase Sales and Marketing Effectiveness and Reduce ...Paul R. DiModica
 

Viewers also liked (19)

司馬光 對話方塊
司馬光 對話方塊司馬光 對話方塊
司馬光 對話方塊
 
Final presentation 2012
Final presentation 2012Final presentation 2012
Final presentation 2012
 
International Group Work For Sustainable Development
International Group Work For Sustainable DevelopmentInternational Group Work For Sustainable Development
International Group Work For Sustainable Development
 
Infolit day 24_may2016
Infolit day 24_may2016Infolit day 24_may2016
Infolit day 24_may2016
 
Comunicado de la oficina del coordinador residente de naciones unidas
Comunicado de la oficina del coordinador residente de naciones unidasComunicado de la oficina del coordinador residente de naciones unidas
Comunicado de la oficina del coordinador residente de naciones unidas
 
Quarterly Technology Briefing, Manchester, UK September 2013
Quarterly Technology Briefing, Manchester, UK September 2013Quarterly Technology Briefing, Manchester, UK September 2013
Quarterly Technology Briefing, Manchester, UK September 2013
 
Parallel Tuning of Machine Learning Algorithms, Thesis Proposal
Parallel Tuning of Machine Learning Algorithms, Thesis ProposalParallel Tuning of Machine Learning Algorithms, Thesis Proposal
Parallel Tuning of Machine Learning Algorithms, Thesis Proposal
 
Evolver Architects
Evolver ArchitectsEvolver Architects
Evolver Architects
 
Squaw Lake
Squaw LakeSquaw Lake
Squaw Lake
 
Shimla Kullu Manali Dalhousie
Shimla Kullu Manali DalhousieShimla Kullu Manali Dalhousie
Shimla Kullu Manali Dalhousie
 
和菓子ここだけの話
和菓子ここだけの話和菓子ここだけの話
和菓子ここだけの話
 
Spring3.1 aop-mvc
Spring3.1 aop-mvcSpring3.1 aop-mvc
Spring3.1 aop-mvc
 
The Latest SEO Statistics for SEOs, Tweeted at SMX West 2013
The Latest SEO Statistics for SEOs, Tweeted at SMX West 2013The Latest SEO Statistics for SEOs, Tweeted at SMX West 2013
The Latest SEO Statistics for SEOs, Tweeted at SMX West 2013
 
¿Hablamos de futuro?
¿Hablamos de futuro?¿Hablamos de futuro?
¿Hablamos de futuro?
 
User experience for drupal
User experience for drupalUser experience for drupal
User experience for drupal
 
How High Tech CEOs Can Increase Sales and Marketing Effectiveness and Reduce ...
How High Tech CEOs Can Increase Sales and Marketing Effectiveness and Reduce ...How High Tech CEOs Can Increase Sales and Marketing Effectiveness and Reduce ...
How High Tech CEOs Can Increase Sales and Marketing Effectiveness and Reduce ...
 
Chuong 1 new
Chuong 1 newChuong 1 new
Chuong 1 new
 
Fall Simmer Pot Recipes
Fall Simmer Pot RecipesFall Simmer Pot Recipes
Fall Simmer Pot Recipes
 
NFS: para la gestion de espacios de trabajo
NFS: para la gestion de espacios de trabajoNFS: para la gestion de espacios de trabajo
NFS: para la gestion de espacios de trabajo
 

Similar to Unknown Genes, Community Profiling, & Biotorrents.net

Genome science intermine
Genome science intermineGenome science intermine
Genome science intermineELIXIR UK
 
Basic BLAST (BLASTn)
Basic BLAST (BLASTn)Basic BLAST (BLASTn)
Basic BLAST (BLASTn)Syed Lokman
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisCatherine Canevet
 
BIOLINK 2008: Linking database submissions to primary citations with PubMe...
BIOLINK 2008:    Linking database submissions to primary citations with PubMe...BIOLINK 2008:    Linking database submissions to primary citations with PubMe...
BIOLINK 2008: Linking database submissions to primary citations with PubMe...Heather Piwowar
 
Plant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesPlant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesLeighton Pritchard
 
PhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenesPhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenesPhoenix Bioinformatics
 
RDA Wheat Data Interoperability Cookbook and last developments
RDA Wheat Data Interoperability Cookbook and last developmentsRDA Wheat Data Interoperability Cookbook and last developments
RDA Wheat Data Interoperability Cookbook and last developmentsCIARD Movement
 
GlyGen Warren Workshop in Boston
GlyGen Warren Workshop in BostonGlyGen Warren Workshop in Boston
GlyGen Warren Workshop in BostonGlyGen
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GenomeInABottle
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giantsBenjamin Good
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Webebiquity
 
provenance of microarray experiments
provenance of microarray experimentsprovenance of microarray experiments
provenance of microarray experimentsHelena Deus
 
Interpret gene expression results 2013
Interpret gene expression results 2013Interpret gene expression results 2013
Interpret gene expression results 2013Elsa von Licy
 
Build your own gene panels 2013
Build your own gene panels 2013Build your own gene panels 2013
Build your own gene panels 2013Elsa von Licy
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biologyrobertstevens65
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataPhilip Cheung
 
Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02PILLAI ASWATHY VISWANATH
 
Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.Deepak K
 
Zmasek TOPSAN Biohackathon 2011
Zmasek TOPSAN Biohackathon 2011Zmasek TOPSAN Biohackathon 2011
Zmasek TOPSAN Biohackathon 2011cmzmasek
 

Similar to Unknown Genes, Community Profiling, & Biotorrents.net (20)

Genome science intermine
Genome science intermineGenome science intermine
Genome science intermine
 
Basic BLAST (BLASTn)
Basic BLAST (BLASTn)Basic BLAST (BLASTn)
Basic BLAST (BLASTn)
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysis
 
BIOLINK 2008: Linking database submissions to primary citations with PubMe...
BIOLINK 2008:    Linking database submissions to primary citations with PubMe...BIOLINK 2008:    Linking database submissions to primary citations with PubMe...
BIOLINK 2008: Linking database submissions to primary citations with PubMe...
 
Plant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesPlant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In Sequences
 
PhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenesPhoenixBio 2020 Stanford Workshop on PhyloGenes
PhoenixBio 2020 Stanford Workshop on PhyloGenes
 
RDA Wheat Data Interoperability Cookbook and last developments
RDA Wheat Data Interoperability Cookbook and last developmentsRDA Wheat Data Interoperability Cookbook and last developments
RDA Wheat Data Interoperability Cookbook and last developments
 
GlyGen Warren Workshop in Boston
GlyGen Warren Workshop in BostonGlyGen Warren Workshop in Boston
GlyGen Warren Workshop in Boston
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giants
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Web
 
provenance of microarray experiments
provenance of microarray experimentsprovenance of microarray experiments
provenance of microarray experiments
 
Text and data integration
Text and data integrationText and data integration
Text and data integration
 
Interpret gene expression results 2013
Interpret gene expression results 2013Interpret gene expression results 2013
Interpret gene expression results 2013
 
Build your own gene panels 2013
Build your own gene panels 2013Build your own gene panels 2013
Build your own gene panels 2013
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biology
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadata
 
Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02
 
Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.
 
Zmasek TOPSAN Biohackathon 2011
Zmasek TOPSAN Biohackathon 2011Zmasek TOPSAN Biohackathon 2011
Zmasek TOPSAN Biohackathon 2011
 

More from Morgan Langille

GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopMorgan Langille
 
Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...Morgan Langille
 
Inferring microbial community function from taxonomic composition
Inferring microbial community function from taxonomic compositionInferring microbial community function from taxonomic composition
Inferring microbial community function from taxonomic compositionMorgan Langille
 
Characterizing Protein Families of Unknown Function
Characterizing Protein Families of Unknown FunctionCharacterizing Protein Families of Unknown Function
Characterizing Protein Families of Unknown FunctionMorgan Langille
 
HMMER 3 & Community Profiling
HMMER 3 & Community ProfilingHMMER 3 & Community Profiling
HMMER 3 & Community ProfilingMorgan Langille
 
Computational prediction and characterization of genomic islands: insights i...
Computational prediction and characterization of genomic islands: insights i...Computational prediction and characterization of genomic islands: insights i...
Computational prediction and characterization of genomic islands: insights i...Morgan Langille
 
Microbial Genomics 2008 Conference Review
Microbial Genomics 2008 Conference ReviewMicrobial Genomics 2008 Conference Review
Microbial Genomics 2008 Conference ReviewMorgan Langille
 
A graduate student's experience in bioinformatics
A graduate student's experience in bioinformaticsA graduate student's experience in bioinformatics
A graduate student's experience in bioinformaticsMorgan Langille
 

More from Morgan Langille (8)

GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics Workshop
 
Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...
 
Inferring microbial community function from taxonomic composition
Inferring microbial community function from taxonomic compositionInferring microbial community function from taxonomic composition
Inferring microbial community function from taxonomic composition
 
Characterizing Protein Families of Unknown Function
Characterizing Protein Families of Unknown FunctionCharacterizing Protein Families of Unknown Function
Characterizing Protein Families of Unknown Function
 
HMMER 3 & Community Profiling
HMMER 3 & Community ProfilingHMMER 3 & Community Profiling
HMMER 3 & Community Profiling
 
Computational prediction and characterization of genomic islands: insights i...
Computational prediction and characterization of genomic islands: insights i...Computational prediction and characterization of genomic islands: insights i...
Computational prediction and characterization of genomic islands: insights i...
 
Microbial Genomics 2008 Conference Review
Microbial Genomics 2008 Conference ReviewMicrobial Genomics 2008 Conference Review
Microbial Genomics 2008 Conference Review
 
A graduate student's experience in bioinformatics
A graduate student's experience in bioinformaticsA graduate student's experience in bioinformatics
A graduate student's experience in bioinformatics
 

Recently uploaded

Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backElena Simperl
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2DianaGray10
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIES VE
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Julian Hyde
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...Product School
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...Product School
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupCatarinaPereira64715
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...Sri Ambati
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...Product School
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...CzechDreamin
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeCzechDreamin
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesThousandEyes
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...Product School
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCzechDreamin
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaRTTS
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka DoktorováCzechDreamin
 

Recently uploaded (20)

Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 

Unknown Genes, Community Profiling, & Biotorrents.net

  • 1. unknown genes, Community Profiling,& Biotorrents.net Morgan Langille UC Davis
  • 3. Questions If we wanted to start studying a gene of unknown function, which one(s) should we study first? How many un-annotated genes could be annotated? What proportion of unknown genes (hypothetical proteins) are probably not real proteins (i.e. pseudo-genes, mis-predicted orfs, etc.) ? What proportion of unknown gene families are probably phage-related? Can some of these families (hopefully the top ranking ones) be characterized using non-similarity based bioinformatic approaches?
  • 6. Phylogenetic profiling Wu, et al., PLOS Genetics, 2005 C. hydrogenoformansidentified presence or absence of homologs in all other completely sequence genomes Identified many hypothetical proteins that had the same profile as other sporulation proteins
  • 7. Community Profiling KEGG COG Delong, et al., Science, 2006
  • 8. Community Profiling Look across multiple metagenomic samples Gene families that have similar profiles may have similar function Similar to using co-expression to identify similar functioning genes
  • 9. So what have I done? "all metagenomics peptides" from CAMERA 43M sequences (mostly GOS) Searched against 11,000 Pfams using HMMER 3 Used “cluster” to group genes and samples
  • 10. Results Metagenomic Samples Red = above avg. number of pfams Green = below avg. number of pfams Have not normalized Number of sequences per sample For number of pfams Pfams
  • 11. Example of phage Pfams clustering together
  • 12. Measuring functional relatedness Need to measure community profiling performance The hierarchal clusters were broken into 575 groups using a correlation cutoff of 0.90 or above. PFams were mapped to GO terms using pfam2GO 1893 PFams had no associated GO term 695 of these were Domains of Unknown Function:DUFs 3377 PFams had one or more associated GO terms and could be used for further analysis Only 67 (of 575) clusters contained 4 or more PFams with at least one GO term
  • 13. Measuring GO similarity G-SESAME Measures the semantic similarity of any two GO terms Not downloadable so queries had to be made to their web server (not fun) Pair-wise similarity was measure for each pair of GO terms in each cluster had to check if terms were in same namespace
  • 14. Results Average G-Sesame scores for each cluster The average of all cluster averages was 0.484 10 clusters had a score of 0.60 or greater. The data was then randomized by using the same GO terms but in different random clusters and a score of 0.412-0.420 over 4 iterations Each of the 4 iterations had only 1 or 0 clusters with a score of 0.60 or greater
  • 15.
  • 16.
  • 17.
  • 18. Bittorrent A peer-to-peer file sharing protocol ~ 27-55% of all Internet traffic Mostly illegal file sharing Files are shared in small pieces between several users
  • 19. Torrents for Biology Why use torrent technology? Download large datasets much faster Searchable central listing Decentralization of data
  • 20. What is BioTorrents? A legal file sharing website for scientists Users can upload their own research results, data, software Users can browse or search through all datasets Data is not hosted on BioTorrents
  • 26. Other Features Forum RSS Feed Top 10 FAQ Links
  • 27. Who will upload data? Everyone! Realistically, Large organizations (e.g. NCBI, CAMERA, etc.) May need some convincing to host their data via torrents in addition to FTP, HTTP, etc. Scientists that really support open science Sharing data before formally complete and published
  • 28. Technical Challenges Many institutions frown on BitTorrent technology A port must be opened/forwarded Client program and computer must be left running Ensuring data is legal, virus free, etc. Users that upload many legitimate torrents will provide more confidence to people downloading Making downloading and uploading easy