Fungal ITS meeting presentation

1,120 views

Published on

My talk from the Fungal ITS meeting in Boulder, Colorado (sponsored by the Sloan Foundation). Discussing metagenomic tools for fungal studies, and how we can increase support for fungal researchers within our computational pipelines being developed at UC Davis.

Published in: Education
  • Be the first to comment

  • Be the first to like this

Fungal ITS meeting presentation

  1. 1. Metagenomic+tools+for+the+ fungal+community+ Holly+Bik,+UC+Davis+ 19+October+2012+
  2. 2. hAp://phylosiE.wordpress.com+
  3. 3. Explicitly+PhylogeneLc+Approaches+ Aligned+ EvoluLonary+ environmental+ Placement+of+ sequences+ short+reads+ ++++ Guide+Tree+
  4. 4. We+provide:+•  Support+for+Paired+End+(raw)+Illumina+data+•  Marker+gene+data+for+Bacteria,+Archaea,+ Eukaryotes,+Viruses+•  Taxonomy+assignments+based+on+probability+ distribuLons+over+a+reference+phylogeny+•  Complement+to+exisLng+tools+–+QIIME/VAMPs+ –  Inputs/outputs+will+be+compaLble+for+use+with+ other+soEware+tools+
  5. 5. Markers+•  PMPROK+–+Dongying+Wu’s+Bac/Arch+markers+•  EukaryoLc+Orthologs+–+Parfrey+2011+paper+•  16S/18S+rRNA++•  Mitochondria+_+protein_coding+genes+•  Viral+Markers+–+Markov+clustering+on+genomes+•  Codon+Subtrees+–+finer+scale+taxonomy+•  Extended+Markers+–+plasLds,+gene+families+
  6. 6. Reference+Marker+Genes+
  7. 7. specifiedPDcutoff(e.g.99%) Quan?ta?vemetric(minimum Tree ReconcileNCBItaxonomyIDsThe+Monkey+–+Build+Marker+Packages+ hammingdistance)usedtomatch Reconciliation withphylogene?ctopology edgesbetweenNCBItaxontree andmolecularphylogeny Cleanandpackagenewmarkergenes Built Marker Newmarkergenepackagesplacedinto Packages sharedPhyloSiSmarkerdirectory MappingFile PD AlignmentFile Executeindexmode (sequencename,NCBItaxonID) cutoff (MarkersequencesinFASTAformat) Locallyindexedmarkerpackages willnotinterferewithautoma?c Index Marker Indexesthemarkerdatabasesneeded NOTE:Newmarkerpackagesare forLASTandBow?e namedaccordingtoinputfilenames Executebuild_markermode updatestoPhyloSiScoremarkers Database (e.g.MarkerAlignment.fasta).Core markerdatawillbeoverwriXen duringnewmarkerbuildsifinput hmmbuild CreateprofileHMMs(orCMsfor filesdonothaveuniquenames rRNAdata)usinginputsequences (ssu-build) comparedtoexis?ngPhyloSiS markers. BuiltPhyloSiSMarkerpackage GenerateuniqueIDsforinputsequences Buildtreeandcollapse Tree HMMprofile FastTree topologyaccordingtoauserM (CMsforrRNA) specifiedPDcutoff(e.g.99%) Representa?ve Taxonmap Quan?ta?vemetric(minimum sequences hammingdistance)usedtomatch Tree ReconcileNCBItaxonomyIDs Reconciliation withphylogene?ctopology edgesbetweenNCBItaxontree andmolecularphylogeny Alignment Cleanandpackagenewmarkergenes Built Marker Newmarkergenepackagesplacedinto Packages sharedPhyloSiSmarkerdirectory Executeindexmode Locallyindexedmarkerpackages willnotinterferewithautoma?c Index Marker Indexesthemarkerdatabasesneeded Database forLASTandBow?e updatestoPhyloSiScoremarkers BuiltPhyloSiSMarkerpackage
  8. 8. The+Kangaroo+–+SimulaLon+Data+ Genome&Directory& Define&the&number&of&&genomes&to&pick&(default&=&10)&and&number&of& reads&to&generate&per&file&(default&=&100,000)& Execute&sim&mode& Determines&PD&contribuFons&for&taxa& PD on present&in&concatenated&guide&tree& concatenated tree in&PhyloSiH&marker&directory& Two&separate&approaches&used:& 1.  Select&some&number&of&taxa&that&contribute& Select Taxa to&PD&(user&input,&default&=&10&taxa)& 2.  Sample&taxa&uniformly&without&replacement& Compute metrics Calculated&metrics&include:&the&distance&to& between target and nearest&neighbors,&connecFng&branch& remaining taxa lengths,&and&the&number&of&sampled&nodes& within&various&PD&units&of&connecFng&nodes.& Knockout Workflow&plugs&into&updateDB&to& remove&genomes&which&have&been&used& Swaths of Taxa to&simulate&metagenome&data,&as&well&as& a&swath&of&related&taxa.& Grinder&algorithm&randomly&generates& Generated reads&from&selected&genomes,&outputs& Simulated Reads simulated&PEAIllumina&and&454&datasets& A&new&marker&directory&is&created,& Simulation where&simulated&genomes&have&been& Marker Directory knocked&out&from&marker&packages.&&
  9. 9. DBupdate+–+Mining+new+genomes+ EBI Private NCBI JGI Genomes Genomes Genomes Genomes Execute phylosi/_dbupdate.pl Run PhyloSift (search + align) Addnewsequencestomarkerpackages Infer Updated Tree Amino Acid Nucleotide Tree Tree PDmetricusedtosplitguidetreeinto Ataxasetisselectedwitha Codon smallersubtrees;subsetsoftaxaare maxPDcutoffof0.02andanew Prune Tree treeisinferred Subtrees selectedsuchthatnobranchconnecEng themhaslength>0.XforsomevalueofX Newsequencesaddedat0.25PDforamino acidtree;higherPDthresholdenables Update reference moreaggressivesearchesofreference sequences with database,sinceLASTsearchingisfaster new data withfewersequences. ReconcileNCBItaxonomyIDswith Tree phylogeneEctopologies,forboth Reconciliation aminoacidtreeandcodonsubtrees Package Markers Automated Users’localmarkerdatabasesareautomaEcally Download to scannedeachEmePhyloSi/isrunandanynew PhyloSift Users updatesareautomaEcallydownloadedifavailable
  10. 10. Tree+ReconciliaLon+in+PhyloSiE+ Environmental, Named, Sequences, Taxa,
  11. 11. Great!,, Not,Bad,, Ge9ng,Tricky…,,
  12. 12. Tree+Placement+ Fat+Tree+_+Guppy+
  13. 13. Chemoautotrophic+Marine+ bacteria+–+oxidize+Metagenome+ ammonia+into+nitrite+ Alveolate+ProLsts+ Common+seawater+ Archaea+
  14. 14. Tree+Placement+ Tog+Tree+_+Guppy+
  15. 15. Marine+Metagenome+
  16. 16. Marine+ Metagenome+Tree+Placement+ Sing+Tree+_+Guppy+
  17. 17. Linking+with+the+Fungal+ITS+community+•  How+does+fungal+ITS+sequence+data+relate+to+your+ project?+ –  PhyloSiE+has+the+capability+to+add+any+marker+gene+ reference+packages+that+are+relevant+for+specific+ taxonomic+communiLes++•  What+fungal+ITS+data+does+your+project+currently+ provide+ –  None+–+but+we+do+mine+other+marker+genes+from+ fungal+genomes+•  What+fungal+ITS+data+is+your+project+hoping+to+ provide?+ –  We+wouldn’t+provide+data,+but+can+work+with+users+to+ increase+support+for+fungal+analyses+
  18. 18. Linking+with+the+Fungal+ITS+community+•  Is+your+project+involved+with+curaLng+fungal+ITS+ sequences+ –  No,+but+we+would+curate+alignments+and+marker+ packages+of+ITS+sequences+mined+from+public+ databases+•  If+so,+what+curaLon+strategies+are+being+ implemented+for+your+project?+ –  Alignment+filtering+and+masking,+pruning+reference+ trees+•  What+tools+for+working+with+fungal+ITS+sequences+ does+your+project+currently+provide?++ –  None+so+far+–+but+can+be+implemented+if+given+a+ reference+dataset+(e.g.+alignment)+
  19. 19. Linking+with+the+Fungal+ITS+community+•  What+tools+are+you+developing+/+planning+to+ develop?++ –  Current+focus+is+on+mulLsample+comparisons+ –  Gene+tree+reconciliaLon+ –  Probability+distribuLon+over+tree+topology+to+ delimit+OTUs+(PhylogeneLc+OTUs)+•  What+framework+of+fungal+taxonomy+does+ your+project+use?++ –  NCBI_derived+taxonomy+(because+of+tree+ mapping/reconciliaLon+issues)+
  20. 20. SATELLITE MEETING Eukaryotic Metagenomics March/April 2013 UC Davis
  21. 21. Acknowledgements+UC+Davis+•  Jonathan+Eisen+•  Aaron+Darling+•  Guillaume+Jospin+•  Dongying+Wu+•  David+Coil++PhyloSiE+SoEware+Development+on+Github:+hAps://github.com/gjospin/PhyloSiE++Google+Group+for+user+support:++hAps://groups.google.com/d/forum/phylosiE++TwiAer:+@PhyloSiE+

×