Visualizing biodiversity in theera of high-throughputsequencingHolly Bik, UC Davis@Dr_Bik
Our ability to visualize high-throughput sequencing data is asbad as my title slide
$250k, 1 year“A Research-Driven DataVisualization Framework for High-Throughput EnvironmentalSequence Data”
http://pitchinteractive.com @pitchinc
“Pitch Interactive dissects largedata sets in search of meaningfuland often hidden patterns thatserve to determine the sha...
Diverse marine community!EASY!EASY!EASY!VERYDifficult!!
Mark Rothko,No. 14, 1960rectangles of orange andpurple with soft edges
h"p://pippascabinet.blogspot.com/2012/11/on6true6love.html:
Challenge 1: Environmental data isterrible at revealing fine-scaletaxonomic patterns
ShallowGulf:ShallowCalif:AtlanAc22#1:AtlanAc25#2:AtlanAc29:AtlanAc43: Pacific128:Pacific528:Pacific422:Pacific321:Pacific237:At...
0:0.1:0.2:0.3:0.4:0.5:0.6:0.7:0.8:0.9:1:Post-spillFungalDominanceNematodeDominancePre-spillBik et al. 2012, PLoS ONE, 7(6)...
Algae:Environmental:Fungi:Metazoa::Annelida:Metazoa::Arthropoda:Metazoa::Gastrotricha:Metazoa::Nematoda:Metazoa::Platyhelm...
Exploring TreesEcologically,what are thesereference taxadoing??!
Pertinent info for biologicalinterpretations of DNA data!!!
Challenge 2: Taxonomic, phylogenetic,and ecological knowledge is imperative formaking meaningful interpretations ofhigh-th...
Enoplus spp.Daptonema spp.Robbea spp.Caenorhabditis elegansActinomyces spp.Clostridium spp.Listeria spp.Synechococcus spp.
Challenge 3: Extremebioinformatics bottleneck formicrobial eukaryote data
rDNA copy number & genome size in eukaryotesProkopowich CD, Gregory TR, Crease TJ. (2003) Genome, 46(1):48–50.
Bik et al., in revision…and in ONE genus of nematodesCaenorhabditis brenneri ~323 rRNA gene copiesCaenorhabditis briggsae ...
OCTU Reads OCTU Length Bit Score E-Value Match bp Total bp % Similarity Chimera DB match27 63 266 525 e-146 265 265 100 -1...
99% cutoffOTUs as ‘Clouds’97% cutoffHow to correlate OTUswith biological species?
Sparse Databases for EukaryotesSILVA&108&Ref&rRNA&Database&(16S/18S)&Bacteria: 530,197:Archaea: 25,658:Eukaryotes: 62,587:
Ambiguous TaxonomyTaxa Region 195%Region 295%Region 199%Region 299%Metazoa (20 Phyla) 1360 1461 43255 25668Nematoda 765 87...
Goal 1: A web-based, scalablevisualization framework forstandard data formats
Tier OneStandard outputs from bioinformatic pipelines
•  BIOM (json) files – OTU tables, metagenome datasets•  Tab-delimited metadata files
http://explore.climbsf.com
Goal 2: Destroy biologists’addiction to pie charts
A pie chart is not the mostinformative way to interpretbiodiversity data!
Tier Two
Bacteria: Archaea:Nematodes:Cilliates:Crustaceans:Circle:size:=:species:abundance:Circle:color:=:metadata:(sample,:temprat...
Goal 4: Find intuitive ways tovisualize new data outputs
Explicitly Phylogenetic Approaches!Aligned:environmental:sequences:Guide:Tree:EvoluAonary:Placement:of:short:reads::::::::::
http://phylosift.wordpress.com!
Input SequencesrRNA workflowprotein workflowprofile HMMs used to aligncandidates to reference alignmentTaxonomicSummariesp...
Probability Distributions:when a pie chart is not a pie chart
Great! !Not Bad !Getting Tricky…
Marine:Metagenome:Tree:Placement:Sing:Tree:6:Guppy:
Goal 5: Pester other peopleSolicit case study participants
Goal 6: (Phase 2) Build a user anddeveloper community
Acknowledgements::Jonathan Eisen Aaron Darling Guillaume Jospin Dongying Wu  David Coil:: Further Information•  hbik@ucdav...
#ievobio Keynote - June 26, 2013
#ievobio Keynote - June 26, 2013
#ievobio Keynote - June 26, 2013
#ievobio Keynote - June 26, 2013
Upcoming SlideShare
Loading in...5
×

#ievobio Keynote - June 26, 2013

818

Published on

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
818
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
9
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

#ievobio Keynote - June 26, 2013

  1. 1. Visualizing biodiversity in theera of high-throughputsequencingHolly Bik, UC Davis@Dr_Bik
  2. 2. Our ability to visualize high-throughput sequencing data is asbad as my title slide
  3. 3. $250k, 1 year“A Research-Driven DataVisualization Framework for High-Throughput EnvironmentalSequence Data”
  4. 4. http://pitchinteractive.com @pitchinc
  5. 5. “Pitch Interactive dissects largedata sets in search of meaningfuland often hidden patterns thatserve to determine the shape andform that best tells a story.”
  6. 6. Diverse marine community!EASY!EASY!EASY!VERYDifficult!!
  7. 7. Mark Rothko,No. 14, 1960rectangles of orange andpurple with soft edges
  8. 8. h"p://pippascabinet.blogspot.com/2012/11/on6true6love.html:
  9. 9. Challenge 1: Environmental data isterrible at revealing fine-scaletaxonomic patterns
  10. 10. ShallowGulf:ShallowCalif:AtlanAc22#1:AtlanAc25#2:AtlanAc29:AtlanAc43: Pacific128:Pacific528:Pacific422:Pacific321:Pacific237:AtlanAc45:PC2:(12.21%):PC3:(10.54%): PC1:(13.03%):Overarching Community Patterns!Bik et al. 2012, Molecular Ecology,!21(5):1048-59 !
  11. 11. 0:0.1:0.2:0.3:0.4:0.5:0.6:0.7:0.8:0.9:1:Post-spillFungalDominanceNematodeDominancePre-spillBik et al. 2012, PLoS ONE, 7(6):e38550 !
  12. 12. Algae:Environmental:Fungi:Metazoa::Annelida:Metazoa::Arthropoda:Metazoa::Gastrotricha:Metazoa::Nematoda:Metazoa::Platyhelminthes:No:Match:Stramenopiles:Unicellular:Eukaryotes:Metazoa::Acanthocephala:Metazoa::Brachiopoda:Metazoa::Bryozoa:Metazoa::Chordata:Metazoa::Cnidaria:Metazoa::Echiura:Metazoa::Entoprocta:Metazoa::Mollusca:FungiGrand&Isle,&Louisiana&:Bik et al. 2012, PLoS ONE, 7(6):e38550 !
  13. 13. Exploring TreesEcologically,what are thesereference taxadoing??!
  14. 14. Pertinent info for biologicalinterpretations of DNA data!!!
  15. 15. Challenge 2: Taxonomic, phylogenetic,and ecological knowledge is imperative formaking meaningful interpretations ofhigh-throughput sequence datasets
  16. 16. Enoplus spp.Daptonema spp.Robbea spp.Caenorhabditis elegansActinomyces spp.Clostridium spp.Listeria spp.Synechococcus spp.
  17. 17. Challenge 3: Extremebioinformatics bottleneck formicrobial eukaryote data
  18. 18. rDNA copy number & genome size in eukaryotesProkopowich CD, Gregory TR, Crease TJ. (2003) Genome, 46(1):48–50.
  19. 19. Bik et al., in revision…and in ONE genus of nematodesCaenorhabditis brenneri ~323 rRNA gene copiesCaenorhabditis briggsae ~56 rRNA gene copies
  20. 20. OCTU Reads OCTU Length Bit Score E-Value Match bp Total bp % Similarity Chimera DB match27 63 266 525 e-146 265 265 100 -1 B. seani 17512 9 265 500 e-138 261 264 98.86 -1 B. seani 175170 8 264 496 e-137 261 264 98.86 0 B. seani 175513 1 264 494 e-136 259 262 98.85 -2 B. seani 175579 2 263 492 e-136 258 261 98.85 -2 B. seani 175570 1 262 492 e-136 258 261 98.85 -1 B. seani 175394 1 263 490 e-135 260 264 98.48 1 B. seani 17519 2 269 488 e-135 264 269 98.14 0 B. seani 175658 1 266 486 e-134 260 265 98.11 -1 B. seani 175412 2 264 480 e-132 260 265 98.11 1 B. seani 175465 9 254 478 e-132 251 254 98.82 0 B. seani 1751164 1 268 478 e-132 261 267 97.75 -1 B. seani 175304 1 261 474 e-130 255 260 98.08 -1 B. seani 175868 1 244 460 e-126 242 245 98.78 1 B. seani 175514 2 274 458 e-126 263 272 96.69 -2 B. seani 175683 1 250 426 e-116 241 249 96.79 -1 B. seani 175627 1 230 422 e-115 223 226 98.67 -4 B. seani 175171 3 212 400 e-108 209 211 99.05 -1 B. seani 1751223 1 202 355 5.00E-95 198 204 97.06 2 B. seani 175Porazinska et al. 2010 ZootaxaIntragenomic variation in Eukaryotic rRNATail!Head!Artificial control community containing known nematodespecies, all with corresponding full length reference 18SHead-Tail Pattern in Nematode OTUs
  21. 21. 99% cutoffOTUs as ‘Clouds’97% cutoffHow to correlate OTUswith biological species?
  22. 22. Sparse Databases for EukaryotesSILVA&108&Ref&rRNA&Database&(16S/18S)&Bacteria: 530,197:Archaea: 25,658:Eukaryotes: 62,587:
  23. 23. Ambiguous TaxonomyTaxa Region 195%Region 295%Region 199%Region 299%Metazoa (20 Phyla) 1360 1461 43255 25668Nematoda 765 879 27020 15518Annelida 217 197 7073 3869Arthropoda 128 178 2280 2323Unicellular eukaryotes 738 1257 15198 22020Environmental isolates 774 686 12687 9775No match 480 354 11345 1868Fungi 225 163 9984 2445Stramenopiles 137 146 1771 1583Algae 111 96 975 861Total (all taxa) 3825 4163 95215 64220!Deep sea and shallow water marine sediment1.2 million reads, 454 GS FLX TitaniumBik et al. 2012, Molecular Ecology,21(5):1048-59
  24. 24. Goal 1: A web-based, scalablevisualization framework forstandard data formats
  25. 25. Tier OneStandard outputs from bioinformatic pipelines
  26. 26. •  BIOM (json) files – OTU tables, metagenome datasets•  Tab-delimited metadata files
  27. 27. http://explore.climbsf.com
  28. 28. Goal 2: Destroy biologists’addiction to pie charts
  29. 29. A pie chart is not the mostinformative way to interpretbiodiversity data!
  30. 30. Tier Two
  31. 31. Bacteria: Archaea:Nematodes:Cilliates:Crustaceans:Circle:size:=:species:abundance:Circle:color:=:metadata:(sample,:temprature,:pH,:etc.):Mockup:example:take:from:h"p://www.wefeelfine.org/:
  32. 32. Goal 4: Find intuitive ways tovisualize new data outputs
  33. 33. Explicitly Phylogenetic Approaches!Aligned:environmental:sequences:Guide:Tree:EvoluAonary:Placement:of:short:reads::::::::::
  34. 34. http://phylosift.wordpress.com!
  35. 35. Input SequencesrRNA workflowprotein workflowprofile HMMs used to aligncandidates to reference alignmentTaxonomicSummariesparallel optionhmmalignmultiple alignmentLASTfast candidate searchpplacerphylogenetic placementLASTfast candidate searchLASTfast candidate searchsearch input against referenceshmmalignmultiple alignmenthmmalignmultiple alignmentInfernalmultiple alignmentLASTfast candidate search<600 bp>600 bpSample Analysis &ComparisonKrona plots,Number of reads placedfor each marker geneEdge PCA,Tree visualization,Bayes factor testseachinputsequencescannedagainstbothworkflows
  36. 36. Probability Distributions:when a pie chart is not a pie chart
  37. 37. Great! !Not Bad !Getting Tricky…
  38. 38. Marine:Metagenome:Tree:Placement:Sing:Tree:6:Guppy:
  39. 39. Goal 5: Pester other peopleSolicit case study participants
  40. 40. Goal 6: (Phase 2) Build a user anddeveloper community
  41. 41. Acknowledgements::Jonathan Eisen Aaron Darling Guillaume Jospin Dongying Wu David Coil:: Further Information•  hbik@ucdavis.edu •  @Dr_Bik – updates posted to Twitter•  Grant proposal now posted on Figshare!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×