GenGIS 2
New approaches to understand the
geography of our microbial world
Rob Beiko
Donovan Parks




        Timothy Mankowski


Mike Porter




                Brett O`Donnell
demo: the GenGIS environment




2-24                 GenGIS v1: Parks et al (2009) Genome Res
GenGIS v1 overview

    GUI (wxPython)
                                     Core                   Output
                                  application           Saved image files
                                    (C++)
  Scripting interface
      (Python, R)

                           Data
                           Map – many formats (GDAL)
                           Samples – CSV
                           Sequences – CSV
                           Trees - Newick

 Crossing minimization + statistical test
 Supported platforms: Windows XP, Vista, 7; OS X 10.4, 10.5, 10.6
 Open source: Creative Commons Attribution – Share Alike 3.0
what's new in v2

    GUI (wxPython)
                                    Core                 Output
                                 application         Saved image files
                                   (C++)              Save / restore
   Scripting interface                                   sessions
       (Python, R)

                         Data
                         Map – many formats (GDAL)
                         Samples – CSV
                         Sequences – CSV
    Python plugins       Trees - Newick
                         External files
Stability improvements, various things now work properly on the Mac
Interface updates (legends, data visualizations)
Linear axes analysis
bringing map data into GenGIS
• Maps:
  – MapMaker (included application)




  – Digital elevation data (Geobase.ca, NASA Shuttle
    Topography data, etc.)
  – Images (.png, .tif, etc.)
three views of the LineP transect




                Original data: Jody Wright, Steven Hallam
diversity and depth
clustering based on Canberra
beta-diversity
relative abundance of SUP05
demo: plugins and R scripts



                   Linear regression of group frequencies

                   Heatmap RPy2 script




                         Original data:
10-29                    Costello et al. Science 326:1694-1697
clustering of fecal samples

Female subjects: F1 – F3
Male subjects: M1 – M3

Two sampling methods:
       - TP
       - Direct from feces

Two time points

= 4 samples per individual. Do these
samples cluster with each other?
Wood Buffalo National Park


      • Canada’s largest National Park

      • UNESCO World Heritage status (Boreal Forest)

      • Threatened by encroaching development
          – Oil Sands mining (Alberta)
          – Metal mining (NWT)
          – Hydro-electric dams (Peace River, BC)

      • Natural resources sustain traditional use by Métis and
      First Nations peoples


Photos: D Baird
biomonitoring 2.0
    what is being collected

•   Benthic invertebrates (COI, 28S) – kick sample
•   Water (16S, 18S, 28S) – 1L volume
•   Soil (16S, COI, ITS, 18S, 28S, RbcL) - cores
•   Terrestrial arthropods (COI, 28S) – malaise / pitfall traps



• All samples replicated 3 times
• 5 time points in initial study
• Lots of metadata (soil chemistry,
  flooding, etc.)
biomonitoring 2.0
replication results – 2010 trial
• fjej
biomonitoring 2.0
sampling progress


• August 2011
   • Samples collected, starting analysis of sequences
   • 'traditional' taxonomy where applicable (arthropods
     si, bacteria no)
• June 2012
   • Samples collected
• Future sampling: August 2012, June – August 2013
biomonitoring 2.0
our three-year mission (and beyond)

• Develop robust sampling techniques for sequence-
  based biomonitoring
• Develop and apply different approaches for
  assessing biodiversity (taxon-based and taxon-
  free), and compare their performance on WBNP data
• Identify whether “reference conditions” can be
  established against which future samples can be
  compared
call for collaborators
• Currently underway:
  – Combined axis tests (Many trees, one optimal gradient)
  – Regional tests of diversity
  – Canonical correlation analysis and related
  – Bio2.0 analysis
• Goals:
  – Integrate with online data sources
  – Support more data types (especially vector data)
  – More plugins!
the long-term goal




Online data sources                       Analysis:
     with APIs                            -Geo gradients
                      Automated dataset   -Diversity vs. habitat
        +                generation /     -Diversity networks
 Local data             visualization     -Functional models
acknowledgments
GenGIS developers
  (Dal)
Donovan Parks
Mike Porter            LineP (UBC)
Timothy Mankowski      Jody Wright
Brett O'Donnell        Steven Hallam
Kathryn Dunphy
                       Bio2.0
Sylvia Churcher
Mike Porter            Mehrdad Hajibabaei (Guelph)
Suwen Wang             Donald Baird, Wendy Monk (UNB)
Harman Clair           Brian Golding (McMaster)
Greg Smolyn            Jeff Shatford (Parks Canada)
Stephen Brooks
Christian Blouin
Jacqueline Whalley
   (Auckland U Tech)
New Zealand fungus beetle
                       (Agyrtodes labralis)




COI phylogeny
Ecological niche modelling suggests
                                       Marske et al. Mol Ecol (2009)
several glacial refugia, phylogenies   Data shown in GenGIS
suggest transalpine migration
map
locations
sequence summaries
tree vs geography
axes test
body site data
linear regression
heatmaps using R

Beiko gen gis2-share

  • 1.
    GenGIS 2 New approachesto understand the geography of our microbial world Rob Beiko
  • 2.
    Donovan Parks Timothy Mankowski Mike Porter Brett O`Donnell
  • 3.
    demo: the GenGISenvironment 2-24 GenGIS v1: Parks et al (2009) Genome Res
  • 4.
    GenGIS v1 overview GUI (wxPython) Core Output application Saved image files (C++) Scripting interface (Python, R) Data Map – many formats (GDAL) Samples – CSV Sequences – CSV Trees - Newick Crossing minimization + statistical test Supported platforms: Windows XP, Vista, 7; OS X 10.4, 10.5, 10.6 Open source: Creative Commons Attribution – Share Alike 3.0
  • 5.
    what's new inv2 GUI (wxPython) Core Output application Saved image files (C++) Save / restore Scripting interface sessions (Python, R) Data Map – many formats (GDAL) Samples – CSV Sequences – CSV Python plugins Trees - Newick External files Stability improvements, various things now work properly on the Mac Interface updates (legends, data visualizations) Linear axes analysis
  • 6.
    bringing map datainto GenGIS • Maps: – MapMaker (included application) – Digital elevation data (Geobase.ca, NASA Shuttle Topography data, etc.) – Images (.png, .tif, etc.)
  • 7.
    three views ofthe LineP transect Original data: Jody Wright, Steven Hallam
  • 8.
  • 9.
    clustering based onCanberra beta-diversity
  • 10.
  • 11.
    demo: plugins andR scripts Linear regression of group frequencies Heatmap RPy2 script Original data: 10-29 Costello et al. Science 326:1694-1697
  • 12.
    clustering of fecalsamples Female subjects: F1 – F3 Male subjects: M1 – M3 Two sampling methods: - TP - Direct from feces Two time points = 4 samples per individual. Do these samples cluster with each other?
  • 14.
    Wood Buffalo NationalPark • Canada’s largest National Park • UNESCO World Heritage status (Boreal Forest) • Threatened by encroaching development – Oil Sands mining (Alberta) – Metal mining (NWT) – Hydro-electric dams (Peace River, BC) • Natural resources sustain traditional use by Métis and First Nations peoples Photos: D Baird
  • 15.
    biomonitoring 2.0 what is being collected • Benthic invertebrates (COI, 28S) – kick sample • Water (16S, 18S, 28S) – 1L volume • Soil (16S, COI, ITS, 18S, 28S, RbcL) - cores • Terrestrial arthropods (COI, 28S) – malaise / pitfall traps • All samples replicated 3 times • 5 time points in initial study • Lots of metadata (soil chemistry, flooding, etc.)
  • 16.
  • 17.
    biomonitoring 2.0 sampling progress •August 2011 • Samples collected, starting analysis of sequences • 'traditional' taxonomy where applicable (arthropods si, bacteria no) • June 2012 • Samples collected • Future sampling: August 2012, June – August 2013
  • 18.
    biomonitoring 2.0 our three-yearmission (and beyond) • Develop robust sampling techniques for sequence- based biomonitoring • Develop and apply different approaches for assessing biodiversity (taxon-based and taxon- free), and compare their performance on WBNP data • Identify whether “reference conditions” can be established against which future samples can be compared
  • 19.
    call for collaborators •Currently underway: – Combined axis tests (Many trees, one optimal gradient) – Regional tests of diversity – Canonical correlation analysis and related – Bio2.0 analysis • Goals: – Integrate with online data sources – Support more data types (especially vector data) – More plugins!
  • 20.
    the long-term goal Onlinedata sources Analysis: with APIs -Geo gradients Automated dataset -Diversity vs. habitat + generation / -Diversity networks Local data visualization -Functional models
  • 22.
    acknowledgments GenGIS developers (Dal) Donovan Parks Mike Porter LineP (UBC) Timothy Mankowski Jody Wright Brett O'Donnell Steven Hallam Kathryn Dunphy Bio2.0 Sylvia Churcher Mike Porter Mehrdad Hajibabaei (Guelph) Suwen Wang Donald Baird, Wendy Monk (UNB) Harman Clair Brian Golding (McMaster) Greg Smolyn Jeff Shatford (Parks Canada) Stephen Brooks Christian Blouin Jacqueline Whalley (Auckland U Tech)
  • 24.
    New Zealand fungusbeetle (Agyrtodes labralis) COI phylogeny Ecological niche modelling suggests Marske et al. Mol Ecol (2009) several glacial refugia, phylogenies Data shown in GenGIS suggest transalpine migration
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.