SlideShare a Scribd company logo
1 of 1
Download to read offline
Introduction
Clustering techniques can play an important role in analyzing
ecological data. Our goal was to apply these clustering techniques to a
large dataset consisting of plant species information from the US and
Canada and group the states, territories, and provinces into
geographic regions according to the distribution of plant species. We
used the Kulczynski dissimilarity coefficient to measure the similarity
and dissimilarity among the regions. These measures, along with the
average linkage clustering method, were then used to group the
regions. In order to visualize the clusters, we created a dendrogram
and identified 13 distinct clusters that best indicate plant species
trends across our target geographic area. Finally, we displayed the
clustered regions on a geographic map and compared them with
typical forest maps.
Preparing the Data
● We began with a data set as pictured below (Figure 1) where the
rows represented the genus and species of each of the plants, and
the columns represented each of the states that the plants were
present in.
● We wrote a program to convert this raw data into a binary
presence-absence matrix which could be handled much more
naturally by R (Figure 2).
● Many of the species were only present in a couple of locations,
which would provide minimal information for clustering. We
decided to elimate any plants that appear in three or less locations,
which meant that the reminaing plants showed stronger ties
between geographic regions.
● These actions reduced the total number of plants from 34,000 to
12,000 rows.
Figure 1
Figure 2
A Cluster Analysis of North American Locations Based on Plant Species
Sally Dufek and Parker Kain
Faculty Mentors: Dhanuja Kasturiratna and Aimee Krug
Kulczynski
To establish a quantifiable distance between each of the states, we utilized a
package called “prabclus” containing a method that calculates the Kulczynski
distance for each of the pairs of states. We chose Kulczynski distance because
it works better for presence and absence data, specifically where there are
significantly more zeroes than ones. Kulczynski distance uses the following
formula, where A1 and A2, where A1 represents all of the plant species in a
state, and A1 ∩ A2 is the shared states between two states:
After creating a table of Kulczynski distances between states, we could cluster
the states using hierarchical clustering methods. We chose average linkage to
minimize the variance in distances between our clusters, as it focuses on a
central measure of location rather than merely the closest or furthest points.
We chose 13 clusters with each one being unique and tightly packed and
limited the number of one and two state clusters (Figure 3). As pictured
above, some of the clusters are significantly larger than others, while the
smallest one contains one state, Hawaii. From here, we utilized the “maps”
package in R to create a colored map of our areas of interest to best visualize
the clusters, and compare them to a topographical map of US forests (Figure
5). The maps are very similar with only minor border regions bearing any real
differences.
Results
Our 13 clusters (Figure 4) made sense when overlayed with various
topographical forest maps of the US and Canada (Figure 5). Our
geographic map was able to recreate typical forest maps rather
accurately with only small variations in borders, and in some places
allowed for smaller, more tightly packed clusters than the forest
maps.
Acknowledgements
We would like to thank the UR-STEM program at NKU for
funding our research project.
Figure 3
Figure 4
Figure 5
References
"Facts and Information about the Continent of North America." Natural History on the Net.
N.p., 07 July 2016. Web. 27 July 2016
Henning, Chrstian, and Bernard Hausdorf. Design of Dissimilarity Measures: A New
Dissimilarity between Species Distribution Areas. UCL - London's Global University. UCL,
n.d. Web. 15 July 2016.
Tan, Pang-Ning, Michael Steinbach, and Vipin Kumar. Introduction to Data Mining. Boston:
Pearson Addison Wesley, 2005. Print.

More Related Content

Similar to UR-STEM Poster

Written Assignment 2 Biology and Technology in the Real World.docx
Written Assignment 2 Biology and Technology in the Real World.docxWritten Assignment 2 Biology and Technology in the Real World.docx
Written Assignment 2 Biology and Technology in the Real World.docxbernadettexrb
 
Written Assignment 2 Biology and Technology in the Real World.docx
Written Assignment 2 Biology and Technology in the Real World.docxWritten Assignment 2 Biology and Technology in the Real World.docx
Written Assignment 2 Biology and Technology in the Real World.docxmaryettamckinnel
 
Moderate_resolution_GEC
Moderate_resolution_GECModerate_resolution_GEC
Moderate_resolution_GECKenneth Kay
 
Cushman parsimony in landscape metrics strength, universality
Cushman parsimony in landscape metrics strength, universalityCushman parsimony in landscape metrics strength, universality
Cushman parsimony in landscape metrics strength, universalityCalidad Ambiental
 
Written Assignment 1 Biology and Technology in the Real World.docx
Written Assignment 1 Biology and Technology in the Real World.docxWritten Assignment 1 Biology and Technology in the Real World.docx
Written Assignment 1 Biology and Technology in the Real World.docxouldparis
 
Comarpsion of Understory and Overstory
Comarpsion of Understory and OverstoryComarpsion of Understory and Overstory
Comarpsion of Understory and OverstoryStephan DiTullio
 
Supervised and unsupervised classification techniques for satellite imagery i...
Supervised and unsupervised classification techniques for satellite imagery i...Supervised and unsupervised classification techniques for satellite imagery i...
Supervised and unsupervised classification techniques for satellite imagery i...gaup_geo
 
1Running head ABBREVIATED TITLE OF PAPER (50 characters maxim.docx
1Running head ABBREVIATED TITLE OF PAPER (50 characters maxim.docx1Running head ABBREVIATED TITLE OF PAPER (50 characters maxim.docx
1Running head ABBREVIATED TITLE OF PAPER (50 characters maxim.docxdrennanmicah
 
EnglishHollySpatialAnalysis
EnglishHollySpatialAnalysisEnglishHollySpatialAnalysis
EnglishHollySpatialAnalysisJames Watson
 
Advanced Remote Sensing Project Report
Advanced Remote Sensing Project ReportAdvanced Remote Sensing Project Report
Advanced Remote Sensing Project ReportJeffrey Schorsch
 
Zhao_Danton_SR16_Poster
Zhao_Danton_SR16_PosterZhao_Danton_SR16_Poster
Zhao_Danton_SR16_PosterDanton Zhao
 
Ecological study design multiple group study and statistical analysis
Ecological study design multiple group study and statistical analysisEcological study design multiple group study and statistical analysis
Ecological study design multiple group study and statistical analysissirjana Tiwari
 
Ecological Connectivity in Delaware, Ohio
Ecological Connectivity in Delaware, OhioEcological Connectivity in Delaware, Ohio
Ecological Connectivity in Delaware, OhioStefanie Hauck
 
Application of gis and gps in civil engineering
Application of gis and gps in civil engineeringApplication of gis and gps in civil engineering
Application of gis and gps in civil engineeringAvinash Anand
 

Similar to UR-STEM Poster (20)

Written Assignment 2 Biology and Technology in the Real World.docx
Written Assignment 2 Biology and Technology in the Real World.docxWritten Assignment 2 Biology and Technology in the Real World.docx
Written Assignment 2 Biology and Technology in the Real World.docx
 
Report Argonne
Report ArgonneReport Argonne
Report Argonne
 
Written Assignment 2 Biology and Technology in the Real World.docx
Written Assignment 2 Biology and Technology in the Real World.docxWritten Assignment 2 Biology and Technology in the Real World.docx
Written Assignment 2 Biology and Technology in the Real World.docx
 
Moderate_resolution_GEC
Moderate_resolution_GECModerate_resolution_GEC
Moderate_resolution_GEC
 
CT_NASAGoddard_PosterPresentation
CT_NASAGoddard_PosterPresentationCT_NASAGoddard_PosterPresentation
CT_NASAGoddard_PosterPresentation
 
BaltezarMangroves2016Final
BaltezarMangroves2016FinalBaltezarMangroves2016Final
BaltezarMangroves2016Final
 
Cushman parsimony in landscape metrics strength, universality
Cushman parsimony in landscape metrics strength, universalityCushman parsimony in landscape metrics strength, universality
Cushman parsimony in landscape metrics strength, universality
 
Written Assignment 1 Biology and Technology in the Real World.docx
Written Assignment 1 Biology and Technology in the Real World.docxWritten Assignment 1 Biology and Technology in the Real World.docx
Written Assignment 1 Biology and Technology in the Real World.docx
 
Comarpsion of Understory and Overstory
Comarpsion of Understory and OverstoryComarpsion of Understory and Overstory
Comarpsion of Understory and Overstory
 
Supervised and unsupervised classification techniques for satellite imagery i...
Supervised and unsupervised classification techniques for satellite imagery i...Supervised and unsupervised classification techniques for satellite imagery i...
Supervised and unsupervised classification techniques for satellite imagery i...
 
1Running head ABBREVIATED TITLE OF PAPER (50 characters maxim.docx
1Running head ABBREVIATED TITLE OF PAPER (50 characters maxim.docx1Running head ABBREVIATED TITLE OF PAPER (50 characters maxim.docx
1Running head ABBREVIATED TITLE OF PAPER (50 characters maxim.docx
 
EnglishHollySpatialAnalysis
EnglishHollySpatialAnalysisEnglishHollySpatialAnalysis
EnglishHollySpatialAnalysis
 
391 pensando espacialmente
391 pensando espacialmente391 pensando espacialmente
391 pensando espacialmente
 
Advanced Remote Sensing Project Report
Advanced Remote Sensing Project ReportAdvanced Remote Sensing Project Report
Advanced Remote Sensing Project Report
 
Zhao_Danton_SR16_Poster
Zhao_Danton_SR16_PosterZhao_Danton_SR16_Poster
Zhao_Danton_SR16_Poster
 
Ecological study design multiple group study and statistical analysis
Ecological study design multiple group study and statistical analysisEcological study design multiple group study and statistical analysis
Ecological study design multiple group study and statistical analysis
 
Population Density Mapping using the Dasymetric Method
Population Density Mapping using the Dasymetric MethodPopulation Density Mapping using the Dasymetric Method
Population Density Mapping using the Dasymetric Method
 
Ecological Connectivity in Delaware, Ohio
Ecological Connectivity in Delaware, OhioEcological Connectivity in Delaware, Ohio
Ecological Connectivity in Delaware, Ohio
 
Application of gis and gps in civil engineering
Application of gis and gps in civil engineeringApplication of gis and gps in civil engineering
Application of gis and gps in civil engineering
 
Scientific Paper
Scientific PaperScientific Paper
Scientific Paper
 

UR-STEM Poster

  • 1. Introduction Clustering techniques can play an important role in analyzing ecological data. Our goal was to apply these clustering techniques to a large dataset consisting of plant species information from the US and Canada and group the states, territories, and provinces into geographic regions according to the distribution of plant species. We used the Kulczynski dissimilarity coefficient to measure the similarity and dissimilarity among the regions. These measures, along with the average linkage clustering method, were then used to group the regions. In order to visualize the clusters, we created a dendrogram and identified 13 distinct clusters that best indicate plant species trends across our target geographic area. Finally, we displayed the clustered regions on a geographic map and compared them with typical forest maps. Preparing the Data ● We began with a data set as pictured below (Figure 1) where the rows represented the genus and species of each of the plants, and the columns represented each of the states that the plants were present in. ● We wrote a program to convert this raw data into a binary presence-absence matrix which could be handled much more naturally by R (Figure 2). ● Many of the species were only present in a couple of locations, which would provide minimal information for clustering. We decided to elimate any plants that appear in three or less locations, which meant that the reminaing plants showed stronger ties between geographic regions. ● These actions reduced the total number of plants from 34,000 to 12,000 rows. Figure 1 Figure 2 A Cluster Analysis of North American Locations Based on Plant Species Sally Dufek and Parker Kain Faculty Mentors: Dhanuja Kasturiratna and Aimee Krug Kulczynski To establish a quantifiable distance between each of the states, we utilized a package called “prabclus” containing a method that calculates the Kulczynski distance for each of the pairs of states. We chose Kulczynski distance because it works better for presence and absence data, specifically where there are significantly more zeroes than ones. Kulczynski distance uses the following formula, where A1 and A2, where A1 represents all of the plant species in a state, and A1 ∩ A2 is the shared states between two states: After creating a table of Kulczynski distances between states, we could cluster the states using hierarchical clustering methods. We chose average linkage to minimize the variance in distances between our clusters, as it focuses on a central measure of location rather than merely the closest or furthest points. We chose 13 clusters with each one being unique and tightly packed and limited the number of one and two state clusters (Figure 3). As pictured above, some of the clusters are significantly larger than others, while the smallest one contains one state, Hawaii. From here, we utilized the “maps” package in R to create a colored map of our areas of interest to best visualize the clusters, and compare them to a topographical map of US forests (Figure 5). The maps are very similar with only minor border regions bearing any real differences. Results Our 13 clusters (Figure 4) made sense when overlayed with various topographical forest maps of the US and Canada (Figure 5). Our geographic map was able to recreate typical forest maps rather accurately with only small variations in borders, and in some places allowed for smaller, more tightly packed clusters than the forest maps. Acknowledgements We would like to thank the UR-STEM program at NKU for funding our research project. Figure 3 Figure 4 Figure 5 References "Facts and Information about the Continent of North America." Natural History on the Net. N.p., 07 July 2016. Web. 27 July 2016 Henning, Chrstian, and Bernard Hausdorf. Design of Dissimilarity Measures: A New Dissimilarity between Species Distribution Areas. UCL - London's Global University. UCL, n.d. Web. 15 July 2016. Tan, Pang-Ning, Michael Steinbach, and Vipin Kumar. Introduction to Data Mining. Boston: Pearson Addison Wesley, 2005. Print.