Imaging spectrometers housed on satellites are used to obtain data on vegetated surfaces by measuring reflectance from the Earth’s surface. These data are very useful as they provide information on changes in vegetation over time on global scales, which is important to assess the impacts of changes in weather and climate, and the effects of agricultural practices. However, the information provided by these data can be limited due to the resolution of the sensors and inhibiting factors such as cloud cover. In this project we will use two remote sensing sources of the Enhanced Vegetation Index (EVI) to analyze vegetation over Nebraska. The first, Landsat EVI, is available at fine spatial resolution, but is sparse in time. The second, MODIS EVI, is obtained regularly in time, but is available at a much coarser spatial resolution. We will use these data to explore the relationships between vegetation and changes in temperature and landcover (e.g. corn fields versus grasslands), as well as to classify the landcover in unknown regions.
Group members: Samuel Hood, Zhihan Lu, Rita Pradhudesai, Thomas Rechtman, Meghana Tatneni, Ganlin Ye
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Undergraduate Modeling Workshop - Vegetation Working Group Final Presentation, May 25, 2018
1. Classifying Vegetation in
Nebraska using Landsat
data
Mentor: Maggie Johnson
Group member: Samuel Hood, Riya Prabbhudesai, Thomas Rechtman,
Zhihan Lu, Meghana Tatineni, Ganlin Ye
2. Introduction
Land cover: the surface of the ground (i.e, types of vegetation and water)
Knowledge of Land cover is important because:
● Vegetation affects climate and climate affects vegetation
● Landcover can be an important input into various models (ex.climate change,
air pollution)
● Being able to identify changes in land cover helps us understand changes in
agriculture practices and implications of deforestation and how bodies of
water change over time.
3. Data Available
Enhanced Vegetation Index: A measurement of how much chlorophyll is present
*Formula of EVI* -> Uses Reflectance Values
Other Data available:
● Dates Corresponding to when the EVI value was calculated
● Land cover type of each location
● Temperature over the region for each day
● X,Y coordinates of each location
● Longitude, Latitude coordinates of each location
4. Goal
To determine whether remote sensing data can be used to classify the land cover of a region in Nebraska
at high spatial resolution
We’ll accomplish this goal by training our models using the USDA NASS 2008 cropland data layer
10. Introduction to Features
Feature Ideas
Latitude Blue average reflectance Red average reflectance
Longitude Green average reflectance Temperature at max EVI
Duration of the season Nir Average reflectance Temperature at min EVI
Maximum EVI Max-Min blue reflectance Rate of spring green-up
Time of maximum EVI Max-Min green reflectance Starting date of the season
Max-Min EVI Max-Min NIR reflectance Starting EVI of the season
11. Logistic Regression
Background
- Generalized linear model with bernoulli random component and logit link function
- Binary outcome
- Formula
- Advantages: Simple
- Disadvantages: Binary Outcome
Model Outcome
- Two outcomes: open water and vegetation
- One covariate used: Average Green Reflectance values
- Error rate of .82%
12.
13. Multinomial Regression
- Extension of logistic regression
- Outcomes can be more than two categories
- Forward and backward selection used to select features.
- Advantages: Simple
- Disadvantages: Linear Prediction, Overfitting
- 13 out of the 19 features are used in the model
14. Random Forest (will need more slides)
Machine Learning Algorithm that uses decision trees
Since one decision tree would over fit our data, an average of many random trees
is taken
The processes to random forest is similar to tree bagging (insert equation)
Random Forest differs in the fact that it choses a new random subtree at every
vertex
In our models we looked at classifying through a random forest as well as
concatenating random forest through in steps by grouping our different
classifications.
15. Cross Validation
- Separate the data into n
non-overlapping sets of
equal size.
- Train on n-1 of these sets
and test on the other 1
set.
- Average all the accuracy
values for final
assessment of the model
- Benefit:
Reduce bias in training
and testing
16. Method
Error Rates on
randomized test data
Mean error rate in 10-fold
cross validation
Multinomial Logistic
Regression
32.48% 33.38%
Random Forests 16.94% 15.62%
Multi-Layer Random
Forest
16.71% 15.81%
Model Comparison
21. Future Work
There is missing data in the landsat data and it is collected every 16 days.
Missing data can be filled in with motis data but the motis data has low spatial
resolution.
Possible abrupt changes in EVI and temperature are not recorded in landsat data.
Editor's Notes
Ultimately have a global landcover so categrfdd this by using remote sensing data
Landsat and time series example corn and something else
How remote sensing works
1 km of data- google map image of region
Add landsat data
qplot(ylim=c(0,1))
Building the model
(the percentage of each model, to conclude xxxx might be the best.) We did 99% in Binomial logistic regression on water aspect.