In this presentation the presenter will be presenting about the availability of open datasets which can be used to qualitatively classify the extent of urbanization. Open datasets such as NASA landsat imagery, night light data and Open streetmap building data can be used to study correlations between various datasets and arrive at a methodology to classify the urbanization.
This analysis is useful for business establishments to overlay their sales datasets and look for patterns in their sales. These patterns can lead to potential new markets and also manage their supply chains more efficiently. Open datasets and Open Source tools enable these analytical outputs to be integrated into their existing work-flows.
Once these sales data is integrated to spatial data machine learning algorithms can be deployed for predictive modeling. Population census data plays a major role in such predictive modeling. Presenter is working on developing solutions using such large scale data available in open domain.
Qualitative Analysis of urbanization using Open datasets
1. Qualitative analysis of urbanization using
Open datasets
- Ajay Kumar Mulakala
Presented at FOSS4G Asia held at IIIT H, India.
2. What do our customers ask us?
●
Suggest them places where they can look for potential markets
– Their sales could be toys, groceries
●
To give them a solution we need demographic data
– At a resolution of around 250*250 sq m.
– Typical census data with number of households, children, women
etc
3. Available Open datasets
Data Source Data Type Challenges Update
frequency
Census Population Spreadsheets Spatial data not available
in open domain
Once in 10 years
OSM Buildings Vector Rural areas data
availability is issue
Whenever possible
Landsat Land use Raster Less resolution than
what we need
Every month
Gridded Population Population Raster Less resolution than
what we need
Generally once in
five years or
in between
4. What we want to do?
OSM
Landsat
Gridded Population which can
be updated as per needs of
our clients
Build correlation
In this presentation we will take you through our process.
You are welcome to suggest any better methods of doing this.
5. Step 1: Raster Data pre processing
Red Band SWIR Band
NDBI =
SWIR - Red
SWIR + Red
red swir
6. Step 2: Testing for seasonal variation
Not much variation except for
rainy season
7. Step 3: Normalization of the Raster data into 2 classes
Normalized into two
classes blue and green.
These have to be correlated
to OSM data.
Blue – scattered
Green – abundant
8. Step 4: Vector data preparation
Divided Area of Interest
into 250*250 sqm grids
Computed grid statistics
for buildings taken from
OSM
Classified grids into
sparse and dense
based on grid statistics
in each grid
PostgreSQL is used to make these calculations
Blue – sparse
Green – dense
9. Step 5: Correlation
The correlation between Classes from NDBI and OSM buildings is
0.625 which indicates moderate linear positive correlation
10. How does this help our customers
●
Building footprint can act as proxy for population
●
Positive correlation helps in finding linear relation between Landsat
NDBI and building density of OSM
●
This helps us estimate population in places where there is no or very
little OSM building data.
●
We can then help them to take decisions based on demographics of
an area
11. Future work
●
Regression analysis for estimating population for OSM building
●
We are planning to use gridded population from NASA's Socio Economic Data and
Application Center (SEDAC) for this analysis
●
Temporal updates for all the country every year
12. References and data sources
1)Indian census data : http://www.dataforall.org/dashboard/censusinfoindia_pca/
2)http://sedac.ciesin.columbia.edu/data/collection/gpw-v4
3)http://www.eea.europa.eu/data-and-maps
4)Built environment – Wikipedia https://en.wikipedia.org/wiki/Built_environment
http://www.kaiinos.com/ http://blog.kaiinos.com/