TerraPop Goals
Lower barriers to conducting interdisciplinary human-
environment interactions research by making data with
different formats from different scientific domains easily
interoperable
Provide an organizational and technical framework to
preserve, integrate, disseminate, and analyze global-
scale spatiotemporal data describing population and the
environment.
TerraPop in Context
Collaborating Organizations
• Data integration expertise
• Large census and survey
data collections & expertise
• Institutional foundation
• Human-environment
interactions research
expertise
• Environmentally-oriented
data collections & expertise
• Data preservation and
sustainability expertise
• Social science data
collections & expertise
• Major producers and distributors of data on both humans and their environment
• Major producers of tools for integrating and transforming data across formats
• Leaders in preservation and sustainability
Background
 Sustainable Digital Data Access and
Preservation Network (DataNet)
 Provide reliable digital preservation, access, integration, and
analysis
 Anticipate and adapt to technological change and user needs
 Engage with frontiers of computer/information science and CI
 Serve as component elements of interoperable data
preservation and access network
established in 2009
established in 2011
TerraPop in Context
DataNet Cyberinfrastructure
Curated population and
environment data collection
 Exposed through DataONE,
SEAD
 Extracts exportable to DFC
Integration services
 Potentially available through
DFC, SEAD
 Open source components and
API
• T W O D O M AI N S : P O P U L AT I O N & E N V I R O N M E N T
• T H R E E D ATA S T R U C T U R E S
• Microdata
• Area-level data
• Rasters
Source Data
Making disparate data formats interoperable
Microdata:
Characteristics of individuals
and households
Area-level data:
Characteristics of places defined
by boundaries
Raster data:
Values tied to spatial
coordinates
Location-Based Integration
Microdata
Area-level dataRasters
Mix and match
variables originating in
any of the data structures
Obtain output in the
data structure most
useful to you
Location-Based Integration
Individuals and households
with their environmental
and social context
Microdata
Area-level dataRasters
Age Sex
36 M
34 F
11 M
8 M
42 M
39 F
15 F
Landcover
Forest
Forest
Forest
Forest
Grassland
Grassland
Grassland
Location-Based Integration
Summarized
environmental
and population
Microdata
Area-level dataRasters
characteristics for
administrative
districts
County ID
G01001
G01003
G01005
G01007
County ID Mean Ann.
Precip.
Median HH
Income
G01001 768 50,500
G01003 589 48,500
G01005 867 51,000
G01007 701 50,750
Location-Based Integration
Rasters of
population and
environment
data
Microdata
Area-level dataRasters
Why TerraPop?
 Data
 Access
 Preservation
 Documentation
 Creation
 Transformations
Improved Data Access
Preservation
 Data producers have no preservation plan
 GLI crops data
 Previous versions of data difficult or impossible
to find
 MODIS Land Cover Collection 4 superseded by Collection 5,
but Collection 4 is unavailable
Documentation
 Data lacks sufficient (or any) metadata
http://gli.environment.umn.edu/
Documentation
 GLI crops data originally provided through an
anonymous FTP site
 No metadata provided with the data files
 So, we wrote it!
http://www.earthstat.org/
http://www.earthstat.org/wp-content/uploads/METADATA_HarvestedAreaYield175Crops.pdf
Abaca – Harvested Area GeoTIFF Metadata
http://www.earthstat.org
Abaca – Harvested Area GeoTIFF Metadata
http://www.terrapop.org
Creation
 Historical subnational GIS data
 Matched to census data
 Aligned with most recent GIS data available for a given
country
Photographed Countries
Census Bureau Library, Library of Congress, Harvard
Creation
 Historical subnational GIS data
 Matched to census data
 Aligned with most recent GIS data available for a given
country
 Area-level data
 Tabulated from census microdata
 Obtained from census agencies as digital files, PDFs, or
HTML tables
Transformations
Continuous Binary Categorical
Min Percent area Mode
Max Total area Number of Classes
Mean
Count
Percent area*
Total area*
* Available for some continuous agricultural rasters
Area-Level Summary of Raster Data
Data in TerraPop
Completed GIS Boundary FilesGIS Boundary Files In Progress
Beta Raster data
 Global Landscapes Initiative (GLI)
 Yield and harvested area for 175 crops
 Global Land Cover 2000 (GLC2000)
 Land cover data, circa 2000, derived from the VEGETATION
instrument on the SPOT 4 satellite
 WorldClim
 Climate data describing temperature, precipitation, and
bioclimatic variables, created from weather station data
collected from approximately 1950-2000
New Raster Data
 MODIS Land Cover Type (MCD12Q1)
 Yearly land cover data derived from the MODIS Terra and
Aqua satellites, available for 2001 - 2012
 500 meter spatial resolution
 Available in five land cover classifications
 IGBP
 University of Maryland
 LAI/fPAR
 Net Primary Productivity
 Plant Functional Type
 Now available on our staging site
Project Status
 Currently in project year 4
 Prepping a rollout of new data, but you can
preview it at http://beta2.terrapop.org
 Prepping a new UI for summer 2015
 Always creating new data!

Terra Populus: Integrated Data on Population and Environment

  • 2.
    TerraPop Goals Lower barriersto conducting interdisciplinary human- environment interactions research by making data with different formats from different scientific domains easily interoperable Provide an organizational and technical framework to preserve, integrate, disseminate, and analyze global- scale spatiotemporal data describing population and the environment.
  • 3.
    TerraPop in Context CollaboratingOrganizations • Data integration expertise • Large census and survey data collections & expertise • Institutional foundation • Human-environment interactions research expertise • Environmentally-oriented data collections & expertise • Data preservation and sustainability expertise • Social science data collections & expertise • Major producers and distributors of data on both humans and their environment • Major producers of tools for integrating and transforming data across formats • Leaders in preservation and sustainability
  • 4.
    Background  Sustainable DigitalData Access and Preservation Network (DataNet)  Provide reliable digital preservation, access, integration, and analysis  Anticipate and adapt to technological change and user needs  Engage with frontiers of computer/information science and CI  Serve as component elements of interoperable data preservation and access network
  • 5.
  • 6.
  • 7.
    TerraPop in Context DataNetCyberinfrastructure Curated population and environment data collection  Exposed through DataONE, SEAD  Extracts exportable to DFC Integration services  Potentially available through DFC, SEAD  Open source components and API
  • 8.
    • T WO D O M AI N S : P O P U L AT I O N & E N V I R O N M E N T • T H R E E D ATA S T R U C T U R E S • Microdata • Area-level data • Rasters Source Data
  • 9.
    Making disparate dataformats interoperable Microdata: Characteristics of individuals and households Area-level data: Characteristics of places defined by boundaries Raster data: Values tied to spatial coordinates
  • 10.
    Location-Based Integration Microdata Area-level dataRasters Mixand match variables originating in any of the data structures Obtain output in the data structure most useful to you
  • 11.
    Location-Based Integration Individuals andhouseholds with their environmental and social context Microdata Area-level dataRasters Age Sex 36 M 34 F 11 M 8 M 42 M 39 F 15 F Landcover Forest Forest Forest Forest Grassland Grassland Grassland
  • 12.
    Location-Based Integration Summarized environmental and population Microdata Area-leveldataRasters characteristics for administrative districts County ID G01001 G01003 G01005 G01007 County ID Mean Ann. Precip. Median HH Income G01001 768 50,500 G01003 589 48,500 G01005 867 51,000 G01007 701 50,750
  • 13.
    Location-Based Integration Rasters of populationand environment data Microdata Area-level dataRasters
  • 14.
    Why TerraPop?  Data Access  Preservation  Documentation  Creation  Transformations
  • 15.
  • 20.
    Preservation  Data producershave no preservation plan  GLI crops data  Previous versions of data difficult or impossible to find  MODIS Land Cover Collection 4 superseded by Collection 5, but Collection 4 is unavailable
  • 21.
    Documentation  Data lackssufficient (or any) metadata http://gli.environment.umn.edu/
  • 22.
    Documentation  GLI cropsdata originally provided through an anonymous FTP site  No metadata provided with the data files  So, we wrote it!
  • 23.
  • 24.
  • 25.
    Abaca – HarvestedArea GeoTIFF Metadata http://www.earthstat.org
  • 27.
    Abaca – HarvestedArea GeoTIFF Metadata http://www.terrapop.org
  • 28.
    Creation  Historical subnationalGIS data  Matched to census data  Aligned with most recent GIS data available for a given country
  • 34.
    Photographed Countries Census BureauLibrary, Library of Congress, Harvard
  • 35.
    Creation  Historical subnationalGIS data  Matched to census data  Aligned with most recent GIS data available for a given country  Area-level data  Tabulated from census microdata  Obtained from census agencies as digital files, PDFs, or HTML tables
  • 36.
    Transformations Continuous Binary Categorical MinPercent area Mode Max Total area Number of Classes Mean Count Percent area* Total area* * Available for some continuous agricultural rasters
  • 37.
  • 40.
  • 43.
    Completed GIS BoundaryFilesGIS Boundary Files In Progress
  • 44.
    Beta Raster data Global Landscapes Initiative (GLI)  Yield and harvested area for 175 crops  Global Land Cover 2000 (GLC2000)  Land cover data, circa 2000, derived from the VEGETATION instrument on the SPOT 4 satellite  WorldClim  Climate data describing temperature, precipitation, and bioclimatic variables, created from weather station data collected from approximately 1950-2000
  • 45.
    New Raster Data MODIS Land Cover Type (MCD12Q1)  Yearly land cover data derived from the MODIS Terra and Aqua satellites, available for 2001 - 2012  500 meter spatial resolution  Available in five land cover classifications  IGBP  University of Maryland  LAI/fPAR  Net Primary Productivity  Plant Functional Type  Now available on our staging site
  • 46.
    Project Status  Currentlyin project year 4  Prepping a rollout of new data, but you can preview it at http://beta2.terrapop.org  Prepping a new UI for summer 2015  Always creating new data!

Editor's Notes

  • #7 DataONE – University of New Mexico, UC Santa Barbara, Oak Ridge National Laboratory, and many more DC – Johns Hopkins, National Snow and Ice Data Center, UIUC SEAD – University of Michigan, Indiana University, Rensselaer, UIUC, ICPSR DFC – UNC Chapel Hill, U of South Carolina, Drexel, Ocean Observatories Initiative TerraPop – Minnesota Population Center, ICPSR, Institute on the Environment (UMN), CIESIN (Columbia)
  • #11 Integration across domains, formats hinges on geography Users get any type of data in format useful to them Requires boundary files, boundaries harmonized over time
  • #12 Integration across domains, formats hinges on geography Users get any type of data in format useful to them Requires boundary files, boundaries harmonized over time
  • #13 Integration across domains, formats hinges on geography Users get any type of data in format useful to them Requires boundary files, boundaries harmonized over time
  • #14 Integration across domains, formats hinges on geography Users get any type of data in format useful to them Requires boundary files, boundaries harmonized over time
  • #16  ----- Meeting Notes (4/28/15 09:34) ----- Have to download tiles of data and piece them together - just identifying the tiles is time consuming!
  • #18 2001 Census of Population – Croatia Multiple geographic levels in the table. Total figures and female totals, so you have to subtract out to get the males. Multiple data type (counts, percentages) embedded in table. Multiple dimensions embedded in table.
  • #19 Laos Census of Population, 2005, embedded new variables (urban/rural with and without roads) into the population by sex and province table.
  • #20  ----- Meeting Notes (4/28/15 09:34) ----- We are converting those HTML and PDF tables to machine-readable data files for end users.
  • #22  ----- Meeting Notes (4/28/15 09:34) ----- This is the site that the Institute on the Environment uses to disseminate its Global Landscape Initiative crop data.
  • #25  ----- Meeting Notes (4/28/15 09:34) ----- This is their PDF metadata and technical documentation file that you can download. Doesn't follow any metadata standard and isn't machine readable.
  • #26  ----- Meeting Notes (4/28/15 09:34) ----- If you download one of their GeoTIFFs from EarthStat, this is the metadata file that comes with it.
  • #28  ----- Meeting Notes (4/28/15 09:34) ----- This is the metadata file that we provide through TerraPop
  • #29  ----- Meeting Notes (4/28/15 09:34) ----- During the last few years, we've heard a lot about big data and the data deluge. But, there are still some "deserts" in the deluge - key datasets that just don't exist. Subnational administrative boundaries, particularly historical admin boundaries, are one of those datasets. So, we have started created this boundary files.
  • #33 1974 census of population and housing in Liberia – district map, with codes and names
  • #34 1993 census of population – Gabon – province and department boundaries
  • #41 TerraPop contain four different types of data – census and survey microdata, area-level data describing the characteristics of geographic entities, raster data describing land cover, land use, and climate, and GIS boundary files delineating first and second admininstrative levels.
  • #42 For area-level data, NEXT SLIDE
  • #44 Here you can see the countries we’ve completed and those that are in progress. We will continue to fill in this map as we work in more and more countries during the next two years! Finished GIS boundary construction for 14 additional countries Have an additional 35 countries in progress