Your SlideShare is downloading. ×
  • Like
ICPSR Data Sharing
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

ICPSR Data Sharing


This is Part II of a workshop presented by ICPSR at IASSIST 2011. This section focuses on data sharing of publicly available data.

This is Part II of a workshop presented by ICPSR at IASSIST 2011. This section focuses on data sharing of publicly available data.

Published in Education , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide
  • DOJ, BJS, SAMHSA, National Institute on AgingRobert Wood Johnson, Fenway InstituteNCAAInternational Archive of Education Data taken down Several years ago ICPSR discontinued support for the International Archive of Education Data (IAED) Web site. The site was sponsored through a contract with the National Center for Education Statistics which was not renewed. All ICPSR data collections included in this site are still available to the membership (via the ICPSR Web site) and we will continue to respond to user questions about these data.
  • Why might you organize your collections in this way?
  • Note – image does not capture all projects.Our data collections tend to have a unique purpose and sometimes unique ways of serving their particular audiences. They share a common hub that is ICPSR with the primary benefit being a common approach to processing, preserving, and very importantly, surrounding data with metadata including citations, the SSVD, and documentation.What is important to note is that as you consider how you might serve your data community and particular audiences within, you need to think about not only processing and tagging data in a common way, but also, how you will staff your collection and what your outreach will look like. Let’s take a look at what this means. . .
  • If you as an archive or collection desire to assist non-researchers in this way, remember you will need staffing strategies that will provide hands-on service/training to novice data users (staff who not only understand the data, but who are patient instructors on the use of data among analysts that may not be well-trained). Dedicated User Support!
  • DSDR PartnersEunice Kennedy Shriver National Institute of Child Health and Human Development NICHD Demographic & Behavioral Sciences (DBS) Branch Carolina Population Center (CPC) Hopkins Population Center (HPC) Michigan Population Studies Center (PSC) Minnesota Population Center (MPC) RAND Population Research Center Resource ExplanationDSDR provides resources to demographic data producers and users, including confidentiality and disclosure review, restricted data contract development and data dissemination, a searchable index of important demography and population study data, and a catalogue of publications using data indexed.
  • Research Connections is funded by the Office of Child Care and the Office of Planning, Research and Evaluation, Administration for Children and Families, in the U.S. Department of Health and Human Services.In addition to data, RC provides reports (tagged with metadata), research opportunities (grants announcements), recurring training opportunities, announcements relevant to this particular research and policy community. “Community Engagement” is RC’s goal and staff for this project must be active engagers to meet the project’s goal.
  • This collection has a specific goal in sharing its data – policy development. The collection represents another collection where the core audience includes individuals who are not research scientists (including administrators and the media) and will require more instruction on data use.
  • This is a foreshadowing (teaser!) to the last part of this workshop – Data Management – where we’ll provide insights on data sharing in secure environments and administering a growing number of restricted data contracts.Let’s take a look at some of these collections.
  • Data preparation and processing is largely handled offsite. ICPSR serves as the web host & Fenway uses our processes and infrastructure as well as our distribution capabilities – ability to share its data with over 700 member institutions.
  • An initiative of the Office of Applied Studies , Substance Abuse and Mental Health Services Administration (SAMHSA) of the United States Department of Health and Human ServicesThe audience here prompted the desire to upgrade our online analysis approach from a simple data exploration tool to one that could handle more advanced statistical analysis like that of SPSS or SAS. It also prompted the development of an online tool where when a sample size was too small and for example, might lead to some greater level of disclosure risk, that the tool prohibited display. This is known as Secure SDA.Currently, this collection is leading our efforts to develop the VDE – virtual data enclave (more on that soon!) whereby a research scientist is able to analyze sensitive data virtually versus for example, coming physically to our enclave.
  • National Institute on Drug AbuseAnother automated system - RCS – another topic to be discussed shortly, has become necessary because the data in this archive, and much coming in via DSDR, are restricted. That is, a researcher must have submit his/her research team composition, get IRB approvals, provide data security plans, etc. before we can release this data to the researcher – either within the VDE or via removable media. To date, our restricted contracts, which require periodic tracking of the research team up until the data has been destroyed, has been manual. With the increase in volume of contracts, an automated system needed to be developed.
  • IFSS offers data and tools for examining issues related to families and fertility in the United States spanning five decades. IFSS encompasses the Growth of American Families (GAF), National Fertility Surveys (NFS), and National Surveys of Family Growth (NSFG), as well as a single dataset of harmonized variables across all ten surveys. Analytic tools make it possible to quickly and easily explore the data and obtain information about changes in behaviors and attitudes across time.The Eunice Kennedy Shriver National Institute for Child Health and Human Development (NICHD)


  • 1. ICPSR AT 50:Facilitating Research and Data Sharing
    Part II: Data Sharing
    IASSIST Vancouver, BC
    May 31, 2011
  • 2. “Public” Data Sharingbegins at 10:45
  • 3. ICPSR’s Public Data
    Sharing Public Data - Agenda
    2010 US Census
    ICPSR’s “Public” Archives
    From the Office of Management and Budget (OMB) Policy Directive published in the Federal Register, Vol. 72, No. 46, Friday, March 7, 2008, Notices, pp. 12662-12626:
    “When appropriate to facilitate in-depth research, and feasible in the presence of resource constraints, statistical agencies should provide public access to microdata files with secure safeguards to protect the confidentiality of individually-identifiable responses and with readily accessible documentation, metadata, or other means to facilitate user access to and manipulation of the data. “
  • 5.
  • 6. U.S. CENSUS DATA – 2010: KEY DATES
    National Census Day: 1 April 2010
    April - July 2010: Census takers visit households that did not return a form by mail
    December 2010: By law, the Census Bureau delivers population information to the President for apportionment
    March 2011: By law, the Census Bureau completes delivery of redistricting data to states
  • 7.
    American FactFinder (AFF) is an online source for population, housing, economic and geographic data that presents the results from four key data programs:
    • Decennial Census of Housing and Population - 1990 and 2000
    • 9. Economic Census 1997-2002-2007 
    • 10. American Community Survey 1-Year Estimates and 3-Year Estimates 
    • 11. Population Estimates Program - July 1, 2006 to July 1, 2009
    Results from each of these data programs are provided in the form of data sets, tables, thematic maps, and reference maps. 
  • 12.
  • 13.
    Direct File Access through Download FTP Center at Census Bureau
    Free Access to all PUBLIC-USE DATA FILES
    First Release of Data (February – March 2011)
    2010 Census Redistricting Data Summary File (P.L. 94-171):
    State and sub-state population counts to the block level for the total population and the population 18 years and over for 63 race groups; and not Hispanic or Latino origin by 63 race groups
    State and sub-state housing unit counts down to the block level by occupancy status (occupied units, vacant units)
    Quickly followed by (April 2011):
    National Summary File of Redistricting Data: Contains the same data tables as the state files, but the geographic levels include the U.S., regions, divisions, other areas that cross state boundaries, and a small subset of the geographic areas shown in the state files.
    SUMMARY FILE 1 (SF 1):
    This file shows detailed tables on age, sex, households, families, relationship to householder, housing units, detailed race and Hispanic or Latino origin groups, and group quarters. Most tables are shown down to the block or census tract level. Some tables are repeated for nine race/Hispanic or Latino origin groups. The nine groups are (1) White alone, (2) Black or African American alone, (3) American Indian and Alaska Native alone, (4) Asian alone, (5) Native Hawaiian and Other Pacific Islander alone, (6) Some Other Race alone, (7) Two or More Races, (8) Hispanic or Latino; (9) White alone, Not Hispanic or Latino. (Release: June-August 2011)
    The SF 1 National Update File contains the same data tables as the state files, but the geographic levels include the U.S., regions, divisions, and other areas that cross state boundaries. (Release: November 2011)
    The SF 1 Urban/Rural Update File provides users with urban/rural population and housing unit counts (down to block) and characteristics for urbanized areas and urban clusters. (Release: October 2012)
    The SF 1 Redefined Core Based Statistical Areas Update File contains the same data tables as the state files for redefined CBSAs as defined by OMB following the 2010 Census. (Release: August 2013)
    This file shows detailed tables on age, sex, households, families, relationship to householder, housing units, and group quarters. Most tables are shown down to the census tract level. Tables are repeated by 141 race groups, 98 American Indian and Alaska Native tribes/tribal groupings, and 39 Hispanic or Latino origin groups. In order for any of the tables for a specific group to be shown in SF 2, the data must meet a minimum population threshold. The tables in SF 2 will be repeated for each group if there are at least 100 or more people of that specific group in a particular geographic area. (Release: December 2011-April 2012)
    SUMMARY FILE 2 (SF 2):
    The SF 2 National Update File contains the same data tables as the state files, but the geographic levels include the U.S., regions, divisions, and other areas that cross state boundaries. (Release: May 2012)
    The SF 2 Urban/Rural Update File provides users with urban/rural population and housing unit counts (down to census tract) and characteristics for urbanized areas and urban clusters. (Release: January 2013)
    Congressional District Summary File – This file is a re-tabulation of Summary File 1 for newly redistricted Congressional Districts for the 113th Congress. State-based files will be released in January 2013 and every 2 years thereafter for states where congressional redistricting occurs.
    State Legislative District Summary File – This file is a re-tabulation of Summary File 1 for State Legislative Districts drawn following the 2010 Census. State-based files will be released in June 2013 and every 2 years thereafter for states where legislative redistricting occurs.
    American Indian and Alaska Native (AIAN) Summary File – This is a national-level file showing the same content as Summary File 2. Tables are repeated for the total population, the total AIAN population, the total American Indian population, the total Alaska Native population, and for numerous American Indian and Alaska Native tribes. In order for any of the tables for a specific group to be shown, the data must meet a minimum population threshold of at least 100 or more people of that specific group in a particular geographic area. (Release: April 2013)
    Public Use Microdata Sample (PUMS) Files – The PUMS files contain state-level 2010 Census data containing individual records of characteristics for a 10 percent sample of people and housing units. Data will be included for age, sex, race, Hispanic or Latino origin, household type and relationship, and tenure data with identifying information removed, for PUMAs of 100,000 or more population. (Release: TBD)
    Of lesser importance than 2000?
  • 22. Decennial Census
    In Census 2000, the census used 2 forms
    “short” form – asked for basic demographic and housing information, such as age, sex, race, how many people lived in the housing unit, and if the housing unit was owned or rented by the resident
    “long” form – collected the same information as the short form but also collected more in-depth information such as income, education, and language spoken at home
    Only a small portion of the population, called asample, received the long form.
  • 23. 2010 Census and American Community Survey
    • 2010 Census will focus on counting the U.S. population
    • 24. The sample data are now collected in the ACS
    • 25. Puerto Rico is the only U.S. territory where the ACS is conducted
    • 26. 2010 Census will have a long form for U.S. territories such as Guam and U.S. Virgin Islands
    • 27. Same “short form” questions on the ACS
  • American Community Survey2008 Content Changes
    • Three new questions
    Health Insurance Coverage
    Veteran’s Service-connected Disability
    Marital History
    • Deletion of one question
    Time and main reason for staying at the address
    • Changes in some wording and format
  • American Community Survey Methodology
    • Sample includes about 3 million addresses each year
    • 28. Three modes of data collection
    personal visit
    • Data are collected continuously throughout the year
  • American Community SurveyTarget Population
    • Resident population of the United States and Puerto Rico
    Living in housing units and group quarters
    • Current residents at the selected address
    “Two month” rule
  • 29. American Community SurveyGroup Quarters
    • Place where people live or stay that is normally owned or managed by an entity or organization providing housing or services for the residents.
    • 30. 2 categories of group quarters:
  • 31. American Community Survey Period Estimates
    • ACS estimates are period estimates, describing the average characteristics over a specified period
    • 32. Contrast with point-in-time estimates that describe the characteristics of an area on a specific date
    • 33. 1-year, 3-year, and 5-year estimates will be released for geographic areas that meet specific population thresholds
  • American Community Survey Data Products Release Schedule
    * Five-year estimates will be available for areas as small as census tracts and block groups.
    Source: US Census Bureau
  • 34. American Community SurveyData Products
    Data Profiles
    Narrative Profiles
    Comparison Profiles
    Selected Population Profiles
    Detailed Tables
    Subject Tables
    Ranking Tables
    Geographic Comparison Tables
    Thematic Maps
    Public Use Microdata Sample (PUMS) Files
  • 35. American Community SurveySimilarities with Census 2000
    • Same questions and many of the same basic statistics
    • 36. 5-year estimates will be produced for same broad set of geographic areas including census tracts and block groups
  • American Community SurveyKey Differences from Census 2000
    • Beginning in 2010, data for small geographic areas will be produced every year versus once every 10 years
    • 37. Data for larger areas are available now and data for mid sized area will be available in December 2008
    • 38. Census 2000 data described the population and housing as of April 1, 2000 while ACS data describe a period of time and require data for 12 months, 36 months, or 60 months
  • American Community SurveyKey Differences from Census 2000
    • The goal of ACS is to produce data comparable to the Census 2000 long form data
    • 39. These estimates will cover the same small areas as Census 2000 but with smaller sample sizes
    • 40. Smaller sample sizes for 5-year ACS estimates results in reductions in the reliability of estimates
  • Cooperative Agreements
    Close collaboration with the Bureau over the years in making data available to the academic research community.
    Since the 1980’s ICPSR has sought outside funding to deal with Census data and entered into joint statistical agreements with the Bureau to facilitate its distribution and use.
    Importance in 1990: High cost of raw data ($175 per reel of tape; entire Census comprised about 2000 tapes = C. $350,000).
  • 41. Cooperative Agreements
    Data available to at no cost to member institutions without any rights to redistribute or resell.
    Joint annual summer workshops to offer training on the new Census data products.
    One week training sessions held in 1991-1994 and 2001-2004
    Census Bureau staff participated extensively in these courses
    Attracted both researchers and ICPSR Official Representatives who attended to learn how to provide assistance to faculty and students on their campuses
  • 42. The Decennial In(di)gestion
    Census Data: Collected regularly since the 1960s.
    Number of files and bytes have grown exponentially with every new Census.
    Main reason for the rapid growth in the numbers of data files archived and disseminated by ICPSR.
    How much and how rapid?
  • 43. The Decennial In(di)gestion
    Another access point, focused on the social science research community, to Census data and documentation
    Original Census data available from the 1960s onward as well as special samples created for earlier years
    TIGER Line Files
    American Community Survey
    Many of the newer files are available in a variety of formats:
    Ascii text files
  • 45. Special Census Subsets
    These files report population and housing data for national and specific sub-national geographical entities, for example:
    The entire nation
    Each individual state
    Metropolitan Areas
    Census Tracts
  • 46. Contextual File
    Based largely on Census data
    Provides information at the ‘county’ level in the U.S. (subunits of states numbering more than 3,100 in all)
    Contains data from other government and private sources at the same geographic level
    Under certain circumstances, can be merged with survey data
  • 47. Contextual File - 2
    Population by age, sex, race, and Hispanic origin
    Labor force size and unemployment
    Personal income
    Earnings and employment by industry
    Land surface form typography
    Government revenue and expenditures
    Crimes reported to police
    Presidential election results
    Housing authorized by building permits
    Medicare enrollment
    Health profession shortage areas
  • 48. Preservation
    ICPSR provides another location to preserve data and documentation files produced by the Census Bureau
    ICPSR keeps multiple copies of these files both at its home location at the University of Michigan and at other sites in the United States
    Copies are continually checked and updated when necessary
    Considerable interest in historical Census data by demographers, historians, and economists.
  • 49. Current Happenings with ACS and Plans for Census 2010
    Consulted with Collection Development Committee of ICPSR Council:
    Advised to continue ICPSR precedent of acquiring Census 2010 since the membership and the research community in general have traditionally come to ICPSR for their Census data needs.
    Suggestion that the data files need not be archived right away since all public-use data will be available directly from the Census Bureau.
    Emphases should center on archiving the most important Census data products when it could be best determined that final versions were created.
    The Committee also suggested that ICPSR consider holding training workshops on Census data once again as they did during the last decade and decide how best to finance them within the context of the Summer Program.
  • 50. Current Happenings with ACS and Plans for Census 2010
    Suggestion to study possibility that SDA functionality might work to produce subsets for Census data instead of creating specific data products to do so.
    Emphasis placed on partnerships and as an example working with the University of Minnesota Population Center and their National Historical Geographic Information System (NHGIS) which is expected to be able to produce subsets of 2010 Census data.
    Determine in general from membership and user community what value-added features might make sense for academic researchers as greater amounts of Census 2010 data become available.
  • 51. Current Happenings with ACS and Plans for Census 2010
    Select files archived at ICPSR beginning with 1996 ACS:
    Emphasis on PUMS files at first
    Greater interest in Summary Files as more data is released and, in particular, with the recent appearance of the first 5-year Estimates File covering calendar years 2005-2009
  • 52. Current Happenings with ACS and Plans for Census 2010
    TIGER files (Topologically Integrated Geographic Encoding and Referencing System)
    2010 extracts containing geographic and cartographic information from the Census Bureau's MAF/TIGER® (Master Address File/Topologically Integrated Geographic Encoding and Referencing) database.
    These files support the 2010 Census Redistricting Data (P. L. 94-171) and the National Summary File of Redistricting Data/Summary File 1 releases.
    The files provide the digital map base for a Geographic Information System or mapping software. The files do not contain any mapping software.
  • 53. Current Happenings with ACS and Plans for Census 2010
    TIGER files (Topologically Integrated Geographic Encoding and Referencing System)
    All legal boundaries and names are as of January 1, 2010. The boundaries shown are for Census Bureau statistical data collection and tabulation purposes only; their depiction and designation for statistical purposes does not constitute a determination of jurisdictional authority or rights of ownership or entitlement.
    The geographic entity codes needed to link the Census Bureau's demographic data to the geography are included in the files. The TIGER/Line Shapefiles do not contain any demographic or economic data; data can be downloaded separately using American FactFinder.
  • 54. Current Happenings with ACS and Plans for Census 2010
    TIGER files (Topologically Integrated Geographic Encoding and Referencing System)
    Differences between shape files and line files
    Data stored at ICPSR through designated Web site
    Maintain archival copies as older versions of TIGER files cease to be distributed by Census Bureau
  • 55. ICPSR’s Public Archives
  • 56. ICPSR’s Public Archives
    Three Differentiating Characteristics of a “Public Archive”
    Funding Sources
  • 57. Funding Sources & Long Term Access
    ICPSR’s public archives are funded by entities including:
    Government agencies
    Other Organizations
    And if the funding ceases:
    ICPSR commitment to support access
    Access generally reverts to membership-only after some time period
  • 58. Why are Funders using ICPSR?
    An Archive’s Reasons for Being
    Dissemination Infrastructure
    Systems & Search = technology, security, & metadata
    Data Community Base (700 immediate members to share with)
    Community Outreach/engagement expertise
    Fulfillment of Data Management Plan (Grant) Requirements
    Ability to Measure & Report Dissemination Statistics
  • 59. Data Search within our Public Archives
    A search for data/documents from within a public archive defaults to searches of materials (data) within that archive
    A strategy to help one narrow their scope
    All materials are publicly available
  • 60. The Relationship Visual
    A common hub, yet each unique
  • 61. NACJD: National Archive of Criminal Justice Data
    Study topic: criminal justice
    Funders: BJS, OJJDP, NIJ
    Unique attribute: staff routinely assist non-researchers (police departments) in data use
  • 62. DSDR: Data Sharing for Demographic Research
    Study topic: demography
    Partnership of several institutions
    Unique attribute: as much a resource for data producers as well as a mechanism for dissemination
  • 63. NACDA: National Archive of Computerized Data on Aging
    Study topic: Aging – gerontological research
    Funder: National Institute on Aging
    Unique attribute: largest library of electronic data on aging in the US
  • 64. Research Connections: Child Care and Early Education
    Study topic: early education
    Funder: US Dept. of Health & Human Service
    Unique attribute: goal is more than data – to be the destination for child care & early education research
  • 65. NCAA Student-Athlete Experiences Data Archive
    Study topic: intercollegiate athletics and higher education
    Funder: NCAA
    Unique attribute: to assist in the development of national athletics policies
    Unique attribute: to assist in development of national athletics policies
  • 66. Health and Mental Health Collections
    Enhanced sensitivity in the area of disclosure risk
    From ingest of data to storage of data to analysis of data
    Has driven ICPSR, as the hub, to heighten its computing and data sharing environments
    Increasing demand has lead to a need to automate – in a secured manner
  • 67. Center for Population Research in LGBT Health
    Partner: Fenway Institute
    Unique attribute: data is processed offsite – ICPSR acts as the host
  • 68. SAMHDA: Substance Abuse & Mental Health Data Archive
    Funder: SAMHSA
    Unique attribute: driving our online services and virtual analysis capabilities
  • 69. NAHDAP: National Addiction & HIV Data Archive Program
    Funder: NIDA
    Unique attribute: driving restricted contract system
  • 70. IFSS: Integrated Fertility Survey Series
    Funder: NICHD
    Unique attribute: data harmonization
  • 71. Let’s Take a BreakReturn at 11:45