Welcome to the Life of a Dataset Webinar Series. In this webinar, you will learn about the new resource available on the ICPSR website on “Life of a Dataset.” We’ll talk about ICPSR’s commitment to archiving, the types of data that are the focus of the archive, and the guiding principles for archiving at ICPSR. We will review the training, materials, and support are available to those who wish to deposit data with ICPSR, how ICPSR enhances data deposited, as well as at several tools on the ICPSR site that promote the dataset, inform the depositor, and provide an easy gateway to depositing data. We will be joined by Piper Simons, Acquisitions Coordinator. But let’s begin with the “Life of a Dataset”.
The name of this webinar series is derived from a tour of ICPSR’s home during the OR Meeting in 2011. This meeting is held every other year on the campus of the University of Michigan in Ann Arbor. ICPSR invites its campus representatives from around the world to visit here for training. These representatives are known as Official Representatives, or ORs, for short. At the last meeting, a tour of the Perry Building, ICPSR’s home was offered to the ORs. They were introduced to staff who work with data at all levels, and introduced to how the staff gathers, prepares, and preserves, and disseminates data. Staff from Acquisition talked about the details in acquiring and expanding the holdings of the general archive, which is funded through member’s dues. Staff from Data Preservation talked about the responsibilities that each member of the staff has in insure the confidentiality and security of the data. While this is an important part of the Life of a Dataset, it is a shared responsibility throughout the process that data are put through prior to dissemination and archiving. Similarly, CNS, Computer and Network Services, is a very important part of the process of preparing data for dissemination and archiving by building tools and providing expertise in technical areas. In this webinar series, we will talk about confidentiality, data security and preservation, and the tools that are used to take the data from acquisition to dissemination.
Life of a Dataset was transformed into a poster sessions for the 50th Anniversary Open House held last year at ICPSR. The staff from each of the units in the tour created posters on their unit’s role in the data lifecycle. This is the poster created by the Acquisitions unit. ORs requested a web site has been built with a short description of the Life of a Dataset and the posters which are downloadable there. The brochure from the tour is also available on the site.
Also, This series of webinars is created in response to requests of the ORs who wish to be able to share this information with faculty and students on their campuses. When completed they will reside on the ICPSR Youtube Channel and will be linked into the Life of a Dataset page. Each of the webinars focus on the main theme of the three stages of a data life cycle, Deposit, Process, and Dissemination. Review new site using link on bar. Point out the request for hard copies of brochures, larger version of thumbnail picture and the link to the posters that can be downloaded and printed. If you need help with this page, please feel free to contact me. My information will be on a slide at the end of the webinar.
This seal, the Data Seal of Approval, 2010, is prominently displayed on our main page at www.icpsr.umich.edu. It means that ICPSR complies with the guidelines of the Data Seal of Approval Board which was established by DANS – Data Archiving and Networked Services, an institute of the Royal Netherlands Academy of Arts and Sciences (KNAW), and supported by the Netherlands Organization for Scientific Research (NWO). The objectives of the Data Seal of Approval are to safeguard data, ensure high quality, and guide reliable management of research data for the future. To secure the seal, an archive must comply with sixteen guidelines for the application and verification of quality aspects of creation, storage, use and reuse of digital research data in the social sciences and humanities. These guideline are distilled from a number of national and international guidelines for digital data archiving and serve as assurance to data users, data producers, and other repositories that the holder has demonstrated its commitment to maintaining a sustainable data archive. ICPSR was among the first six data repositories internationally to be awarded the Data Seal.
Data that is sought for deposit at ICPSR are current or emerging research topics or statistical techniques, in core social science areas, or have a demonstrated importance in the social science community. Presently, ICPSR is particularly interested in acquiring data that are:Diversity Data – thatfosters understanding of the experiences of racial and ethnic minorities and other marginalized peoples living in the U.S.Complex Data - longitudinal research, survey research, and non-standard types: biological data, administrative records, video data, spatial data, remotely sensed data, and relational databases.Mixed Method Data - supports both qualitative and quantitative analyses; data resulting from concurrent, sequential, or conversion mixed method study designs.Interdisciplinary Data - from interdisciplinary studies, and data resulting from studies using the research methods of multiple disciplines.International Data - originating outside the United States and data that support cross national, comparative research. We are especially interested in data from countries and regions of the world that do not have a national structure for archiving, disseminating, and preserving research data.ICPSR created the LEADS database with the support from the National Digital Information Infrastructure and Preservation Partnership program at the Library of Congress to document information about the thousands of social science studies that have been conducted over the last 40 years. It is so named because each of the records is a “lead” describing potential data for archiving at ICPSR. The goal of this database is, in part, to document the scope of social science research data that are “at risk” of being lost. The largest share of social science research is conducted with federal support. The National Science Foundation and the National Institutes of Health historically have supported a significant share of social science data collections. And by focusing on gathering information from grant awards made by NSF and NIH, it is possible to identify much of the social science data collections that exist today.
Archiving data helps researchers meet requirements of NIH and NSF data management plans.Data used for secondary analysis is published more widely than data not shared. Sharing data extends the research productivity of the primary investigator.Data Curation - ICPSR enhances and adds value to data by making them easier to use by including. We also describe data fully for Web discovery and protect respondent privacy.Long-Term Preservation - ICPSR ensures long-term data availability.Worldwide Dissemination - ICPSR offers data in the major statistical package formats and online analysis. Usage statistics are available on request.User Support - ICPSR staff are available to answer questions about downloading and using data.Levels of Access - ICPSR offers restricted access data services, and a secure data enclave in addition to offering data for download from our website.Aggregation of Publications - ICPSR creates a database of citations based on analyses of your data which is included in the Bibliography of Data-Related Literature. This database is harvested by social science indexes, such as the Web of Science, which also provides credit to the original investigator.
ICPSR assigns DOIs for each study we hold; we also encourage their use for journal publications and other articles to make it easier for researchers to find relevant work.A Digital Object Identifier or DOI is a unique persistent identifier for a published digital object, such as an article or a study. A DOI also links to an article or study. ICPSR maintains DOIs for data depositors so that the link will always work. On most Web sites, DOIs are clickable objects.DOIs are also part of an integrated network of linkages between articles and datasets that is maintained by publishers and archives through registration agencies like CrossRef.The inclusion of DOIs in citations makes it much easier for us to see how a report or dataset generates other research, which in turn assists researchers in demonstrating the value and scientific impact of their work.ICPSR provides users the ability to analyze data on the Web without downloading entire datasets for use with statistical packages like SAS, Stata, or SPSS. This lets researchers assess whether a dataset is relevant to their needs relatively quickly. We use Survey Documentation and Analysis (SDA) software developed and maintained by the Computer-Assisted Survey Methods Program at the University of California, Berkeley.SDA allows users to:Search for variables of interest in a datasetReview frequencies or summary statistics of key variables to determine what further analyses are appropriateReview frequencies or summary statistics for missing dataProduce simple summary statistics for reportsCreate statistical tables and charts from raw dataCreate a subset of cases or variables from a large collection to save time and storage space when downloading to a personal computerBrowse the electronic codebookMore sophisticated analysis options are also possible in SDA. More than 750 of ICPSR's datasets are available in SDA.Bibliography of Data-Related LiteratureFirst published in 2002, ICPSR's Bibliography of Data-Related Literature is a searchable database of more than 60,000 citations of published and unpublished works resulting from analyses of data held in the ICPSR archive.The project was developed with support from the National Science Foundation. Its goal is to facilitate the use of ICPSR holdings by providing potential users a means to investigate previous research that used ICPSR data.The Bibliography makes it possible to:Identify much of the research that has already been undertaken using a given ICPSR datasetReplicate analyses in order to understand, evaluate, and build upon others' findingsDetermine the usage patterns of data resourcesInvestigate the life cycle of data and the types of analyses undertakenLearn more about methodological issues, some of which are covered solely in the published literatureUnderstand the limitations as well as the research potential of the data, by seeing the data in use and reading the observations and findings of other researchersAvoid accidentally duplicating, in whole or in part, an analysis that has already been doneIdentify cross-disciplinary implications and uses of the dataUsers can access information from the bibliography two ways. Firstly, the project has a search engine that searches all included citations. Secondly, each home page for ICPSR datasets contains a link to citations from the bibliography based on the data.ICPSR regularly searches a variety of databases and journals to find published articles to include in the bibliography.Social Science Variables DatabaseThe Social Science Variables Database is a searchable database of nearly 2 million variables from approximately 2,600 studies held at ICPSR.The database allows users to find variables that may be relevant to their research across multiple studies. The database began in 2003 as a pilot project funded by the National Science Foundation. The continuing additions to the database take advantage of the XML tagging recommended by the Data Documentation Initiative, which make searches much easier.In 2010, advanced searches of the database became possible, including geographic and time-period facets, as well as the ability to refine searches by fields or combinations of fields in the variables description.Utilization report on next page.
Through utilization reports that became available in 2011, ICPSR provides data depositors with information on the usage of their datasets. The reports show how many times a dataset has been viewed and downloaded by data users, as well as information on users' academic status (i.e., faculty, graduate student, or undergraduate) and institutions. Identities of individual users are not disclosed.This is an example of a utilization report for the Community Supervision in Minnesota, 1990-1992 dataset.
Reinforces open scientific inquiry. When data are widely available, the self-correcting features of science work most effectively.Encourages diversity of analysis and opinions. Researchers having access to the same data can challenge each other’s analyses and conclusions.Promotes new research and allows for the testing of new or alternative methods. Examples of data being used in ways that the original investigators had not envisioned are numerous.Improves methods of data collection and measurement through the scrutiny of others. Making data publicly available allows the scientific community to reach consensus on methods.Reduces costs by avoiding duplicate data collection efforts. Some standard datasets, such as the General Social Survey and the National Election Studies, have produced thousands of papers that could not have been published if the authors had to collect their own data.Archiving makes known to the field what data have been collected so that additional resources are not spent to gather essentially the same information.Provides an important resource for training in research. Secondary data are extremely valuable to students, who then have access to high-quality data as a model for their own work.
ICPSR has prepared the Guide to Social Science Data Preparation and Archiving for researchers who wish to deposit data with ICPSR. It provides detailed information on how to prepare data for deposit and how researchers can ensure access to their data by others in the future. The Guide walks the researcher through each step in the process, from developing a data management plan to creating files and preparing the data for sharing. According to James Jacobs and Charles Humphrey in 2004, “Data archiving is a process, not an end state where data are simply turned over to an archive at the conclusion of a study. Rather, data archiving should begin early in a project and incorporate a schedule for depositing products over the course of a project’s life cycle and the creation and preservation of accurate metadata, ensuring the usability of the research data itself. Such practices would incorporate archiving as part of the research method.” This diagram illustrates key considerations in archiving at each step in the data creation process. The actual process may not be as linear as the diagram suggests, but it is important to develop a plan to address the archival considerations that come into play across all stages of the data life cycle. The ICPSR Summer Program offers a Workshop on the principles of data management that covers the creation of data management plans, effective data documentation practices, and maintaining the confidentiality of data. If a depositor has questions during the questions, there are staff available to provide assistance. Just contact “Deposit” or the Acquisition Coordinator. Contact information for both of these will be on the last slide.
ICPSR is especially interested in securing data that are in formats that are at risk for being lost. Those types include legacy formats which generally means punched cards and magnetic tapes. Here is an example of each. The cards are general
ICPSR owns and maintains a magnetic tape reader and a punch card reader for data store in these formats. The picture on the left is a magnetic tape drive and the picture on the right shows a punch card reader in the center. The person writing a program would punch everything on cards using a keypunch machine and load all of the cards into a feeder tray that would be snapped into the top of the card reader. These may not be the same models of tape drive and card reader but ICPSR has one of each on site. ICPSR is especially interested in retrieving data in these formats and will work with you to accomplish that. However, be aware that the media itself, the cards and tapes, may be subject to some degradation because of their age. The tapes are especially vulnerable to decline. In the best case scenario, all of the documentation and data will be retrieved, processed and disseminated. If not all of the data can be retrieved, we will archive what is until such time as new techniques for retrieval emerge. If the documentation is missing, the data can be archived, and not disseminated, until it can be located. Please contact the Acquisitions Coordinator for more information.
Go to the Deposit FormThe Data Deposit Form is the secure method that ICPSR has developed to accept deposits to all archives. It gathers all relevant information on the data, allows for documentation, and the data to be uploaded to ICPSR. The URL here links to the page with the link for the form and has detailed information on each of the steps on the deposit form. Here’s the link to access the deposit form but on this page is information on what you should have ready before starting the deposit form. The form is secure and to access it you will need to log into your MyData account. If you are depositing legacy materials, you are directed to send an email to the unit. Also, if you are depositing legacy materials, you still need to complete the data deposit form.From there the page explains the information you will need to have at hand to complete the deposit form. It also specifies the appropriate data formats for depositing data. The page asks the question “What if I need my data publicly available quickly?” If you have a request from a publisher for access to the data immediately, the data can be deposited into the Publications Related Archive, a free and public archive. The data held in this archive are not processed or enhanced as data in other ICPSR archives. We accept the data and documentation and post them as submitted in the PRA. Also, ICPSR does not offer support for users of data in the Publications Related Archive.Let’s click on the link to go to the form.Data for all ICPSR archives is accepted through this form. In the first step, the depositor chooses which archive to deposit the data. OPEN THE DROP DOWN. Then you start entering your information about the data. Once you work through the form and sign the deposit agreement and submit the files for deposit, you will receive a confirmation of your deposit.
Page one is in the center and outlined in greenPage 2 is on the bottom left and outlined in yellowPage 3 is on the bottom right and outlined in dark blue.
This is a very long web page so I divided it into 4 parts and colored the edges differently.Page 1 is the very top and has a logo at the topPage 2 is to the left and below and has a peach colored outline.Page 3 is beneath and above page 2 and to the right and outlined in yellow.Page 4 is beneath page 3 and outlined in dark blue. The Deposit Number generates upon submitting the first page and is displayed on this page.
If you have questions about the process of archiving and depositing data with ICPSR, please contact Piper for help. If you wish to receive a hard copy of the Guide to Social Science Data Preparation and Archiving, please contact User Support. If you have questions about the Life of a Dataset, please contact me. And just a reminder that this webinar will be available through the Life of a Dataset website and on the ICPSR Youtube Channel.
The next webinar in this series will be on the Processing aspect of the Life of a Dataset and will be held on Wednesday, January 23, at 1 pm. The link to register for that event is included in the invitation for this webinar. Thank you for joining us today!
Load webinar deposit.final
Welcome! Life of a Dataset: Deposit Wednesday, January 16, 2013
Program Outline• Life of a Dataset – Posters – New Web site• ICPSR – A Trusted Repository – Depositing data at ICPSR • Goal of archive • Depositing legacy data • Depositing electronic data
Life of a Dataset• A tour of the Perry Building, home of ICPSR, offered during the OR Meeting, October 5, 2011.• The goal of the tour was to familiarize ORs with how ICPSR acquires, archives and disseminates a typical study.• During the tour, attendees visited Acquisitions, General Archive, Data Preservation, CNS, and Collection Delivery, met members of the staff and learned about each unit’s role in bringing data to our users.
Life of a Dataset – the Posters• Poster Session at the 50th Anniversary Open House and Celebration
Life of a Dataset - the Web site• www.icpsr.umich.edu/icpsrweb/content/data management/life-of-dataset.html
ICPSR: A Trusted Digital Repository www.datasealofapproval.org
Data at ICPSR• ICPSR seeks data that – have demonstrated importance to the social science community; support its mission; are in core social science substantive areas; are useful in utilization of current and emerging research and statistical techniques.• Focus of archive • Diversity Data • Complex Data • Mixed Method Data • Interdisciplinary Data • International Data• LEADS Database
Features of Data Archived at ICPSR• Data Curation• Long-term Preservation• Worldwide Dissemination• User Support• Levels of access• Aggregation of Publications
Additional Services for Deposited Data• Digital Object Identifiers (DOI)• Online Analysis• Bibliography of Data-Related Literature• Social Science Variables Database• Utilization Reports
BENEFITS OF DATA SHARING FOR THE RESEARCH COMMUNITY• Reinforces open scientific inquiry.• Encourages diversity of analysis and opinions.• Promotes new research and allows for the testing of new or alternative methods.• Improves methods of data collection and measurement through the scrutiny of others.• Reduces costs by avoiding duplicate data collection efforts.• Archiving makes known to the field what data have been collected.• Provides an important resource for training in research.
Preparing Data for Deposit • Guide to Social Science Data Preparation and Archiving, 5th Edition • www.icpsr.umich.edu/ic psrweb/content/deposit /guide • Also available in hard copy upon request. • Summer Program Workshop on Data Curation • http://www.icpsr.umich.ed u/icpsrweb/sumprog/index