Collecting original research data can be rewarding, but time-consuming, and you may not always have the capacity to collect the data you really want. But there are other options: many existing research datasets, ranging from historical to contemporary and across disciplines, are available to be leveraged for new analysis. This session from the Scholarly Communication Librarian at Sam Houston State University explores finding these datasets, making sense of them, and understanding how you can re-use them for your own research, either alone or in combination with new data.
8. • UNDATA
• DATA.GOV
• FEDERAL RESERVE ECONOMIC DATA (FRED)
• TEXAS DATA REPOSITORY
• HARVARD DATAVERSE
• INTER-UNIVERSITY CONSORTIUM FOR POLITICAL AND SOCIAL RESEARCH (ICPSR)
• REGISTRY OF RESEARCH DATA REPOSITORIES (RE3DATA)
9.
10. •
• ACT UP
• RADAR
• CRAAP
•
•
•
•
•
• EVALUATING DATA SETS
Unmute or type in chat: Have you ever encountered an issue with barriers like these for data that you wished you could collect?
Demo data.gov – search for obesity
Demo ICPSR – obviously keyword, but also show Find Data -> Find Data -> Topics, Series, and Thematic Collections
The key is to think critically about who collected the data, why, and how. Determine whether there are any problems of bias, inclusion, or completeness that would negatively impact your research purpose.
Any questions about data evaluation before we move on?
(Source: Carnegie Classification, 2021 Update Public File, https://carnegieclassifications.acenet.edu/downloads.php)
(after data dictionary snip) In this particular case, the data dictionary doesn’t take the additional step of defining what the “basic classification” actually is – I would need to refer back to the main Carnegie Classifications website to read the definition for that framework. With well-documented datasets, these definitions will be more integrated into the data documentation so that it is comprehensible without all of these additional references to an external website, which the dataset may actually outlive.
(after codebook snip) Again, in this case, the codebook doesn’t take the additional step of defining what a Doctoral University: Very High Research Activity means. What characteristics of SHSU resulted in that classification? I would again have to refer back to the main website.
Unmute or type in chat: Have you had a situation where you could have reused existing data like this?
(Source: Carnegie Classification, Downloads page, in page footer - https://carnegieclassifications.acenet.edu/downloads.php)
Any questions on the licensing aspect of this before we move on?
"Author: Name(s) of each individual or organizational entity responsible for the creation of the dataset."
"Title: Complete title of the dataset, including the edition or version number, if applicable."
"Date of Publication: Year the dataset was published or disseminated."
"Publisher and/or Distributor: Organizational entity that makes the dataset available by archiving, producing, publishing, and/or distributing the dataset."
"Electronic Location or Identifier: Web address or unique, persistent, global identifier used to locate the dataset (such as a DOI). Append the date retrieved if the title and locator are not specific to the exact instance of the data you used."