The document discusses a presentation about using data from the Inter-University Consortium for Political and Social Research (ICPSR). It provides an overview of ICPSR, including its history as one of the oldest social science data archives, current holdings of over 8,200 studies and 68,700 datasets, and membership of over 735 institutions worldwide. The presentation demonstrates how to search ICPSR's database to find relevant datasets, download data, and conduct basic analysis using the online tools. Attendees are encouraged to explore ICPSR's website and resources for additional information and support.
1. ICPSR Data Services @
Kenyon College
Frederique Laubepin, Instructional Resources
David Thomas, Resource Center for Minority Data
2. Agenda
• Intro of the day and agenda
• ICPSR in the Kenyon Classroom
– Professor Corker
– Professor Kohlman
• ICPSR Background
• Searching ICPSR Holdings
• ICPSR and data in the classroom
• Questions?
3. • One of the world’s oldest and largest social
science data archives, est. 1962
• Data distributed on punch cards, then reel-
to-reel tape, now:
– Data available on demand
– Over 8,200 studies with over 68,700 data sets
• Membership organization among 21
universities, now:
– Currently about 735 members world-wide
– Federal funding of public collections
What is ICPSR?
- Then and Now -
4. What We Do – It’s About Data!
• Seek research data and
pertinent documents from
researchers (PIs, research
agencies, government)
• Process and preserve the
data and documents
• Disseminate data
• Provide
education, training, &
instructional resources
5. Why People Use ICPSR
• Write articles, papers, or theses using real
research data
• Conduct secondary research to support findings
of current research or to generate new findings
• Use as intro material in grant proposals
• Preserve/disseminate primary research data
– Fulfill data management plan (grant)
requirements
• Study or teach quantitative methods
6. “Shopping” for Data: The MyData Account
• MyData account – operates as authentication and like a
shopping cart!
• Authenticate once every six months on campus and you
can carry it with you
7. Supporting the Data
• Free user support
• The HELP Page offers:
– User support (at ICPSR) email and phone contact
information
– Data User Help Center: Short Tutorials & Webinars
available 24/7
– Local Support: Who to contact at your local institution
– Glossary of Terms
– Social Networks: Where you can find us on
YouTube, Facebook, Twitter, Slideshare, and more
8. It’s really a searchable database . . .
containing over 62,500 citations of known published
and unpublished works resulting from analyses of
data archived at ICPSR
. . .that can generate study bibliographies
associating each study with the literature about it
. . . Included in the integrated search
on the ICPSR Web site
The Bibliography of Data-related Literature
11. Assessing the data in the collection
• Searching for and Downloading data
• Simple crosstab
• Codebook
• Full Descriptives
• Online Analysis Functionality
12. Study Search Behaviors
In practice, we encounter three typical search
behaviors from our users:
• A user has a research question in mind.
• A user is looking for a dataset that contains
specific variables.
• A user is looking for a specific dataset and has
the study title or investigator name.
13.
14. Natural Language Searching
• Does juvenile drug use lead to delinquency?
• juvenile “drug use” delinquency
• “juvenile drug use” delinquency
17. Search Conclusion
•Sorting by “Variable Relevance”
– ranks the variable text
(questions, labels, categories) highly so that top
results contain the variable concepts
– displays matching variables on the search results
screen
– allows you to check variables and compare them
side-by-side
– provides direct links to the full variable description
18. •Separating search terms with commas treats
each as a distinct variable/concept.
–drug abuse, gender, race
–newspaper in home, voting, party affiliation
38. For More Info:
• Explore the website - www.icpsr.umich.edu
• Sign up for our email announcements -
www.icpsr.umich.edu/icpsrweb/membership/lists/index.jsp
• “Like” ICPSR on Facebook/follow ICPSR on Twitter
• Attend or view our webinars (open to the public!)
• Find our presentations on www.slideshare.net –
user: icpsr
• Contact user support – netmail@icpsr.umich.edu
Editor's Notes
As of September 2012, over 68,700 datasets (over 585,000 files) available for download. As a sense of volume of downloads, total downloads for FY 2012 = over 1,172,304 datasets downloaded/accessed (4,765,641).Also in FY2012 – about 35,345 (19,600 members) MyData accounts downloaded/accessed something – were active.
ICPSR supports students, faculty, researchers, and policymakers.
As you seen, ICPSR doesn’t just deliver data. We surround that data with tools and services that support its use and interpretation.
What’s in the collection?Resources using data in the ICPSR holdings as the primary data sourceResources using ICPSR data in a comparison with the primary dataset investigatedResources "about" an ICPSR dataset or study series.
Know of reports, articles, publications connected to our data? Contact us!
We have several different use cases for our citation search.- Users who are looking for facts/tables, rather than raw data- Users who want to see how a particular dataset has been utilized- Researchers and funding agencies who want to gauge the impact of a particular study by seeing how much it has been citedThe citation search is a bit limited in that it only searches the citation, not the full text. Due to recent court rulings on the Google copyright case that favor indexing copyrighted content and providing snippets to users on the search results page, we are investigating the possibility of expanding this service to include full text.One of the strengths of this utility is that we provide links to full text whenever possible. We retain DOIs when we can locate them, and we provide dynamic links to WorldCat and Google Scholar if we don’t know the specific location of the article/report, making it that much easier for users to get to the full text.
Our study search indexes the metadata record that ICPSR creates, as well as the full text of the documentation files, including all the variable markup and question and answer text. In the study search, the metadata record is heavily weighted, especially the study title and the subject terms.In practice, we encounter three typical search behaviors from our users that dictate which search options best meet the user’s needs.
To better understand user search behaviors, we're going to make a change to the Find & Analyze Data site to get a feel for the use of these three styles. We’re adding a short survey that changes search behavior based upon the user’s selection and displays relevant search tips. In addition, those choices will be recorded in Google Analytics so we can see what percentage of our users favors each style. It is our hope that in six months or so we’ll have the data to know where to focus our efforts to improve the search.[Briefly demonstrate how the survey choices cause different search tips to appear. Explain that the second option changes the sort to “variable relevance.”]
SOLR has natural language search capabilities. For some users, finding the right dataset is as simple as typing in the research question. Unfortunately, bad search engines are so common that users seldom provide more than one or two query terms, on the assumption that more query terms would narrow search results to nothing. The challenge for us is educating our users and making sure our metadata is sufficient to the task.To provide an example, let’s do a natural language search on the NACJD website using a simple research question, such as “Does juvenile drug use lead to delinquency?” The results are pretty good, and tend to be better than concise keyword searches, such as “juvenile ‘drug use’ delinquency,” which results in studies that only grab two of the three factors or focus on prediction of delinquency instead of delinquency itself. If I searched for “juvenile drug use” as a phrase, I’d actually get only one result. That’s one of the big limitations of phrase searching.[go to the front page of NACJD and search on the question “Does juvenile drug use lead to delinquency?”]
One relatively recent addition to our website is the variable relevance sort. When you do a search and then sort by variable relevance, we weight the variables more heavily and display matching variables on the search results page, along with instructions that the user should separate different concepts with commas. This makes it relatively easy for a user to find a dataset that has a specific combination of variables.The variable display also includes checkboxes beside each variable and we provide a “compare variable” utility that allows users to view selected variables side by side.[Do search on age gender race then sort by Variable Relevance. Use the tool to add commas. Briefly demo the compare variables function.]
If a user is searching for a particular investigator, the search is relatively easy, unless the person’s name is very common. (E.g., Smith.) Our site offers a “Browse by Author” page to make the task somewhat easier, and to allow users to see variations on the name if they run into trouble. Site visitors can use quotation marks for phrase searching, but this can be problematic if the user is mistaken about the word order or gets a word in the title wrong. For example, “study” versus “survey.” In general, searching by title is very effective and we don’t need to provide the user with additional guidance.
The new search function is pretty simple and straight-forward. When searching for studies that have specific variables, separate your concepts with commas and choose “Variable Relevance” as your sort option on the search results screen. If you have comments or suggestions, please feel free to email web-support@icpsr.umich.edu.