Data Reference Interview
Stuart Macdonald
CISER Data Services Librarian
Email: srm262@cornell.edu
Data Archive - Collection and Services
•Established over 30 years ago
•Collection of numeric datasets to support quantitat...
•Consulting services to match user needs with appropriate data
•finding, accessing and using data

•Current Cornell resear...
•Provides Cornell social science researchers with a repository for
sharing and providing long-term preservation of their n...
Believe it or not – not all data are the same …
• Data means different things to different people (informatics,
geography,...
Research data may include all of the following:
• Text or Word documents, spreadsheets
• Laboratory notebooks, field noteb...
Data Reference Interview - establish what the user actually needs (not
what they think they may need!) :
• Statistics or d...
Sets the goals and structure for the data interview and helps articulate any
decisions made by the data librarian

Establi...
Important not to use too much jargon and to double-check understanding of
unfamiliar terms – often we use the same word to...
Two recent examples:
Q. Grad student wanting # of plastic surgery clinics in Seoul, South Korea from 19902009
A. the Inter...
User needs statistical data about agrarian violence (originated by land disputes) variables
include: food riots, assassina...
Social Science research data resources
•Inter-University Consortium for Political and Social Research (ICPSR)
•National Ar...
Social science statistical data on the internet:
CISER Internet Data Sources:
https://ciser.cornell.edu/info/datasource.sh...
Location & hours:
CISER Data Archive is located at 391 Pine Tree Road, Ithaca
CISER is open 8.30am – 4.30pm (Mon-Fri) – wa...
Upcoming SlideShare
Loading in …5
×

CISER & the Data Reference Interview

594 views

Published on

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
594
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Data, documentation and associated files (e.g. SAS, SPSS, Stata) are housed on the CISER file server. Files are downloaded from the catalog in ZIP compressed format..Cross-National Time Series data
  • UG – more general enquiries – summary statistics rather than raw data – what they ask for is often not what they really needPG – nature of enquiry more specific, more often again, summary statistics. May be raw data as PhD progresses. Often data collection may be involved, to be used in conjunction with other sources, visualized etcTeacher – teaching datasets or sample data. Or data subsets (NGO, IGO)Researcher – Have a better idea as to what data they need, usually raw data, need to identify variables, help with codebook / questionnaire. Use of statistical analysis packages, GIS
  • As CISER is an ICPSR member, researchers can gain access to data held in those CESSDA Archives that are themselves ICPSR membersCESSDAT member organisations adhere to a Trans-border Data Access Agreement
  • As CISER is an ICPSR member, researchers can gain access to data held in those CESSDA Archives that are themselves ICPSR membersCESSDAT member organisations adhere to a Trans-border Data Access Agreement
  • CISER & the Data Reference Interview

    1. 1. Data Reference Interview Stuart Macdonald CISER Data Services Librarian Email: srm262@cornell.edu
    2. 2. Data Archive - Collection and Services •Established over 30 years ago •Collection of numeric datasets to support quantitative research c. 27,000 online files in addition to thousands of studies on CD/DVD •Emphasis on demography (state/federal censuses), economics, health, labor, election studies, attitudinal and behavioral studies, family life etc.
    3. 3. •Consulting services to match user needs with appropriate data •finding, accessing and using data •Current Cornell researchers can download archive files from online catalog (search & browse) in formats conversant with statistical software •Data files are identified by a ‘traffic light’ icon that indicates usage level: • Green – downloadable by anyone • Yellow – downloadable from links in the catalog with CUWebAuth authentication (for use within the CISER research computing environment - CISERRSCH) – Cornell researchers can apply for a computing account • Red – data to be used in restriction ( via conditions imposed by data provider) • Cornell Restricted Access Data Center
    4. 4. •Provides Cornell social science researchers with a repository for sharing and providing long-term preservation of their numeric/statistical research data •Participates in Cornell’s Research Data Management Service Group •Assist Cornell social science researchers with Research Data Management (RDM) plans •Provide Cornell social science researchers with support and expertise in obtaining and using restricted data
    5. 5. Believe it or not – not all data are the same … • Data means different things to different people (informatics, geography, art history, system biology, architecture, archaeology etc) • Definition of data / value of data in a commercial sense is different to that in an academic sense • Data requirements differ for the undergraduate, postgraduate, teacher, researcher • Data catalogs, data libraries, gateways, portals exist for a range of disciplinary domains
    6. 6. Research data may include all of the following: • Text or Word documents, spreadsheets • Laboratory notebooks, field notebooks, diaries • Questionnaires, transcripts, codebooks • Audiotapes, videotapes • Photographs, films • Slides, artifacts, specimens, samples • Database contents including video, audio, text, images • Models, algorithms, scripts • Contents of an application such as input, output, log files for analysis software, simulation software, schemas • Methodologies and workflows • Standard operating procedures and protocols Formats, size, volume, open, confidentiality, complexity, flat files – factors to consider as part of the reference interview (computing capabilities, software dependencies, copyright and ethical considerations)
    7. 7. Data Reference Interview - establish what the user actually needs (not what they think they may need!) : • Statistics or data? Summary statistics, secondary use datasets, raw or derived data • • • • • • • • Software requirements, contingencies What is the subject or topic? Health, unemployment, deprivation Type of analysis? Visualization, map, statistical analysis, modelling What is the unit of analysis? Individual, family, county-level, country-level Geographic constraints? Time constraints? Range of years, daily, monthly, quarterly, annual Cross-sectional or longitudinal? Data type? Historic, demographic, financial, administrative, geospatial
    8. 8. Sets the goals and structure for the data interview and helps articulate any decisions made by the data librarian Establishes the ‘learning stage of the user’ and helps put them at ease Observations: Establish time-line for research and data needs (can buy data librarian time, set priorities, allow time for further investigation) Fine balance between assistance and exploitation!! Recognition that data finding, data handling etc may be the learning objective itself (e.g. identifying variables and using a codebook) All data queries should be viewed as new. It will soon become evident if the request has similarities with previous enquiries.
    9. 9. Important not to use too much jargon and to double-check understanding of unfamiliar terms – often we use the same word to mean something different, conversely we can use different words but mean the same thing Sometimes users will say they understand but often don’t. If there’s any doubt ask and explain again. Supply of up-to-date user guides to hand Call Management Systems are great knowledge banks Be familiar with available expertise (colleagues, organization, national, international) Google is a friend. A very good friend.
    10. 10. Two recent examples: Q. Grad student wanting # of plastic surgery clinics in Seoul, South Korea from 19902009 A. the International Society of Aesthetic Plastic Surgery (ISAPS http://www.isaps.org/ ) in particular the ISAPS International Survey on Aesthetic/Cosmetic Procedures – there’s data for 2010 and 2011 (http://www.isaps.org/isaps-global-statistics.html ). Process: Check NGO sources (World Bank, UN etc) Check Google – deep searching in to results using a variety of related terms. Time consuming but often productive. Searches often find references in literature which can be followed up or discussion forums.
    11. 11. User needs statistical data about agrarian violence (originated by land disputes) variables include: food riots, assassinations (if occurred as result of land dispute), imprisonments etc unit of investigation is country-year; area of interest: Latin American countries; period: from 1960 until now, yearly Process: Not likely to available through NGO sources Try deep searching through Google – find literature sources with summary statistics about land disputes for individual countries – no time series Responded: Check Latin America Network Information Center (LANIC) at Univ. Taxas at Austin Speak with our Cornell Colleague Sean Knowlton who has expertise in Latin American statistical resources. Check CEPALSTAT - gateway to statistical information of Latin America and the Caribbean countries published by Economic Commission for Latin America and the Caribbean 11
    12. 12. Social Science research data resources •Inter-University Consortium for Political and Social Research (ICPSR) •National Archive of Criminal Justice Data •Minority Data Resource Center •National Archive of Computerized Data on Aging •Roper Center for Public Opinion Archives •International Data Archives e.g. CESSDA, UKDA, Eurostat • CESSDA catalog (DDI) provides a multi-lingual interface to datasets from member social science data archives across Europe • Study description and online documentation are free •Non-Govenmental Organizations •National / Governmental Statistical Agencies
    13. 13. Social science statistical data on the internet: CISER Internet Data Sources: https://ciser.cornell.edu/info/datasource.shtml MIT Data Sources: http://libguides.mit.edu/ssds/any-subject Columbia University Social Science Data http://library.columbia.edu/locations/dssc/data/socsc.html University California, San Diego – Data on the Web http://3stages.org/idata/ Most research-driven universities have similar listings via Data Library webpages
    14. 14. Location & hours: CISER Data Archive is located at 391 Pine Tree Road, Ithaca CISER is open 8.30am – 4.30pm (Mon-Fri) – walk-in assistance is not always available – so appointments are recommended Contacts: Tel.: (607) 255 4801 Email: ciser@cornell.edu

    ×