Data Reference Interview
Stuart Macdonald
CISER Data Services Librarian
Email: srm262@cornell.edu
Data Archive - Collection and Services
•Established over 30 years ago
•Collection of numeric datasets to support quantitative
research
c. 27,000 online files in addition to thousands of studies on CD/DVD

•Emphasis on demography (state/federal censuses),
economics, health, labor, election studies, attitudinal and
behavioral studies, family life etc.
•Consulting services to match user needs with appropriate data
•finding, accessing and using data

•Current Cornell researchers can download archive files from online
catalog (search & browse) in formats conversant with statistical software

•Data files are identified by a ‘traffic light’ icon that indicates usage level:
• Green – downloadable by anyone
• Yellow – downloadable from links in the catalog with CUWebAuth authentication
(for use within the CISER research computing environment - CISERRSCH) –
Cornell researchers can apply for a computing account
• Red – data to be used in restriction ( via conditions imposed by data provider)

• Cornell Restricted Access Data Center
•Provides Cornell social science researchers with a repository for
sharing and providing long-term preservation of their numeric/statistical
research data
•Participates in Cornell’s Research Data Management Service Group
•Assist Cornell social science researchers with Research Data
Management (RDM) plans
•Provide Cornell social science researchers with support and expertise
in obtaining and using restricted data
Believe it or not – not all data are the same …
• Data means different things to different people (informatics,
geography, art history, system biology, architecture,
archaeology etc)
• Definition of data / value of data in a commercial sense is
different to that in an academic sense
• Data requirements differ for the undergraduate, postgraduate,
teacher, researcher

• Data catalogs, data libraries, gateways, portals exist for a range
of disciplinary domains
Research data may include all of the following:
• Text or Word documents, spreadsheets
• Laboratory notebooks, field notebooks, diaries
• Questionnaires, transcripts, codebooks
• Audiotapes, videotapes
• Photographs, films
• Slides, artifacts, specimens, samples
• Database contents including video, audio, text, images
• Models, algorithms, scripts
• Contents of an application such as input, output, log files for analysis software, simulation
software, schemas
• Methodologies and workflows
• Standard operating procedures and protocols
Formats, size, volume, open, confidentiality, complexity, flat files – factors to consider as part of
the reference interview (computing capabilities, software dependencies, copyright and ethical
considerations)
Data Reference Interview - establish what the user actually needs (not
what they think they may need!) :
• Statistics or data? Summary statistics, secondary use datasets, raw or derived data
•

•
•
•
•
•
•
•

Software requirements, contingencies

What is the subject or topic? Health, unemployment, deprivation
Type of analysis? Visualization, map, statistical analysis, modelling
What is the unit of analysis? Individual, family, county-level, country-level
Geographic constraints?
Time constraints? Range of years, daily, monthly, quarterly, annual
Cross-sectional or longitudinal?
Data type? Historic, demographic, financial, administrative, geospatial
Sets the goals and structure for the data interview and helps articulate any
decisions made by the data librarian

Establishes the ‘learning stage of the user’ and helps put them at ease

Observations:
Establish time-line for research and data needs (can buy data librarian time, set
priorities, allow time for further investigation)
Fine balance between assistance and exploitation!!
Recognition that data finding, data handling etc may be the learning objective itself
(e.g. identifying variables and using a codebook)
All data queries should be viewed as new. It will soon become evident if the
request has similarities with previous enquiries.
Important not to use too much jargon and to double-check understanding of
unfamiliar terms – often we use the same word to mean something different,
conversely we can use different words but mean the same thing
Sometimes users will say they understand but often don’t. If there’s any doubt
ask and explain again.

Supply of up-to-date user guides to hand
Call Management Systems are great knowledge banks
Be familiar with available expertise (colleagues, organization, national,
international)
Google is a friend. A very good friend.
Two recent examples:
Q. Grad student wanting # of plastic surgery clinics in Seoul, South Korea from 19902009
A. the International Society of Aesthetic Plastic Surgery (ISAPS http://www.isaps.org/ ) in particular the ISAPS International Survey on
Aesthetic/Cosmetic Procedures – there’s data for 2010 and 2011
(http://www.isaps.org/isaps-global-statistics.html ).
Process:
Check NGO sources (World Bank, UN etc)
Check Google – deep searching in to results using a variety of related terms. Time
consuming but often productive. Searches often find references in literature which
can be followed up or discussion forums.
User needs statistical data about agrarian violence (originated by land disputes) variables
include: food riots, assassinations (if occurred as result of land dispute), imprisonments etc
unit of investigation is country-year; area of interest: Latin American countries; period: from
1960 until now, yearly
Process:
Not likely to available through NGO sources
Try deep searching through Google – find literature sources with summary statistics about
land disputes for individual countries – no time series
Responded:
Check Latin America Network Information Center (LANIC) at Univ. Taxas at Austin
Speak with our Cornell Colleague Sean Knowlton who has expertise in Latin American
statistical resources.
Check CEPALSTAT - gateway to statistical information of Latin America and the Caribbean
countries published by Economic Commission for Latin America and the Caribbean
11
Social Science research data resources
•Inter-University Consortium for Political and Social Research (ICPSR)
•National Archive of Criminal Justice Data
•Minority Data Resource Center
•National Archive of Computerized Data on Aging

•Roper Center for Public Opinion Archives
•International Data Archives e.g. CESSDA, UKDA, Eurostat
• CESSDA catalog (DDI) provides a multi-lingual interface to datasets from member social
science data archives across Europe
• Study description and online documentation are free

•Non-Govenmental Organizations
•National / Governmental Statistical Agencies
Social science statistical data on the internet:
CISER Internet Data Sources:
https://ciser.cornell.edu/info/datasource.shtml

MIT Data Sources:
http://libguides.mit.edu/ssds/any-subject
Columbia University Social Science Data
http://library.columbia.edu/locations/dssc/data/socsc.html
University California, San Diego – Data on the Web
http://3stages.org/idata/

Most research-driven universities have similar listings via Data Library webpages
Location & hours:
CISER Data Archive is located at 391 Pine Tree Road, Ithaca
CISER is open 8.30am – 4.30pm (Mon-Fri) – walk-in assistance
is not always available – so appointments are recommended

Contacts:
Tel.: (607) 255 4801
Email: ciser@cornell.edu

CISER & the Data Reference Interview

  • 1.
    Data Reference Interview StuartMacdonald CISER Data Services Librarian Email: srm262@cornell.edu
  • 2.
    Data Archive -Collection and Services •Established over 30 years ago •Collection of numeric datasets to support quantitative research c. 27,000 online files in addition to thousands of studies on CD/DVD •Emphasis on demography (state/federal censuses), economics, health, labor, election studies, attitudinal and behavioral studies, family life etc.
  • 3.
    •Consulting services tomatch user needs with appropriate data •finding, accessing and using data •Current Cornell researchers can download archive files from online catalog (search & browse) in formats conversant with statistical software •Data files are identified by a ‘traffic light’ icon that indicates usage level: • Green – downloadable by anyone • Yellow – downloadable from links in the catalog with CUWebAuth authentication (for use within the CISER research computing environment - CISERRSCH) – Cornell researchers can apply for a computing account • Red – data to be used in restriction ( via conditions imposed by data provider) • Cornell Restricted Access Data Center
  • 4.
    •Provides Cornell socialscience researchers with a repository for sharing and providing long-term preservation of their numeric/statistical research data •Participates in Cornell’s Research Data Management Service Group •Assist Cornell social science researchers with Research Data Management (RDM) plans •Provide Cornell social science researchers with support and expertise in obtaining and using restricted data
  • 5.
    Believe it ornot – not all data are the same … • Data means different things to different people (informatics, geography, art history, system biology, architecture, archaeology etc) • Definition of data / value of data in a commercial sense is different to that in an academic sense • Data requirements differ for the undergraduate, postgraduate, teacher, researcher • Data catalogs, data libraries, gateways, portals exist for a range of disciplinary domains
  • 6.
    Research data mayinclude all of the following: • Text or Word documents, spreadsheets • Laboratory notebooks, field notebooks, diaries • Questionnaires, transcripts, codebooks • Audiotapes, videotapes • Photographs, films • Slides, artifacts, specimens, samples • Database contents including video, audio, text, images • Models, algorithms, scripts • Contents of an application such as input, output, log files for analysis software, simulation software, schemas • Methodologies and workflows • Standard operating procedures and protocols Formats, size, volume, open, confidentiality, complexity, flat files – factors to consider as part of the reference interview (computing capabilities, software dependencies, copyright and ethical considerations)
  • 7.
    Data Reference Interview- establish what the user actually needs (not what they think they may need!) : • Statistics or data? Summary statistics, secondary use datasets, raw or derived data • • • • • • • • Software requirements, contingencies What is the subject or topic? Health, unemployment, deprivation Type of analysis? Visualization, map, statistical analysis, modelling What is the unit of analysis? Individual, family, county-level, country-level Geographic constraints? Time constraints? Range of years, daily, monthly, quarterly, annual Cross-sectional or longitudinal? Data type? Historic, demographic, financial, administrative, geospatial
  • 8.
    Sets the goalsand structure for the data interview and helps articulate any decisions made by the data librarian Establishes the ‘learning stage of the user’ and helps put them at ease Observations: Establish time-line for research and data needs (can buy data librarian time, set priorities, allow time for further investigation) Fine balance between assistance and exploitation!! Recognition that data finding, data handling etc may be the learning objective itself (e.g. identifying variables and using a codebook) All data queries should be viewed as new. It will soon become evident if the request has similarities with previous enquiries.
  • 9.
    Important not touse too much jargon and to double-check understanding of unfamiliar terms – often we use the same word to mean something different, conversely we can use different words but mean the same thing Sometimes users will say they understand but often don’t. If there’s any doubt ask and explain again. Supply of up-to-date user guides to hand Call Management Systems are great knowledge banks Be familiar with available expertise (colleagues, organization, national, international) Google is a friend. A very good friend.
  • 10.
    Two recent examples: Q.Grad student wanting # of plastic surgery clinics in Seoul, South Korea from 19902009 A. the International Society of Aesthetic Plastic Surgery (ISAPS http://www.isaps.org/ ) in particular the ISAPS International Survey on Aesthetic/Cosmetic Procedures – there’s data for 2010 and 2011 (http://www.isaps.org/isaps-global-statistics.html ). Process: Check NGO sources (World Bank, UN etc) Check Google – deep searching in to results using a variety of related terms. Time consuming but often productive. Searches often find references in literature which can be followed up or discussion forums.
  • 11.
    User needs statisticaldata about agrarian violence (originated by land disputes) variables include: food riots, assassinations (if occurred as result of land dispute), imprisonments etc unit of investigation is country-year; area of interest: Latin American countries; period: from 1960 until now, yearly Process: Not likely to available through NGO sources Try deep searching through Google – find literature sources with summary statistics about land disputes for individual countries – no time series Responded: Check Latin America Network Information Center (LANIC) at Univ. Taxas at Austin Speak with our Cornell Colleague Sean Knowlton who has expertise in Latin American statistical resources. Check CEPALSTAT - gateway to statistical information of Latin America and the Caribbean countries published by Economic Commission for Latin America and the Caribbean 11
  • 12.
    Social Science researchdata resources •Inter-University Consortium for Political and Social Research (ICPSR) •National Archive of Criminal Justice Data •Minority Data Resource Center •National Archive of Computerized Data on Aging •Roper Center for Public Opinion Archives •International Data Archives e.g. CESSDA, UKDA, Eurostat • CESSDA catalog (DDI) provides a multi-lingual interface to datasets from member social science data archives across Europe • Study description and online documentation are free •Non-Govenmental Organizations •National / Governmental Statistical Agencies
  • 13.
    Social science statisticaldata on the internet: CISER Internet Data Sources: https://ciser.cornell.edu/info/datasource.shtml MIT Data Sources: http://libguides.mit.edu/ssds/any-subject Columbia University Social Science Data http://library.columbia.edu/locations/dssc/data/socsc.html University California, San Diego – Data on the Web http://3stages.org/idata/ Most research-driven universities have similar listings via Data Library webpages
  • 14.
    Location & hours: CISERData Archive is located at 391 Pine Tree Road, Ithaca CISER is open 8.30am – 4.30pm (Mon-Fri) – walk-in assistance is not always available – so appointments are recommended Contacts: Tel.: (607) 255 4801 Email: ciser@cornell.edu

Editor's Notes

  • #4 Data, documentation and associated files (e.g. SAS, SPSS, Stata) are housed on the CISER file server. Files are downloaded from the catalog in ZIP compressed format..Cross-National Time Series data
  • #6 UG – more general enquiries – summary statistics rather than raw data – what they ask for is often not what they really needPG – nature of enquiry more specific, more often again, summary statistics. May be raw data as PhD progresses. Often data collection may be involved, to be used in conjunction with other sources, visualized etcTeacher – teaching datasets or sample data. Or data subsets (NGO, IGO)Researcher – Have a better idea as to what data they need, usually raw data, need to identify variables, help with codebook / questionnaire. Use of statistical analysis packages, GIS
  • #13 As CISER is an ICPSR member, researchers can gain access to data held in those CESSDA Archives that are themselves ICPSR membersCESSDAT member organisations adhere to a Trans-border Data Access Agreement
  • #14 As CISER is an ICPSR member, researchers can gain access to data held in those CESSDA Archives that are themselves ICPSR membersCESSDAT member organisations adhere to a Trans-border Data Access Agreement