Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Finding Datasets and Statistics
What is Data? 
da·ta noun plural but singular or plural in construction, often 
attributive ˈdā-tə, ˈda- also ˈdä- 
1: fac...
Data can be 
• Observational: Captured in real-time, 
typically outside the lab 
– Examples: Sensor readings, survey resul...
Data can be 
• Text: field or laboratory notes, 
survey responses 
• Numeric: tables, counts, 
measurements 
• Audiovisual...
Microdata 
• Data directly observed or collected from a 
specific unit of observation. 
• Contain individual cases, usuall...
Aggregate Data 
Is higher-level data that have been compiled 
from smaller units of data. 
Examples: inflation rate, consu...
Statistics 
are numerical data that has 
been organized and 
interpreted, usually 
displayed in tables.
Datasets 
• A dataset or study is 
made up of the raw data 
file and any related files, 
usually the codebook 
and setup f...
Repositories 
• A data repository is a collection of 
datasets that have been 
deposited for storage and 
findability. 
• ...
To recap: 
• Data are raw ingredients 
from which statistics are 
created. 
• Statistical analysis can be 
performed on da...
Finding Datasets
1. Think about who might 
collect the data. 
• Could it have been collected by a government 
agency? 
• A nonprofit or non...
2. Look for publications that use 
the kind of data you’re looking for 
and that cite the dataset 
In other words, is the ...
3. Once you know that what you want 
exists, it's time to hunt it down. 
• Is it freely available on the web? 
• Or part o...
Upcoming SlideShare
Loading in …5
×

Data 2014

"What is data" slideshow for Libguide

  • Be the first to comment

Data 2014

  1. 1. Finding Datasets and Statistics
  2. 2. What is Data? da·ta noun plural but singular or plural in construction, often attributive ˈdā-tə, ˈda- also ˈdä- 1: factual information (as measurements or statistics) used as a basis for reasoning, discussion, or calculation 2: information output by a sensing device or organ that includes both useful and irrelevant or redundant information and must be processed to be meaningful 3: information in numerical form that can be digitally transmitted or processed Merriam-Webster (http://www.merriam-webster.com/dictionary/data)
  3. 3. Data can be • Observational: Captured in real-time, typically outside the lab – Examples: Sensor readings, survey results, images, audio, video • Experimental: Typically generated in the lab or under controlled conditions – Examples: test results • Simulation: Machine generated from test models – Examples: climate models, economic models • Derived /Compiled: Generated from existing datasets – Examples: text and data mining, compiled database, 3D models
  4. 4. Data can be • Text: field or laboratory notes, survey responses • Numeric: tables, counts, measurements • Audiovisual: images, sound recordings, video • Models, computer code, geospatial data • Discipline-specific: FITS in astronomy, CIF in chemistry • Instrument-specific: equipment outputs
  5. 5. Microdata • Data directly observed or collected from a specific unit of observation. • Contain individual cases, usually individual people, or in the case of Census data, individual households Examples: • Census: the unit of observation is probably an individual, a household or a family. • Survey or poll: the responses of a single respondent
  6. 6. Aggregate Data Is higher-level data that have been compiled from smaller units of data. Examples: inflation rate, consumer price index, demographic data for city or state
  7. 7. Statistics are numerical data that has been organized and interpreted, usually displayed in tables.
  8. 8. Datasets • A dataset or study is made up of the raw data file and any related files, usually the codebook and setup files. • Most data sets require at least basic statistical analysis (Stata, SPSS, R, etc.) or spreadsheet programs (Excel) to use.
  9. 9. Repositories • A data repository is a collection of datasets that have been deposited for storage and findability. • They are often – discipline specific and/or – affiliated with a research institution • Examples – ICPSR – Harvard Dataverse Network – UC San Diego Digital Collections
  10. 10. To recap: • Data are raw ingredients from which statistics are created. • Statistical analysis can be performed on data to show relationships among the variables collected. • Through secondary data analysis, many different researchers can re-use the same data set for different purposes.
  11. 11. Finding Datasets
  12. 12. 1. Think about who might collect the data. • Could it have been collected by a government agency? • A nonprofit or nongovernmental organization? • A private business or industry group? • Academic researchers?
  13. 13. 2. Look for publications that use the kind of data you’re looking for and that cite the dataset In other words, is the data you want mentioned in scholarly articles or government reports or some other source?
  14. 14. 3. Once you know that what you want exists, it's time to hunt it down. • Is it freely available on the web? • Or part of a package to which the library already subscribes? • Is it something we can buy? (And is it within the library's budget and can the purchase be made quickly enough to fit your timeframe?) • Can it be requested directly from the researcher?

×