14 August 2013
Data 101:
A Gentle Introduction
Presented by
Kimberly Silk, MLS,
Data Librarian, Martin Prosperity Institut...
2
Our Agenda
• Defining data librarianship
• Basic terminology
• Common data sources
• Our challenge: data management, pre...
3
Defining Data Librarianship
• Data librarianship is a relatively new area of practice,
emerging with the growth of digit...
4
Basic Terminology
• Data – plural! Think: Squirrels!! 
• Microdata – raw data, individual records consisting of rows of...
5
Common Data Sources
• Gov’t- collected surveys
– US Census (American Fact Finder)
– Bureau of Labor Statistics, Bureau o...
6
Other International Data Sources
• Some countries do not gather data, have not
been gathering data for very long, or els...
7
Uncommon Data Sources
• Data can come from everywhere;
• Occasionally, the MPI acquires data from
unusual sources, such ...
Data Management,
Preservation, Discovery &
Access
• We’ve conquered print collections,
but data present a new challenge;
•...
9
Data Discovery Platforms
• Nesstar – developed in Norway by Norwegian
Social Science Data Services, used by Statistics
C...
Dataverse
• We installed an iteration
of Dataverse at the
University of Toronto, in
our “cloud”, and I manage
my data coll...
12
What are Big Data?
• Big Data are data that are too large for the
average database management tool (Access and
Excel, f...
13
Data Visualizations
• The visual representation of data ---- literally,
a picture can say a thousand [numbers]
• Edward...
14
Sources
• International Association for Social Science
Information Services & Technology (ASSIST) -
http://www.iassistd...
17 September 2012
Q & A
(and, Thank You!)
Kimberly Silk, MLS, Data Librarian,
Martin Prosperity Institute, University of T...
Data 101: A Gentle Introduction
Upcoming SlideShare
Loading in...5
×

Data 101: A Gentle Introduction

525

Published on

This webinar was prepared for and hosted by the SLA Social Science and Transportation divisions and the Upstate New York chapter. Presented on August 14, 2013.

Published in: Technology, Education
1 Comment
1 Like
Statistics
Notes
No Downloads
Views
Total Views
525
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
9
Comments
1
Likes
1
Embeds 0
No embeds

No notes for slide

Data 101: A Gentle Introduction

  1. 1. 14 August 2013 Data 101: A Gentle Introduction Presented by Kimberly Silk, MLS, Data Librarian, Martin Prosperity Institute, Rotman School of Management, University of Toronto
  2. 2. 2 Our Agenda • Defining data librarianship • Basic terminology • Common data sources • Our challenge: data management, preservation, discovery and access • What are “big data”? • What are data visualizations? • Sources • Q & A
  3. 3. 3 Defining Data Librarianship • Data librarianship is a relatively new area of practice, emerging with the growth of digital media since the 1970s; • Data librarians are professional library staff engaged in managing research data as a resource, and supporting researchers in these activities; • We support our institutions and researchers in the areas of data management, metadata management, and teaching how to use data as a resource; • Many of us work in the social sciences, but there is growth in the natural sciences and humanities as well.
  4. 4. 4 Basic Terminology • Data – plural! Think: Squirrels!!  • Microdata – raw data, individual records consisting of rows of numbers (Excel spreadsheet); • Statistics – summarized tables and cross-tabulations that have been formulated from the raw data; • Aggregate data – statistical summaries organized in a data file structure (Excel) that permits further analysis; • PUMF – Public Use Microdata File – raw data that is available for public use; some data may be filtered and geographies repressed to ensure personal privacy; • Variables – a set of factors, traits or conditions that describes a unit of analysis; for instance, sex, age, marital status, etc. • Frequencies – the number of times an observation occurs in the data;
  5. 5. 5 Common Data Sources • Gov’t- collected surveys – US Census (American Fact Finder) – Bureau of Labor Statistics, Bureau of Economic Analysis, – Statistics Canada – International sources such as UK Data Archive, Swedish National Data Service, Australian Data Archive, etc. – OECD iLibrary – World DataBank – Pew Research Center – Gallup – Thomson
  6. 6. 6 Other International Data Sources • Some countries do not gather data, have not been gathering data for very long, or else limit or filter available data • For instance, Russia, India, China and other developing countries may not gather, preserve or release their data; • The BRICs (Brazil, Russia, India, China) will struggle with this issue as their economies grow.
  7. 7. 7 Uncommon Data Sources • Data can come from everywhere; • Occasionally, the MPI acquires data from unusual sources, such as: – Rolling Stone magazine – MySpace social media site for bands – CrunchBase database of technology companies
  8. 8. Data Management, Preservation, Discovery & Access • We’ve conquered print collections, but data present a new challenge; • Like all digital files, metadata is necessary to describe data assets; • Like images, a single data set can mean many things to many people; • How do we manage these data to make sure they are discoverable, accessible, and preserved? • Traditionally, data files have been stored on network drives, and shared or restricted according to the groups who need to use them; • Network drives are difficult to search, can be hard to share and restrict, and don’t deal with metadata well; • Web pages with links has been a common way to distribute data sets; • We needed new tools – a new kind of catalogue that is designed for the specialized needs of data.
  9. 9. 9 Data Discovery Platforms • Nesstar – developed in Norway by Norwegian Social Science Data Services, used by Statistics Canada, UK Data Archive, NORC at the University of Chicago • ODESI – proprietary system developed and used by Scholars Portal • Dataverse – Open source system developed by the Institute for Quantitative Social Science (IQSS) at Harvard, used by NBER and ICPSR
  10. 10. Dataverse • We installed an iteration of Dataverse at the University of Toronto, in our “cloud”, and I manage my data collections myself; • As an open source solution, it’s cost-effective and my colleagues at Scholar’s Portal support it for me and other Ontario universities. • The data are associated with studies; several data sets can be associated with a single study; • The world can see the metadata for each data collection, but access to the data sets themselves are restricted to those who contact me to get permission.
  11. 11. 12 What are Big Data? • Big Data are data that are too large for the average database management tool (Access and Excel, for instance). • Examples come from meteorology, genomics and physics. At MPI we wrestle with large GIS data sets (maps and satellite data), and deal with data at the terabyte (1 trillion bytes) level. • Larger data sets deal with petabytes (1 quadrillion bytes) and exabytes (1 quintillion bytes).
  12. 12. 13 Data Visualizations • The visual representation of data ---- literally, a picture can say a thousand [numbers] • Edward Tufte is a key pioneer: http://www.edwardtufte.com/tufte/ • Fantastic examples at Flowing Data: http://flowingdata.com/ • RSA Animate: http://www.thersa.org/
  13. 13. 14 Sources • International Association for Social Science Information Services & Technology (ASSIST) - http://www.iassistdata.org/ • OECD iLibrary - http://www.oecd-ilibrary.org/ • World Bank Data - http://data.worldbank.org/ • UK Data Archive - http://data-archive.ac.uk/ • Nesstar - http://www.nesstar.com/ • Dataverse - http://thedata.org/
  14. 14. 17 September 2012 Q & A (and, Thank You!) Kimberly Silk, MLS, Data Librarian, Martin Prosperity Institute, University of Toronto kimberly.silk@martinprosperity.org
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×