Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

APLIC 2012: Discovering & Dealing with Data


Published on

A presentation to the Association of Parliamentary Libraries in Canada, Toronto, September 2012.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

APLIC 2012: Discovering & Dealing with Data

  1. 1. Discovering & Dealing with Data Presented by Kimberly Silk, MLS, Data Librarian, Martin Prosperity Institute, University of Toronto17 September 2012
  2. 2. Agenda • The MPI information environment • Common data sources & authority • Data management, discovery and access • What is Open Data? Big Data? • Fun with data visualization • Q&A2
  3. 3. About the MPI• The Martin Prosperity Institute is a economic think-tank; we are part of the Rotman School within the University of Toronto• My client group consists of grad students, post- docs, visiting faculty and researchers who use social-science data to support their research• To support their research process, I procure, curate, preserve and make discoverable data sets.• The MPI has our own data repository that has grown to 4 TB in size. 3
  4. 4. Data Sources • Common & Very authoritative sources – StatsCan via the Data Liberation Initiative – Bureau of Labor Statistics, Bureau of Economic Analysis, American Fact Finder (Census) – OECD eLibrary – World Bank – Int’l sources such as UK Data Archive, Swedish National Data Service, etc. – Pew Research Center – Gallup4
  5. 5. More data sources • Less authoritative?? – Chinese Data Center – Rolling Stone – MySpace – CrunchBase5
  6. 6. Data Challenge: Discovery• Lots of research data being collected and added, but no method to manage it, catalogue it, or make it findable• Demands from various clients: faculty, students, researchers, staff, administration• The shared network drive was no longer effective 6
  7. 7. Show & Share… • We want the world to see our data catalogue • But, we don’t want the world to be able to copy or change what’s in the catalogue, or the catalogue itself • We need to manage access to our data; who are you? Where are you from? Why do you want the data? What are you going to do with it? Will you share your results?7
  8. 8. Data Discovery Platforms • I reviewed several platforms that would work in an academic environment: – Nesstar – developed in Norway by Norwegian Social Science Data Services, used by StatsCan, UK Data Archive, NORC at UChicago – Islandora – Open source system based on Fedora developed at UPEI – ODESI – proprietary system developed and used by Scholars Portal – Dataverse – Open source system developed by the Institute for Quantitative Social Science at Harvard, used by NBER, and many academic think tanks.8
  9. 9. Dataverse • Dataverse was a good choice since we could install an iteration at UToronto, in the UToronto cloud, and I could manage it myself • It was free, and my colleagues at Scholar’s Portal was interested in installing it – I was the perfect guinea pig • Slowly, I am cataloguing my data collection; I have set up a lending agreement, and it’s working very well. • Demo:
  10. 10. Open Data • Open data is an idea, that certain data should be freely available to everyone to use, reuse, and redistribute without restriction. • Governments around the world have begun to “open up” some of their data: US, UK, New Zealand, Norway, Russia, Australia, Morocco, Netherlands, Chile, Spain, Uruguay, France, Brazil, Estonia, Portugal, etc. • State- and municipal-levels of government have also created open data sites.10
  11. 11. Open Data Opportunities… • Governments open up their data to foster better citizenship and improve transparency • Open Data can spur grass-roots innovation: citizens access open data to use in software programs to solve problems, such as finding a local daycare, knowing when the next bus will come, reporting crime on-the-fly, or watching congress proceedings in real time.11
  12. 12. … and Challenges • Open Data takes commitment. Successful implementations have a dedicated team of people who decide what data to release according to usefulness and demand • The data must be anonymized, cleansed and in a non-proprietary format • Organizations must be prepared to listen to the citizens, be responsive, and trouble-shoot. • Open data is a public service.12
  13. 13. Big Data • Big Data is a collection of data sets that is too large for the average database management tool (Access and Excel, for instance). • Examples come from meteorology, genomics and physics. At MPI we wrestle with large GIS data sets (maps and satellite data), and deal with data at the terabyte (1 trillion bytes) level. • Larger data sets deal with petabytes (1 quadrillion bytes) and exabytes (1 quintillion bytes).13
  14. 14. Data Visualizations • The visual representation of data ---- literally, a picture can say a thousand [numbers] • Edward Tufte is a key pioneer: • Fantastic examples at Flowing Data: • RSA Animate:
  15. 15. Q&A (and, Thank You!) Kimberly Silk, MLS, Data Librarian, Martin Prosperity Institute, University of Toronto September 2012