APLIC 2012: Discovering & Dealing with Data

Hamilton Public Library
Hamilton Public LibraryLibrarian & Library Value Researcher at Hamilton Public Library
Discovering & Dealing with Data

                                       Presented by


                           Kimberly Silk, MLS, Data Librarian,
                    Martin Prosperity Institute, University of Toronto




17 September 2012
Agenda
    • The MPI information environment
    • Common data sources & authority
    • Data management, discovery and access
    • What is Open Data? Big Data?
    • Fun with data visualization
    • Q&A



2
About the MPI
• The Martin Prosperity Institute is a economic
  think-tank; we are part of the Rotman School
  within the University of Toronto
• My client group consists of grad students, post-
  docs, visiting faculty and researchers who use
  social-science data to support their research
• To support their research process, I procure,
  curate, preserve and make discoverable data sets.
• The MPI has our own data repository that has
  grown to 4 TB in size.
                                                  3
Data Sources
    • Common & Very authoritative sources
      – StatsCan via the Data Liberation Initiative
      – Bureau of Labor Statistics, Bureau of Economic
        Analysis, American Fact Finder (Census)
      – OECD eLibrary
      – World Bank
      – Int’l sources such as UK Data Archive, Swedish
        National Data Service, etc.
      – Pew Research Center
      – Gallup
4
More data sources
    • Less authoritative??
      – Chinese Data Center
      – Rolling Stone
      – MySpace
      – CrunchBase




5
Data Challenge: Discovery

• Lots of research data
  being collected and
  added, but no method
  to manage it, catalogue
  it, or make it findable
• Demands from various
  clients: faculty,
  students, researchers,
  staff, administration
• The shared network
  drive was no longer
  effective




                            6
Show & Share…
    • We want the world to see our data catalogue
    • But, we don’t want the world to be able to
      copy or change what’s in the catalogue, or the
      catalogue itself
    • We need to manage access to our data; who
      are you? Where are you from? Why do you
      want the data? What are you going to do with
      it? Will you share your results?

7
Data Discovery Platforms
    • I reviewed several platforms that would work in
      an academic environment:
      – Nesstar – developed in Norway by Norwegian Social
        Science Data Services, used by StatsCan, UK Data
        Archive, NORC at UChicago
      – Islandora – Open source system based on Fedora
        developed at UPEI
      – ODESI – proprietary system developed and used by
        Scholars Portal
      – Dataverse – Open source system developed by the
        Institute for Quantitative Social Science at Harvard,
        used by NBER, and many academic think tanks.

8
Dataverse
    • Dataverse was a good choice since we could
      install an iteration at UToronto, in the UToronto
      cloud, and I could manage it myself
    • It was free, and my colleagues at Scholar’s Portal
      was interested in installing it – I was the perfect
      guinea pig
    • Slowly, I am cataloguing my data collection; I
      have set up a lending agreement, and it’s working
      very well.
    • Demo:
      http://dataverse.scholarsportal.info/dvn/dv/mpi

9
Open Data
 • Open data is an idea, that certain data should be
   freely available to everyone to use, reuse, and
   redistribute without restriction.
 • Governments around the world have begun to
   “open up” some of their data: US, UK, New
   Zealand, Norway, Russia, Australia, Morocco,
   Netherlands, Chile, Spain, Uruguay, France, Brazil,
   Estonia, Portugal, etc.
 • State- and municipal-levels of government have
   also created open data sites.

10
Open Data Opportunities…
 • Governments open up their data to foster
   better citizenship and improve transparency
 • Open Data can spur grass-roots innovation:
   citizens access open data to use in software
   programs to solve problems, such as finding a
   local daycare, knowing when the next bus will
   come, reporting crime on-the-fly, or watching
   congress proceedings in real time.

11
… and Challenges
 • Open Data takes commitment. Successful
   implementations have a dedicated team of
   people who decide what data to release
   according to usefulness and demand
 • The data must be anonymized, cleansed and
   in a non-proprietary format
 • Organizations must be prepared to listen to
   the citizens, be responsive, and trouble-shoot.
 • Open data is a public service.

12
Big Data
 • Big Data is a collection of data sets that is too
   large for the average database management tool
   (Access and Excel, for instance).
 • Examples come from meteorology, genomics and
   physics. At MPI we wrestle with large GIS data
   sets (maps and satellite data), and deal with data
   at the terabyte (1 trillion bytes) level.
 • Larger data sets deal with petabytes (1
   quadrillion bytes) and exabytes (1 quintillion
   bytes).

13
Data Visualizations
 • The visual representation of data ---- literally,
   a picture can say a thousand [numbers]
 • Edward Tufte is a key pioneer:
   http://www.edwardtufte.com/tufte/
 • Fantastic examples at Flowing Data:
   http://flowingdata.com/
 • RSA Animate: http://www.thersa.org/


14
Q&A

                                (and, Thank You!)



                           Kimberly Silk, MLS, Data Librarian,
                    Martin Prosperity Institute, University of Toronto
                          kimberly.silk@martinprosperity.org




17 September 2012
1 of 15

Recommended

Computers in Libraries 2012 - Discovering Data: Cataloguing Data Collections by
Computers in Libraries 2012 - Discovering Data: Cataloguing Data CollectionsComputers in Libraries 2012 - Discovering Data: Cataloguing Data Collections
Computers in Libraries 2012 - Discovering Data: Cataloguing Data CollectionsHamilton Public Library
373 views14 slides
B2: Open Up: Open Data in the Public Sector by
B2: Open Up: Open Data in the Public SectorB2: Open Up: Open Data in the Public Sector
B2: Open Up: Open Data in the Public SectorMarieke Guy
2.9K views48 slides
20130805 Activating Linked Open Data in Libraries Archives and Museums by
20130805 Activating Linked Open Data in Libraries Archives and Museums20130805 Activating Linked Open Data in Libraries Archives and Museums
20130805 Activating Linked Open Data in Libraries Archives and Museumsandrea huang
5.3K views32 slides
LinkedUp at Mozilla Festival Science Fair by
LinkedUp at Mozilla Festival Science FairLinkedUp at Mozilla Festival Science Fair
LinkedUp at Mozilla Festival Science FairMarieke Guy
2.1K views16 slides
Linked Open Data Approaches within the ARIADNE Project by
Linked Open Data Approaches within the ARIADNE ProjectLinked Open Data Approaches within the ARIADNE Project
Linked Open Data Approaches within the ARIADNE Projectariadnenetwork
1K views27 slides
Data Science Curriculum for Professionals by
Data Science Curriculum for ProfessionalsData Science Curriculum for Professionals
Data Science Curriculum for ProfessionalsEUCLID project
2.2K views16 slides

More Related Content

What's hot

The HathiTrust Research Center: An Overview of Advanced Computational Services by
The HathiTrust Research Center: An Overview of Advanced Computational ServicesThe HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational ServicesRobert H. McDonald
1.1K views24 slides
LOD/LAM Presentation by
LOD/LAM PresentationLOD/LAM Presentation
LOD/LAM PresentationHafabe
553 views10 slides
The culture of researchData by
The culture of researchDataThe culture of researchData
The culture of researchDatapetermurrayrust
2.4K views87 slides
Linked data: what it means, why it matters. Karen Coyle by
Linked data: what it means, why it matters. Karen CoyleLinked data: what it means, why it matters. Karen Coyle
Linked data: what it means, why it matters. Karen CoyleBiblioteca Nacional de España
2.4K views31 slides
Efforts to Promote Open Science in European Research Libraries by
Efforts to Promote Open Science in European Research LibrariesEfforts to Promote Open Science in European Research Libraries
Efforts to Promote Open Science in European Research LibrariesLIBER Europe
892 views45 slides
LIBER Webinar: Research Data Services Survey by
LIBER Webinar: Research Data Services Survey LIBER Webinar: Research Data Services Survey
LIBER Webinar: Research Data Services Survey LIBER Europe
853 views24 slides

What's hot(20)

The HathiTrust Research Center: An Overview of Advanced Computational Services by Robert H. McDonald
The HathiTrust Research Center: An Overview of Advanced Computational ServicesThe HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational Services
Robert H. McDonald1.1K views
LOD/LAM Presentation by Hafabe
LOD/LAM PresentationLOD/LAM Presentation
LOD/LAM Presentation
Hafabe553 views
Efforts to Promote Open Science in European Research Libraries by LIBER Europe
Efforts to Promote Open Science in European Research LibrariesEfforts to Promote Open Science in European Research Libraries
Efforts to Promote Open Science in European Research Libraries
LIBER Europe892 views
LIBER Webinar: Research Data Services Survey by LIBER Europe
LIBER Webinar: Research Data Services Survey LIBER Webinar: Research Data Services Survey
LIBER Webinar: Research Data Services Survey
LIBER Europe853 views
What is #LODLAM?! Understanding linked open data in libraries, archives [and ... by Alison Hitchens
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
Alison Hitchens3K views
A theory of digital library metadata : enrich then filter by Getaneh Alemu
A theory of digital library metadata : enrich then filter A theory of digital library metadata : enrich then filter
A theory of digital library metadata : enrich then filter
Getaneh Alemu840 views
Research into Practice case study 2: Library linked data implementations an... by Hazel Hall
	Research into Practice case study 2:  Library linked data implementations an...	Research into Practice case study 2:  Library linked data implementations an...
Research into Practice case study 2: Library linked data implementations an...
Hazel Hall308 views
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat... by Stefan Dietze
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Stefan Dietze1.8K views
DYAS: The Greek Research Infrastructure Network for the Humanities by ariadnenetwork
DYAS: The Greek Research Infrastructure Network for the HumanitiesDYAS: The Greek Research Infrastructure Network for the Humanities
DYAS: The Greek Research Infrastructure Network for the Humanities
ariadnenetwork934 views
Linked open data and libraries by Alison Hitchens
Linked open data and librariesLinked open data and libraries
Linked open data and libraries
Alison Hitchens1.9K views
What is #LODLAM?! (revised January 2015) by Alison Hitchens
What is #LODLAM?! (revised January 2015)What is #LODLAM?! (revised January 2015)
What is #LODLAM?! (revised January 2015)
Alison Hitchens1.2K views
Increase usage of online resources Edina presentation by JISC RSC Eastern
Increase usage of online resources Edina presentationIncrease usage of online resources Edina presentation
Increase usage of online resources Edina presentation
JISC RSC Eastern921 views
The Challenges of Making Data Travel, by Sabina Leonelli by LEARN Project
The Challenges of Making Data Travel, by Sabina LeonelliThe Challenges of Making Data Travel, by Sabina Leonelli
The Challenges of Making Data Travel, by Sabina Leonelli
LEARN Project675 views

Viewers also liked

Day2 1 nijsten_dealing with data by
Day2 1 nijsten_dealing with dataDay2 1 nijsten_dealing with data
Day2 1 nijsten_dealing with datagroundwatercop
592 views31 slides
Dealing with Unstructured Data: Scaling to Infinity by
Dealing with Unstructured Data: Scaling to InfinityDealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to InfinityGreat Wide Open
415 views28 slides
Dealing With Data by
Dealing With DataDealing With Data
Dealing With Datapsjelinek
557 views11 slides
Back-to-School Survey 2016 by
Back-to-School Survey 2016Back-to-School Survey 2016
Back-to-School Survey 2016Deloitte United States
90K views17 slides
The Near Future of CSS by
The Near Future of CSSThe Near Future of CSS
The Near Future of CSSRachel Andrew
109.8K views54 slides
Essential things that should always be in your car by
Essential things that should always be in your carEssential things that should always be in your car
Essential things that should always be in your carEason Chan
58.7K views12 slides

Viewers also liked(7)

Day2 1 nijsten_dealing with data by groundwatercop
Day2 1 nijsten_dealing with dataDay2 1 nijsten_dealing with data
Day2 1 nijsten_dealing with data
groundwatercop592 views
Dealing with Unstructured Data: Scaling to Infinity by Great Wide Open
Dealing with Unstructured Data: Scaling to InfinityDealing with Unstructured Data: Scaling to Infinity
Dealing with Unstructured Data: Scaling to Infinity
Great Wide Open415 views
Dealing With Data by psjelinek
Dealing With DataDealing With Data
Dealing With Data
psjelinek557 views
The Near Future of CSS by Rachel Andrew
The Near Future of CSSThe Near Future of CSS
The Near Future of CSS
Rachel Andrew109.8K views
Essential things that should always be in your car by Eason Chan
Essential things that should always be in your carEssential things that should always be in your car
Essential things that should always be in your car
Eason Chan58.7K views

Similar to APLIC 2012: Discovering & Dealing with Data

Data 101: A Gentle Introduction by
Data 101: A Gentle IntroductionData 101: A Gentle Introduction
Data 101: A Gentle IntroductionHamilton Public Library
793 views18 slides
Data 101: A Gentle Introduction by
Data 101: A Gentle IntroductionData 101: A Gentle Introduction
Data 101: A Gentle IntroductionHamilton Public Library
850 views15 slides
Big and Small Web Data by
Big and Small Web DataBig and Small Web Data
Big and Small Web DataMarieke Guy
3.8K views60 slides
Open Sesame: Open Data, Data Liberation and Opportunities for Librarians by
Open Sesame: Open Data, Data Liberation and Opportunities for LibrariansOpen Sesame: Open Data, Data Liberation and Opportunities for Librarians
Open Sesame: Open Data, Data Liberation and Opportunities for LibrariansCommunication and Media Studies, Carleton University
2.3K views77 slides
datamining_Lecture_1(introduction).pptx by
datamining_Lecture_1(introduction).pptxdatamining_Lecture_1(introduction).pptx
datamining_Lecture_1(introduction).pptxHASHEMHASH
29 views155 slides
Guy avoiding-dat apocalypse by
Guy avoiding-dat apocalypseGuy avoiding-dat apocalypse
Guy avoiding-dat apocalypseENUG
388 views38 slides

Similar to APLIC 2012: Discovering & Dealing with Data(20)

Big and Small Web Data by Marieke Guy
Big and Small Web DataBig and Small Web Data
Big and Small Web Data
Marieke Guy3.8K views
datamining_Lecture_1(introduction).pptx by HASHEMHASH
datamining_Lecture_1(introduction).pptxdatamining_Lecture_1(introduction).pptx
datamining_Lecture_1(introduction).pptx
HASHEMHASH29 views
Guy avoiding-dat apocalypse by ENUG
Guy avoiding-dat apocalypseGuy avoiding-dat apocalypse
Guy avoiding-dat apocalypse
ENUG388 views
Supporting Libraries in Leading the Way in Research Data Management by Marieke Guy
Supporting Libraries in Leading the Way in Research Data ManagementSupporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data Management
Marieke Guy2.1K views
ICPSR Data Services by ICPSR
ICPSR Data ServicesICPSR Data Services
ICPSR Data Services
ICPSR920 views
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data by Martin Kaltenböck
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open DataODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data
Martin Kaltenböck1.1K views
Getting Started with Institutional Repositories and Open Access by Abby Clobridge
Getting Started with Institutional Repositories and Open AccessGetting Started with Institutional Repositories and Open Access
Getting Started with Institutional Repositories and Open Access
Abby Clobridge848 views
ICPSR Workshop Template - 2012/13 by ICPSR
ICPSR Workshop Template - 2012/13ICPSR Workshop Template - 2012/13
ICPSR Workshop Template - 2012/13
ICPSR918 views
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ... by hsuleslie
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
hsuleslie686 views
Open Data Publication - Requirements, Good practices, and Benefits by ariadnenetwork
Open Data Publication - Requirements, Good practices, and BenefitsOpen Data Publication - Requirements, Good practices, and Benefits
Open Data Publication - Requirements, Good practices, and Benefits
ariadnenetwork1.4K views
Meeting Federal Research Requirements for Data Management Plans, Public Acces... by ICPSR
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
ICPSR1.4K views
Requirements for Open Sharing of Archaeological Research Data by ariadnenetwork
Requirements for Open Sharing of Archaeological Research DataRequirements for Open Sharing of Archaeological Research Data
Requirements for Open Sharing of Archaeological Research Data
ariadnenetwork797 views
Research Data Management in Academic Libraries: Meeting the Challenge by Spencer Keralis
Research Data Management in Academic Libraries: Meeting the ChallengeResearch Data Management in Academic Libraries: Meeting the Challenge
Research Data Management in Academic Libraries: Meeting the Challenge
Spencer Keralis736 views

More from Hamilton Public Library

OLA Super Conference 2019: Data Skills for 21st Century Library Practice by
OLA Super Conference 2019: Data Skills for 21st Century Library PracticeOLA Super Conference 2019: Data Skills for 21st Century Library Practice
OLA Super Conference 2019: Data Skills for 21st Century Library PracticeHamilton Public Library
251 views9 slides
OLA Super Conference 2019: Research Round-up by
OLA Super Conference 2019: Research Round-upOLA Super Conference 2019: Research Round-up
OLA Super Conference 2019: Research Round-upHamilton Public Library
211 views18 slides
OLA Super Conference 2019: Changing Stakeholder Perceptions About Library Value by
OLA Super Conference 2019: Changing Stakeholder Perceptions About Library ValueOLA Super Conference 2019: Changing Stakeholder Perceptions About Library Value
OLA Super Conference 2019: Changing Stakeholder Perceptions About Library ValueHamilton Public Library
200 views12 slides
Constructing a Strategic Plan: Essential Processes and Components by
Constructing a Strategic Plan: Essential Processes and ComponentsConstructing a Strategic Plan: Essential Processes and Components
Constructing a Strategic Plan: Essential Processes and ComponentsHamilton Public Library
363 views40 slides
Library Space Use Study: What we Learned by
Library Space Use Study: What we Learned Library Space Use Study: What we Learned
Library Space Use Study: What we Learned Hamilton Public Library
364 views21 slides
Surfacing Integration in the Digital Scholarship Ecosystem by
Surfacing Integration in the Digital Scholarship EcosystemSurfacing Integration in the Digital Scholarship Ecosystem
Surfacing Integration in the Digital Scholarship EcosystemHamilton Public Library
126 views9 slides

More from Hamilton Public Library(20)

OLA Super Conference 2019: Data Skills for 21st Century Library Practice by Hamilton Public Library
OLA Super Conference 2019: Data Skills for 21st Century Library PracticeOLA Super Conference 2019: Data Skills for 21st Century Library Practice
OLA Super Conference 2019: Data Skills for 21st Century Library Practice
OLA Super Conference 2019: Changing Stakeholder Perceptions About Library Value by Hamilton Public Library
OLA Super Conference 2019: Changing Stakeholder Perceptions About Library ValueOLA Super Conference 2019: Changing Stakeholder Perceptions About Library Value
OLA Super Conference 2019: Changing Stakeholder Perceptions About Library Value
Constructing a Strategic Plan: Essential Processes and Components by Hamilton Public Library
Constructing a Strategic Plan: Essential Processes and ComponentsConstructing a Strategic Plan: Essential Processes and Components
Constructing a Strategic Plan: Essential Processes and Components
All Together Now: Collaboration and Coordination in Canada's Digital Scholars... by Hamilton Public Library
All Together Now: Collaboration and Coordination in Canada's Digital Scholars...All Together Now: Collaboration and Coordination in Canada's Digital Scholars...
All Together Now: Collaboration and Coordination in Canada's Digital Scholars...
L-Index: Designing a New Method for Measuring Library Impact in Canada by Hamilton Public Library
L-Index: Designing a New Method for Measuring Library Impact in CanadaL-Index: Designing a New Method for Measuring Library Impact in Canada
L-Index: Designing a New Method for Measuring Library Impact in Canada
Ink On Our Hands: Plotting the Map of Canada's Integrated Digital Scholarship... by Hamilton Public Library
Ink On Our Hands: Plotting the Map of Canada's Integrated Digital Scholarship...Ink On Our Hands: Plotting the Map of Canada's Integrated Digital Scholarship...
Ink On Our Hands: Plotting the Map of Canada's Integrated Digital Scholarship...
Library Evaluation in 3 Parts - Presented by Dr. Bill Irwin, Computers in Lib... by Hamilton Public Library
Library Evaluation in 3 Parts - Presented by Dr. Bill Irwin, Computers in Lib...Library Evaluation in 3 Parts - Presented by Dr. Bill Irwin, Computers in Lib...
Library Evaluation in 3 Parts - Presented by Dr. Bill Irwin, Computers in Lib...
Strategic Metrics Workshop: Computers in Libraries Conference, April 2015 by Hamilton Public Library
Strategic Metrics Workshop: Computers in Libraries Conference, April 2015Strategic Metrics Workshop: Computers in Libraries Conference, April 2015
Strategic Metrics Workshop: Computers in Libraries Conference, April 2015
So Much More: The Economic Impact of Toronto Public Library on the City of To... by Hamilton Public Library
So Much More: The Economic Impact of Toronto Public Library on the City of To...So Much More: The Economic Impact of Toronto Public Library on the City of To...
So Much More: The Economic Impact of Toronto Public Library on the City of To...
Internet Librarian 2010 - Using Design Thinking to Enable Innovation by Hamilton Public Library
Internet Librarian 2010 - Using Design Thinking to Enable InnovationInternet Librarian 2010 - Using Design Thinking to Enable Innovation
Internet Librarian 2010 - Using Design Thinking to Enable Innovation

Recently uploaded

Powerful Google developer tools for immediate impact! (2023-24) by
Powerful Google developer tools for immediate impact! (2023-24)Powerful Google developer tools for immediate impact! (2023-24)
Powerful Google developer tools for immediate impact! (2023-24)wesley chun
10 views38 slides
Unit 1_Lecture 2_Physical Design of IoT.pdf by
Unit 1_Lecture 2_Physical Design of IoT.pdfUnit 1_Lecture 2_Physical Design of IoT.pdf
Unit 1_Lecture 2_Physical Design of IoT.pdfStephenTec
12 views36 slides
Serverless computing with Google Cloud (2023-24) by
Serverless computing with Google Cloud (2023-24)Serverless computing with Google Cloud (2023-24)
Serverless computing with Google Cloud (2023-24)wesley chun
11 views33 slides
PRODUCT PRESENTATION.pptx by
PRODUCT PRESENTATION.pptxPRODUCT PRESENTATION.pptx
PRODUCT PRESENTATION.pptxangelicacueva6
15 views1 slide
Kyo - Functional Scala 2023.pdf by
Kyo - Functional Scala 2023.pdfKyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdfFlavio W. Brasil
400 views92 slides
Future of Indian ConsumerTech by
Future of Indian ConsumerTechFuture of Indian ConsumerTech
Future of Indian ConsumerTechKapil Khandelwal (KK)
22 views68 slides

Recently uploaded(20)

Powerful Google developer tools for immediate impact! (2023-24) by wesley chun
Powerful Google developer tools for immediate impact! (2023-24)Powerful Google developer tools for immediate impact! (2023-24)
Powerful Google developer tools for immediate impact! (2023-24)
wesley chun10 views
Unit 1_Lecture 2_Physical Design of IoT.pdf by StephenTec
Unit 1_Lecture 2_Physical Design of IoT.pdfUnit 1_Lecture 2_Physical Design of IoT.pdf
Unit 1_Lecture 2_Physical Design of IoT.pdf
StephenTec12 views
Serverless computing with Google Cloud (2023-24) by wesley chun
Serverless computing with Google Cloud (2023-24)Serverless computing with Google Cloud (2023-24)
Serverless computing with Google Cloud (2023-24)
wesley chun11 views
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf by Dr. Jimmy Schwarzkopf
STKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdfSTKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdf
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院 by IttrainingIttraining
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc11 views
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive by Network Automation Forum
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive
"Running students' code in isolation. The hard way", Yurii Holiuk by Fwdays
"Running students' code in isolation. The hard way", Yurii Holiuk "Running students' code in isolation. The hard way", Yurii Holiuk
"Running students' code in isolation. The hard way", Yurii Holiuk
Fwdays17 views
HTTP headers that make your website go faster - devs.gent November 2023 by Thijs Feryn
HTTP headers that make your website go faster - devs.gent November 2023HTTP headers that make your website go faster - devs.gent November 2023
HTTP headers that make your website go faster - devs.gent November 2023
Thijs Feryn22 views

APLIC 2012: Discovering & Dealing with Data

  • 1. Discovering & Dealing with Data Presented by Kimberly Silk, MLS, Data Librarian, Martin Prosperity Institute, University of Toronto 17 September 2012
  • 2. Agenda • The MPI information environment • Common data sources & authority • Data management, discovery and access • What is Open Data? Big Data? • Fun with data visualization • Q&A 2
  • 3. About the MPI • The Martin Prosperity Institute is a economic think-tank; we are part of the Rotman School within the University of Toronto • My client group consists of grad students, post- docs, visiting faculty and researchers who use social-science data to support their research • To support their research process, I procure, curate, preserve and make discoverable data sets. • The MPI has our own data repository that has grown to 4 TB in size. 3
  • 4. Data Sources • Common & Very authoritative sources – StatsCan via the Data Liberation Initiative – Bureau of Labor Statistics, Bureau of Economic Analysis, American Fact Finder (Census) – OECD eLibrary – World Bank – Int’l sources such as UK Data Archive, Swedish National Data Service, etc. – Pew Research Center – Gallup 4
  • 5. More data sources • Less authoritative?? – Chinese Data Center – Rolling Stone – MySpace – CrunchBase 5
  • 6. Data Challenge: Discovery • Lots of research data being collected and added, but no method to manage it, catalogue it, or make it findable • Demands from various clients: faculty, students, researchers, staff, administration • The shared network drive was no longer effective 6
  • 7. Show & Share… • We want the world to see our data catalogue • But, we don’t want the world to be able to copy or change what’s in the catalogue, or the catalogue itself • We need to manage access to our data; who are you? Where are you from? Why do you want the data? What are you going to do with it? Will you share your results? 7
  • 8. Data Discovery Platforms • I reviewed several platforms that would work in an academic environment: – Nesstar – developed in Norway by Norwegian Social Science Data Services, used by StatsCan, UK Data Archive, NORC at UChicago – Islandora – Open source system based on Fedora developed at UPEI – ODESI – proprietary system developed and used by Scholars Portal – Dataverse – Open source system developed by the Institute for Quantitative Social Science at Harvard, used by NBER, and many academic think tanks. 8
  • 9. Dataverse • Dataverse was a good choice since we could install an iteration at UToronto, in the UToronto cloud, and I could manage it myself • It was free, and my colleagues at Scholar’s Portal was interested in installing it – I was the perfect guinea pig • Slowly, I am cataloguing my data collection; I have set up a lending agreement, and it’s working very well. • Demo: http://dataverse.scholarsportal.info/dvn/dv/mpi 9
  • 10. Open Data • Open data is an idea, that certain data should be freely available to everyone to use, reuse, and redistribute without restriction. • Governments around the world have begun to “open up” some of their data: US, UK, New Zealand, Norway, Russia, Australia, Morocco, Netherlands, Chile, Spain, Uruguay, France, Brazil, Estonia, Portugal, etc. • State- and municipal-levels of government have also created open data sites. 10
  • 11. Open Data Opportunities… • Governments open up their data to foster better citizenship and improve transparency • Open Data can spur grass-roots innovation: citizens access open data to use in software programs to solve problems, such as finding a local daycare, knowing when the next bus will come, reporting crime on-the-fly, or watching congress proceedings in real time. 11
  • 12. … and Challenges • Open Data takes commitment. Successful implementations have a dedicated team of people who decide what data to release according to usefulness and demand • The data must be anonymized, cleansed and in a non-proprietary format • Organizations must be prepared to listen to the citizens, be responsive, and trouble-shoot. • Open data is a public service. 12
  • 13. Big Data • Big Data is a collection of data sets that is too large for the average database management tool (Access and Excel, for instance). • Examples come from meteorology, genomics and physics. At MPI we wrestle with large GIS data sets (maps and satellite data), and deal with data at the terabyte (1 trillion bytes) level. • Larger data sets deal with petabytes (1 quadrillion bytes) and exabytes (1 quintillion bytes). 13
  • 14. Data Visualizations • The visual representation of data ---- literally, a picture can say a thousand [numbers] • Edward Tufte is a key pioneer: http://www.edwardtufte.com/tufte/ • Fantastic examples at Flowing Data: http://flowingdata.com/ • RSA Animate: http://www.thersa.org/ 14
  • 15. Q&A (and, Thank You!) Kimberly Silk, MLS, Data Librarian, Martin Prosperity Institute, University of Toronto kimberly.silk@martinprosperity.org 17 September 2012