Data 101: A Gentle Introduction

Hamilton Public Library
Hamilton Public LibraryLibrarian & Library Value Researcher at Hamilton Public Library
14 August 2013
Data 101:
A Gentle Introduction
Presented by
Kimberly Silk, MLS,
Data Librarian, Martin Prosperity Institute,
Rotman School of Management, University of Toronto
2
Our Agenda
• Defining data librarianship
• Basic terminology
• Common data sources
• Our challenge: data management, preservation,
discovery and access
• What are “big data”?
• What are data visualizations?
• Sources
• Q & A
3
Defining Data Librarianship
• Data librarianship is a relatively new area of practice,
emerging with the growth of digital media since the
1970s;
• Data librarians are professional library staff engaged in
managing research data as a resource, and supporting
researchers in these activities;
• We support our institutions and researchers in the
areas of data management, metadata management,
and teaching how to use data as a resource;
• Many of us work in the social sciences, but there is
growth in the natural sciences and humanities as well.
4
Basic Terminology
• Data – plural! Think: Squirrels!! 
• Microdata – raw data, individual records consisting of rows of
numbers (Excel spreadsheet);
• Statistics – summarized tables and cross-tabulations that have
been formulated from the raw data;
• Aggregate data – statistical summaries organized in a data file
structure (Excel) that permits further analysis;
• PUMF – Public Use Microdata File – raw data that is available for
public use; some data may be filtered and geographies repressed
to ensure personal privacy;
• Variables – a set of factors, traits or conditions that describes a
unit of analysis; for instance, sex, age, marital status, etc.
• Frequencies – the number of times an observation occurs in the
data;
5
Common Data Sources
• Gov’t- collected surveys
– US Census (American Fact Finder)
– Bureau of Labor Statistics, Bureau of Economic Analysis,
– Statistics Canada
– International sources such as UK Data Archive, Swedish
National Data Service, Australian Data Archive, etc.
– OECD iLibrary
– World DataBank
– Pew Research Center
– Gallup
– Thomson
6
Other International Data Sources
• Some countries do not gather data, have not
been gathering data for very long, or else limit
or filter available data
• For instance, Russia, India, China and other
developing countries may not gather, preserve
or release their data;
• The BRICs (Brazil, Russia, India, China) will
struggle with this issue as their economies
grow.
7
Uncommon Data Sources
• Data can come from everywhere;
• Occasionally, the MPI acquires data from
unusual sources, such as:
– Rolling Stone magazine
– MySpace social media site for bands
– CrunchBase database of technology companies
Data Management,
Preservation, Discovery &
Access
• We’ve conquered print collections,
but data present a new challenge;
• Like all digital files, metadata is
necessary to describe data assets;
• Like images, a single data set can
mean many things to many people;
• How do we manage these data to
make sure they are discoverable,
accessible, and preserved?
• Traditionally, data files have been
stored on network drives, and shared
or restricted according to the groups
who need to use them;
• Network drives are difficult to search,
can be hard to share and restrict, and
don’t deal with metadata well;
• Web pages with links has been a
common way to distribute data sets;
• We needed new tools – a new kind of
catalogue that is designed for the
specialized needs of data.
9
Data Discovery Platforms
• Nesstar – developed in Norway by Norwegian
Social Science Data Services, used by Statistics
Canada, UK Data Archive, NORC at the
University of Chicago
• ODESI – proprietary system developed and
used by Scholars Portal
• Dataverse – Open source system developed by
the Institute for Quantitative Social Science
(IQSS) at Harvard, used by NBER and ICPSR
Dataverse
• We installed an iteration
of Dataverse at the
University of Toronto, in
our “cloud”, and I manage
my data collections myself;
• As an open source
solution, it’s cost-effective
and my colleagues at
Scholar’s Portal support it
for me and other Ontario
universities.
• The data are associated
with studies; several data
sets can be associated
with a single study;
• The world can see the
metadata for each data
collection, but access to
the data sets themselves
are restricted to those
who contact me to get
permission.
Data 101: A Gentle Introduction
12
What are Big Data?
• Big Data are data that are too large for the
average database management tool (Access and
Excel, for instance).
• Examples come from meteorology, genomics and
physics. At MPI we wrestle with large GIS data
sets (maps and satellite data), and deal with data
at the terabyte (1 trillion bytes) level.
• Larger data sets deal with petabytes (1
quadrillion bytes) and exabytes (1 quintillion
bytes).
13
Data Visualizations
• The visual representation of data ---- literally,
a picture can say a thousand [numbers]
• Edward Tufte is a key pioneer:
http://www.edwardtufte.com/tufte/
• Fantastic examples at Flowing Data:
http://flowingdata.com/
• RSA Animate: http://www.thersa.org/
14
Sources
• International Association for Social Science
Information Services & Technology (ASSIST) -
http://www.iassistdata.org/
• OECD iLibrary - http://www.oecd-ilibrary.org/
• World Bank Data - http://data.worldbank.org/
• UK Data Archive - http://data-archive.ac.uk/
• Nesstar - http://www.nesstar.com/
• Dataverse - http://thedata.org/
17 September 2012
Q & A
(and, Thank You!)
Kimberly Silk, MLS, Data Librarian,
Martin Prosperity Institute, University of Toronto
kimberly.silk@martinprosperity.org
1 of 15

Recommended

Data 101: A Gentle Introduction by
Data 101: A Gentle IntroductionData 101: A Gentle Introduction
Data 101: A Gentle IntroductionHamilton Public Library
793 views18 slides
Lafferty "Supporting Research Data Management: Perceptions from a Library Pra... by
Lafferty "Supporting Research Data Management: Perceptions from a Library Pra...Lafferty "Supporting Research Data Management: Perceptions from a Library Pra...
Lafferty "Supporting Research Data Management: Perceptions from a Library Pra...National Information Standards Organization (NISO)
383 views25 slides
Labou "Data Science and the Library at UC San Diego" by
Labou "Data Science and the Library at UC San Diego"Labou "Data Science and the Library at UC San Diego"
Labou "Data Science and the Library at UC San Diego"National Information Standards Organization (NISO)
399 views17 slides
Why does research data matter to libraries by
Why does research data matter to librariesWhy does research data matter to libraries
Why does research data matter to librariesJisc RDM
585 views25 slides
Llauferseiler "OU Libraries: Opportunities Supporting Research and Education" by
Llauferseiler "OU Libraries: Opportunities Supporting Research and Education"Llauferseiler "OU Libraries: Opportunities Supporting Research and Education"
Llauferseiler "OU Libraries: Opportunities Supporting Research and Education"National Information Standards Organization (NISO)
386 views25 slides
Data Management for Undergraduate Research by
Data Management for Undergraduate ResearchData Management for Undergraduate Research
Data Management for Undergraduate ResearchRebekah Cummings
2.9K views34 slides

More Related Content

What's hot

Data Management for Undergraduate Researchers by
Data Management for Undergraduate ResearchersData Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersRebekah Cummings
946 views43 slides
Allard - Research Data Services in Libraries by
Allard - Research Data Services in LibrariesAllard - Research Data Services in Libraries
Allard - Research Data Services in LibrariesNational Information Standards Organization (NISO)
1.7K views49 slides
Is It Too Late to Ensure Continuity of Access to the Scholarly Record? by
Is It Too Late to Ensure Continuity of Access to the Scholarly Record?Is It Too Late to Ensure Continuity of Access to the Scholarly Record?
Is It Too Late to Ensure Continuity of Access to the Scholarly Record?EDINA, University of Edinburgh
1.7K views27 slides
Publishing Open Access isn’t the End of the Story by
Publishing Open Access isn’t the End of the StoryPublishing Open Access isn’t the End of the Story
Publishing Open Access isn’t the End of the Storyariadnenetwork
1.4K views25 slides
Research Data in an Open Science World - Prof. Dr. Eva Mendez, uc3m by
Research Data in an Open Science World - Prof. Dr. Eva Mendez, uc3mResearch Data in an Open Science World - Prof. Dr. Eva Mendez, uc3m
Research Data in an Open Science World - Prof. Dr. Eva Mendez, uc3mLEARN Project
878 views24 slides

What's hot(19)

Data Management for Undergraduate Researchers by Rebekah Cummings
Data Management for Undergraduate ResearchersData Management for Undergraduate Researchers
Data Management for Undergraduate Researchers
Rebekah Cummings946 views
Publishing Open Access isn’t the End of the Story by ariadnenetwork
Publishing Open Access isn’t the End of the StoryPublishing Open Access isn’t the End of the Story
Publishing Open Access isn’t the End of the Story
ariadnenetwork1.4K views
Research Data in an Open Science World - Prof. Dr. Eva Mendez, uc3m by LEARN Project
Research Data in an Open Science World - Prof. Dr. Eva Mendez, uc3mResearch Data in an Open Science World - Prof. Dr. Eva Mendez, uc3m
Research Data in an Open Science World - Prof. Dr. Eva Mendez, uc3m
LEARN Project878 views
WP3: overzicht van de voortgang van WP# op de CLARIAH-dag by CLARIAH
WP3: overzicht van de voortgang van WP# op de CLARIAH-dagWP3: overzicht van de voortgang van WP# op de CLARIAH-dag
WP3: overzicht van de voortgang van WP# op de CLARIAH-dag
CLARIAH745 views
Getting onboard the data training: How librarians fit in by Diane Clark
Getting onboard the data training: How librarians fit inGetting onboard the data training: How librarians fit in
Getting onboard the data training: How librarians fit in
Diane Clark558 views
Introduction to Digital File Management by Rebekah Cummings
Introduction to Digital File ManagementIntroduction to Digital File Management
Introduction to Digital File Management
Rebekah Cummings3.1K views
Research Data Services at the University of Utah by Rebekah Cummings
Research Data Services at the University of UtahResearch Data Services at the University of Utah
Research Data Services at the University of Utah
Rebekah Cummings242 views
LIBER Webinar: Research Data Services Survey by LIBER Europe
LIBER Webinar: Research Data Services Survey LIBER Webinar: Research Data Services Survey
LIBER Webinar: Research Data Services Survey
LIBER Europe853 views
Winning the Tour de France, Research Data and Data Stewardship by Alastair Dunning
Winning the Tour de France, Research Data and Data StewardshipWinning the Tour de France, Research Data and Data Stewardship
Winning the Tour de France, Research Data and Data Stewardship
Alastair Dunning231 views
Efforts to Promote Open Science in European Research Libraries by LIBER Europe
Efforts to Promote Open Science in European Research LibrariesEfforts to Promote Open Science in European Research Libraries
Efforts to Promote Open Science in European Research Libraries
LIBER Europe892 views
The Landscape of Research Data Management by Alastair Dunning
The Landscape of Research Data Management The Landscape of Research Data Management
The Landscape of Research Data Management
Alastair Dunning212 views
Lake us-canada policesupdate by Sherry Lake
Lake us-canada policesupdateLake us-canada policesupdate
Lake us-canada policesupdate
Sherry Lake383 views
Managing Research Data in the Life Sciences by alwerhane
Managing Research Data in the Life SciencesManaging Research Data in the Life Sciences
Managing Research Data in the Life Sciences
alwerhane287 views

Similar to Data 101: A Gentle Introduction

APLIC 2012: Discovering & Dealing with Data by
APLIC 2012: Discovering & Dealing with DataAPLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with DataHamilton Public Library
438 views15 slides
Guy avoiding-dat apocalypse by
Guy avoiding-dat apocalypseGuy avoiding-dat apocalypse
Guy avoiding-dat apocalypseENUG
388 views38 slides
Computers in Libraries 2012 - Discovering Data: Cataloguing Data Collections by
Computers in Libraries 2012 - Discovering Data: Cataloguing Data CollectionsComputers in Libraries 2012 - Discovering Data: Cataloguing Data Collections
Computers in Libraries 2012 - Discovering Data: Cataloguing Data CollectionsHamilton Public Library
373 views14 slides
Research Data Management in Academic Libraries: Meeting the Challenge by
Research Data Management in Academic Libraries: Meeting the ChallengeResearch Data Management in Academic Libraries: Meeting the Challenge
Research Data Management in Academic Libraries: Meeting the ChallengeSpencer Keralis
736 views42 slides
Data Literacy: Creating and Managing Reserach Data by
Data Literacy: Creating and Managing Reserach DataData Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach Datacunera
1.4K views37 slides
Alain Frey Research Data for universities and information producers by
Alain Frey Research Data for universities and information producersAlain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producersIncisive_Events
948 views26 slides

Similar to Data 101: A Gentle Introduction(20)

Guy avoiding-dat apocalypse by ENUG
Guy avoiding-dat apocalypseGuy avoiding-dat apocalypse
Guy avoiding-dat apocalypse
ENUG388 views
Computers in Libraries 2012 - Discovering Data: Cataloguing Data Collections by Hamilton Public Library
Computers in Libraries 2012 - Discovering Data: Cataloguing Data CollectionsComputers in Libraries 2012 - Discovering Data: Cataloguing Data Collections
Computers in Libraries 2012 - Discovering Data: Cataloguing Data Collections
Research Data Management in Academic Libraries: Meeting the Challenge by Spencer Keralis
Research Data Management in Academic Libraries: Meeting the ChallengeResearch Data Management in Academic Libraries: Meeting the Challenge
Research Data Management in Academic Libraries: Meeting the Challenge
Spencer Keralis736 views
Data Literacy: Creating and Managing Reserach Data by cunera
Data Literacy: Creating and Managing Reserach DataData Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach Data
cunera1.4K views
Alain Frey Research Data for universities and information producers by Incisive_Events
Alain Frey Research Data for universities and information producersAlain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producers
Incisive_Events948 views
Digital Data Sharing: Opportunities and Challenges of Opening Research by Martin Donnelly
Digital Data Sharing: Opportunities and Challenges of Opening ResearchDigital Data Sharing: Opportunities and Challenges of Opening Research
Digital Data Sharing: Opportunities and Challenges of Opening Research
Martin Donnelly776 views
Big and Small Web Data by Marieke Guy
Big and Small Web DataBig and Small Web Data
Big and Small Web Data
Marieke Guy3.8K views
Research Data Management by Sarah Jones
Research Data ManagementResearch Data Management
Research Data Management
Sarah Jones4.9K views
ICPSR Data Services by ICPSR
ICPSR Data ServicesICPSR Data Services
ICPSR Data Services
ICPSR920 views
ICPSR Workshop Template - 2012/13 by ICPSR
ICPSR Workshop Template - 2012/13ICPSR Workshop Template - 2012/13
ICPSR Workshop Template - 2012/13
ICPSR918 views
Shared Data & Big Data for Libraries by robin fay
Shared Data & Big Data for LibrariesShared Data & Big Data for Libraries
Shared Data & Big Data for Libraries
robin fay1.1K views
Shared data and the future of libraries by Regan Harper
Shared data and the future of librariesShared data and the future of libraries
Shared data and the future of libraries
Regan Harper300 views
Getting Started in Data Science by Thinkful
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data Science
Thinkful196 views
Open Data Publication - Requirements, Good practices, and Benefits by ariadnenetwork
Open Data Publication - Requirements, Good practices, and BenefitsOpen Data Publication - Requirements, Good practices, and Benefits
Open Data Publication - Requirements, Good practices, and Benefits
ariadnenetwork1.4K views
Open Access to Research Data: Challenges and Solutions by Martin Donnelly
Open Access to Research Data: Challenges and SolutionsOpen Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and Solutions
Martin Donnelly1.3K views

More from Hamilton Public Library

OLA Super Conference 2019: Data Skills for 21st Century Library Practice by
OLA Super Conference 2019: Data Skills for 21st Century Library PracticeOLA Super Conference 2019: Data Skills for 21st Century Library Practice
OLA Super Conference 2019: Data Skills for 21st Century Library PracticeHamilton Public Library
251 views9 slides
OLA Super Conference 2019: Research Round-up by
OLA Super Conference 2019: Research Round-upOLA Super Conference 2019: Research Round-up
OLA Super Conference 2019: Research Round-upHamilton Public Library
211 views18 slides
OLA Super Conference 2019: Changing Stakeholder Perceptions About Library Value by
OLA Super Conference 2019: Changing Stakeholder Perceptions About Library ValueOLA Super Conference 2019: Changing Stakeholder Perceptions About Library Value
OLA Super Conference 2019: Changing Stakeholder Perceptions About Library ValueHamilton Public Library
200 views12 slides
Constructing a Strategic Plan: Essential Processes and Components by
Constructing a Strategic Plan: Essential Processes and ComponentsConstructing a Strategic Plan: Essential Processes and Components
Constructing a Strategic Plan: Essential Processes and ComponentsHamilton Public Library
363 views40 slides
Library Space Use Study: What we Learned by
Library Space Use Study: What we Learned Library Space Use Study: What we Learned
Library Space Use Study: What we Learned Hamilton Public Library
364 views21 slides
Surfacing Integration in the Digital Scholarship Ecosystem by
Surfacing Integration in the Digital Scholarship EcosystemSurfacing Integration in the Digital Scholarship Ecosystem
Surfacing Integration in the Digital Scholarship EcosystemHamilton Public Library
126 views9 slides

More from Hamilton Public Library(20)

OLA Super Conference 2019: Data Skills for 21st Century Library Practice by Hamilton Public Library
OLA Super Conference 2019: Data Skills for 21st Century Library PracticeOLA Super Conference 2019: Data Skills for 21st Century Library Practice
OLA Super Conference 2019: Data Skills for 21st Century Library Practice
OLA Super Conference 2019: Changing Stakeholder Perceptions About Library Value by Hamilton Public Library
OLA Super Conference 2019: Changing Stakeholder Perceptions About Library ValueOLA Super Conference 2019: Changing Stakeholder Perceptions About Library Value
OLA Super Conference 2019: Changing Stakeholder Perceptions About Library Value
Constructing a Strategic Plan: Essential Processes and Components by Hamilton Public Library
Constructing a Strategic Plan: Essential Processes and ComponentsConstructing a Strategic Plan: Essential Processes and Components
Constructing a Strategic Plan: Essential Processes and Components
All Together Now: Collaboration and Coordination in Canada's Digital Scholars... by Hamilton Public Library
All Together Now: Collaboration and Coordination in Canada's Digital Scholars...All Together Now: Collaboration and Coordination in Canada's Digital Scholars...
All Together Now: Collaboration and Coordination in Canada's Digital Scholars...
L-Index: Designing a New Method for Measuring Library Impact in Canada by Hamilton Public Library
L-Index: Designing a New Method for Measuring Library Impact in CanadaL-Index: Designing a New Method for Measuring Library Impact in Canada
L-Index: Designing a New Method for Measuring Library Impact in Canada
Ink On Our Hands: Plotting the Map of Canada's Integrated Digital Scholarship... by Hamilton Public Library
Ink On Our Hands: Plotting the Map of Canada's Integrated Digital Scholarship...Ink On Our Hands: Plotting the Map of Canada's Integrated Digital Scholarship...
Ink On Our Hands: Plotting the Map of Canada's Integrated Digital Scholarship...
Library Evaluation in 3 Parts - Presented by Dr. Bill Irwin, Computers in Lib... by Hamilton Public Library
Library Evaluation in 3 Parts - Presented by Dr. Bill Irwin, Computers in Lib...Library Evaluation in 3 Parts - Presented by Dr. Bill Irwin, Computers in Lib...
Library Evaluation in 3 Parts - Presented by Dr. Bill Irwin, Computers in Lib...
Strategic Metrics Workshop: Computers in Libraries Conference, April 2015 by Hamilton Public Library
Strategic Metrics Workshop: Computers in Libraries Conference, April 2015Strategic Metrics Workshop: Computers in Libraries Conference, April 2015
Strategic Metrics Workshop: Computers in Libraries Conference, April 2015
So Much More: The Economic Impact of Toronto Public Library on the City of To... by Hamilton Public Library
So Much More: The Economic Impact of Toronto Public Library on the City of To...So Much More: The Economic Impact of Toronto Public Library on the City of To...
So Much More: The Economic Impact of Toronto Public Library on the City of To...
Internet Librarian 2010 - Using Design Thinking to Enable Innovation by Hamilton Public Library
Internet Librarian 2010 - Using Design Thinking to Enable InnovationInternet Librarian 2010 - Using Design Thinking to Enable Innovation
Internet Librarian 2010 - Using Design Thinking to Enable Innovation

Recently uploaded

Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...The Digital Insurer
40 views52 slides
Kyo - Functional Scala 2023.pdf by
Kyo - Functional Scala 2023.pdfKyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdfFlavio W. Brasil
443 views92 slides
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue by
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlueElevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlueShapeBlue
149 views7 slides
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Safe Software
373 views86 slides
"Surviving highload with Node.js", Andrii Shumada by
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada Fwdays
49 views29 slides
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R... by
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...ShapeBlue
105 views15 slides

Recently uploaded(20)

Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by The Digital Insurer
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue by ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlueElevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
ShapeBlue149 views
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software373 views
"Surviving highload with Node.js", Andrii Shumada by Fwdays
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada
Fwdays49 views
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R... by ShapeBlue
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
ShapeBlue105 views
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc130 views
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ... by ShapeBlue
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
ShapeBlue52 views
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O... by ShapeBlue
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
ShapeBlue59 views
The Power of Heat Decarbonisation Plans in the Built Environment by IES VE
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built Environment
IES VE67 views
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online by ShapeBlue
KVM Security Groups Under the Hood - Wido den Hollander - Your.OnlineKVM Security Groups Under the Hood - Wido den Hollander - Your.Online
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online
ShapeBlue154 views
DRBD Deep Dive - Philipp Reisner - LINBIT by ShapeBlue
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBIT
ShapeBlue110 views
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue by ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlueCloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
ShapeBlue68 views
Keynote Talk: Open Source is Not Dead - Charles Schulz - Vates by ShapeBlue
Keynote Talk: Open Source is Not Dead - Charles Schulz - VatesKeynote Talk: Open Source is Not Dead - Charles Schulz - Vates
Keynote Talk: Open Source is Not Dead - Charles Schulz - Vates
ShapeBlue178 views
Confidence in CloudStack - Aron Wagner, Nathan Gleason - Americ by ShapeBlue
Confidence in CloudStack - Aron Wagner, Nathan Gleason - AmericConfidence in CloudStack - Aron Wagner, Nathan Gleason - Americ
Confidence in CloudStack - Aron Wagner, Nathan Gleason - Americ
ShapeBlue58 views
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson142 views
Extending KVM Host HA for Non-NFS Storage - Alex Ivanov - StorPool by ShapeBlue
Extending KVM Host HA for Non-NFS Storage -  Alex Ivanov - StorPoolExtending KVM Host HA for Non-NFS Storage -  Alex Ivanov - StorPool
Extending KVM Host HA for Non-NFS Storage - Alex Ivanov - StorPool
ShapeBlue56 views
Future of AR - Facebook Presentation by Rob McCarty
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
Rob McCarty54 views

Data 101: A Gentle Introduction

  • 1. 14 August 2013 Data 101: A Gentle Introduction Presented by Kimberly Silk, MLS, Data Librarian, Martin Prosperity Institute, Rotman School of Management, University of Toronto
  • 2. 2 Our Agenda • Defining data librarianship • Basic terminology • Common data sources • Our challenge: data management, preservation, discovery and access • What are “big data”? • What are data visualizations? • Sources • Q & A
  • 3. 3 Defining Data Librarianship • Data librarianship is a relatively new area of practice, emerging with the growth of digital media since the 1970s; • Data librarians are professional library staff engaged in managing research data as a resource, and supporting researchers in these activities; • We support our institutions and researchers in the areas of data management, metadata management, and teaching how to use data as a resource; • Many of us work in the social sciences, but there is growth in the natural sciences and humanities as well.
  • 4. 4 Basic Terminology • Data – plural! Think: Squirrels!!  • Microdata – raw data, individual records consisting of rows of numbers (Excel spreadsheet); • Statistics – summarized tables and cross-tabulations that have been formulated from the raw data; • Aggregate data – statistical summaries organized in a data file structure (Excel) that permits further analysis; • PUMF – Public Use Microdata File – raw data that is available for public use; some data may be filtered and geographies repressed to ensure personal privacy; • Variables – a set of factors, traits or conditions that describes a unit of analysis; for instance, sex, age, marital status, etc. • Frequencies – the number of times an observation occurs in the data;
  • 5. 5 Common Data Sources • Gov’t- collected surveys – US Census (American Fact Finder) – Bureau of Labor Statistics, Bureau of Economic Analysis, – Statistics Canada – International sources such as UK Data Archive, Swedish National Data Service, Australian Data Archive, etc. – OECD iLibrary – World DataBank – Pew Research Center – Gallup – Thomson
  • 6. 6 Other International Data Sources • Some countries do not gather data, have not been gathering data for very long, or else limit or filter available data • For instance, Russia, India, China and other developing countries may not gather, preserve or release their data; • The BRICs (Brazil, Russia, India, China) will struggle with this issue as their economies grow.
  • 7. 7 Uncommon Data Sources • Data can come from everywhere; • Occasionally, the MPI acquires data from unusual sources, such as: – Rolling Stone magazine – MySpace social media site for bands – CrunchBase database of technology companies
  • 8. Data Management, Preservation, Discovery & Access • We’ve conquered print collections, but data present a new challenge; • Like all digital files, metadata is necessary to describe data assets; • Like images, a single data set can mean many things to many people; • How do we manage these data to make sure they are discoverable, accessible, and preserved? • Traditionally, data files have been stored on network drives, and shared or restricted according to the groups who need to use them; • Network drives are difficult to search, can be hard to share and restrict, and don’t deal with metadata well; • Web pages with links has been a common way to distribute data sets; • We needed new tools – a new kind of catalogue that is designed for the specialized needs of data.
  • 9. 9 Data Discovery Platforms • Nesstar – developed in Norway by Norwegian Social Science Data Services, used by Statistics Canada, UK Data Archive, NORC at the University of Chicago • ODESI – proprietary system developed and used by Scholars Portal • Dataverse – Open source system developed by the Institute for Quantitative Social Science (IQSS) at Harvard, used by NBER and ICPSR
  • 10. Dataverse • We installed an iteration of Dataverse at the University of Toronto, in our “cloud”, and I manage my data collections myself; • As an open source solution, it’s cost-effective and my colleagues at Scholar’s Portal support it for me and other Ontario universities. • The data are associated with studies; several data sets can be associated with a single study; • The world can see the metadata for each data collection, but access to the data sets themselves are restricted to those who contact me to get permission.
  • 12. 12 What are Big Data? • Big Data are data that are too large for the average database management tool (Access and Excel, for instance). • Examples come from meteorology, genomics and physics. At MPI we wrestle with large GIS data sets (maps and satellite data), and deal with data at the terabyte (1 trillion bytes) level. • Larger data sets deal with petabytes (1 quadrillion bytes) and exabytes (1 quintillion bytes).
  • 13. 13 Data Visualizations • The visual representation of data ---- literally, a picture can say a thousand [numbers] • Edward Tufte is a key pioneer: http://www.edwardtufte.com/tufte/ • Fantastic examples at Flowing Data: http://flowingdata.com/ • RSA Animate: http://www.thersa.org/
  • 14. 14 Sources • International Association for Social Science Information Services & Technology (ASSIST) - http://www.iassistdata.org/ • OECD iLibrary - http://www.oecd-ilibrary.org/ • World Bank Data - http://data.worldbank.org/ • UK Data Archive - http://data-archive.ac.uk/ • Nesstar - http://www.nesstar.com/ • Dataverse - http://thedata.org/
  • 15. 17 September 2012 Q & A (and, Thank You!) Kimberly Silk, MLS, Data Librarian, Martin Prosperity Institute, University of Toronto kimberly.silk@martinprosperity.org