Computers in Libraries 2012 - Discovering Data: Cataloguing Data Collections
Cataloguing Data Collections
Kimberly Silk, Data Librarian, Martin Prosperity Institute, University of Toronto
Steve Marks, Digital Preservation Policy Librarian, University of Toronto Libraries
Setting the Stage
• Steve works for OCUL, a consortium of Ontario’s
university libraries, which is housed at the
University of Toronto.
• Kim works for MPI, a think tank at the University
• We both have a lot of data to manage, but (until
recently) had no way to do so.
• This is the story of how we met, and how we’re
ﬁguring out how to manage our data.
The University of Toronto is
Canada’s largest university, with almost
80,000 undergraduate students and over
15,000 graduate students
30 physical libraries make up the UT
Library system, plus various specialized
With over 18 million holdings, UTL is
the third-largest system in North
The MPI is a think tank
within the Rotman School of
Management at the
University of Toronto.
We study the role of
location, place and city
regions in global economic
OCUL is a consortium
of Ontario’s 21
Goal: to support and
enhance research and to
create rich learning
Kim Met Steve
• “Since I began working at • “We were starting to hear
concerns from OCUL schools
the MPI in 2008, I have that they were starting to ﬁeld
been building systems and questions from faculty and
introducing tools to administration about what they
manage our information, were doing to support research
and our research process; data.
• A lot of these schools don't
• BUT, I had a growing have the IT resources to rapidly
collection of over 4 TB of launch a program like that, so
data to deal with; lots of we saw it as an opportunity to
data, and no way to help them out.
manage or search it; • It also gave us a controlled way
to start exploring some of the
• How do I catalogue ideas around research data
Lots of research data, and no way
to manage it.
Demands from various audiences
(researchers, faculty, students,
Kim came across Dataverse,
which looked like a very
promising solution, and a much
better alternative to this
unwieldy, awkward, network drive
• Used by leading data repositories, including
ICPSR at UMichigan, UCLA’s Social Science
Data Archive, and the National Bureau for
• Open source, so free; not that easy to install,
but the documentation is now much better
and Steve got lots of support from IQSS staff
• Take a look at http://thedata.org
This is how it happened:
Hey Steve, I’ve been
looking at Dataverse, the the guys
told me you’re playing with it.
Yeah, I am. I’m in the
middle of installing it.
trying to ﬁnd a way to
catalogue my data. Can I give
Dataverse a try?
Sure! Would you mind being a
test case for a presentation I’m
making to OCUL?
Would love to! Let me
know when it’s ready, and I’ll add
And now, a demo.
• MPI Dataverse:
What I Like
• Very powerful, lots of metadata options available
• Fairly easy to use
• Lots of other Dataverses to look at as models
• Can control access, and create multiple access/permission
• Usage statistics on downloads, web site trafﬁc via Google
• Great marketing tool for our data collection - we can show
the world what we have (but not necessarily let the world
• It takes a LOT of planning and time
• Planning - because you want your records to
• Time, because it just takes time to create the
records - there’s no WorldCat for data!
• Every data collection is unique; can look at
other models to inform your own design.
• We’re still at the beginning, in pilot mode.
Thanks for Listening
Kim Silk, Data Librarian, Martin Prosperity Institute