Research data services have become a common fixture in academic libraries, yet many libraries still struggle to develop an appropriate and in-demand mix of services to support their research community. While an elite few offer seemingly endless curatorial assistance, the majority of libraries are building basic to mid-level services such as DMP support, workshops, and consultations. This case study provides a detailed look at the University of Utah Marriott Library’s data services, the rationale behind our current service model, the results of our campus data needs assessment, and how we plan to grow our technical infrastructure into the future. In addition to an overview of our data service mix, we will look closely at one current initiative, the Entertainment, Arts, and Engineering (EAE) Thesis Preservation Project, which highlights curation challenges such as irregular and proprietary file formats, copyright restrictions, long-term preservation, and a lack of appropriate metadata standards. This presentation will highlight the Marriott Library’s data curation accomplishments to date alongside an honest assessment of ongoing challenges.
Level Up! Building data services at the Marriott Library
1. LEVEL UP!: BUILDING DATA
SERVICES AT THE J.
WILLARD MARRIOTT
LIBRARY
REBEKAH CUMMINGS, RESEARCH DATA MANAGEMENT LIBRARIAN
MARRIOTT LIBRARY, UNIVERSITY OF UTAH
NISO DATA CURATION VIRTUAL CONFERENCE
AUGUST 31, 2016
2. OVERVIEW
“Sum of Two Parts” Stallman Studio. Image from http://stallmanstudio.com/project/sum-
of-two-parts/
• Data services at the University of Utah Marriott Library
• Entertainment, Arts, and Engineering (EAE) Thesis
Preservation Project
7. OUTREACH AND EDUCATION
• Library workshops
• Research Administration Training Series
• Undergraduate Research Workshop Series
• Graduate student workshop series
• Research and Learning Services Forum
• Spring Research Series
• All Staff – “How to organize digital files”
• Zotero workshops
• ICPSR workshops
• Monthly Data Journal Club
• 1:1 Consultations
8. RELATIONSHIPS AND DMP SUPPORT
• The Kem C. Gardner Policy Institute
• The College of Architecture and
Planning
• The College of Humanities
• The Alzheimer’s Institute
• Entertainment, Arts, and Engineering
• Office of Undergraduate Research
• Grant development services
• VP for Research
• Center for High Performance
Computing
9. ASSESSMENT AND PILOTING REPOSITORY
SERVICES
• Campus Data Services Group
• Summer/Fall 2016 - Assessment
• University of Utah Campus Data Services Survey
• Focus Groups - September 19th and October 3rd
• Winter 2016 – Data portal
• Spring 2017
• Call for participation
• Pilot new infrastructure and workflows
• Modeled after 2013 University of Minnesota Data Curation Pilot
13. MOTIVATIONS OF THE EAE DEPARTMENT
• “Erie” - Flagship
game of EAE
• Shown by vlogger
PewDiePie
• Almost 7 million
views on YouTube
• Built for Windows 7,
needs to migrate to
Windows 10
• “Bad archiving is
starting to be
injurious to us.” – AJ
18. CHALLENGES TO PRESERVING GAMES
• Unavailability of original game
• Complex versioning issues surrounding software and
operating systems
• Storage media obsolescence and fragility
• Degradation of game play due to the use of emulation or
migration
• Lack of documentation of file formats
• Intellectual property issues
From “Preserving Virtual Worlds Final Report”
19. METADATA
• “Language” – C Sharp, C++, Action
script
• Platform – medium the game was
played on, console game, web
browser
• Engine – FLASH, Java, etc.
• Words that are instantly informative
to the gaming community – “console
2D platform” means something like a
Mario game. First person shooter.
Adventure game vs. RPG (role playing
game)
21. COPYRIGHT
•Highly creative and commercial work
•Multiple authors
•Multiple intellectual “objects”
•Some, however, are highly motivated to share
work
•Education component
25. THANK YOU!
Rebekah Cummings
Research Data Management Libra
Rebekah.cummings@utah.edu
801-581-7701
@RebekahCummings
QUESTIONS?
Anne Morrow, Head DSS
Tawnya Keller, Dig Pres
Ambra Gagliardi, Grants
A.J. Dimick, Game
Studio/ Alumni
Relations, EAE
Daureen Nesdill, Data
Lib
Harish Maringanti, AD
Editor's Notes
Hi everyone. My name is Rebekah Cummings, and I’m the Research Data Management Librarian at the University of Utah.
This presentation has two main parts to it.
The first half, I’m going to provide a case study of data service at the Marriott Library, specifically our approach to build data services from basic to higher level data services.
In the second part of this presentation, I’m going to focus in on one specific project that we’re working on, the EAE Thesis Preservation Project, that to me really highlights the challenges that we’re facing at the intersection of data management, digital preservation, metadata, and scholarly communications. This project is still in its early stages of this project, but hopefully our work puzzling through some of the difficult questions around the EAE Project will be useful to this audience.
To give you a little context, this is the library where I work, the J. Willard Marriott Library at the University of Utah. We’re the main library on campus and serve a population of about 35,000 students and faculty.
I was hired as a data librarian here in April 2015 so almost a year and a half ago. My focus is on data management for the social science and humanities, and I work very closely with Daureen Nesdill, the data librarian for the sciences. The two of us, and our Associate Dean for IT Services, Harish Maringanti, comprise the Marriott Library data team.
When I was hired as the data librarian, there was one line in my job description that seemed to encapsulate the major function of my job, and that was to “Explore and pilot baseline data services” at the U. Of course this raises the question: What are baseline data services? What services should a large, academic research library provide? Sometimes it’s daunting to compare our services to universities who are on the absolute cutting edge like Johns Hopkins or the Unievrsity of Michigan.
I’ve found this image from the Journal of eScience Librarianship to be extremely useful in conceptualizing our data service model and help me visualize where we’re at in regards to data service.
The first thing I did in my new role was get the lay of the land and to become familiar with the constellation of data services currently available on campus.
Because my role focused on data services for the social sciences and humanities, the first thing I did was reach out to all the subject liaisons in the library and set up 30 minute meetings with each of them.
Conduct a mini-interview on the data practices – five questions
Find out how the liaisons wanted me to interact with their departments. Would they prefer I reach out to their departments directly or work through them?
Solicit their help in promoting data services. Letting them know that I was available for DMP reviews, consultations, and workshops.
Have you ever had a faculty member talk to you about their data?
Have you heard them talk about data services that might be useful for them?
What types of data do your researchers collect?
Do you know if there is a repository for data that is used in their field?
How would you like to see my new position support the work that you do?
My second order of business as a new data librarian was enhance our web presence for data services. While my colleague Daureen had created incredibly informative subject guides on data management, electronic lab notebooks, repositories, and agency responses to the OSTP memo, there wasn’t anything on the website that promoted or informed researchers about our data services without going to the subject guides.
I did an environmental scan of different webpages that I liked at other universities and ultimately was inspired by the simple and clean design from the University of Wisconsin-Madison website for research data services. Thanks to Brianna Marshall for her great work there and for allowing us to mimic some of the elements of her design. Now our web presence isn’t just better on the website but also on Google when people search for data services at the University of Utah.
One thing that became apparent to me early in that there was a need and a desire for data management training across campus. A big part of my job has been providing outreach and education through a variety of workshop series in the library and across campus. Anytime I see a workshop series for research, I reach out and ask if they would be interested in adding a course on data management.
Daureen and I now teach four sessions a year for the Research Administration Training Series, the group that gives “Responsible Research” certification on campus. I also teach every two months in the Undergraduate Research Workshop Series, which has been particularly satisfying since our undergraduates haven’t yet formed poor data management practices.
Another notable experience for us has been through a monthly data journal club that Daureen and I organize. Through the school year, we meet once a month in the library and invite a faculty member from campus to come and speak about their data. We ask the participants to prepare by reading an article or two and use the time to discuss data and data practices. This has been a great way to keep the rest of our librarians involved and informed about data and to engage our faculty and subject liaisons.
Relationships and DMP support.
I originally thought I would talk more about our survey…
This Spring, we surveyed over 200 researchers across campus. 60% of them had never written a DMP.
Of the 40% of them that had written a DMP, 60% of them used no resources to help them write a DMP.
23% used a DMP template online
10% used a librarian consultation
9% used a grant write
5% used the DMP Tool.
This means that out of the 203 people that took our survey, only 4 of them had ever consulted with a librarian to help write or edit their DMP. There is still so much room for growth. One of my goals moving forward is to work more closely with grant development services to make sure data management is in the workflow for researches submitting grants on campus.
At this point, we still hadn’t made the leap to really providing full-ledged data repository services to researchers.
For a while we really considered whether or not we did. With so many subject based and general data repositories opening up, did we really need to have an institutional data repository as well? Or is it better if we just continue to provide consultation services to help people find the right place to put their data? Could we just rely on things llike Dryad, OpenICPSR, Dataverse, FigShare, and various subject based repositories.
We started a campus data group that has been meeting since March to think about this question. We embarked on assessment with a survey that targeted faculty and graduate students and we have two follow-up focus groups scheduled in September and October.
Data portal – looking at great examples such as the ones at Purdue, University of Minnesota, Virginia Tech
Minnesota – Thank you to Lisa Johnston for doing that great work there. http://conservancy.umn.edu/handle/11299/162338
As a recap before we dive into our case study of the EAE, this is where we stand today, almost a year and a half after I started my new position and was charged with exploring and piloting baseline data services.
Data Curation –I don’t think we or most libraries are at that point yet, but hopefully we are headed in that direction.
Switching gears.
Now that we’ve gone over a general overview of data services at the University of Utah and our rationale for building those services, I’m going to look closely at one of the projects that we’ve been working on that highlights the complexities of data management in a library setting.
Early this year, I was approached by Anne Morrow, the Head of Digital Scholarship Services at the Library, to figure out how to archive the theses that come out of the EAE department.
Game development program established in 2007, interdisciplinary program between the college of engineering and the college of Fine Arts
In 2015, our masters and bachelors programs in this department were named the #1 and #2 game development programs in the country by Princeton Review.
As you might imagine, a thesis in the EAE department doesn’t look like a thesis in other departments. An EAE thesis is a game, which is included as part of wrap kit. These wrap kits are relatively large and come in a wide variety of file formats, many of which are proprietary. Unlike traditional theses, which are almost always sole authored, wrap kits are compiled by teams of EAE students and again unlike most of the theses generated on campus, some of these wrap kits have a high commercial value.
Up to this point, no one has been preserving the EAE theses, not in the department or through the Graduate School or through the Libray. Whereas most programs turn the theses and dissertations over to the Graduate School, EAE only submits a document for each student saying that they have done the required work. The wrap kits themselves are not submitted.
From the library’s point of view, the motivation behind archiving the EAE theses is relatively straightforward. There is a university mandate to keep all theses and dissertations produced on campus as part of the academic record and a record of achievement for the degree. It’s a requirement of certification like keeping grades or transcripts. The library keeps a physical and digital copy of all the theses and dissertations to fulfill this mandate, which is the primary role of our institutional repository.
The timing of this was also serendipitous as we are also thinking a lot about preserving born digital materials. These theses are a perfect use case for us to develop workflows and to come up with a strategy for preserving new and unusual forms of digital scholarship.
EAE has been equally motivated to come up with an archival strategy for these theses. As EAE has matured as a program they want to start aligning with typical scholarly practices in order to add legitimacy to the program. But more importantly, they are just tired of not being able to find things.
Our contact in the EAE department, AJ Dimick, is responsible for fielding requests for wrap kits and has actually been on a mission to find old wrap kits. The first cohorts just submitted their wrap kits on CDs and those CDs were subject to whatever the professor chose to do with them. In the hunt for wrap kits that he thought one professor might have AJ actually went into his office and started hunting for disks. He found some unlabeled ones that were indeed wrap kits. They also have a shared server where people commit code, and AJ has been able to find components of wrap kits but not entire wrap kits. They have kept most of the executable version of most of the games, but not all the associated files necessary to keep migrating the games forward.
For AJ, the tipping point was when they were unable to find the source code for what he considers to be the flagship game of the EAE program. One of the students from their first cohort named created a game named “Erie” that was shared online by a famous YouTube vlogger named PewDiePie. One video of Erie received almost 4.3 million views of YouTube and a second video of the game was viewed an additional 1.6 million times.
Just to put that in perspective, can you imagine one of our other theses being viewed almost 7 million times? This was a very important game for the EAE program.
Problem was the game was built in 2008 for Windows 7 and now in order to migrate the game to Windows 10, the creator needs to source code. But he can’t find it and neither can EAE. If they had the source code, they could migrate it in a day. As it is, the flagship game of the EAE program will likely be unplayable as the operating system that it was built for becomes obsolete.
It just so happened that when Anne formed this working group, the newest cohort of EAE students was about to graduate and they were hosting an EAE open house where anyone could stop in, see, and play all the work that had been produced that year. Anne and I went to the open house to get a sense of the type of material that we would be archiving.
You can see in these images students crowded around one of the games and in another, an EAE staff member testing out a virtual reality game. And there’s Anne and I very excited next to the EAE poster.
Police Training Simulator – A simulation game that trains police officers in de-escalation techniques. This is an example of what AJ would call a “serious” game as opposed to a game for entertainment. Gaming of course is growing far beyond entertainment as it’s now being used for education, training, or even pain management in hospitals.
Art – “assets” of the game – 3D models, textures, characters, sounds, all of the final versions but also often the earlier versions of these art assets. Could be considered the “skin”.
One sheet w/ attribution.
A wrap kit can also include earlier versions of the game but not always.
This high level explanation of what was in a wrap kit made sense to us, but none of us knew what that would look like under the hood.
EAE doesn’t put any restrictions on what type of files their students can use because they don’t want to inhibit creativity.
Good news!
Storage was not as bad as we thought. 40-50 theses a year that are on average 5-10 MB, the cost is about $70 - $100/year for storage, a cost which isn’t prohibitive to the library.
Doesn’t include dependencies like the environment in which the game is played like an Xbox or an operating system.
Luckily, there are some best practices in place for digital preservation of video games. We’ve been relying on this amazing report “Preserving Virtual World” and the knowledge of our digital preservation team to figure out how and where we are going to store these files.
The challenges of preserving video games is similar in a lot of ways to our common challenges preserving academic research data. Here are a few pulled from the Preserving Virtual Worlds report. You can see some of the challenges that we’ve already talked about.
Unavailability of original game or original source code (Erie). We can handle that moving forward but we’ve already accepted that some of the wrap kits from the first 8 cohorts are gone forever. Like legacy data in the research data management.
Complex environments that require an interconnected system of hardware and software, and operating systems.
Similar challenges that we see in “normal” research data management.
Engine – what tool did they use to create the game?
Words that are instantaneously informative – console 2D platformer means something like a Mario game. 2D IOS game platformer – those word paint a picture of what the game is. Might be meaningless to nongamers but it tells what boxes the projects falls into. Adventure game vs . RPG (role playing game) – used to mean the same thing but now they have different meanings.
It was obvious to us that these games would require a combination of library metadata with student supplied metadata. They way they describe their game is going to be very useful to us.
But of course, we want to think about secondary users as well.
There are some controlled vocabularies emerging around video games which we are hoping to rely on as well. The Game Metadata and Citation Project, which is funded by IMLS is developing controlled vocabularies for game platforms and game media. It would be great to see controlled subject terms and ontologies developed for gaming as well.
Just like we’ve struggled with the variety of formats in academic research data, video games contain many, many different file format many of which are proprietary,
Very different than the pdf and TIFF world our digital libraries are used to.
Current solution is that we are going to have to upload to Rosetta in zipped files because Rosetta does not accept all the file formats in the wrap kit.
From Tawnya:
There are a lot of formats that are not available in Rosetta so some further discussion with AJ is needed to understand some of the extensions and then some further conversation with Ex Libris needed for recommending how to move forward. Likely at this point, it would mean preserving a zip file and not ingesting all formats on their own. Not as ideal but would likely be fine for a few years with a plan in place to expand the file format list of acceptable formats in future.
Copyright might be a bigger hurdle.
Students always retain copyright over their work, but the EAE theses have some copyright considerations we haven’t encountered.
Education component – data management in the spring in their second semester at the beginning of their projects, and a copyright educational component when they get closer to completing their projects.
Thinking through a lot of these issues and limitations has informed the technical infrastructure we are going to use to get the theses, archive them, describe them, and make them available at least in a limited way.
The students or the department will submit the wrap kits to the library through Ubox, campus storage system. The student will also need to fill out an online form possibly through Survey Monkey, of metadata fields, descriptions, and keywords for their games.
We decided early on that CONTENTdm is not a good solution for these wrap kits. The library is actually moving away from CONTENTdm, but in the meantime, our preservation system Rosetta is a viable place where these files can live safely. Just by putting the files in Rosetta, we’re fulfilling the University mandate to preserve the theses. But Rosetta isn’t set up for access and discovery. But we can set up a catalog record in Alma, our library catalog system
We’ve decided to use Wordpress as a front end engine for discovery and access because Wordpress can talk to Alma, which can link to Rosetta. It may not be the most elegant solution in the world, but it ensures that the games will be preserved and available. Access and copyright permissions can be handled in Alma. Metadata can be robust in Alma. And Wordpress provides a much better user interface for discovery and access.
- Trailers in the library YouTube Channel and embed those the videos in Wordpress.
JPGs of screenshots in Wordpress, PPTs, marketing materials.
Executable and source code will be in Rosetta and a catalog record will be available in Alma.
We recently reserved the URL for the EAE Wrap Kit Archive, which will be discoverable at eae.lib.utah.edu.
Here the wrap kits will be searchable by cohort, by author, by title, by acclaim (whether or not something is an award winner), or by keyword.
- Trailers in the library YouTube Channel and embed those the videos in Wordpress.
JPGs of screenshots in Wordpress, PPTs, marketing materials.
Executable and source code will be in Rosetta and a catalog record will be available in Alma.
From Anne:
>We will use the library's YouTube channel to upload trailers and any
>other accompanying video. We'll close access to those so they don't show
>up in YouTube searches (unless the students specify a desire for that)
>and embed the videos in Wordpress.
>
>We'll include JPEGs of screenshots in Wordpress. We'll make any
>PowerPoint, marketing, and game play directions documents available (I.e.
>Store) in Wordpress.
>
>We are not likely to see a scenario where we have the executable hosted
>there, whether or not that is what the student wants. Access to the
>executable, and the entire set of files, would occur via the Alma record
>as a parent record, linking to the Rosetta record as a child.
>
>Wordpress itself would be regarded as another child record of the Alma
>record. Wordpress will use metadata contributed by students and assigned
>by original catalogers (working in Alma) to populate a Wordpress record
>with categories and tags in addition to descriptive metadata.
Our #1 priority is figuring out how to archive and when appropriate share the graduate student degree bearing work from our institution.
Use this as a prototype for a consortial data repository for game design work. This feels pretty far into the future, but if we see a need among other universities that have gaming programs this might be the subject of a future grant. As a data librarian, I think that the best place for data is in a subject based repository and as the home of the best game design program, maybe it will be up to us to start something like that if we see a need.
Thank you so much for your time. I want to thank a few more people who have been working with me on many of the projects I’ve talked about today…
Please contact me if you have any questions.