There is more to RDM services than the technical skills necessary for data management. Soft skills and non-technical skills are very important when setting up RDM services, and continue to be important to the sustainability of services. Reference skills, relationship building, negotiation, listening, facilitating access to de-centralized resources, policy knowledge and assessment, are all important to the success of a service. Margaret Henderson will discuss these skills and show you how to start RDM services, even if you don’t feel confident about your statistical skills or knowledge of R.
1. SEA lecture script
Title: Hello everyone. First, I must congratulate Tony on his new position as SEA director. I’ve
known him since before he moved to Maryland, and he will do a wonderful job for the region.
Thank you Tony for the invitation to speak today, and thank you everyone listening. I hope today
I can show those of you who are uncertain about your analysis and coding skills that you can
still provide research data management support at your institution. While my observations are
from academic libraries, they should be applicable to other situations. I’ve included links in my
slides where appropriate, but I’ve also included a bibliography at the end. Tony will make the
slides available to you in a week or so, so you’ll be able to get to the things I mention.
Bourg blog image: I want to start by clarifying something from my past writings about libraries. I
have used the word neutral to describe libraries, but after reading Chris Bourg’s remarks from
the 2018 ALA midwinter President’s program, I realized that what I was thinking it means isn’t
what it means to others.
PBmeme: I thought of neutral as non-judgemental. For example, a couple of days after a
meeting discussing reproducible science and teaching R and Open Science Framework, one of
the faculty on the committee called me up and asked me all sorts of questions about what R
was and why it should be used and what open source was. As the person not affiliated with a
particular school, I was a safe person to ask these questions. He knew I wouldn’t judge him for
not knowing R. That is the way I want all the faculty, staff, and students I work with to feel. But
neutral isn’t the best way to convey that.
Neutral Paint: I wasn’t sure what to replace neutral with. Some ALA documents suggested post-
neutral, but that really doesn’t help explain what I was trying to convey. Recently I heard Patti
Brennan, the Director of the National Library of Medicine speak to the first Biomedical and
Health Research Data Management for Librarians cohort at the capstone Summit at the NIH (hi
to those of you listening). She used the word Trusted when she spoke of librarians, which I think
is perfect.
Cat meme: I think we need to remember this trust when we start research data management
services. We are trusted to find the best information and provide a communal resource. We are
trusted to find accurate information, facts not fake news. When we help with research data
management, we are being trusted to help with things researchers and students are worried
about, so we need to be sure we are giving the best information on policies, best practices, data
collection protocols, analysis or visualization options, and preservation.
LIBER quote: Although this Association of European Research Libraries report is slightly old,
given the growth of RDM services in libraries, the ten recommendations are still valid, and the
advice to not try to do all things at once is still very important. I’m sure you have listened to other
talks about RDM or taken courses to learn skills, and you have finished with lots of enthusiasm
2. to start doing things at your place of work. But then nothing, or you talk about doing something,
but never quite get the plans moving. You don’t have to roll out a fully functioning data service
all at one. Offer a class, start networking, look for infrastructure. Small steps. Hopefully, I’ll be
able to give you a few other options for starting in this talk.
Teamwork photo: At this point, I also want to encourage you all to be a part of a team providing
RDM services. One person cannot do it all - not the skills, not the networking, not the meetings,
not the consultations. Form a team. Maybe a group of subject liaisons who are getting more
data questions. Maybe a dedicated data librarian with help from a couple of subject liaisons and
a digital collections person. Maybe a couple of librarians and somebody from institutional IT or
the research office. Whatever the configuration, a group with varying skills is valuable when
trying to start RDM services
List of skills: In my initial discussions about this talk with Tony, I thought of a few skills to cover,
but then when I started reviewing the work I have done involving RDM, I realized there was
quite a bit more, and I know this isn’t an exhaustive list, since I haven’t even touched on the
skills needed for metadata, curation, and preservation. Hopefully, as I go over these skills, you’ll
find a couple that you feel comfortable with and you’ll use those to move forward.
1.Reference. Batgirl: I have always advocated that reference skills are our superpower. Finding
out what is really needed is a big part of what we do. For example, when I was working on a
practicum with people in the bioinformatics core, I had a chance to learn about searching the
electronic health records. There was no interface for clinicians to get out data, so they
requested it through an intermediary, one of the core programmers who worked on the
database. They would make a request, and he would do a search and send them a dataset,
then they would send it back because it didn’t have quite what they needed, and then he would
try again with the new information, and they would send it back again, with more specifics about
what they wanted and why the dataset wasn’t what they needed, and so on. I hope you can see
how a good reference interview could have helped in this situation. In fact, when they were
researching a new data warehouse for the EHR data, the request for proposal was going to
include a data concierge to help people find the correct data!
2. Relationships: P&P relationship map: I think our reference skills naturally lead into the ability
to develop good relationships with different groups on our campus. We ask the right questions
to learn more about what they are doing and what they need to do their work. We work to
support what they are doing, in research or teaching. Building relationships was such an integral
part of what I did as Director for RDM that I have realized that creating relationship maps is an
excellent way to show progress when assessing RDM services. These maps can quantitatively
express the strength of a relationship with different width lines or colours to show types of
contact, meetings, emails, consultations, and how many of each. This map, based on one of my
favourite books, shows types of relationships in colour, and arrows indicate whether it is a one
or two-way relationship.
3. This map shows the imaginary first year of a library data service, with contact out to sponsored
programs and the provost, emails to one or two people, but no further interactions. But
institutional IT , the research office, and a researcher have darker lines with two arrows showing
interactions with emails, and meetings and multiple people. With proper captioning, and maybe
animation using a relationship mapping program like NodeXL, you could show changes in
relationships over time.
3. Negotiation: It is inevitable, when working with stakeholders from many areas and disciplines,
that people will not see eye to eye on what is needed for RDM. Of course question negotiation
skills help(there’s that reference superpower again), but collaborating with librarians who have
experience negotiating with vendors, be they resource or service vendors, can be helpful.
Learning to find the pain points as well as the major goals of the various groups you have to
deal with will help the negotiations. The personas used by many in UX design can be a big help
when trying to empathise with the people you are working with.
4. Frasier I’m Listening: Listening can be one of the hardest things to do, especially when you
already have some RDM services in place. You might be interviewing a researcher or just
chatting with a faculty member you meet in the hall. They might complain about what they can’t
get from the university or library, and how they need help with a DMP. And you want to burst in
that DMPTool is available and IT has storage and the university has a policy, but that won’t
help. You need to let them finish, show empathy and then try paraphrasing what they said so
they know you understand. Then you can suggest the solutions you have and offer your help. If
you act defensive and cut them off, you’ll never learn all they need. And remember that people
don’t really notice what the library offers until they actually need to do something, so keep up
the marketing efforts.
5. Data literacy: Many aspects of information literacy can be useful when dealing with data.
Where did the data come from, who collected it, when was it done, is the raw data available?
Basics of statistics, like understanding percentages and standard deviation, can be covered
without knowing all about analysis. Those who have taught Evidence based medicine or nursing
will have some experience with deciphering what the results or a trail really mean when looking
at patient outcomes, and these skills can be applied to teaching data literacy. With these tools,
students can learn to critically review the numbers in a newspaper article about the benefits of
coffee or the adverse effects of smoking.
In a recent article, Meryl Brodsky used the ACRL Framework as a basis for assessing the
syllabi and assignments from business school classes to reveal stated and implied data literacy
competencies, and learn where data literacy instruction might be needed. This could be done
with any faculty member or discipline you are working with.
6. Facilitation: Over my 30 plus years in libraries, one of the things I’ve regularly helped with at
the reference desk, is helping people find something they need that isn’t a library resource.
Helping students find tutoring services, or faculty find research equipment, or a visitor to
campus find the proper building, are all part of facilitating access to services. In all of my jobs,
4. one of my first tasks has been to find out what the other departments in the library can help with,
and then what departments around the institution can be of help to me or those I’m helping at
the reference desk or in consultations. Some libraries have developed guides or databases of
services around their institution that provide research support. I also form relationships (see skill
2) with the people in the different areas, so I can refer people easily. Once, I worked with an
Emergency Room research group that wanted help formatting multiple Excel spreadsheets.
They didn’t think RedCap would work, but I was pretty sure it would, so I put them in touch with
RedCap administrators, and they were able to set things up in a much easier format than the
original Excel. Even though I didn’t do much, I was pleased to be able to send people to the
best services.
This is a part of the Research Navigator I developed using LibGuides. I included different
subject pages, and within each subject, tabs to further divide resources to make it easier for
people to find what was needed. Outside resources that fit specified criteria, which are listed on
the about page, were also included to give researchers a larger range of tools to work with. As
you can see on this page, all institutional resources had an icon attached to make it easier to
identify local resources, even when they were on a subject page..
7. Reproducibility: Reproducibility is a big issue right now in many sciences and social sciences.
Data librarians should understand the problems and the basics of reproducibility and be
prepared to help researchers find information on how to design reproducible studies and where
to register trials and experiments if that is part of their field. Understanding disciplinary
standards and the basics of the process is helpful. There will be some researchers who are
using analysis programs and need to be aware of how to document their methodology, so data
librarians who don’t work in this area need to be able to facilitate access to services that can
help (see previous skill).
At the University of Minnesota, librarians hosted a reproducibility event and then partnered with
research support groups on campus to develop this portal. It includes guidelines but also the
various services around campus that help with reproducibility. At the end of the presentation,
I’ve included a bibliography that gives links to a couple of articles by Franklin Sayre and Amy
Riegelman at the University of Minnesota about the reproducibility work they have done.
8. Policy and compliance: Nobody likes dealing with policies and regulations, be they federal or
state, institutional, publisher, or funder. Having a librarian who can review their plans and advise
on the policies they need to follow, and make sure they are being followed, can be a great relief
for many researchers. Librarians can also advise grants offices or administration as well, and an
understanding of outside policies and best practices can be helpful when working on institutional
policies. It is important to make sure institutional policies don’t hamper any sharing or access
mandated by journals or funders.
Here are some of the current NIH policies for public access to articles and data sharing.
Remembering that a new NIH data sharing policy is expected soon. Being up on these policies
is important if you are helping with data management plans.
5. And as a reminder, even with there is uncertainty about some policies, like the OSTP memo,
good RDM is needed because it helps research, not because it is required by policy.
Equally important, is being aware of the threat of PubPeer and journal policies. In fact, I have
found that the threat of retraction is a better motivator than funder compliance. I recently had a
twitter conversation with Kristen Briney, Margaret Janz, and Thomas Padilla about using data
horror stories. Ideally, we want researchers and students to manage their data because it helps
them and makes their work easier, but getting them to break out of bad habits is hard. The
realization that others have had their papers retracted, and worse, can sometimes be the
wakeup call that is needed. Papers can be investigated because of questions on PubPeer like
this one, which was 10 years old when the retraction was issued because there was no original
data (and saying there were no rules was no excuse)
Or, in the case of this PACE trial, there were comments in Lancet and PubMed Commons.
The data was requested as per the journal policy to verify the claims in the article, and the
authors were eventually ordered by the court to release the data. No researcher or institution
wants this kind of notoriety, so helping researchers secure all the data related to an article, and
preserving it, is an important step in data management.
9.Scholarly Communication: Helping researchers see data as a research product, like an article,
that should be cited and shared is important for not only future research, but also for pushing for
changes in promotion and tenure policies. Open access, open science, and open data are part
of this conversation as well.
FORCE 11 has been working to make sure there are standardized ways to make sure data is
accessible. Data citation principles make sure data can be found, but also credited to a
researcher the way an article is. This type of acknowledgement can the be used in promotion
dossiers and grant applications.
And the FAIR Principles give guidelines for making sure data is available for others to use and
build upon. The Interoperable part of FAIR is actually very important. A study of 704 NSF PIs
found that the top unmet data analysis need was training on the integration of multiple data
types - which means metadata, something librarians can help with. Being able to pull together
data from various sources to discover new insights is an important part of open data.
One good way to help researchers understand the value of open science is to show them
articles by other researchers, or suggestions from major science journals, rather than librarians.
This short article from Nature this year promotes data management plans and data sharing, and
explains how it can help improve a researcher’s work. Of course it is a librarian’s nightmare
because the online and print versions have different titles!
6. 10. Assessment: Some sort of assessment is necessary to know if data services are actually
helping people, just like any library service. While there are some basic consultation and
instruction assessment instruments available from liaison and teaching librarians, some other
parts of data librarianship are harder to assess. Researchers have many needs around data,
and making sure they are all met can be a challenge. Librarians who understand how to use
different assessment methods will be able to help justify data services and eventually expand
them. I have an article coming out soon in Journal of Librarianship and Scholarly
Communication with Heather Coates, Jake Carlson, Ryan Clement, Lisa Johnston, and
Yasmeen Shorish, entitled, How are we Measuring Up? Evaluating Research Data Services.
This article is based on a panel we all presented at, moderated by Heather Coates, at RDAP
2016. The article covers the way 5 different institutions, of varying types and different stages of
data services, have worked on assessment. Assessment for a new service needs to be more
than just keeping track of numbers.
I realize this is more self promotion, but in a talk I gave at the Center for Evidence Based Library
and Information Practice at the University of Saskatchewan in Saskatoon, I advocated for using
assessment as research, to help solidify the evidence base of our profession. As an aside, if
you don’t read this journal, you should, and there were many other excellent presentations at
the symposium.
So when it comes time to do surveys to find out what the researchers at your institution are
doing, read the existing literature and use the same questions so you can compare between
institutions. For example, I compared the results on data formats faculty are collecting to those
at Northwestern, in a 2015 report of a survey by Cunera Buys and Pamela Shaw. As you can
see text and spreadsheets are the most used data formats.
And responses on how much data is being stored, and where it is being stored, can be
compared with results from Katherine Akers and Jennifer Doty at Emory in 2013 These results
show that few people are working with large amounts data, and some don’t even know who
much data they have.
Abigail Goben and Tina Griffin presented their review of the current survey literature at
IASISST, and there should be a paper soon, and their findings show that most researchers are
worried about storage, sharing, and issues that revolve around long term access. They also
point out some gaps in the literature that you might be able to fill. So I hope you will take a look
at what has already been done and build on it when you decide to conduct a data survey at your
institution. It is helpful to understand your institutional needs, but please try to contribute to the
evidence base.
And be sure to look into other methods when you do assessment. An article by Ullah and
Ameeen showed that surveys were heavily used in LIS research. 33% of the papers reviewed
used surveys, all the rest of the research methods in Table 6 had less than 10% of the papers
using them, and the methods in Table 7 were only used once or twice. Take a look at the article
7. and consider some other form of research or assessment - for example a syllabi and
assignment review like Meryl Brodsky.
11. Sustainability: Replacing somebody who leaves, no matter what the job, is always a
problem. Many institutions want to hire less experienced people to save money. And even when
there are people, research and data management are advancing and changing. For data
services to be sustainable, librarians need to be life-long learners and library management
needs to invest in training. When I received my MLIS back in 1986, there was no WWW. Some
of the first programs I used for searching no longer exist, like BRS and Gopher and
GratefulMed. And I’ve had to learn new ones in their place.
Patti Brennan’s most recent blog post on June 12 was about Training for Lifelong Learning. She
was writing about the biomedical informatics training funded by NLM, but what she says applies
to librarians as well “ in such a rapidly changing field, never-ending curiosity and
unrelenting inquiry are absolutely essential. Trainees and fellows must be prepared for an
ever-changing world and embrace the idea that their current training programs are launch
pads, not tool belts. Content mastery will get them only so far.”
12. Collaboration: I mentioned earlier that librarians need to work together as a team to develop
data services, but they need to collaborate with people outside of the library as well. The
number one thing researchers need in surveys is data storage and most libraries can’t provide
that. So we must collaborate with institutional IT. When I was running an RDM service, I did
collaborate with university and school IT services to make sure everyone had the collaboration
space and back up and long term storage they needed, but I also collaborated with the Statistics
and Analysis service of the Department of Statistical Science and Operations Research, to
provide R training. I also collaborated with Sponsored Programs on grant writing workshops,
that included a session I taught on data management plans. These are the sorts of
collaborations you should be considering.
If you don’t have departments to help, maybe consider having post-docs lecture about the topics
you need covered. Successful programs at the University of Pittsburgh and my previous place
of work, Virginia Commonwealth University, had not only provided information about new tools,
but they have also provided post-docs with teaching experience.
13, Diversity, equality, inclusion heart: Diversity, equality and inclusion are essential to all library
services, and all institutions. But because of the nature of data, the fact that it involves statistics
and analysis, and can be manipulated to show a particular point, it becomes especially
important that we learn to critically view data, and help people understand what is being
represented by the data at hand. Data Librarians need to use all the skills I’ve been discussing
to ensure that people have access to data and the means to understand what it is saying.
I don’t know how many of you were at MLA or have access to the plenary talks online, but I
listened to Elaine Martin’s Janet Doe Lecture and I really felt she touched on things we need to
consider as data librarians in the health sciences. Her lecture was titled Social Justice and the
8. Medical Librarian, and she pointed out that diversity was important for achieving social justice. I
was taken with this quote “ Social justice librarianship involves the development of a personal
and professional approach in which the practice of medical librarianship puts the library’s users
health information interests and needs front and center. Similar to US medicine, we have been
slow to realize our social responsibility to those we serve.” I think this applies equally to data
librarianship, and all the different groups we serve.
14 Ethics. George Joseph tweet and article: Finally, I’d like to touch on an area relates to
diversity, equality and inclusion, and makes my initial clarification about neutrality important. We
can’t be neutral when it comes to data ethics. We can’t assume that every use of data is good
for the community we serve, we can’t stand by and say nothing when a researcher falsifies data.
We need to use the knowledge we have to push for a census that will provide unbiased and
adequate data to plan for schools and hospitals and other social services. And teach people to
question algorithms as well as data sources, as in this example about the LAPD using data and
Palantir software to target “probable offenders”. Read Safiya Noble’s Algorithms of Oppression:
How Search Engines Reinforce Racism, and use her ideas to talk to students about what is
going on with their data and social media and Google searching. We shouldn’t be neutral when
it comes to the privacy of the people we help and the way their data can be misused. You might
be the only one in a faculty senate meeting pointing out that student data should be secure and
private, but somebody has to do it.
And of course, since most of this group is involved with medical libraries in some way, learning
about the ins and outs of patient data is a big help. Understanding how patient data sharing can
help and how to make sure some privacy is maintained, and finding the best information for
advising patients and researchers takes some understanding of data security, but can be
invaluable for those we help.
Sunset/environmental scan So now that I’ve gone over a bevy of skills you can use in RDM
services, I suggest that a good first step for any new services is to conduct an environmental
scan to find out where the resources are at your institution, and where there are gaps. As I have
shown earlier, an environmental scan is helpful to allow you to make accurate referrals to
facilitate access, but it can also lead to a libguide to collect together institutional research
resources or a guide to reproducibility support. And talking to people to find out more about
what services are offered can help with relationship building and lead to collaborations. And
you also need to scan the environment outside of your institution to keep up with data
management developments. Everyone has different ways of managing the growing mass of
information that is added to the internet every day. I have found using twitter to keep us is the
easiest for me. I follow #medlibs and #datalibs hashtags and as I come upon groups that work
with data, or important journals with twitter feeds, I follow those. I use tweetdeck to I have
columns for the most important feeds. It makes it easier to go back in time and not miss things.
Whatever you find works for you, try to keep up with developments the best you can.
Gardens: but remember, every place is different and you will find that your services and
expertise will be different than other libraries, and that’s okay. I have loved this garden metaphor
9. for library data services from Jamene Brooks-Keifer since she presented it at the Midwest Data
Librarians Symposium back in 2014, and I think it is still important to realize that every place will
implement RDM services in a different way. Do what is best for your institution - based on
assessment - another one of the skills. I tried holding hour long workshops at various times, and
eventually discovered that longer bootcamps when there were no classes was a better way to
deliver services to graduate students and faculty. I also did some collaborative bootcamps with
data librarians at other universities in the state, but I noticed after a couple of years, that the
universities that had been running the bootcamps before I started, had fewer and fewer people.
They had lots of people in other sessions, but the basic bootcamps weren’t as important. I’m
sure I would have eventually found the same thing. Developing a service is an iterative process
and you have to be flexible to the needs of your researchers.
Reference desk women: Finally, I’m not advocating that we ignore data science. But, I believe
that if we give up the parts of our profession that differentiate us from data science, we devalue
ourselves. We are bowing to the people who think librarianship isn’t valuable because it is a
female profession, and see computer and data science as male so more important. Our
combination of soft skills AND computing and data skills is what makes us more useful than a
programmer or statistician who can’t ask the right questions to figure out what a researcher
needs. And because we have a holistic view of the whole research process, we can be of more
help than the grants administrator who just worries about policy compliance. We know how to
work with all the groups involved with RDM and bring them together as a service that truly
meets the needs of researchers.
Thank you for joining the webinar today. I am happy to answer any questions now, or in the
future. Please think of this as the start of a conversation on data librarianship.