The Research Data Services Landscape
Beyond Filling Gaps
NISO Webinar Labor and Capacity for Research Data Management
March 11, 2020
Rebecca Springer | rebecca.springer@ithaka.org | @rsspringer1
The Research Support
Services program examines the
practices and support needs of
scholars by discipline or thematic
area using a unique collaborative,
qualitative methodology.
We provide research and strategic guidance
to help the academic and cultural
communities serve the public good and
navigate economic, technological, and
demographic change.
Published and upcoming issue
briefs and blog posts synthesize
research on topics like data
communities and the
organization of research data
services.
1. Characterize the current
landscape of research data
services
2. Point out some areas where we
need to think beyond “filling
gaps”
In this talk
I will…
The Research
Data Services
Landscape
Three Models to Drive Data Sharing
Institution Driven
Researchers across
disciplines are
encouraged to deposit
their datasets in
institutional
repositories.
Collaborative groups
are working to share
curation expertise and
make data sets
discoverable across
institutions.
Compliance Driven
Funders and publishers
require researchers to
deposit datasets when
their articles are
published.
Generalist repositories
are targeting this type
of data sharing.
Community Driven
Researchers form fluid
and informal
communities around
the sharing and reuse
of certain types of data.
Community-centric
sharing usually takes
place via domain
repositories.
Three Models for Data Sharing
Institution Driven Compliance Driven Community Driven
Three Models for Data Sharing
Institution Driven
Strengths long-term
preservation, potential
for high-quality
curation
Weaknesses siloed
discovery, limited
uptake
Compliance Driven
Strengths widespread
uptake, publication-
driven discovery,
sustainability
Weaknesses low quality
of deposits leading to
limited reusability,
lock-in potential
Community Driven
Strengths responsive to
researcher needs,
discovery through
familiarity
Weaknesses long-term
sustainability is
difficult to achieve
Challenges
and
opportunities
1. Researchers are confused by
the profusion of options –
they are getting mixed
messages
2. Are there opportunities for
proponents of different
models to cooperate toward
maximizing strengths and
minimizing weaknesses?
Looking Ahead
(Beyond Filling
Gaps)
When we think about “filling the gaps,”
we start to assume that our goal is static
and uniform across fields.
In order to succeed, we must work
together and change our paradigms.
Low-Hanging
Fruit
Collaboration
Problems
Collaboration
Problems
(/Opportunities)
Around a quarter of
R1 universities don’t
have any dedicated
data librarians on
staff (July 2019)
Building Data Capacity
So…can I have your data?
write code write more code
write even
more code
run this
other
model
raw
data
clean
data
analytic
model run the
model to
create
virtual
sensorscollect
sensor
data
another
analytic
model
results
Challenges
and
opportunities
1. Pick the low-hanging fruit
first.
2. Engagement is the #1 most
important skill for research
data professionals.
3. Be prepared for the possibility
that our solutions don’t fit
their problems.
Thank You

Springer "The Research Data Landscape: Beyond Filling Gaps"

  • 1.
    The Research DataServices Landscape Beyond Filling Gaps NISO Webinar Labor and Capacity for Research Data Management March 11, 2020 Rebecca Springer | rebecca.springer@ithaka.org | @rsspringer1
  • 2.
    The Research Support Servicesprogram examines the practices and support needs of scholars by discipline or thematic area using a unique collaborative, qualitative methodology. We provide research and strategic guidance to help the academic and cultural communities serve the public good and navigate economic, technological, and demographic change. Published and upcoming issue briefs and blog posts synthesize research on topics like data communities and the organization of research data services.
  • 3.
    1. Characterize thecurrent landscape of research data services 2. Point out some areas where we need to think beyond “filling gaps” In this talk I will…
  • 4.
  • 5.
    Three Models toDrive Data Sharing Institution Driven Researchers across disciplines are encouraged to deposit their datasets in institutional repositories. Collaborative groups are working to share curation expertise and make data sets discoverable across institutions. Compliance Driven Funders and publishers require researchers to deposit datasets when their articles are published. Generalist repositories are targeting this type of data sharing. Community Driven Researchers form fluid and informal communities around the sharing and reuse of certain types of data. Community-centric sharing usually takes place via domain repositories.
  • 6.
    Three Models forData Sharing Institution Driven Compliance Driven Community Driven
  • 7.
    Three Models forData Sharing Institution Driven Strengths long-term preservation, potential for high-quality curation Weaknesses siloed discovery, limited uptake Compliance Driven Strengths widespread uptake, publication- driven discovery, sustainability Weaknesses low quality of deposits leading to limited reusability, lock-in potential Community Driven Strengths responsive to researcher needs, discovery through familiarity Weaknesses long-term sustainability is difficult to achieve
  • 8.
    Challenges and opportunities 1. Researchers areconfused by the profusion of options – they are getting mixed messages 2. Are there opportunities for proponents of different models to cooperate toward maximizing strengths and minimizing weaknesses?
  • 9.
  • 10.
    When we thinkabout “filling the gaps,” we start to assume that our goal is static and uniform across fields. In order to succeed, we must work together and change our paradigms.
  • 11.
  • 12.
  • 13.
  • 14.
    Around a quarterof R1 universities don’t have any dedicated data librarians on staff (July 2019) Building Data Capacity
  • 15.
    So…can I haveyour data? write code write more code write even more code run this other model raw data clean data analytic model run the model to create virtual sensorscollect sensor data another analytic model results
  • 16.
    Challenges and opportunities 1. Pick thelow-hanging fruit first. 2. Engagement is the #1 most important skill for research data professionals. 3. Be prepared for the possibility that our solutions don’t fit their problems.
  • 17.

Editor's Notes

  • #5 Data communities concept is a way to address 2 Qs: scale, motivation
  • #6 NB I’m saying “data sharing” because usually when we’re talking about RDM it’s for the purpose of eventually sharing responsibly, or even “sharing with one’s self,” i.e. reusing data later. Currently 3 models through which institutions & stakeholders are trying to drive data sharing. This typology is focused on where the data eventually ends up, because that tends to be the motivating factor.
  • #7 NB I’m saying “data sharing” because usually when we’re talking about RDM it’s for the purpose of eventually sharing responsibly, or even “sharing with one’s self,” i.e. reusing data later. This typology is focused on where the data eventually ends up, because that tends to be the motivating factor.
  • #8 NB I’m saying “data sharing” because usually when we’re talking about RDM it’s for the purpose of eventually sharing responsibly, or even “sharing with one’s self,” i.e. reusing data later. This typology is focused on where the data eventually ends up, because that tends to be the motivating factor.
  • #10 Data communities concept is a way to address 2 Qs: scale, motivation
  • #11 What I mean is that data-driven research is changing fast. The needs of researchers vary by field, but also from researcher to researcher. And if we are just thinking of, OK how do we get everyone to this level of compliance with data sharing or RDMPs or whatever, we will get left behind. EVERYONE is in the same boat here. It’s not a library problem or a publisher problem or a research computing problem. We really have to work together and change our thinking to succeed here.
  • #12 CEE. Several interviewees reported using ImageJ, a software that integrates with MATLAB, to extract numerical values from published graphs when better data was not available. Obviously there are problems with accuracy here. To drive home how serious this is – one interviewee actually reported using a print-out of an article, a ruler, and a pencil to manually guesstimate the data from a published article. Not, like, back when they were doing their PhD in the 80s – this is a current thing they sometimes have to do.
  • #14 Map of all the research data services offered by a particular R1 The challenge is duplication, lack of coordination, researchers get confused, get mixed messages The opportunity is that with a decentralized model, services can be embedded in departments and centers, really close to where researchers do their work, socially as well as physically, so the researcher can have someone who understands what they’re doing and whom they trust to get advice from. Of course not every school has this. Then it’s a question of, how do we use the resources we have efficiently. How do we partner with other schools to share resources.
  • #15 These numbers have probably changed Hiring data librarians is one trend but others include: Training existing staff – every liaison, scholcomms, reference librarian will have to have data familiarity Hiring nontraditional data staff – i.e. staff who don’t have a MLIS degree – they are doing this especially over in Europe And again, realizing that the library is not the only player here – the library doesn’t have to have all the data expertise in house – it can also be a partner in directing people to the campus resources they need
  • #16 The third topic I wanted to bring up is how data science and big data increasingly do not fit into our data management paradigms. We often assume that you do research, you create data. Or you put data in, do something to it, get data out. It’s not even that simple. In some experimental fields it is – but increasingly not so. EX operations research. How do we figure this out? We have to talk to the researchers. Get them to explain step by step, answer the questions we feel stupid asking. Not everyone will talk to you but many will. They actually love when people are interested in their work. What do we do once we figure it out?