SlideShare a Scribd company logo
1 of 24
Download to read offline
[Unclear] words are denoted in brackets
Webinar: Trusted Data Repositories
13 March 2018
Video & slides available from ANDS website
START OF TRANSCRIPT
Andrew Treloar: Okay, good afternoon, or good morning, depending on what time zone
you're in. Welcome to this webinar on the ANDS - and I'll explain why it's
ANDS in a minute, trusted data repository program. My name's Andrew
Treloar. I work for ANDS and I was responsible for the trusted data
repository program itself.
So I'm going to start by providing a bit of context for the program, and an
overview of - a very high-level overview of CoreTrustSeal. We then have
three case studies. These were three projects that we funded, one from
CSIRO, one from the National Imaging Facility and one from the Australian
Data Archive, and I'll explain why we picked those and the different
perspectives they provide, and then we have a slightly more freewheeling
Q&A session at the end, so there should be plenty of time for you to ask any
questions that you have.
Trusted data repositories was a program that ANDS funded in its 2016/17
annual business plan. Now, 2016/17 seems like a very long time ago now,
and a number of these projects - for reasons that they will explain, ran after
the end of the 2016/17 financial year, and that's one of the reasons why
we're talking about them now.
Page 2 of 24
It also means that this was a program that we started when ANDS and
Nectar and RDS were largely separate activities. Since that time, ANDS,
Nectar and RDS have been progressively aligning what they do, our 2017/18
activities, which are of course still running, are being undertaken under an
integrated business plan, and so, while it seems slightly strange to be talking
about an ANDS-only program - for me, at least - it is very much something
that's been embraced by RDS and Nectar. So this overall trust agenda is very
much something that is reflected in the 2017/18 business plan and is tied
into a wider concern around research quality and trustedness.
So, please see this as entirely consistent with what the three projects, as
they come together, care about, even though it started under an ANDS-only
umbrella.
I was trying to think about the best way to provide some context for the
concern with trustedness, and I decided to go back - not to beer - although
the beer is relevant - but go back to an article that was very formative for
me in the early '90s, written by a guy called John Perry Barlow, who in fact
died last month.
So he was a guy who was, among other things, a lyricist for the band, the
Grateful Dead, but was one of the early thinkers around intellectual
property and ideas and wrote a very influential article called The Economy of
Ideas, subtitled Everything you know about intellectual property is wrong,
whereby other things he distinguished between then are in the contents.
So the way I want us to think about trust for the purposes of this seminar, is
to distinguish between the contents of a repository and the repository itself.
So, what we all want is we want contents - in this case, something that we
can consume without fear, and one of the ways that we're happy to, in this
case, drink from this delightful glass of beer, is by looking at the container.
So the container has some characteristics that make the contents more
trustworthy. One of the characteristics - and I hadn't actually thought about
this joke until now - one of the characteristics is the seal on top of the
Page 3 of 24
container, in this case. You can see that the beer bottle has not been
unsealed and so we're prepared to believe that it hasn’t been tampered
with between the brewery and getting poured.
Another of the elements of the trust that we might have in this particular
beer is the label on the bottle. So we look at this and we say, ah yes, I've
heard of Batemans, if I think - I think it's what that says. I've heard of
Batemans. They produce trustworthy beer. There might be more
information on the back. It might say, brewed in - somewhere. So there's
some brand information that's associated with the container that leads us to
trust it more, and there may even be some provenance information about
how the beer was produced, where the ingredients came from, and so on,
So the distinction that I want you to take out of this image is not, mm, it's
lunchtime - I'd really like a beer now, but that distinction between container
and contents.
In fact, the ANDS trusted data repository projects were focused on the
container. We had a separate set of projects, called trusted research
outputs, which were focused much more around the process of producing
the outputs and the provenance associated with producing the outputs. In
this webinar, we want to focus on the container - the trusted data
repository projects.
So we selected a small number of pilot projects, which were deliberately
designed to cover as wide a range of settings as possible. So they covered a
range of disciplines - human and non-human imaging, in the case of NIF,
social sciences, mostly quantitative social science in the case of ADA, water
and a number of other physical sciences in the case of CSIRO, a number of
different organisational settings - in some cases universities, in some cases a
national archive, in some cases a research institution, and a number of
different provision models - and you'll hear more about each of those in the
case studies.
Page 4 of 24
The point was to try and get information about what it takes to implement a
trusted data repository.
Part of what we were doing in the program was we're saying, look, we'd like
you to use this - what was then called the data seal of approval - now called
the CoreTrustSeal certification as the approach to follow, to determine
whether or not this was a trusted data repository.
So, for those of you that are unfamiliar with CoreTrustSeal, this started out
as a thing called the data seal of approval and was one of a number of
certification schemes. If you look at the bottom of this slide, originally it was
part of a three-level hierarchy. So the simplest was data seal of approval.
Next level up was a German standard, which was looking at standardisation
of digital resources, and the most advanced level was the thing that some of
you may have heard of, called the [TRACK] criteria, or ISO 16363.
What happened was that the World Data System - WDS - worked with the
research data alliance via a working group on repository audit and
certification, took the DANS data seal of approval, tweaked it a bit, added
some additional questions, and essentially agreed on this as what was for a
while called the IDAWDS criteria, and then just recently has morphed into an
organisation called CoreTrustSeal. So there is now an organisation called
CoreTrustSeal.org, which has taken over responsibility for this certification
program.
It's relatively straightforward at the high level. There are 16 criteria the you
use to assess your repository. Some of these are around your organisation
and the characteristics of your organisation. Some of these are around the
way your repository does digital object management - and you have varying
levels of compliance possible for a number of these criteria, so you can
assess how well you're doing. In that respect it's a little bit like a maturity
model assessment.
The thing I want to stress, before we get into the case studies, is that this is
not just technology, and in fact, the technology is almost the lease
Page 5 of 24
important bit. A lot of it is to do around the organisational processes and
the kind of organisation that's standing behind this trusted data repository.
You can use the criteria to do a self-assessment, and you hear a number of
the presenters talk about a self-assessment, or you can then submit that
assessment for certification - so external reviewers will look at that and
maybe come back and ask some questions. You'll then get a CoreTrustSeal
tick, which will last three years, and there's a business model that sits
behind that. You have to pay for that, but you can of course just do the self-
assessment as an exercise for yourself at zero cost.
So there's more information there and links to the criteria, but if you go to
CoreTrustSeal.org that will be enough.
That's all I really wanted to provide by way of introduction. What I'd now
like to do is pass to the first of our presenters. The first of our presenters is
Mikaela Lawrence who would like to talk to us about the CSIRO experience
in working with this approach.
Mikaela Lawrence: Okay, so today I'm just going to talk about CSIRO's trusted data repository
project. So I'll guide you through the aims of the project, some background
about the data access portal, the requirements of self-assessment as a TDR,
gathering the evidence, applying for certification and also another aspect of
our project was looking into hosting externally-owned data.
The aims of the trusted data repository project was to investigate certifying
the data access portal as the trusted data repository, to develop a plan to
implement changes to policies and procedures, to support CSIRO business
requirements, and certification. To develop a plan to implement systems
changes that may be required to the DAP infrastructure, to engage with
external entities to host externally-owned data as a test case, and to
prepare an application for certification.
This is our data access portal. So just a bit of background information of the
repository, and this'll provide some of the context that relates to the first
section of the application.
Page 6 of 24
The data access portal is currently an institutional repository and when we
submitted our application that's what we submitted our application as.
Deposit is by self-service and is accessible to CSIRO staff, using their
institutional username and log in. We have approximate 2100 publicly
available collections and storage of the data is over one petabyte.
The subject matter includes a broad range of sciences, with 17 of the 22
fields of research codes represented.
The software and storage infrastructure of the DAP - which is what we term
our data access portal - are developed and managed by CSIRO information
management and technology. We have a data deposit checklist, which
ensures depositors can see the key quality and legal issues prior to deposit.
A science leader then approves the collection after assessing for quality and
legal issues.
The repository - we offer a few different curation levels, based on depositor
needs. So the content can be distributed as deposited. We may offer some
basic curation - brief checking or addition of basic metadata - or enhanced
curation - such as conversion to new formats.
The community or data users of the data access portal are researchers,
industry, policymakers, general public and students. The data users can
download the majority of collections without a user log in, and a smaller
number of collections will require registration to access the files.
So the requirements of self-assessment as a TDR - when we went through
the process, to help with understanding - there are 16 requirements of the
self-assessment - we read other organisations' applications and considered
the evidence they had used. Applications within the CoreTrustSeal are now
open, with certified repository applications available on their website.
Applications that were useful to us in reading were DANS, as they're part of
the secretariat of the CoreTrustSeal, and have been involved in developing
their requirements, and also the UK data archives. They had a well-
organised application with the detailed evidence.
Page 7 of 24
To help with the next step of gathering and determining what evidence to
use for CSIRO, an analysis was undertaken of the types of evidence used in a
few of the published applications. We've included a list of references we
used to inform our understanding of the requirements in the appendix of
our report to ANDS. This will be published on their website.
There also is a useful extended guidance document and webinar available
on the CoreTrustSeal website that discusses the requirements and
reviewers' expectations.
Gathering the evidence. The certifying body have a preference for evidence
that is public, and we found this a major challenge. In this table are some
examples of the evidence we used for the first part of the requirements
which were organisational related, from requirement one to six. This gives
an idea of new evidence we developed, such as the mission statement. Also,
the difficulty with providing publicly available evidence. It also provides
information about the departments we consulted for expert guidance within
our organisation, such as legal, business development and staff from within
our own information and technology - information management and
technology department.
We have attempted to overcome the challenge with providing public
evidence with the development of collection development principles,
preservation principles and an update of the data management we've got.
These provide a summary of the processes, so requirements from seven to
16, which covered digital object management and technology. These public
documents are available from the CSIRO DAP help page.
The next - what stage are we up to with applying for certification? So the
data seal of approval ceased applications in October 2017, and we missed
this deadline. However, our application was submitted with the
CoreTrustSeal in February 2018, as part of their soft launch to test their
system. Processing of our application will begin when the CoreTrustSeal
legal entity is finalised. So we're currently waiting to pay the administration
fee of €1000 and then our application will be processed.
Page 8 of 24
We found getting an account for the application management tool gave
access to a staff member, who promptly answered our questions.
A word of warning - once an application is submitted, it is locked, however,
we found the helpful staff member could amend a small error we had made.
One aspect of our project looked at investigating policies, procedures and
system changes to host externally-owned data.
So why was this part of our strategy? As an organisation, we understand the
value of new research possibilities in drawing together research data,
produced by organisations beyond CSIRO and across the research
community. Also, researchers from our land and water business unit are
interested in investigating a trusted repository for water research data. This
vision is to bring together nationally significant data from a wide range of
organisations for the benefit of industry, policy and research.
What did we implement as part of this part of the project? We defined the
scope for accepting data in the collection development principles. For
example, data should be aligned with CSIRO's function as set out in Section 9
of the Science and Research Industry Act, 1949.
Terms and conditions were developed into an agreement to be signed by
the depositing organisation, called the Data Deposit Conditions. Some
example of the terms and conditions include that data is free from embargo,
it has not previously been published with the DOI, data is owned by the
depositing organisation, data complies with ethics, privacy, confidentiality,
contractual licensing and copyright obligations, and data will have a [CC by]
licence applied.
A data deposit form was developed for the data depositor, to provide
metadata. We developed some procedures for depositing externally-owned
data.
The DAP is a self-service repository, with access to deposit by CSIRO staff
only, so the research data service will liaise with external data owners to
facilitate the deposit of data. Then the science - CSIRO science leader
Page 9 of 24
with the main knowledge of the data will be the approver of the collection.
This is part of the risk management framework that all public data
collections in the DAP are subject to. It involves the checkers, the data
quality and legal issues prior to publishing.
Some future enhancements to the DAP include the ability to customise a
collection landing page, such as the addition of logos for external
organisations, automation of the data deposit conditions within the existing
DAP software and to develop a self-serve deposit interface for external
organisations.
We found that this project had some immediate benefits for us, such as
when applying for recommended repository status with journal publishers
and funders, we've found that we had information ready to use to meet
those requirements.
We've also had enquiries from researchers regarding publishing externally-
owned data, and we now have a response with policies and procedures in
place.
So thank you. There was a lot of people involved in this project within CSIRO
- too many to list, but a thank you to all of them as well.
Andrew Treloar: Okay, so next, another Andrew - Andrew Mehnert, to talk about the NIF
experience.
Andrew Mehnert: All right. So I'm going to talk about national trusted data repositories for the
National Imaging Facility. My name's Andrew Mehnert. I'm a NIF Informatics
Fellow at the Centre for Microscopy, Characterisation and Analysis, at the
University of Western Australia.
Very quickly, what is NIF? The Australian National Imaging Facility is a $130
million project, providing state of the art imaging capability of animals,
plants and materials for the Australian research community. The little map
there to the right shows the various nodes of the National Imaging Facility
around the country.
Page 10 of 24
Now, why is NIF interested in trusted data repositories? Well, the imaging
equipment such as MRI, PET, CT scanners are capable of producing vast
amounts of valuable research data. So we're interested in maximising those
research outcomes, and to do so, the data must be stored securely, it must
have its quality verified and should be accessible to the wider research
community.
From the CoreTrustSeal point of view, why trusted data repositories? Well,
firstly, to be able to share data, secondly, to preserve the initial investment
in collecting that data, thirdly, to ensure that the data remain useful, and
meaningful into the future. The last one, importantly, is that funding
authorities are increasingly requiring continued access to data that's
produced by projects they fund.
All right, now I want to talk specifically about the NIF/RDS/ANDS trusted
data repositories project, officially titled Delivering Durable, Reliable, High-
Quality Image Data for the National Imaging Facility.
Now, broad aim of the project was to enhance quality, durability and
reliability of data that is generated by the NIF.
By quality, we mean that data has to be captured by what we call the NIF-
agreed process. Durable means that the data has to have guaranteed
availability for 10 years. Reliable means that the data has to be useful for
future researchers. So it has to be stored in one or more open data formats,
and with sufficient evidential metadata, so we know how it was created,
what the state of the instrument was at the time of creation, and so on.
The NIF nodes involved were the University of Western Australia, University
of Queensland, University of New South Wales and Monash University.
In the project, we limited our scope to MRI data, but essentially, the results
are generalisable to other modalities, and in fact we've already progressed
to micro CT.
Key outcomes from the project include the NIF-agreed process to obtain
trusted data from NIF instruments. I'll talk more about that shortly. The
Page 11 of 24
second is requirements necessary and sufficient for a basic NIF-trusted data
repository service. The third were exemplar repository services across all
four participating noes, and then the last one were self-assessments against
the core trustworthy data repositories' requirements, from CoreTrustSeal.
The NIF-agreed process requiring high quality data - this essentially lists
requirements that have to be satisfied to obtain high-quality data - which
we call NIF-certified data - this then suitable for ingestion in a NIF trusted
data repository service.
We mandate that repository data must be organised by project ID, because
projects' IDs will persist with time, whereas user IDs don't - users come and
go.
Now, to be NIF-certified, the data must have been acquired on a NIF-
compliant instrument - more about that shortly. It has to possess NIF-
minimal metadata - so that includes cross-reference to relevant instrument
quality control data. It has to include the native data generated by the
instrument in proprietary format and include conversions to one or more
open data formats.
So the requirements for a NIF trusted data repository service. We drew
upon the CoreTrustSeal requirements in the left column that you see there,
and additionally added some NIF requirements. One of them you've seen
already, the project ID requirement, but we also require an instrument ID
requirement, a quality control requirement, authentication by Australian
Access Federation requirement, interoperability - that is we should be able
to upload data from one repository to another. Redeployability - it should be
possible to deploy the service from one NIF node to another - and a service
requirement that essentially, we have a help desk responding to requests
regarding the repository.
So in a nutshell, if we have a look at this diagram. If we concentrate on the
right-hand side. If we have - we've got the four sites, UWA, UQ, UNSW and
Monash, so TruDat@ that particular site represents the trusted data
Page 12 of 24
repository. Login is via the Australian Access Federation, so it means that on
any of the sites it will direct you back to your institutional login page and use
institutional credentials.
As I mentioned before, data sets are organised by project ID. A data set is
associated with an instrument, and provided the NIF-agreed process has
been followed, then a NIF certification flag, indicating that it is certified, is
also included with the data se.
The repository has a record for the instrument. The instrument itself is
linked to another special project, called the Quality Control Project, and also
a handle to a record in Research Data Australia.
Looking at the bottom of the screen, you can see Research Data Australia is
a data and service discovery portal, provided by ANDS. So we put into that
an instrument description that's both hardware and software, and there's a
unique handle to that record.
If we look at the top left now, at the instrument PC, or client PC, data is
uploaded, according to the NIF-agreed process, so the top box above NIF-
agreed process, the user data set has to have minimal metadata, has the
project I'd, instrument ID, date and time the data was acquired, implicit
metadata that’s in the proprietary data, the native data from the instrument
and conversions to one or more open data formats.
The instrument operator can also upload data to the quality control project,
which includes the [quality stamp], quality control standard operating
procedure, which of course can be updated over time, and quality control
data. So what this means is that when a user uploads data to the repository
there's an automatic link to the quality control project, and so it's possible
to know the state of the instrument at the time that the data was acquired.
This is what the portal looks like for TruDat@UWA. We have based this on
the MyTardis platform, which originated at Monash, with several extensions
developed during the project, and we use docker technology to be able to
Page 13 of 24
easily deploy different sites. So this allows easy instrument integration,
simple data sharing and user control publishing data sets.
Okay, now I come to the comparison of all the self-assessments against the
CoreTrustSeal requirements. All four sites did their own self-assessments for
their respective repositories. What we can see here in this table - also this
shows the first eight such requirements - is that essentially, we
independently arrived at a fairly similar level of assessment, except for the
cases there where we marked in blue.
So the third one, we talk about continuity of access. Monash here believes
that at this point in time that that was not assured, whereas the other three
sites did. I should point out this self-assessment is a statement of the reality
in the situation at the point in time that the self-assessment was completed.
Then there was a difference as well at row four, which is requirement four,
confidentiality and ethics. Monash have this fully implemented, whereas the
other three sites are in various stages of getting this to be implemented.
Then the other differences, with the remaining requirements, some
differences with respect to data storage - documented storage procedures,
work flows and data discovery and identification.
Post-funding. The project hasn’t finished, just because the funding has
finished. We intend to maintain the services for 10 years now, and we plan
to meet quarterly, to make sure that this happens. We are integrating
additional instruments - as I said, we're adding micro CT instruments at the
moment. We will create a project web portal, so we have a single landing
page for all these trusted data repository services.
We're planning new national and international service deployments,
including one in Turku, Finland. We're refining and improving the trusted
data repository portal and we intend progressing to CoreTrustSeal
certification.
Very quickly, benefits of the NIF trusted data repository services. For NIF
users and the broader community it means reliable, durable access to data,
Page 14 of 24
improved reliability of research outputs and provenance associated with it,
making NIF data more fair. Easier linkages between publications and data
and stronger research partnerships.
For NIF it means improved data quality, improved international reputation,
ability to run multi-centre trials.
For the various research institutions, enhanced reputation management, a
means by which to comply with the draft code for responsible research, and
the enhanced ability to engage in multi-centre imaging research projects.
With that, I thank you. I list on the page here the various project leads at the
various nodes. So thank you very much.
Andrew Treloar: Okay, thank you Andrew Mehnert. So that's two quite different perspectives
on trusted data repositories. The third perspective comes from Heather
Leasor and Steve McEachern, from the Australian Data Archive.
Heather Leasor: We're at the [centre] archive, which is a social science research data archive.
Our mission is to be a national service for the collection and preservation of
digital research data, and to make these available to academics, government
and other researchers.
We hold about 5000 datasets in over 1500 studies, on all areas of social
science, from social attitude surveys, censuses, aggregate statistics,
administrative data, and many other sources, both qualitative and
quantitative.
Our data holdings are sourced from academics, government and private
sector.
We undertook the process with ANDS, as part of the trusted repositories.
We originally started under the Data Seal of Approval, before they had
actually combined fully with the World Data Service - our systems.
Originally, we were the DSA and then we became the DSA/WDS. When we
found out that they were moving to the CoreTrustSeal, we delayed our
Page 15 of 24
implementation of which guidelines to take. We officially started the
DSA/WDS in March of 2017 and submitted our application in April of 2017.
We were due to have a review from our reviewers in May, but it didn't
actually arrive until August. Then we made our corrections to this. We sent
it back in, and then we got another aspect of corrections, did the other
round of correction and submitted and finalised in February of 2018. So
slightly less than a year's length of process, we are a CoreTrustSeal
repository now.
We did use the November 2016 WDS - DSA/WDS guidelines, which weren't
as detailed as what is given in the CoreTrustSeal, and there were no people
to look at for reference - as in Mikaela had said she looked at others for
reference - there was no one to look at for reference for this new
CoreTrustSeal, so we flew from what people had done in the DSA/WDS and
flew blind for a bit.
When we want through the process, which was a very useful process for
self-assessment, we identified four of the guidelines, which we set at a level
three, which is the implement station phase of process, which were data
integrity and authenticity, the guideline 10, which is preservation and
planning, guideline 15, which is technical infrastructure, and guideline 16.
Later, in assessment with one of our reviewers, we also changed guideline
nine, which is documented storage, down to a level three. Everything else
we had set at a level four, for our repository.
Our repository has been around for about 35 years - coming up on 40 years,
so we do have quite a few procedures in place.
Some of the challenges that we found, doing the CoreTrustSeal process.
When we initially undertook it there was no recommendation of what a
minimum requirement would be for any of the guidelines, so we didn’t
know if we set it at a three, if that meant we wouldn't be able to get a
CoreTrustSeal or not. Or if you set it at a one, can you still get a
CoreTrustSeal? There doesn’t seem to be a minimum requirement that we
Page 16 of 24
have ever found. The extended guidelines do detail things a little bit better
nowadays, for those who are undertaking it in the future.
We weren't sure if you had to respond to every aspect of all of the sub-
questions in a guideline, or just to the overall arching guideline.
We also found that there was a complex interplay between the relevant
documents required for a guideline and those for other guidelines, so that
one document may respond to up to four different guidelines, or it may
respond to only one guideline.
Also, we found it difficult for providing evidence from documents which are
not in the public domain. Like the other two, we had to go through our own
websites and find out what we did have, forward-facing, what we had
internally-facing and which aspects of those we feel we can now put into an
outward-facing website or Wiki page.
There should be - all aspects should be outward-facing, but if things had to
be inward-facing, there seems to be some basis that the CoreTrustSeal can
deal with that.
The assessors did not indicate, in our original guidelines, that you had to
have a timeline for things that were in process. The new guidelines do state
that you have to list a guideline of when you plan to have your
implementation in place by, so we had to add that in our final version of
when we planned to have these items forward-facing and our new website
up and running. We had to come up - we had no idea when we originally
started the process, what the process entailed, and what time frames it was
going to take. We were unclear if it was going to take a few months or a
year. It ended up taking us a year, but the CoreTrustSeal does seem to be
coming along as an organisation much better, so that time lines should
move a bit quicker now.
So, from our experience, we found that doing as Mikaela and Andrew had
done, going through and finding out what is in the public domain already,
Page 17 of 24
and what can safely be put into the public domain is a good first step for any
repository undertaking the CoreTrustSeal.
How to cite the items which are out of public domain and private elements
is still an area of question, which the CoreTrustSeal is dealing with.
We also would like to know how to deal with items that are out of our direct
control, such as funding models, infrastructure and governance, being a part
of the larger university, or as CSIRO, part of a governmental body, and
Andrew being part of multiple institutions, how do you fit into their
governance models? How do you fit into the infrastructure, and how is this
relayed to the CoreTrustSeal, with these complexities?
Also, the risk management section of the CoreTrustSeal we found a bit
difficult, because they kept referencing almost ISO standard requirements
and to undertake an ISO standard for a risk assessment, to do a base
CoreTrustSeal seemed a bit overkill for us. So finding some risk management
standards that are free and in the public domain, would be very useful.
We actually answered the final one, which is the guidelines are freely
available for self-assessment, without paying to obtain a seal. You can just
undertake the CoreTrustSeal as a self-assessment for your repository, so you
can define what your repository is, where your boundaries are, and
undertake the assessment.
So in the Australian context - these aren't necessarily only in the Australian
context. We did find that they relate to other repositories worldwide, but
the complexity of how institutions and repositories - one institution may
encompass multiple repositories, or one repository may encompass multiple
institutions, and how this affects your governance, your funding, your
security and all those aspects, as well as things that are in the national
framework. So things that are involved in our national roadmaps - how
these play into the CoreTrustSeal, and how they're also out of the control of
the individual repositories. Infrastructure frameworks. Infrastructure that is
Page 18 of 24
provided by your host institution and the government frameworks of host
institutions, which are not easily explainable in a CoreTrustSeal.
So these are not necessarily, as I said, Australian specific, but more to do
with the repository sector, because the repository sector is a very varied
sector, with multiple institutions, multiple repositories, playing different
roles.
Andrew Treloar: Okay. Thank you, Steve and Heather. We've now heard from three separate
experiences of engaging what trusted data repositories and the
CoreTrustSeal. We now have 15 minutes or so for questions.
The first question is from Nick, who says that they got Data Seal of Approval
in 2012, so very early on. Is there any advantage in going through the WDS
process? Would any of the three panellists like to weigh in on that one?
Steve McEachern: I'll take that one, Andrew.
Andrew Treloar: Thanks, Steve.
Steve McEachern: My sense would be probably - I mean, the DSA - the three-year certification
in and of itself - so it's a question - I mean from the point of view of being
ongoing certification, I suppose there's a consideration there.
What I would say to you, having been through - as I say, we were familiar
with the DSA in its original version and what it morphed into in the
CoreTrustSeal - there is probably a heavier expectation on some of the risk
management and preservation requirements than there was in the past, and
the emphasis has shifted somewhat, I would say.
The other point I would probably make is that the review process itself - I
mean, we were flying blind, as Heather pointed out. This - our experience is
probably not reflective of everyone as a whole. I think the CoreTrustSeal
organisation itself was developing, and the reviewing that was going on
probably - the reviews and so on were probably a bit different, as they
brought together what was - that seal of approval was a social science
standard, to begin with, and into the humanities, as Nick's does as well.
Page 19 of 24
It has - I think the WDS side of this is more the physical and life - business
sciences in particular, or the earth sciences. So there's probably a shift in
emphasis there. I think it would be a good experience, but it might be a bit
different from what you went through in the DSA experience, is probably
how I'd reflect on that.
Andrew Treloar: Okay, thanks, Steve. Comments from either of the presenters, if you want to
weigh in on that?
Okay, no - so it looks like a no. Maybe if I could ask a question that builds on
that question from Nick and Steve's answer. Under CoreTrustSeal, the idea
is that you would apply for certification, you would get certification. That
certification would run for three years. I know that they've talked about a
lightweight recertification down the track, if you want to get recertified in
three years' time. Would any of the presenters like to comment on the
question of the time length for the certification - or rather the expiry time
for certification time, and whether that is a reasonable thing to do? Is it - in
your view, does it seem sensible that your certification would slowly
evaporate over a three-year period and that there be value in applying again
in three years' time? Anyone want to…
Andrew Mehnert: I guess I might chip in there and say, the amount of effort I guess in getting
the original certification through, I'd say it would be worth it to keep this
going to the future, and it should be a - three years would seem reasonable,
and it should be a fairly lightweight exercise to get that recertification.
That's having not yet achieved certification the first time around.
Steve McEachern: I would say there, Andrew, I think three years is the right sort of cycle as
well, so long as the certification process itself - the time frame shortens. Our
application was for April 2017. Our certification was in February 2018. Our
certification will end in December 2019. So I think the cycle is right, given
the context of what the content is, but I - as I say, I - and so this is, I think,
partly a function of the organisation itself evolving and sorting itself out, but
Page 20 of 24
the process itself has to speed up somewhat, in order to make that three-
year cycle an appropriate one.
I think that's the right time frame, but they have to speed up the process.
Andrew Treloar: Yeah, that makes sense, and I - sorry, go on.
Heather Leasor: I believe that, yeah, as soon as you do have most of your documents
together, and you know which ones need to evolve, the recertification
should go a bit quicker, because you can just copy and paste, pretty much,
and iterate what new developments have happened to your institution or
your repository in that time.
Andrew Treloar: Yeah, that makes sense. So as in most new things that one does, the first
time's a bit painful, and then it gets easier.
Heather Leasor: Yeah.
Andrew Treloar: Two questions from Carmella. The first question is, what is ANDS' long-term
plan to include university repositories to have all the university repositories
meet the CoreTrustSeal? That is an interesting combination of issues there.
Firstly, it would now be ANDS/Nectar/RDS, as we continue to merge
towards a new organisation, and the ANDS/Nectar/RDS is not really in a
position to require university repositories to do anything. University
repositories will apply for CoreTrustSeal if they see value in doing it
themselves. In the case of the projects that presented here, we provided
some funding to help ADA and NIF and CSIRO do something. That's
something that they really wanted to do anyway. So I don't think that - yeah,
I think it's going to depend on the drivers for the individual repositories as to
whether they see value in this.
Then the second question was on the duration of the data to be preserved.
Carmella's comment there was, 10 years of data preservation seems like it's
a relatively short period of time, especially in the case of clinical trials. I'll
leave it to the three presenters to comment on that, but I would just say
that I suspect the 10 years is a consensus number and is not a, you have to
throw it away after 10 years number. It's a, at least that number. I'm pretty
Page 21 of 24
sure the NHMRC Australian code for responsible conduct of research says
either seven or 10, so it's not inconsistent with that - but would any of the
presenters like to weigh in on this subject of 10 years, before I move to the
next question?
Andrew Mehnert: I might answer first, if I can? I think you're right - NHMRC ARC requirement's
for seven years for retaining data. The figure of 10 was essentially in the
original research proposal as something reasonable each of the NIF nodes
and the associated institutions were happy to support, but that doesn’t
mean we won't support beyond the 10-year period. That was just something
in consensus we agreed that collectively we could do, to ensure. 10 years is
a long time to guarantee a service is running, but the plan at each of the
nodes is that we would continue to the future. The 10 years is proof of
concept that we can indeed do this over a long period of time.
Andrew Treloar: Yeah.
Andrew Mehnert: In the case of NIF, this was establishing some new repository services, so
that's been a challenge unto itself, and guaranteeing 10 years of running
service is no mean feat.
Andrew Treloar: Yeah, 10 years sounds like a long time.
Graham Galloway: It's Graham - Graham Galloway here, who was the lead for that, the NIF
trusted data repository. I think one of the things we need to recognise that
over 10 years the nature of the repository is going to change. We don't
know where we're going to be storing data in 10 years. So for the
institutions to guarantee more than 10 years at this point in time is going to
be difficult. National Imaging Facility, as Andrew has already said, is
committed to providing mechanisms for data storage, but we could - in 10
years' those repositories could be existing on Amazon or on other publicly -
domain services.
So, we - yeah, we want a commitment from the partners at the time we
signed the contract that they would guarantee to maintain that storage for
that 10 years, but we're committed to looking at the mechanisms beyond
Page 22 of 24
that, to ensure - and then of course, then you've got to look at migration of
data between those repositories, and that's an issue we'll have to address.
Andrew Treloar: Yeah, indeed. For the benefit of those people who are unfamiliar with the
august presence of Graham Galloway. He is the director of the National
Imaging Facility.
Steve McEachern: Andrew, just on that - Graham, can I just make a quick comment which is,
this is one of the things that we were referencing at the end here, which
actually creates the complexity of responding to the guidelines, in terms of
saying, how long will you maintain a service, or how long - you're making -
potentially making commitments you can't realistically fulfil.
Andrew Treloar: Yeah.
Steve: In some ways some of the expectations in the seal needed to account for
that, I think, a little bit more than they probably did. It's unrealistic for us to
say that we - yeah, other than being able to say, look, we've been running
for 35 years - we'll probably still be here in 10 years' time. But as I say, in
terms of [IFA] commitment, I could say the ANU will be here in 10 years'
time. Or if I'm the National Archives, I can say that. There aren't many
organisations who'd actually make such claims - or parts of organisations.
Andrew Treloar: Yeah, that's actually a really good point, Steve, that I think the Data Seal of
Approval certainly came out of one environment where they - the players
involved at the time were largely national archives and they saw the world
through that set of lenses - we've been around forever and we're going to
be around forever. Trying to apply that as you move out into the wider data
repository space - it gets harder and harder to make those kinds of
commitments.
Steve McEachern: Yeah.
Andrew Treloar: I realise we're running close on time, so I might just skip over an
observation, and maybe finish with this question, which is, what are the
costs, in terms of human and non-human resource, to get certification? Did
Page 23 of 24
any of you try and add up how much time and effort you put into this? Or
did you choose not to, because the number was just going to be too scary?
Steve McEachern: Well, we can say, in terms of human - well, one of the human resources in
this room - so Heather - as I say, part of our funding was for Heather to
contribute to that. I'd say that you were working a fair chunk of about nine
months, probably, to put that together. There's a reasonable proportion of
my time, as well - not at that level, but probably half a day a week, for
several months - and bits and pieces of other parts of our organisation as
well.
There aren't too many non-human resources, because we were certifying
the existing facility, so realistically, it was a document-gathering or creation
exercise. So it really was primarily staff time that was the involvement there,
plus a little bit of developer - development, but not much.
I think the experience will be different in Andrew's case in particular, where
you are - where it'll be a new service. So to reflect on - to certify the existing
service, it really is staff time. It depends on how good your documentation is
already.
In that way, it actually is a useful exercise, that experience, because it
reminds you of what you haven't done. So as I say, there is real value in that,
but as I said, there was a - there is time that's involved in that as well.
Andrew Treloar: Yeah, all right. Thank you. Mikaela, is it possible for you to respond to that?
Mikaela Lawrence: Yes. Certainly we had similar commitment as the ADA. We had myself and
another data librarian working on this and then given that we were looking
at hosting externally available data, we also had some of our researchers
talking to external organisations as well. However to cost time is a difficult
thing. Also, our legal counsel also put in a significant time and effort into
developing new procedures and policies for hosting externally-owned data.
Andrew Treloar: Okay, thank you. Andrew Mehnert, did you want to weigh in on that?
Page 24 of 24
Andrew Mehnert: Yes. So similar experiences. We had the challenge, I guess, of having four
different sites. We had a project manager at each site and a little team
around that project manager, to address some of these issues - talk to IT
services, talk to library services, to resolve some of the question.
Then, as the overall project manager, I was consulting with each of the other
project managers and trying to come to consensus - even talking to yourself,
Andrew, to understand some of the questions and how to respond, and I
guess at the end of the day it was giving an honest response, with the best
information you have at hand, and that is the honest status of your - how
you've addressed each of the requirements.
Andrew Treloar: Yeah. All right, thank you, and I'm afraid we're going to have to leave it
there. We're over time. My apologies to those people who have questions in
the question panel that we haven't got to yet, but my - in particular, my
thanks to the presenters - to Mikaela, to Heather, to Steve, to Andrew
Mehnert and to Graham for weighing in. Thank you for sharing your
experiences with us. I hope that's been of benefit to the community.
We look forward to seeing or hearing from some of you on our next
webinar. Thank you all.
END OF TRANSCRIPT

More Related Content

Similar to Transcript - Trusted Data Repositories - 13 March 2018

Data Management Strategies - Speakers Notes
Data Management Strategies - Speakers NotesData Management Strategies - Speakers Notes
Data Management Strategies - Speakers NotesMicheal Axelsen
 
Chris Lacinak
Chris LacinakChris Lacinak
Chris LacinakFIAT/IFTA
 
Data for Business Journalism, NICAR 2012
Data for Business Journalism, NICAR 2012Data for Business Journalism, NICAR 2012
Data for Business Journalism, NICAR 2012Chris Taggart
 
A Strategic Approach to Disaster Recovery and Data Lifecycle Management Pays Off
A Strategic Approach to Disaster Recovery and Data Lifecycle Management Pays OffA Strategic Approach to Disaster Recovery and Data Lifecycle Management Pays Off
A Strategic Approach to Disaster Recovery and Data Lifecycle Management Pays OffDana Gardner
 
Big Data Pushes Enterprises into Data-Driven Mode, Makes Demands for More App...
Big Data Pushes Enterprises into Data-Driven Mode, Makes Demands for More App...Big Data Pushes Enterprises into Data-Driven Mode, Makes Demands for More App...
Big Data Pushes Enterprises into Data-Driven Mode, Makes Demands for More App...Dana Gardner
 
2014 11-17 crichton institute talk on open data
2014 11-17 crichton institute talk on open data2014 11-17 crichton institute talk on open data
2014 11-17 crichton institute talk on open dataPeterWinstanley1
 
Gaining Digital Business Strategic View Across More Data Gives AmeriPride Cul...
Gaining Digital Business Strategic View Across More Data Gives AmeriPride Cul...Gaining Digital Business Strategic View Across More Data Gives AmeriPride Cul...
Gaining Digital Business Strategic View Across More Data Gives AmeriPride Cul...Dana Gardner
 
Cecchini
CecchiniCecchini
Cecchinibutest
 
Securing Business Operations and Critical Infrastructure: Trusted Technology,...
Securing Business Operations and Critical Infrastructure: Trusted Technology,...Securing Business Operations and Critical Infrastructure: Trusted Technology,...
Securing Business Operations and Critical Infrastructure: Trusted Technology,...Dana Gardner
 
Transcript - Tracking Research Data Footprints via Integration with Research ...
Transcript - Tracking Research Data Footprints via Integration with Research ...Transcript - Tracking Research Data Footprints via Integration with Research ...
Transcript - Tracking Research Data Footprints via Integration with Research ...ARDC
 
Pallid Sturgeon Research Project
Pallid Sturgeon Research ProjectPallid Sturgeon Research Project
Pallid Sturgeon Research ProjectBrenda Zerr
 
Industry Moves to Fill Gap for Building Trusted Supply Chain Technology Accre...
Industry Moves to Fill Gap for Building Trusted Supply Chain Technology Accre...Industry Moves to Fill Gap for Building Trusted Supply Chain Technology Accre...
Industry Moves to Fill Gap for Building Trusted Supply Chain Technology Accre...Dana Gardner
 
A picture is worth a thousand words_Mathilda Eloff
A picture is worth a thousand words_Mathilda EloffA picture is worth a thousand words_Mathilda Eloff
A picture is worth a thousand words_Mathilda EloffMathilda Eloff
 
Essay Writing Service
Essay Writing ServiceEssay Writing Service
Essay Writing ServiceAllen Butch
 
Professional Information Research
Professional Information ResearchProfessional Information Research
Professional Information ResearchEric Kokke
 
016 Essay Example Paragraph Starters Counter Argument Persuas
016 Essay Example Paragraph Starters Counter Argument Persuas016 Essay Example Paragraph Starters Counter Argument Persuas
016 Essay Example Paragraph Starters Counter Argument PersuasCherie King
 
Transcript FAIR 3 -I-for-interoperable-13-9-17
Transcript FAIR 3 -I-for-interoperable-13-9-17Transcript FAIR 3 -I-for-interoperable-13-9-17
Transcript FAIR 3 -I-for-interoperable-13-9-17ARDC
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Creating a More Efficient and Effective eDiscovery Team -- Ipro Innovations 2016
Creating a More Efficient and Effective eDiscovery Team -- Ipro Innovations 2016Creating a More Efficient and Effective eDiscovery Team -- Ipro Innovations 2016
Creating a More Efficient and Effective eDiscovery Team -- Ipro Innovations 2016ESI Attorneys LLC
 

Similar to Transcript - Trusted Data Repositories - 13 March 2018 (20)

Data Management Strategies - Speakers Notes
Data Management Strategies - Speakers NotesData Management Strategies - Speakers Notes
Data Management Strategies - Speakers Notes
 
Chris Lacinak
Chris LacinakChris Lacinak
Chris Lacinak
 
Data for Business Journalism, NICAR 2012
Data for Business Journalism, NICAR 2012Data for Business Journalism, NICAR 2012
Data for Business Journalism, NICAR 2012
 
A Strategic Approach to Disaster Recovery and Data Lifecycle Management Pays Off
A Strategic Approach to Disaster Recovery and Data Lifecycle Management Pays OffA Strategic Approach to Disaster Recovery and Data Lifecycle Management Pays Off
A Strategic Approach to Disaster Recovery and Data Lifecycle Management Pays Off
 
Big Data Pushes Enterprises into Data-Driven Mode, Makes Demands for More App...
Big Data Pushes Enterprises into Data-Driven Mode, Makes Demands for More App...Big Data Pushes Enterprises into Data-Driven Mode, Makes Demands for More App...
Big Data Pushes Enterprises into Data-Driven Mode, Makes Demands for More App...
 
Research-Data-Management-and-your-PhD
Research-Data-Management-and-your-PhDResearch-Data-Management-and-your-PhD
Research-Data-Management-and-your-PhD
 
2014 11-17 crichton institute talk on open data
2014 11-17 crichton institute talk on open data2014 11-17 crichton institute talk on open data
2014 11-17 crichton institute talk on open data
 
Gaining Digital Business Strategic View Across More Data Gives AmeriPride Cul...
Gaining Digital Business Strategic View Across More Data Gives AmeriPride Cul...Gaining Digital Business Strategic View Across More Data Gives AmeriPride Cul...
Gaining Digital Business Strategic View Across More Data Gives AmeriPride Cul...
 
Cecchini
CecchiniCecchini
Cecchini
 
Securing Business Operations and Critical Infrastructure: Trusted Technology,...
Securing Business Operations and Critical Infrastructure: Trusted Technology,...Securing Business Operations and Critical Infrastructure: Trusted Technology,...
Securing Business Operations and Critical Infrastructure: Trusted Technology,...
 
Transcript - Tracking Research Data Footprints via Integration with Research ...
Transcript - Tracking Research Data Footprints via Integration with Research ...Transcript - Tracking Research Data Footprints via Integration with Research ...
Transcript - Tracking Research Data Footprints via Integration with Research ...
 
Pallid Sturgeon Research Project
Pallid Sturgeon Research ProjectPallid Sturgeon Research Project
Pallid Sturgeon Research Project
 
Industry Moves to Fill Gap for Building Trusted Supply Chain Technology Accre...
Industry Moves to Fill Gap for Building Trusted Supply Chain Technology Accre...Industry Moves to Fill Gap for Building Trusted Supply Chain Technology Accre...
Industry Moves to Fill Gap for Building Trusted Supply Chain Technology Accre...
 
A picture is worth a thousand words_Mathilda Eloff
A picture is worth a thousand words_Mathilda EloffA picture is worth a thousand words_Mathilda Eloff
A picture is worth a thousand words_Mathilda Eloff
 
Essay Writing Service
Essay Writing ServiceEssay Writing Service
Essay Writing Service
 
Professional Information Research
Professional Information ResearchProfessional Information Research
Professional Information Research
 
016 Essay Example Paragraph Starters Counter Argument Persuas
016 Essay Example Paragraph Starters Counter Argument Persuas016 Essay Example Paragraph Starters Counter Argument Persuas
016 Essay Example Paragraph Starters Counter Argument Persuas
 
Transcript FAIR 3 -I-for-interoperable-13-9-17
Transcript FAIR 3 -I-for-interoperable-13-9-17Transcript FAIR 3 -I-for-interoperable-13-9-17
Transcript FAIR 3 -I-for-interoperable-13-9-17
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Creating a More Efficient and Effective eDiscovery Team -- Ipro Innovations 2016
Creating a More Efficient and Effective eDiscovery Team -- Ipro Innovations 2016Creating a More Efficient and Effective eDiscovery Team -- Ipro Innovations 2016
Creating a More Efficient and Effective eDiscovery Team -- Ipro Innovations 2016
 

More from ARDC

Introduction to ADA
Introduction to ADAIntroduction to ADA
Introduction to ADAARDC
 
Architecture and Standards
Architecture and StandardsArchitecture and Standards
Architecture and StandardsARDC
 
Data Sharing and Release Legislation
Data Sharing and Release Legislation   Data Sharing and Release Legislation
Data Sharing and Release Legislation ARDC
 
Australian Dementia Network (ADNet)
Australian Dementia Network (ADNet)Australian Dementia Network (ADNet)
Australian Dementia Network (ADNet)ARDC
 
Investigator-initiated clinical trials: a community perspective
Investigator-initiated clinical trials: a community perspectiveInvestigator-initiated clinical trials: a community perspective
Investigator-initiated clinical trials: a community perspectiveARDC
 
NCRIS and the health domain
NCRIS and the health domainNCRIS and the health domain
NCRIS and the health domainARDC
 
International perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research dataInternational perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research dataARDC
 
Clinical trials data sharing
Clinical trials data sharingClinical trials data sharing
Clinical trials data sharingARDC
 
Clinical trials and cohort studies
Clinical trials and cohort studiesClinical trials and cohort studies
Clinical trials and cohort studiesARDC
 
Introduction to vision and scope
Introduction to vision and scopeIntroduction to vision and scope
Introduction to vision and scopeARDC
 
FAIR for the future: embracing all things data
FAIR for the future: embracing all things dataFAIR for the future: embracing all things data
FAIR for the future: embracing all things dataARDC
 
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian DuncanARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian DuncanARDC
 
Skilling-up-in-research-data-management-20181128
Skilling-up-in-research-data-management-20181128Skilling-up-in-research-data-management-20181128
Skilling-up-in-research-data-management-20181128ARDC
 
Research data management and sharing of medical data
Research data management and sharing of medical dataResearch data management and sharing of medical data
Research data management and sharing of medical dataARDC
 
Findable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) dataFindable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) dataARDC
 
Applying FAIR principles to linked datasets: Opportunities and Challenges
Applying FAIR principles to linked datasets: Opportunities and ChallengesApplying FAIR principles to linked datasets: Opportunities and Challenges
Applying FAIR principles to linked datasets: Opportunities and ChallengesARDC
 
How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018ARDC
 
Ready, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
Ready, Set, Go! Join the Top 10 FAIR Data Things Global SprintReady, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
Ready, Set, Go! Join the Top 10 FAIR Data Things Global SprintARDC
 
How FAIR is your data? Copyright, licensing and reuse of data
How FAIR is your data? Copyright, licensing and reuse of dataHow FAIR is your data? Copyright, licensing and reuse of data
How FAIR is your data? Copyright, licensing and reuse of dataARDC
 
Peter neish DMPs BoF eResearch 2018
Peter neish DMPs BoF eResearch 2018Peter neish DMPs BoF eResearch 2018
Peter neish DMPs BoF eResearch 2018ARDC
 

More from ARDC (20)

Introduction to ADA
Introduction to ADAIntroduction to ADA
Introduction to ADA
 
Architecture and Standards
Architecture and StandardsArchitecture and Standards
Architecture and Standards
 
Data Sharing and Release Legislation
Data Sharing and Release Legislation   Data Sharing and Release Legislation
Data Sharing and Release Legislation
 
Australian Dementia Network (ADNet)
Australian Dementia Network (ADNet)Australian Dementia Network (ADNet)
Australian Dementia Network (ADNet)
 
Investigator-initiated clinical trials: a community perspective
Investigator-initiated clinical trials: a community perspectiveInvestigator-initiated clinical trials: a community perspective
Investigator-initiated clinical trials: a community perspective
 
NCRIS and the health domain
NCRIS and the health domainNCRIS and the health domain
NCRIS and the health domain
 
International perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research dataInternational perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research data
 
Clinical trials data sharing
Clinical trials data sharingClinical trials data sharing
Clinical trials data sharing
 
Clinical trials and cohort studies
Clinical trials and cohort studiesClinical trials and cohort studies
Clinical trials and cohort studies
 
Introduction to vision and scope
Introduction to vision and scopeIntroduction to vision and scope
Introduction to vision and scope
 
FAIR for the future: embracing all things data
FAIR for the future: embracing all things dataFAIR for the future: embracing all things data
FAIR for the future: embracing all things data
 
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian DuncanARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
 
Skilling-up-in-research-data-management-20181128
Skilling-up-in-research-data-management-20181128Skilling-up-in-research-data-management-20181128
Skilling-up-in-research-data-management-20181128
 
Research data management and sharing of medical data
Research data management and sharing of medical dataResearch data management and sharing of medical data
Research data management and sharing of medical data
 
Findable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) dataFindable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) data
 
Applying FAIR principles to linked datasets: Opportunities and Challenges
Applying FAIR principles to linked datasets: Opportunities and ChallengesApplying FAIR principles to linked datasets: Opportunities and Challenges
Applying FAIR principles to linked datasets: Opportunities and Challenges
 
How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018
 
Ready, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
Ready, Set, Go! Join the Top 10 FAIR Data Things Global SprintReady, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
Ready, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
 
How FAIR is your data? Copyright, licensing and reuse of data
How FAIR is your data? Copyright, licensing and reuse of dataHow FAIR is your data? Copyright, licensing and reuse of data
How FAIR is your data? Copyright, licensing and reuse of data
 
Peter neish DMPs BoF eResearch 2018
Peter neish DMPs BoF eResearch 2018Peter neish DMPs BoF eResearch 2018
Peter neish DMPs BoF eResearch 2018
 

Recently uploaded

MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxabhijeetpadhi001
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 

Recently uploaded (20)

MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 

Transcript - Trusted Data Repositories - 13 March 2018

  • 1. [Unclear] words are denoted in brackets Webinar: Trusted Data Repositories 13 March 2018 Video & slides available from ANDS website START OF TRANSCRIPT Andrew Treloar: Okay, good afternoon, or good morning, depending on what time zone you're in. Welcome to this webinar on the ANDS - and I'll explain why it's ANDS in a minute, trusted data repository program. My name's Andrew Treloar. I work for ANDS and I was responsible for the trusted data repository program itself. So I'm going to start by providing a bit of context for the program, and an overview of - a very high-level overview of CoreTrustSeal. We then have three case studies. These were three projects that we funded, one from CSIRO, one from the National Imaging Facility and one from the Australian Data Archive, and I'll explain why we picked those and the different perspectives they provide, and then we have a slightly more freewheeling Q&A session at the end, so there should be plenty of time for you to ask any questions that you have. Trusted data repositories was a program that ANDS funded in its 2016/17 annual business plan. Now, 2016/17 seems like a very long time ago now, and a number of these projects - for reasons that they will explain, ran after the end of the 2016/17 financial year, and that's one of the reasons why we're talking about them now.
  • 2. Page 2 of 24 It also means that this was a program that we started when ANDS and Nectar and RDS were largely separate activities. Since that time, ANDS, Nectar and RDS have been progressively aligning what they do, our 2017/18 activities, which are of course still running, are being undertaken under an integrated business plan, and so, while it seems slightly strange to be talking about an ANDS-only program - for me, at least - it is very much something that's been embraced by RDS and Nectar. So this overall trust agenda is very much something that is reflected in the 2017/18 business plan and is tied into a wider concern around research quality and trustedness. So, please see this as entirely consistent with what the three projects, as they come together, care about, even though it started under an ANDS-only umbrella. I was trying to think about the best way to provide some context for the concern with trustedness, and I decided to go back - not to beer - although the beer is relevant - but go back to an article that was very formative for me in the early '90s, written by a guy called John Perry Barlow, who in fact died last month. So he was a guy who was, among other things, a lyricist for the band, the Grateful Dead, but was one of the early thinkers around intellectual property and ideas and wrote a very influential article called The Economy of Ideas, subtitled Everything you know about intellectual property is wrong, whereby other things he distinguished between then are in the contents. So the way I want us to think about trust for the purposes of this seminar, is to distinguish between the contents of a repository and the repository itself. So, what we all want is we want contents - in this case, something that we can consume without fear, and one of the ways that we're happy to, in this case, drink from this delightful glass of beer, is by looking at the container. So the container has some characteristics that make the contents more trustworthy. One of the characteristics - and I hadn't actually thought about this joke until now - one of the characteristics is the seal on top of the
  • 3. Page 3 of 24 container, in this case. You can see that the beer bottle has not been unsealed and so we're prepared to believe that it hasn’t been tampered with between the brewery and getting poured. Another of the elements of the trust that we might have in this particular beer is the label on the bottle. So we look at this and we say, ah yes, I've heard of Batemans, if I think - I think it's what that says. I've heard of Batemans. They produce trustworthy beer. There might be more information on the back. It might say, brewed in - somewhere. So there's some brand information that's associated with the container that leads us to trust it more, and there may even be some provenance information about how the beer was produced, where the ingredients came from, and so on, So the distinction that I want you to take out of this image is not, mm, it's lunchtime - I'd really like a beer now, but that distinction between container and contents. In fact, the ANDS trusted data repository projects were focused on the container. We had a separate set of projects, called trusted research outputs, which were focused much more around the process of producing the outputs and the provenance associated with producing the outputs. In this webinar, we want to focus on the container - the trusted data repository projects. So we selected a small number of pilot projects, which were deliberately designed to cover as wide a range of settings as possible. So they covered a range of disciplines - human and non-human imaging, in the case of NIF, social sciences, mostly quantitative social science in the case of ADA, water and a number of other physical sciences in the case of CSIRO, a number of different organisational settings - in some cases universities, in some cases a national archive, in some cases a research institution, and a number of different provision models - and you'll hear more about each of those in the case studies.
  • 4. Page 4 of 24 The point was to try and get information about what it takes to implement a trusted data repository. Part of what we were doing in the program was we're saying, look, we'd like you to use this - what was then called the data seal of approval - now called the CoreTrustSeal certification as the approach to follow, to determine whether or not this was a trusted data repository. So, for those of you that are unfamiliar with CoreTrustSeal, this started out as a thing called the data seal of approval and was one of a number of certification schemes. If you look at the bottom of this slide, originally it was part of a three-level hierarchy. So the simplest was data seal of approval. Next level up was a German standard, which was looking at standardisation of digital resources, and the most advanced level was the thing that some of you may have heard of, called the [TRACK] criteria, or ISO 16363. What happened was that the World Data System - WDS - worked with the research data alliance via a working group on repository audit and certification, took the DANS data seal of approval, tweaked it a bit, added some additional questions, and essentially agreed on this as what was for a while called the IDAWDS criteria, and then just recently has morphed into an organisation called CoreTrustSeal. So there is now an organisation called CoreTrustSeal.org, which has taken over responsibility for this certification program. It's relatively straightforward at the high level. There are 16 criteria the you use to assess your repository. Some of these are around your organisation and the characteristics of your organisation. Some of these are around the way your repository does digital object management - and you have varying levels of compliance possible for a number of these criteria, so you can assess how well you're doing. In that respect it's a little bit like a maturity model assessment. The thing I want to stress, before we get into the case studies, is that this is not just technology, and in fact, the technology is almost the lease
  • 5. Page 5 of 24 important bit. A lot of it is to do around the organisational processes and the kind of organisation that's standing behind this trusted data repository. You can use the criteria to do a self-assessment, and you hear a number of the presenters talk about a self-assessment, or you can then submit that assessment for certification - so external reviewers will look at that and maybe come back and ask some questions. You'll then get a CoreTrustSeal tick, which will last three years, and there's a business model that sits behind that. You have to pay for that, but you can of course just do the self- assessment as an exercise for yourself at zero cost. So there's more information there and links to the criteria, but if you go to CoreTrustSeal.org that will be enough. That's all I really wanted to provide by way of introduction. What I'd now like to do is pass to the first of our presenters. The first of our presenters is Mikaela Lawrence who would like to talk to us about the CSIRO experience in working with this approach. Mikaela Lawrence: Okay, so today I'm just going to talk about CSIRO's trusted data repository project. So I'll guide you through the aims of the project, some background about the data access portal, the requirements of self-assessment as a TDR, gathering the evidence, applying for certification and also another aspect of our project was looking into hosting externally-owned data. The aims of the trusted data repository project was to investigate certifying the data access portal as the trusted data repository, to develop a plan to implement changes to policies and procedures, to support CSIRO business requirements, and certification. To develop a plan to implement systems changes that may be required to the DAP infrastructure, to engage with external entities to host externally-owned data as a test case, and to prepare an application for certification. This is our data access portal. So just a bit of background information of the repository, and this'll provide some of the context that relates to the first section of the application.
  • 6. Page 6 of 24 The data access portal is currently an institutional repository and when we submitted our application that's what we submitted our application as. Deposit is by self-service and is accessible to CSIRO staff, using their institutional username and log in. We have approximate 2100 publicly available collections and storage of the data is over one petabyte. The subject matter includes a broad range of sciences, with 17 of the 22 fields of research codes represented. The software and storage infrastructure of the DAP - which is what we term our data access portal - are developed and managed by CSIRO information management and technology. We have a data deposit checklist, which ensures depositors can see the key quality and legal issues prior to deposit. A science leader then approves the collection after assessing for quality and legal issues. The repository - we offer a few different curation levels, based on depositor needs. So the content can be distributed as deposited. We may offer some basic curation - brief checking or addition of basic metadata - or enhanced curation - such as conversion to new formats. The community or data users of the data access portal are researchers, industry, policymakers, general public and students. The data users can download the majority of collections without a user log in, and a smaller number of collections will require registration to access the files. So the requirements of self-assessment as a TDR - when we went through the process, to help with understanding - there are 16 requirements of the self-assessment - we read other organisations' applications and considered the evidence they had used. Applications within the CoreTrustSeal are now open, with certified repository applications available on their website. Applications that were useful to us in reading were DANS, as they're part of the secretariat of the CoreTrustSeal, and have been involved in developing their requirements, and also the UK data archives. They had a well- organised application with the detailed evidence.
  • 7. Page 7 of 24 To help with the next step of gathering and determining what evidence to use for CSIRO, an analysis was undertaken of the types of evidence used in a few of the published applications. We've included a list of references we used to inform our understanding of the requirements in the appendix of our report to ANDS. This will be published on their website. There also is a useful extended guidance document and webinar available on the CoreTrustSeal website that discusses the requirements and reviewers' expectations. Gathering the evidence. The certifying body have a preference for evidence that is public, and we found this a major challenge. In this table are some examples of the evidence we used for the first part of the requirements which were organisational related, from requirement one to six. This gives an idea of new evidence we developed, such as the mission statement. Also, the difficulty with providing publicly available evidence. It also provides information about the departments we consulted for expert guidance within our organisation, such as legal, business development and staff from within our own information and technology - information management and technology department. We have attempted to overcome the challenge with providing public evidence with the development of collection development principles, preservation principles and an update of the data management we've got. These provide a summary of the processes, so requirements from seven to 16, which covered digital object management and technology. These public documents are available from the CSIRO DAP help page. The next - what stage are we up to with applying for certification? So the data seal of approval ceased applications in October 2017, and we missed this deadline. However, our application was submitted with the CoreTrustSeal in February 2018, as part of their soft launch to test their system. Processing of our application will begin when the CoreTrustSeal legal entity is finalised. So we're currently waiting to pay the administration fee of €1000 and then our application will be processed.
  • 8. Page 8 of 24 We found getting an account for the application management tool gave access to a staff member, who promptly answered our questions. A word of warning - once an application is submitted, it is locked, however, we found the helpful staff member could amend a small error we had made. One aspect of our project looked at investigating policies, procedures and system changes to host externally-owned data. So why was this part of our strategy? As an organisation, we understand the value of new research possibilities in drawing together research data, produced by organisations beyond CSIRO and across the research community. Also, researchers from our land and water business unit are interested in investigating a trusted repository for water research data. This vision is to bring together nationally significant data from a wide range of organisations for the benefit of industry, policy and research. What did we implement as part of this part of the project? We defined the scope for accepting data in the collection development principles. For example, data should be aligned with CSIRO's function as set out in Section 9 of the Science and Research Industry Act, 1949. Terms and conditions were developed into an agreement to be signed by the depositing organisation, called the Data Deposit Conditions. Some example of the terms and conditions include that data is free from embargo, it has not previously been published with the DOI, data is owned by the depositing organisation, data complies with ethics, privacy, confidentiality, contractual licensing and copyright obligations, and data will have a [CC by] licence applied. A data deposit form was developed for the data depositor, to provide metadata. We developed some procedures for depositing externally-owned data. The DAP is a self-service repository, with access to deposit by CSIRO staff only, so the research data service will liaise with external data owners to facilitate the deposit of data. Then the science - CSIRO science leader
  • 9. Page 9 of 24 with the main knowledge of the data will be the approver of the collection. This is part of the risk management framework that all public data collections in the DAP are subject to. It involves the checkers, the data quality and legal issues prior to publishing. Some future enhancements to the DAP include the ability to customise a collection landing page, such as the addition of logos for external organisations, automation of the data deposit conditions within the existing DAP software and to develop a self-serve deposit interface for external organisations. We found that this project had some immediate benefits for us, such as when applying for recommended repository status with journal publishers and funders, we've found that we had information ready to use to meet those requirements. We've also had enquiries from researchers regarding publishing externally- owned data, and we now have a response with policies and procedures in place. So thank you. There was a lot of people involved in this project within CSIRO - too many to list, but a thank you to all of them as well. Andrew Treloar: Okay, so next, another Andrew - Andrew Mehnert, to talk about the NIF experience. Andrew Mehnert: All right. So I'm going to talk about national trusted data repositories for the National Imaging Facility. My name's Andrew Mehnert. I'm a NIF Informatics Fellow at the Centre for Microscopy, Characterisation and Analysis, at the University of Western Australia. Very quickly, what is NIF? The Australian National Imaging Facility is a $130 million project, providing state of the art imaging capability of animals, plants and materials for the Australian research community. The little map there to the right shows the various nodes of the National Imaging Facility around the country.
  • 10. Page 10 of 24 Now, why is NIF interested in trusted data repositories? Well, the imaging equipment such as MRI, PET, CT scanners are capable of producing vast amounts of valuable research data. So we're interested in maximising those research outcomes, and to do so, the data must be stored securely, it must have its quality verified and should be accessible to the wider research community. From the CoreTrustSeal point of view, why trusted data repositories? Well, firstly, to be able to share data, secondly, to preserve the initial investment in collecting that data, thirdly, to ensure that the data remain useful, and meaningful into the future. The last one, importantly, is that funding authorities are increasingly requiring continued access to data that's produced by projects they fund. All right, now I want to talk specifically about the NIF/RDS/ANDS trusted data repositories project, officially titled Delivering Durable, Reliable, High- Quality Image Data for the National Imaging Facility. Now, broad aim of the project was to enhance quality, durability and reliability of data that is generated by the NIF. By quality, we mean that data has to be captured by what we call the NIF- agreed process. Durable means that the data has to have guaranteed availability for 10 years. Reliable means that the data has to be useful for future researchers. So it has to be stored in one or more open data formats, and with sufficient evidential metadata, so we know how it was created, what the state of the instrument was at the time of creation, and so on. The NIF nodes involved were the University of Western Australia, University of Queensland, University of New South Wales and Monash University. In the project, we limited our scope to MRI data, but essentially, the results are generalisable to other modalities, and in fact we've already progressed to micro CT. Key outcomes from the project include the NIF-agreed process to obtain trusted data from NIF instruments. I'll talk more about that shortly. The
  • 11. Page 11 of 24 second is requirements necessary and sufficient for a basic NIF-trusted data repository service. The third were exemplar repository services across all four participating noes, and then the last one were self-assessments against the core trustworthy data repositories' requirements, from CoreTrustSeal. The NIF-agreed process requiring high quality data - this essentially lists requirements that have to be satisfied to obtain high-quality data - which we call NIF-certified data - this then suitable for ingestion in a NIF trusted data repository service. We mandate that repository data must be organised by project ID, because projects' IDs will persist with time, whereas user IDs don't - users come and go. Now, to be NIF-certified, the data must have been acquired on a NIF- compliant instrument - more about that shortly. It has to possess NIF- minimal metadata - so that includes cross-reference to relevant instrument quality control data. It has to include the native data generated by the instrument in proprietary format and include conversions to one or more open data formats. So the requirements for a NIF trusted data repository service. We drew upon the CoreTrustSeal requirements in the left column that you see there, and additionally added some NIF requirements. One of them you've seen already, the project ID requirement, but we also require an instrument ID requirement, a quality control requirement, authentication by Australian Access Federation requirement, interoperability - that is we should be able to upload data from one repository to another. Redeployability - it should be possible to deploy the service from one NIF node to another - and a service requirement that essentially, we have a help desk responding to requests regarding the repository. So in a nutshell, if we have a look at this diagram. If we concentrate on the right-hand side. If we have - we've got the four sites, UWA, UQ, UNSW and Monash, so TruDat@ that particular site represents the trusted data
  • 12. Page 12 of 24 repository. Login is via the Australian Access Federation, so it means that on any of the sites it will direct you back to your institutional login page and use institutional credentials. As I mentioned before, data sets are organised by project ID. A data set is associated with an instrument, and provided the NIF-agreed process has been followed, then a NIF certification flag, indicating that it is certified, is also included with the data se. The repository has a record for the instrument. The instrument itself is linked to another special project, called the Quality Control Project, and also a handle to a record in Research Data Australia. Looking at the bottom of the screen, you can see Research Data Australia is a data and service discovery portal, provided by ANDS. So we put into that an instrument description that's both hardware and software, and there's a unique handle to that record. If we look at the top left now, at the instrument PC, or client PC, data is uploaded, according to the NIF-agreed process, so the top box above NIF- agreed process, the user data set has to have minimal metadata, has the project I'd, instrument ID, date and time the data was acquired, implicit metadata that’s in the proprietary data, the native data from the instrument and conversions to one or more open data formats. The instrument operator can also upload data to the quality control project, which includes the [quality stamp], quality control standard operating procedure, which of course can be updated over time, and quality control data. So what this means is that when a user uploads data to the repository there's an automatic link to the quality control project, and so it's possible to know the state of the instrument at the time that the data was acquired. This is what the portal looks like for TruDat@UWA. We have based this on the MyTardis platform, which originated at Monash, with several extensions developed during the project, and we use docker technology to be able to
  • 13. Page 13 of 24 easily deploy different sites. So this allows easy instrument integration, simple data sharing and user control publishing data sets. Okay, now I come to the comparison of all the self-assessments against the CoreTrustSeal requirements. All four sites did their own self-assessments for their respective repositories. What we can see here in this table - also this shows the first eight such requirements - is that essentially, we independently arrived at a fairly similar level of assessment, except for the cases there where we marked in blue. So the third one, we talk about continuity of access. Monash here believes that at this point in time that that was not assured, whereas the other three sites did. I should point out this self-assessment is a statement of the reality in the situation at the point in time that the self-assessment was completed. Then there was a difference as well at row four, which is requirement four, confidentiality and ethics. Monash have this fully implemented, whereas the other three sites are in various stages of getting this to be implemented. Then the other differences, with the remaining requirements, some differences with respect to data storage - documented storage procedures, work flows and data discovery and identification. Post-funding. The project hasn’t finished, just because the funding has finished. We intend to maintain the services for 10 years now, and we plan to meet quarterly, to make sure that this happens. We are integrating additional instruments - as I said, we're adding micro CT instruments at the moment. We will create a project web portal, so we have a single landing page for all these trusted data repository services. We're planning new national and international service deployments, including one in Turku, Finland. We're refining and improving the trusted data repository portal and we intend progressing to CoreTrustSeal certification. Very quickly, benefits of the NIF trusted data repository services. For NIF users and the broader community it means reliable, durable access to data,
  • 14. Page 14 of 24 improved reliability of research outputs and provenance associated with it, making NIF data more fair. Easier linkages between publications and data and stronger research partnerships. For NIF it means improved data quality, improved international reputation, ability to run multi-centre trials. For the various research institutions, enhanced reputation management, a means by which to comply with the draft code for responsible research, and the enhanced ability to engage in multi-centre imaging research projects. With that, I thank you. I list on the page here the various project leads at the various nodes. So thank you very much. Andrew Treloar: Okay, thank you Andrew Mehnert. So that's two quite different perspectives on trusted data repositories. The third perspective comes from Heather Leasor and Steve McEachern, from the Australian Data Archive. Heather Leasor: We're at the [centre] archive, which is a social science research data archive. Our mission is to be a national service for the collection and preservation of digital research data, and to make these available to academics, government and other researchers. We hold about 5000 datasets in over 1500 studies, on all areas of social science, from social attitude surveys, censuses, aggregate statistics, administrative data, and many other sources, both qualitative and quantitative. Our data holdings are sourced from academics, government and private sector. We undertook the process with ANDS, as part of the trusted repositories. We originally started under the Data Seal of Approval, before they had actually combined fully with the World Data Service - our systems. Originally, we were the DSA and then we became the DSA/WDS. When we found out that they were moving to the CoreTrustSeal, we delayed our
  • 15. Page 15 of 24 implementation of which guidelines to take. We officially started the DSA/WDS in March of 2017 and submitted our application in April of 2017. We were due to have a review from our reviewers in May, but it didn't actually arrive until August. Then we made our corrections to this. We sent it back in, and then we got another aspect of corrections, did the other round of correction and submitted and finalised in February of 2018. So slightly less than a year's length of process, we are a CoreTrustSeal repository now. We did use the November 2016 WDS - DSA/WDS guidelines, which weren't as detailed as what is given in the CoreTrustSeal, and there were no people to look at for reference - as in Mikaela had said she looked at others for reference - there was no one to look at for reference for this new CoreTrustSeal, so we flew from what people had done in the DSA/WDS and flew blind for a bit. When we want through the process, which was a very useful process for self-assessment, we identified four of the guidelines, which we set at a level three, which is the implement station phase of process, which were data integrity and authenticity, the guideline 10, which is preservation and planning, guideline 15, which is technical infrastructure, and guideline 16. Later, in assessment with one of our reviewers, we also changed guideline nine, which is documented storage, down to a level three. Everything else we had set at a level four, for our repository. Our repository has been around for about 35 years - coming up on 40 years, so we do have quite a few procedures in place. Some of the challenges that we found, doing the CoreTrustSeal process. When we initially undertook it there was no recommendation of what a minimum requirement would be for any of the guidelines, so we didn’t know if we set it at a three, if that meant we wouldn't be able to get a CoreTrustSeal or not. Or if you set it at a one, can you still get a CoreTrustSeal? There doesn’t seem to be a minimum requirement that we
  • 16. Page 16 of 24 have ever found. The extended guidelines do detail things a little bit better nowadays, for those who are undertaking it in the future. We weren't sure if you had to respond to every aspect of all of the sub- questions in a guideline, or just to the overall arching guideline. We also found that there was a complex interplay between the relevant documents required for a guideline and those for other guidelines, so that one document may respond to up to four different guidelines, or it may respond to only one guideline. Also, we found it difficult for providing evidence from documents which are not in the public domain. Like the other two, we had to go through our own websites and find out what we did have, forward-facing, what we had internally-facing and which aspects of those we feel we can now put into an outward-facing website or Wiki page. There should be - all aspects should be outward-facing, but if things had to be inward-facing, there seems to be some basis that the CoreTrustSeal can deal with that. The assessors did not indicate, in our original guidelines, that you had to have a timeline for things that were in process. The new guidelines do state that you have to list a guideline of when you plan to have your implementation in place by, so we had to add that in our final version of when we planned to have these items forward-facing and our new website up and running. We had to come up - we had no idea when we originally started the process, what the process entailed, and what time frames it was going to take. We were unclear if it was going to take a few months or a year. It ended up taking us a year, but the CoreTrustSeal does seem to be coming along as an organisation much better, so that time lines should move a bit quicker now. So, from our experience, we found that doing as Mikaela and Andrew had done, going through and finding out what is in the public domain already,
  • 17. Page 17 of 24 and what can safely be put into the public domain is a good first step for any repository undertaking the CoreTrustSeal. How to cite the items which are out of public domain and private elements is still an area of question, which the CoreTrustSeal is dealing with. We also would like to know how to deal with items that are out of our direct control, such as funding models, infrastructure and governance, being a part of the larger university, or as CSIRO, part of a governmental body, and Andrew being part of multiple institutions, how do you fit into their governance models? How do you fit into the infrastructure, and how is this relayed to the CoreTrustSeal, with these complexities? Also, the risk management section of the CoreTrustSeal we found a bit difficult, because they kept referencing almost ISO standard requirements and to undertake an ISO standard for a risk assessment, to do a base CoreTrustSeal seemed a bit overkill for us. So finding some risk management standards that are free and in the public domain, would be very useful. We actually answered the final one, which is the guidelines are freely available for self-assessment, without paying to obtain a seal. You can just undertake the CoreTrustSeal as a self-assessment for your repository, so you can define what your repository is, where your boundaries are, and undertake the assessment. So in the Australian context - these aren't necessarily only in the Australian context. We did find that they relate to other repositories worldwide, but the complexity of how institutions and repositories - one institution may encompass multiple repositories, or one repository may encompass multiple institutions, and how this affects your governance, your funding, your security and all those aspects, as well as things that are in the national framework. So things that are involved in our national roadmaps - how these play into the CoreTrustSeal, and how they're also out of the control of the individual repositories. Infrastructure frameworks. Infrastructure that is
  • 18. Page 18 of 24 provided by your host institution and the government frameworks of host institutions, which are not easily explainable in a CoreTrustSeal. So these are not necessarily, as I said, Australian specific, but more to do with the repository sector, because the repository sector is a very varied sector, with multiple institutions, multiple repositories, playing different roles. Andrew Treloar: Okay. Thank you, Steve and Heather. We've now heard from three separate experiences of engaging what trusted data repositories and the CoreTrustSeal. We now have 15 minutes or so for questions. The first question is from Nick, who says that they got Data Seal of Approval in 2012, so very early on. Is there any advantage in going through the WDS process? Would any of the three panellists like to weigh in on that one? Steve McEachern: I'll take that one, Andrew. Andrew Treloar: Thanks, Steve. Steve McEachern: My sense would be probably - I mean, the DSA - the three-year certification in and of itself - so it's a question - I mean from the point of view of being ongoing certification, I suppose there's a consideration there. What I would say to you, having been through - as I say, we were familiar with the DSA in its original version and what it morphed into in the CoreTrustSeal - there is probably a heavier expectation on some of the risk management and preservation requirements than there was in the past, and the emphasis has shifted somewhat, I would say. The other point I would probably make is that the review process itself - I mean, we were flying blind, as Heather pointed out. This - our experience is probably not reflective of everyone as a whole. I think the CoreTrustSeal organisation itself was developing, and the reviewing that was going on probably - the reviews and so on were probably a bit different, as they brought together what was - that seal of approval was a social science standard, to begin with, and into the humanities, as Nick's does as well.
  • 19. Page 19 of 24 It has - I think the WDS side of this is more the physical and life - business sciences in particular, or the earth sciences. So there's probably a shift in emphasis there. I think it would be a good experience, but it might be a bit different from what you went through in the DSA experience, is probably how I'd reflect on that. Andrew Treloar: Okay, thanks, Steve. Comments from either of the presenters, if you want to weigh in on that? Okay, no - so it looks like a no. Maybe if I could ask a question that builds on that question from Nick and Steve's answer. Under CoreTrustSeal, the idea is that you would apply for certification, you would get certification. That certification would run for three years. I know that they've talked about a lightweight recertification down the track, if you want to get recertified in three years' time. Would any of the presenters like to comment on the question of the time length for the certification - or rather the expiry time for certification time, and whether that is a reasonable thing to do? Is it - in your view, does it seem sensible that your certification would slowly evaporate over a three-year period and that there be value in applying again in three years' time? Anyone want to… Andrew Mehnert: I guess I might chip in there and say, the amount of effort I guess in getting the original certification through, I'd say it would be worth it to keep this going to the future, and it should be a - three years would seem reasonable, and it should be a fairly lightweight exercise to get that recertification. That's having not yet achieved certification the first time around. Steve McEachern: I would say there, Andrew, I think three years is the right sort of cycle as well, so long as the certification process itself - the time frame shortens. Our application was for April 2017. Our certification was in February 2018. Our certification will end in December 2019. So I think the cycle is right, given the context of what the content is, but I - as I say, I - and so this is, I think, partly a function of the organisation itself evolving and sorting itself out, but
  • 20. Page 20 of 24 the process itself has to speed up somewhat, in order to make that three- year cycle an appropriate one. I think that's the right time frame, but they have to speed up the process. Andrew Treloar: Yeah, that makes sense, and I - sorry, go on. Heather Leasor: I believe that, yeah, as soon as you do have most of your documents together, and you know which ones need to evolve, the recertification should go a bit quicker, because you can just copy and paste, pretty much, and iterate what new developments have happened to your institution or your repository in that time. Andrew Treloar: Yeah, that makes sense. So as in most new things that one does, the first time's a bit painful, and then it gets easier. Heather Leasor: Yeah. Andrew Treloar: Two questions from Carmella. The first question is, what is ANDS' long-term plan to include university repositories to have all the university repositories meet the CoreTrustSeal? That is an interesting combination of issues there. Firstly, it would now be ANDS/Nectar/RDS, as we continue to merge towards a new organisation, and the ANDS/Nectar/RDS is not really in a position to require university repositories to do anything. University repositories will apply for CoreTrustSeal if they see value in doing it themselves. In the case of the projects that presented here, we provided some funding to help ADA and NIF and CSIRO do something. That's something that they really wanted to do anyway. So I don't think that - yeah, I think it's going to depend on the drivers for the individual repositories as to whether they see value in this. Then the second question was on the duration of the data to be preserved. Carmella's comment there was, 10 years of data preservation seems like it's a relatively short period of time, especially in the case of clinical trials. I'll leave it to the three presenters to comment on that, but I would just say that I suspect the 10 years is a consensus number and is not a, you have to throw it away after 10 years number. It's a, at least that number. I'm pretty
  • 21. Page 21 of 24 sure the NHMRC Australian code for responsible conduct of research says either seven or 10, so it's not inconsistent with that - but would any of the presenters like to weigh in on this subject of 10 years, before I move to the next question? Andrew Mehnert: I might answer first, if I can? I think you're right - NHMRC ARC requirement's for seven years for retaining data. The figure of 10 was essentially in the original research proposal as something reasonable each of the NIF nodes and the associated institutions were happy to support, but that doesn’t mean we won't support beyond the 10-year period. That was just something in consensus we agreed that collectively we could do, to ensure. 10 years is a long time to guarantee a service is running, but the plan at each of the nodes is that we would continue to the future. The 10 years is proof of concept that we can indeed do this over a long period of time. Andrew Treloar: Yeah. Andrew Mehnert: In the case of NIF, this was establishing some new repository services, so that's been a challenge unto itself, and guaranteeing 10 years of running service is no mean feat. Andrew Treloar: Yeah, 10 years sounds like a long time. Graham Galloway: It's Graham - Graham Galloway here, who was the lead for that, the NIF trusted data repository. I think one of the things we need to recognise that over 10 years the nature of the repository is going to change. We don't know where we're going to be storing data in 10 years. So for the institutions to guarantee more than 10 years at this point in time is going to be difficult. National Imaging Facility, as Andrew has already said, is committed to providing mechanisms for data storage, but we could - in 10 years' those repositories could be existing on Amazon or on other publicly - domain services. So, we - yeah, we want a commitment from the partners at the time we signed the contract that they would guarantee to maintain that storage for that 10 years, but we're committed to looking at the mechanisms beyond
  • 22. Page 22 of 24 that, to ensure - and then of course, then you've got to look at migration of data between those repositories, and that's an issue we'll have to address. Andrew Treloar: Yeah, indeed. For the benefit of those people who are unfamiliar with the august presence of Graham Galloway. He is the director of the National Imaging Facility. Steve McEachern: Andrew, just on that - Graham, can I just make a quick comment which is, this is one of the things that we were referencing at the end here, which actually creates the complexity of responding to the guidelines, in terms of saying, how long will you maintain a service, or how long - you're making - potentially making commitments you can't realistically fulfil. Andrew Treloar: Yeah. Steve: In some ways some of the expectations in the seal needed to account for that, I think, a little bit more than they probably did. It's unrealistic for us to say that we - yeah, other than being able to say, look, we've been running for 35 years - we'll probably still be here in 10 years' time. But as I say, in terms of [IFA] commitment, I could say the ANU will be here in 10 years' time. Or if I'm the National Archives, I can say that. There aren't many organisations who'd actually make such claims - or parts of organisations. Andrew Treloar: Yeah, that's actually a really good point, Steve, that I think the Data Seal of Approval certainly came out of one environment where they - the players involved at the time were largely national archives and they saw the world through that set of lenses - we've been around forever and we're going to be around forever. Trying to apply that as you move out into the wider data repository space - it gets harder and harder to make those kinds of commitments. Steve McEachern: Yeah. Andrew Treloar: I realise we're running close on time, so I might just skip over an observation, and maybe finish with this question, which is, what are the costs, in terms of human and non-human resource, to get certification? Did
  • 23. Page 23 of 24 any of you try and add up how much time and effort you put into this? Or did you choose not to, because the number was just going to be too scary? Steve McEachern: Well, we can say, in terms of human - well, one of the human resources in this room - so Heather - as I say, part of our funding was for Heather to contribute to that. I'd say that you were working a fair chunk of about nine months, probably, to put that together. There's a reasonable proportion of my time, as well - not at that level, but probably half a day a week, for several months - and bits and pieces of other parts of our organisation as well. There aren't too many non-human resources, because we were certifying the existing facility, so realistically, it was a document-gathering or creation exercise. So it really was primarily staff time that was the involvement there, plus a little bit of developer - development, but not much. I think the experience will be different in Andrew's case in particular, where you are - where it'll be a new service. So to reflect on - to certify the existing service, it really is staff time. It depends on how good your documentation is already. In that way, it actually is a useful exercise, that experience, because it reminds you of what you haven't done. So as I say, there is real value in that, but as I said, there was a - there is time that's involved in that as well. Andrew Treloar: Yeah, all right. Thank you. Mikaela, is it possible for you to respond to that? Mikaela Lawrence: Yes. Certainly we had similar commitment as the ADA. We had myself and another data librarian working on this and then given that we were looking at hosting externally available data, we also had some of our researchers talking to external organisations as well. However to cost time is a difficult thing. Also, our legal counsel also put in a significant time and effort into developing new procedures and policies for hosting externally-owned data. Andrew Treloar: Okay, thank you. Andrew Mehnert, did you want to weigh in on that?
  • 24. Page 24 of 24 Andrew Mehnert: Yes. So similar experiences. We had the challenge, I guess, of having four different sites. We had a project manager at each site and a little team around that project manager, to address some of these issues - talk to IT services, talk to library services, to resolve some of the question. Then, as the overall project manager, I was consulting with each of the other project managers and trying to come to consensus - even talking to yourself, Andrew, to understand some of the questions and how to respond, and I guess at the end of the day it was giving an honest response, with the best information you have at hand, and that is the honest status of your - how you've addressed each of the requirements. Andrew Treloar: Yeah. All right, thank you, and I'm afraid we're going to have to leave it there. We're over time. My apologies to those people who have questions in the question panel that we haven't got to yet, but my - in particular, my thanks to the presenters - to Mikaela, to Heather, to Steve, to Andrew Mehnert and to Graham for weighing in. Thank you for sharing your experiences with us. I hope that's been of benefit to the community. We look forward to seeing or hearing from some of you on our next webinar. Thank you all. END OF TRANSCRIPT