Transcript - Trusted Data Repositories - 13 March 2018
[Unclear] words are denoted in brackets
Webinar: Trusted Data Repositories
13 March 2018
Video & slides available from ANDS website
START OF TRANSCRIPT
Andrew Treloar: Okay, good afternoon, or good morning, depending on what time zone
you're in. Welcome to this webinar on the ANDS - and I'll explain why it's
ANDS in a minute, trusted data repository program. My name's Andrew
Treloar. I work for ANDS and I was responsible for the trusted data
repository program itself.
So I'm going to start by providing a bit of context for the program, and an
overview of - a very high-level overview of CoreTrustSeal. We then have
three case studies. These were three projects that we funded, one from
CSIRO, one from the National Imaging Facility and one from the Australian
Data Archive, and I'll explain why we picked those and the different
perspectives they provide, and then we have a slightly more freewheeling
Q&A session at the end, so there should be plenty of time for you to ask any
questions that you have.
Trusted data repositories was a program that ANDS funded in its 2016/17
annual business plan. Now, 2016/17 seems like a very long time ago now,
and a number of these projects - for reasons that they will explain, ran after
the end of the 2016/17 financial year, and that's one of the reasons why
we're talking about them now.
Page 2 of 24
It also means that this was a program that we started when ANDS and
Nectar and RDS were largely separate activities. Since that time, ANDS,
Nectar and RDS have been progressively aligning what they do, our 2017/18
activities, which are of course still running, are being undertaken under an
integrated business plan, and so, while it seems slightly strange to be talking
about an ANDS-only program - for me, at least - it is very much something
that's been embraced by RDS and Nectar. So this overall trust agenda is very
much something that is reflected in the 2017/18 business plan and is tied
into a wider concern around research quality and trustedness.
So, please see this as entirely consistent with what the three projects, as
they come together, care about, even though it started under an ANDS-only
I was trying to think about the best way to provide some context for the
concern with trustedness, and I decided to go back - not to beer - although
the beer is relevant - but go back to an article that was very formative for
me in the early '90s, written by a guy called John Perry Barlow, who in fact
died last month.
So he was a guy who was, among other things, a lyricist for the band, the
Grateful Dead, but was one of the early thinkers around intellectual
property and ideas and wrote a very influential article called The Economy of
Ideas, subtitled Everything you know about intellectual property is wrong,
whereby other things he distinguished between then are in the contents.
So the way I want us to think about trust for the purposes of this seminar, is
to distinguish between the contents of a repository and the repository itself.
So, what we all want is we want contents - in this case, something that we
can consume without fear, and one of the ways that we're happy to, in this
case, drink from this delightful glass of beer, is by looking at the container.
So the container has some characteristics that make the contents more
trustworthy. One of the characteristics - and I hadn't actually thought about
this joke until now - one of the characteristics is the seal on top of the
Page 3 of 24
container, in this case. You can see that the beer bottle has not been
unsealed and so we're prepared to believe that it hasn’t been tampered
with between the brewery and getting poured.
Another of the elements of the trust that we might have in this particular
beer is the label on the bottle. So we look at this and we say, ah yes, I've
heard of Batemans, if I think - I think it's what that says. I've heard of
Batemans. They produce trustworthy beer. There might be more
information on the back. It might say, brewed in - somewhere. So there's
some brand information that's associated with the container that leads us to
trust it more, and there may even be some provenance information about
how the beer was produced, where the ingredients came from, and so on,
So the distinction that I want you to take out of this image is not, mm, it's
lunchtime - I'd really like a beer now, but that distinction between container
In fact, the ANDS trusted data repository projects were focused on the
container. We had a separate set of projects, called trusted research
outputs, which were focused much more around the process of producing
the outputs and the provenance associated with producing the outputs. In
this webinar, we want to focus on the container - the trusted data
So we selected a small number of pilot projects, which were deliberately
designed to cover as wide a range of settings as possible. So they covered a
range of disciplines - human and non-human imaging, in the case of NIF,
social sciences, mostly quantitative social science in the case of ADA, water
and a number of other physical sciences in the case of CSIRO, a number of
different organisational settings - in some cases universities, in some cases a
national archive, in some cases a research institution, and a number of
different provision models - and you'll hear more about each of those in the
Page 4 of 24
The point was to try and get information about what it takes to implement a
trusted data repository.
Part of what we were doing in the program was we're saying, look, we'd like
you to use this - what was then called the data seal of approval - now called
the CoreTrustSeal certification as the approach to follow, to determine
whether or not this was a trusted data repository.
So, for those of you that are unfamiliar with CoreTrustSeal, this started out
as a thing called the data seal of approval and was one of a number of
certification schemes. If you look at the bottom of this slide, originally it was
part of a three-level hierarchy. So the simplest was data seal of approval.
Next level up was a German standard, which was looking at standardisation
of digital resources, and the most advanced level was the thing that some of
you may have heard of, called the [TRACK] criteria, or ISO 16363.
What happened was that the World Data System - WDS - worked with the
research data alliance via a working group on repository audit and
certification, took the DANS data seal of approval, tweaked it a bit, added
some additional questions, and essentially agreed on this as what was for a
while called the IDAWDS criteria, and then just recently has morphed into an
organisation called CoreTrustSeal. So there is now an organisation called
CoreTrustSeal.org, which has taken over responsibility for this certification
It's relatively straightforward at the high level. There are 16 criteria the you
use to assess your repository. Some of these are around your organisation
and the characteristics of your organisation. Some of these are around the
way your repository does digital object management - and you have varying
levels of compliance possible for a number of these criteria, so you can
assess how well you're doing. In that respect it's a little bit like a maturity
The thing I want to stress, before we get into the case studies, is that this is
not just technology, and in fact, the technology is almost the lease
Page 5 of 24
important bit. A lot of it is to do around the organisational processes and
the kind of organisation that's standing behind this trusted data repository.
You can use the criteria to do a self-assessment, and you hear a number of
the presenters talk about a self-assessment, or you can then submit that
assessment for certification - so external reviewers will look at that and
maybe come back and ask some questions. You'll then get a CoreTrustSeal
tick, which will last three years, and there's a business model that sits
behind that. You have to pay for that, but you can of course just do the self-
assessment as an exercise for yourself at zero cost.
So there's more information there and links to the criteria, but if you go to
CoreTrustSeal.org that will be enough.
That's all I really wanted to provide by way of introduction. What I'd now
like to do is pass to the first of our presenters. The first of our presenters is
Mikaela Lawrence who would like to talk to us about the CSIRO experience
in working with this approach.
Mikaela Lawrence: Okay, so today I'm just going to talk about CSIRO's trusted data repository
project. So I'll guide you through the aims of the project, some background
about the data access portal, the requirements of self-assessment as a TDR,
gathering the evidence, applying for certification and also another aspect of
our project was looking into hosting externally-owned data.
The aims of the trusted data repository project was to investigate certifying
the data access portal as the trusted data repository, to develop a plan to
implement changes to policies and procedures, to support CSIRO business
requirements, and certification. To develop a plan to implement systems
changes that may be required to the DAP infrastructure, to engage with
external entities to host externally-owned data as a test case, and to
prepare an application for certification.
This is our data access portal. So just a bit of background information of the
repository, and this'll provide some of the context that relates to the first
section of the application.
Page 6 of 24
The data access portal is currently an institutional repository and when we
submitted our application that's what we submitted our application as.
Deposit is by self-service and is accessible to CSIRO staff, using their
institutional username and log in. We have approximate 2100 publicly
available collections and storage of the data is over one petabyte.
The subject matter includes a broad range of sciences, with 17 of the 22
fields of research codes represented.
The software and storage infrastructure of the DAP - which is what we term
our data access portal - are developed and managed by CSIRO information
management and technology. We have a data deposit checklist, which
ensures depositors can see the key quality and legal issues prior to deposit.
A science leader then approves the collection after assessing for quality and
The repository - we offer a few different curation levels, based on depositor
needs. So the content can be distributed as deposited. We may offer some
basic curation - brief checking or addition of basic metadata - or enhanced
curation - such as conversion to new formats.
The community or data users of the data access portal are researchers,
industry, policymakers, general public and students. The data users can
download the majority of collections without a user log in, and a smaller
number of collections will require registration to access the files.
So the requirements of self-assessment as a TDR - when we went through
the process, to help with understanding - there are 16 requirements of the
self-assessment - we read other organisations' applications and considered
the evidence they had used. Applications within the CoreTrustSeal are now
open, with certified repository applications available on their website.
Applications that were useful to us in reading were DANS, as they're part of
the secretariat of the CoreTrustSeal, and have been involved in developing
their requirements, and also the UK data archives. They had a well-
organised application with the detailed evidence.
Page 7 of 24
To help with the next step of gathering and determining what evidence to
use for CSIRO, an analysis was undertaken of the types of evidence used in a
few of the published applications. We've included a list of references we
used to inform our understanding of the requirements in the appendix of
our report to ANDS. This will be published on their website.
There also is a useful extended guidance document and webinar available
on the CoreTrustSeal website that discusses the requirements and
Gathering the evidence. The certifying body have a preference for evidence
that is public, and we found this a major challenge. In this table are some
examples of the evidence we used for the first part of the requirements
which were organisational related, from requirement one to six. This gives
an idea of new evidence we developed, such as the mission statement. Also,
the difficulty with providing publicly available evidence. It also provides
information about the departments we consulted for expert guidance within
our organisation, such as legal, business development and staff from within
our own information and technology - information management and
We have attempted to overcome the challenge with providing public
evidence with the development of collection development principles,
preservation principles and an update of the data management we've got.
These provide a summary of the processes, so requirements from seven to
16, which covered digital object management and technology. These public
documents are available from the CSIRO DAP help page.
The next - what stage are we up to with applying for certification? So the
data seal of approval ceased applications in October 2017, and we missed
this deadline. However, our application was submitted with the
CoreTrustSeal in February 2018, as part of their soft launch to test their
system. Processing of our application will begin when the CoreTrustSeal
legal entity is finalised. So we're currently waiting to pay the administration
fee of €1000 and then our application will be processed.
Page 8 of 24
We found getting an account for the application management tool gave
access to a staff member, who promptly answered our questions.
A word of warning - once an application is submitted, it is locked, however,
we found the helpful staff member could amend a small error we had made.
One aspect of our project looked at investigating policies, procedures and
system changes to host externally-owned data.
So why was this part of our strategy? As an organisation, we understand the
value of new research possibilities in drawing together research data,
produced by organisations beyond CSIRO and across the research
community. Also, researchers from our land and water business unit are
interested in investigating a trusted repository for water research data. This
vision is to bring together nationally significant data from a wide range of
organisations for the benefit of industry, policy and research.
What did we implement as part of this part of the project? We defined the
scope for accepting data in the collection development principles. For
example, data should be aligned with CSIRO's function as set out in Section 9
of the Science and Research Industry Act, 1949.
Terms and conditions were developed into an agreement to be signed by
the depositing organisation, called the Data Deposit Conditions. Some
example of the terms and conditions include that data is free from embargo,
it has not previously been published with the DOI, data is owned by the
depositing organisation, data complies with ethics, privacy, confidentiality,
contractual licensing and copyright obligations, and data will have a [CC by]
A data deposit form was developed for the data depositor, to provide
metadata. We developed some procedures for depositing externally-owned
The DAP is a self-service repository, with access to deposit by CSIRO staff
only, so the research data service will liaise with external data owners to
facilitate the deposit of data. Then the science - CSIRO science leader
Page 9 of 24
with the main knowledge of the data will be the approver of the collection.
This is part of the risk management framework that all public data
collections in the DAP are subject to. It involves the checkers, the data
quality and legal issues prior to publishing.
Some future enhancements to the DAP include the ability to customise a
collection landing page, such as the addition of logos for external
organisations, automation of the data deposit conditions within the existing
DAP software and to develop a self-serve deposit interface for external
We found that this project had some immediate benefits for us, such as
when applying for recommended repository status with journal publishers
and funders, we've found that we had information ready to use to meet
We've also had enquiries from researchers regarding publishing externally-
owned data, and we now have a response with policies and procedures in
So thank you. There was a lot of people involved in this project within CSIRO
- too many to list, but a thank you to all of them as well.
Andrew Treloar: Okay, so next, another Andrew - Andrew Mehnert, to talk about the NIF
Andrew Mehnert: All right. So I'm going to talk about national trusted data repositories for the
National Imaging Facility. My name's Andrew Mehnert. I'm a NIF Informatics
Fellow at the Centre for Microscopy, Characterisation and Analysis, at the
University of Western Australia.
Very quickly, what is NIF? The Australian National Imaging Facility is a $130
million project, providing state of the art imaging capability of animals,
plants and materials for the Australian research community. The little map
there to the right shows the various nodes of the National Imaging Facility
around the country.
Page 10 of 24
Now, why is NIF interested in trusted data repositories? Well, the imaging
equipment such as MRI, PET, CT scanners are capable of producing vast
amounts of valuable research data. So we're interested in maximising those
research outcomes, and to do so, the data must be stored securely, it must
have its quality verified and should be accessible to the wider research
From the CoreTrustSeal point of view, why trusted data repositories? Well,
firstly, to be able to share data, secondly, to preserve the initial investment
in collecting that data, thirdly, to ensure that the data remain useful, and
meaningful into the future. The last one, importantly, is that funding
authorities are increasingly requiring continued access to data that's
produced by projects they fund.
All right, now I want to talk specifically about the NIF/RDS/ANDS trusted
data repositories project, officially titled Delivering Durable, Reliable, High-
Quality Image Data for the National Imaging Facility.
Now, broad aim of the project was to enhance quality, durability and
reliability of data that is generated by the NIF.
By quality, we mean that data has to be captured by what we call the NIF-
agreed process. Durable means that the data has to have guaranteed
availability for 10 years. Reliable means that the data has to be useful for
future researchers. So it has to be stored in one or more open data formats,
and with sufficient evidential metadata, so we know how it was created,
what the state of the instrument was at the time of creation, and so on.
The NIF nodes involved were the University of Western Australia, University
of Queensland, University of New South Wales and Monash University.
In the project, we limited our scope to MRI data, but essentially, the results
are generalisable to other modalities, and in fact we've already progressed
to micro CT.
Key outcomes from the project include the NIF-agreed process to obtain
trusted data from NIF instruments. I'll talk more about that shortly. The
Page 11 of 24
second is requirements necessary and sufficient for a basic NIF-trusted data
repository service. The third were exemplar repository services across all
four participating noes, and then the last one were self-assessments against
the core trustworthy data repositories' requirements, from CoreTrustSeal.
The NIF-agreed process requiring high quality data - this essentially lists
requirements that have to be satisfied to obtain high-quality data - which
we call NIF-certified data - this then suitable for ingestion in a NIF trusted
data repository service.
We mandate that repository data must be organised by project ID, because
projects' IDs will persist with time, whereas user IDs don't - users come and
Now, to be NIF-certified, the data must have been acquired on a NIF-
compliant instrument - more about that shortly. It has to possess NIF-
minimal metadata - so that includes cross-reference to relevant instrument
quality control data. It has to include the native data generated by the
instrument in proprietary format and include conversions to one or more
open data formats.
So the requirements for a NIF trusted data repository service. We drew
upon the CoreTrustSeal requirements in the left column that you see there,
and additionally added some NIF requirements. One of them you've seen
already, the project ID requirement, but we also require an instrument ID
requirement, a quality control requirement, authentication by Australian
Access Federation requirement, interoperability - that is we should be able
to upload data from one repository to another. Redeployability - it should be
possible to deploy the service from one NIF node to another - and a service
requirement that essentially, we have a help desk responding to requests
regarding the repository.
So in a nutshell, if we have a look at this diagram. If we concentrate on the
right-hand side. If we have - we've got the four sites, UWA, UQ, UNSW and
Monash, so TruDat@ that particular site represents the trusted data
Page 12 of 24
repository. Login is via the Australian Access Federation, so it means that on
any of the sites it will direct you back to your institutional login page and use
As I mentioned before, data sets are organised by project ID. A data set is
associated with an instrument, and provided the NIF-agreed process has
been followed, then a NIF certification flag, indicating that it is certified, is
also included with the data se.
The repository has a record for the instrument. The instrument itself is
linked to another special project, called the Quality Control Project, and also
a handle to a record in Research Data Australia.
Looking at the bottom of the screen, you can see Research Data Australia is
a data and service discovery portal, provided by ANDS. So we put into that
an instrument description that's both hardware and software, and there's a
unique handle to that record.
If we look at the top left now, at the instrument PC, or client PC, data is
uploaded, according to the NIF-agreed process, so the top box above NIF-
agreed process, the user data set has to have minimal metadata, has the
project I'd, instrument ID, date and time the data was acquired, implicit
metadata that’s in the proprietary data, the native data from the instrument
and conversions to one or more open data formats.
The instrument operator can also upload data to the quality control project,
which includes the [quality stamp], quality control standard operating
procedure, which of course can be updated over time, and quality control
data. So what this means is that when a user uploads data to the repository
there's an automatic link to the quality control project, and so it's possible
to know the state of the instrument at the time that the data was acquired.
This is what the portal looks like for TruDat@UWA. We have based this on
the MyTardis platform, which originated at Monash, with several extensions
developed during the project, and we use docker technology to be able to
Page 13 of 24
easily deploy different sites. So this allows easy instrument integration,
simple data sharing and user control publishing data sets.
Okay, now I come to the comparison of all the self-assessments against the
CoreTrustSeal requirements. All four sites did their own self-assessments for
their respective repositories. What we can see here in this table - also this
shows the first eight such requirements - is that essentially, we
independently arrived at a fairly similar level of assessment, except for the
cases there where we marked in blue.
So the third one, we talk about continuity of access. Monash here believes
that at this point in time that that was not assured, whereas the other three
sites did. I should point out this self-assessment is a statement of the reality
in the situation at the point in time that the self-assessment was completed.
Then there was a difference as well at row four, which is requirement four,
confidentiality and ethics. Monash have this fully implemented, whereas the
other three sites are in various stages of getting this to be implemented.
Then the other differences, with the remaining requirements, some
differences with respect to data storage - documented storage procedures,
work flows and data discovery and identification.
Post-funding. The project hasn’t finished, just because the funding has
finished. We intend to maintain the services for 10 years now, and we plan
to meet quarterly, to make sure that this happens. We are integrating
additional instruments - as I said, we're adding micro CT instruments at the
moment. We will create a project web portal, so we have a single landing
page for all these trusted data repository services.
We're planning new national and international service deployments,
including one in Turku, Finland. We're refining and improving the trusted
data repository portal and we intend progressing to CoreTrustSeal
Very quickly, benefits of the NIF trusted data repository services. For NIF
users and the broader community it means reliable, durable access to data,
Page 14 of 24
improved reliability of research outputs and provenance associated with it,
making NIF data more fair. Easier linkages between publications and data
and stronger research partnerships.
For NIF it means improved data quality, improved international reputation,
ability to run multi-centre trials.
For the various research institutions, enhanced reputation management, a
means by which to comply with the draft code for responsible research, and
the enhanced ability to engage in multi-centre imaging research projects.
With that, I thank you. I list on the page here the various project leads at the
various nodes. So thank you very much.
Andrew Treloar: Okay, thank you Andrew Mehnert. So that's two quite different perspectives
on trusted data repositories. The third perspective comes from Heather
Leasor and Steve McEachern, from the Australian Data Archive.
Heather Leasor: We're at the [centre] archive, which is a social science research data archive.
Our mission is to be a national service for the collection and preservation of
digital research data, and to make these available to academics, government
and other researchers.
We hold about 5000 datasets in over 1500 studies, on all areas of social
science, from social attitude surveys, censuses, aggregate statistics,
administrative data, and many other sources, both qualitative and
Our data holdings are sourced from academics, government and private
We undertook the process with ANDS, as part of the trusted repositories.
We originally started under the Data Seal of Approval, before they had
actually combined fully with the World Data Service - our systems.
Originally, we were the DSA and then we became the DSA/WDS. When we
found out that they were moving to the CoreTrustSeal, we delayed our
Page 15 of 24
implementation of which guidelines to take. We officially started the
DSA/WDS in March of 2017 and submitted our application in April of 2017.
We were due to have a review from our reviewers in May, but it didn't
actually arrive until August. Then we made our corrections to this. We sent
it back in, and then we got another aspect of corrections, did the other
round of correction and submitted and finalised in February of 2018. So
slightly less than a year's length of process, we are a CoreTrustSeal
We did use the November 2016 WDS - DSA/WDS guidelines, which weren't
as detailed as what is given in the CoreTrustSeal, and there were no people
to look at for reference - as in Mikaela had said she looked at others for
reference - there was no one to look at for reference for this new
CoreTrustSeal, so we flew from what people had done in the DSA/WDS and
flew blind for a bit.
When we want through the process, which was a very useful process for
self-assessment, we identified four of the guidelines, which we set at a level
three, which is the implement station phase of process, which were data
integrity and authenticity, the guideline 10, which is preservation and
planning, guideline 15, which is technical infrastructure, and guideline 16.
Later, in assessment with one of our reviewers, we also changed guideline
nine, which is documented storage, down to a level three. Everything else
we had set at a level four, for our repository.
Our repository has been around for about 35 years - coming up on 40 years,
so we do have quite a few procedures in place.
Some of the challenges that we found, doing the CoreTrustSeal process.
When we initially undertook it there was no recommendation of what a
minimum requirement would be for any of the guidelines, so we didn’t
know if we set it at a three, if that meant we wouldn't be able to get a
CoreTrustSeal or not. Or if you set it at a one, can you still get a
CoreTrustSeal? There doesn’t seem to be a minimum requirement that we
Page 16 of 24
have ever found. The extended guidelines do detail things a little bit better
nowadays, for those who are undertaking it in the future.
We weren't sure if you had to respond to every aspect of all of the sub-
questions in a guideline, or just to the overall arching guideline.
We also found that there was a complex interplay between the relevant
documents required for a guideline and those for other guidelines, so that
one document may respond to up to four different guidelines, or it may
respond to only one guideline.
Also, we found it difficult for providing evidence from documents which are
not in the public domain. Like the other two, we had to go through our own
websites and find out what we did have, forward-facing, what we had
internally-facing and which aspects of those we feel we can now put into an
outward-facing website or Wiki page.
There should be - all aspects should be outward-facing, but if things had to
be inward-facing, there seems to be some basis that the CoreTrustSeal can
deal with that.
The assessors did not indicate, in our original guidelines, that you had to
have a timeline for things that were in process. The new guidelines do state
that you have to list a guideline of when you plan to have your
implementation in place by, so we had to add that in our final version of
when we planned to have these items forward-facing and our new website
up and running. We had to come up - we had no idea when we originally
started the process, what the process entailed, and what time frames it was
going to take. We were unclear if it was going to take a few months or a
year. It ended up taking us a year, but the CoreTrustSeal does seem to be
coming along as an organisation much better, so that time lines should
move a bit quicker now.
So, from our experience, we found that doing as Mikaela and Andrew had
done, going through and finding out what is in the public domain already,
Page 17 of 24
and what can safely be put into the public domain is a good first step for any
repository undertaking the CoreTrustSeal.
How to cite the items which are out of public domain and private elements
is still an area of question, which the CoreTrustSeal is dealing with.
We also would like to know how to deal with items that are out of our direct
control, such as funding models, infrastructure and governance, being a part
of the larger university, or as CSIRO, part of a governmental body, and
Andrew being part of multiple institutions, how do you fit into their
governance models? How do you fit into the infrastructure, and how is this
relayed to the CoreTrustSeal, with these complexities?
Also, the risk management section of the CoreTrustSeal we found a bit
difficult, because they kept referencing almost ISO standard requirements
and to undertake an ISO standard for a risk assessment, to do a base
CoreTrustSeal seemed a bit overkill for us. So finding some risk management
standards that are free and in the public domain, would be very useful.
We actually answered the final one, which is the guidelines are freely
available for self-assessment, without paying to obtain a seal. You can just
undertake the CoreTrustSeal as a self-assessment for your repository, so you
can define what your repository is, where your boundaries are, and
undertake the assessment.
So in the Australian context - these aren't necessarily only in the Australian
context. We did find that they relate to other repositories worldwide, but
the complexity of how institutions and repositories - one institution may
encompass multiple repositories, or one repository may encompass multiple
institutions, and how this affects your governance, your funding, your
security and all those aspects, as well as things that are in the national
framework. So things that are involved in our national roadmaps - how
these play into the CoreTrustSeal, and how they're also out of the control of
the individual repositories. Infrastructure frameworks. Infrastructure that is
Page 18 of 24
provided by your host institution and the government frameworks of host
institutions, which are not easily explainable in a CoreTrustSeal.
So these are not necessarily, as I said, Australian specific, but more to do
with the repository sector, because the repository sector is a very varied
sector, with multiple institutions, multiple repositories, playing different
Andrew Treloar: Okay. Thank you, Steve and Heather. We've now heard from three separate
experiences of engaging what trusted data repositories and the
CoreTrustSeal. We now have 15 minutes or so for questions.
The first question is from Nick, who says that they got Data Seal of Approval
in 2012, so very early on. Is there any advantage in going through the WDS
process? Would any of the three panellists like to weigh in on that one?
Steve McEachern: I'll take that one, Andrew.
Andrew Treloar: Thanks, Steve.
Steve McEachern: My sense would be probably - I mean, the DSA - the three-year certification
in and of itself - so it's a question - I mean from the point of view of being
ongoing certification, I suppose there's a consideration there.
What I would say to you, having been through - as I say, we were familiar
with the DSA in its original version and what it morphed into in the
CoreTrustSeal - there is probably a heavier expectation on some of the risk
management and preservation requirements than there was in the past, and
the emphasis has shifted somewhat, I would say.
The other point I would probably make is that the review process itself - I
mean, we were flying blind, as Heather pointed out. This - our experience is
probably not reflective of everyone as a whole. I think the CoreTrustSeal
organisation itself was developing, and the reviewing that was going on
probably - the reviews and so on were probably a bit different, as they
brought together what was - that seal of approval was a social science
standard, to begin with, and into the humanities, as Nick's does as well.
Page 19 of 24
It has - I think the WDS side of this is more the physical and life - business
sciences in particular, or the earth sciences. So there's probably a shift in
emphasis there. I think it would be a good experience, but it might be a bit
different from what you went through in the DSA experience, is probably
how I'd reflect on that.
Andrew Treloar: Okay, thanks, Steve. Comments from either of the presenters, if you want to
weigh in on that?
Okay, no - so it looks like a no. Maybe if I could ask a question that builds on
that question from Nick and Steve's answer. Under CoreTrustSeal, the idea
is that you would apply for certification, you would get certification. That
certification would run for three years. I know that they've talked about a
lightweight recertification down the track, if you want to get recertified in
three years' time. Would any of the presenters like to comment on the
question of the time length for the certification - or rather the expiry time
for certification time, and whether that is a reasonable thing to do? Is it - in
your view, does it seem sensible that your certification would slowly
evaporate over a three-year period and that there be value in applying again
in three years' time? Anyone want to…
Andrew Mehnert: I guess I might chip in there and say, the amount of effort I guess in getting
the original certification through, I'd say it would be worth it to keep this
going to the future, and it should be a - three years would seem reasonable,
and it should be a fairly lightweight exercise to get that recertification.
That's having not yet achieved certification the first time around.
Steve McEachern: I would say there, Andrew, I think three years is the right sort of cycle as
well, so long as the certification process itself - the time frame shortens. Our
application was for April 2017. Our certification was in February 2018. Our
certification will end in December 2019. So I think the cycle is right, given
the context of what the content is, but I - as I say, I - and so this is, I think,
partly a function of the organisation itself evolving and sorting itself out, but
Page 20 of 24
the process itself has to speed up somewhat, in order to make that three-
year cycle an appropriate one.
I think that's the right time frame, but they have to speed up the process.
Andrew Treloar: Yeah, that makes sense, and I - sorry, go on.
Heather Leasor: I believe that, yeah, as soon as you do have most of your documents
together, and you know which ones need to evolve, the recertification
should go a bit quicker, because you can just copy and paste, pretty much,
and iterate what new developments have happened to your institution or
your repository in that time.
Andrew Treloar: Yeah, that makes sense. So as in most new things that one does, the first
time's a bit painful, and then it gets easier.
Heather Leasor: Yeah.
Andrew Treloar: Two questions from Carmella. The first question is, what is ANDS' long-term
plan to include university repositories to have all the university repositories
meet the CoreTrustSeal? That is an interesting combination of issues there.
Firstly, it would now be ANDS/Nectar/RDS, as we continue to merge
towards a new organisation, and the ANDS/Nectar/RDS is not really in a
position to require university repositories to do anything. University
repositories will apply for CoreTrustSeal if they see value in doing it
themselves. In the case of the projects that presented here, we provided
some funding to help ADA and NIF and CSIRO do something. That's
something that they really wanted to do anyway. So I don't think that - yeah,
I think it's going to depend on the drivers for the individual repositories as to
whether they see value in this.
Then the second question was on the duration of the data to be preserved.
Carmella's comment there was, 10 years of data preservation seems like it's
a relatively short period of time, especially in the case of clinical trials. I'll
leave it to the three presenters to comment on that, but I would just say
that I suspect the 10 years is a consensus number and is not a, you have to
throw it away after 10 years number. It's a, at least that number. I'm pretty
Page 21 of 24
sure the NHMRC Australian code for responsible conduct of research says
either seven or 10, so it's not inconsistent with that - but would any of the
presenters like to weigh in on this subject of 10 years, before I move to the
Andrew Mehnert: I might answer first, if I can? I think you're right - NHMRC ARC requirement's
for seven years for retaining data. The figure of 10 was essentially in the
original research proposal as something reasonable each of the NIF nodes
and the associated institutions were happy to support, but that doesn’t
mean we won't support beyond the 10-year period. That was just something
in consensus we agreed that collectively we could do, to ensure. 10 years is
a long time to guarantee a service is running, but the plan at each of the
nodes is that we would continue to the future. The 10 years is proof of
concept that we can indeed do this over a long period of time.
Andrew Treloar: Yeah.
Andrew Mehnert: In the case of NIF, this was establishing some new repository services, so
that's been a challenge unto itself, and guaranteeing 10 years of running
service is no mean feat.
Andrew Treloar: Yeah, 10 years sounds like a long time.
Graham Galloway: It's Graham - Graham Galloway here, who was the lead for that, the NIF
trusted data repository. I think one of the things we need to recognise that
over 10 years the nature of the repository is going to change. We don't
know where we're going to be storing data in 10 years. So for the
institutions to guarantee more than 10 years at this point in time is going to
be difficult. National Imaging Facility, as Andrew has already said, is
committed to providing mechanisms for data storage, but we could - in 10
years' those repositories could be existing on Amazon or on other publicly -
So, we - yeah, we want a commitment from the partners at the time we
signed the contract that they would guarantee to maintain that storage for
that 10 years, but we're committed to looking at the mechanisms beyond
Page 22 of 24
that, to ensure - and then of course, then you've got to look at migration of
data between those repositories, and that's an issue we'll have to address.
Andrew Treloar: Yeah, indeed. For the benefit of those people who are unfamiliar with the
august presence of Graham Galloway. He is the director of the National
Steve McEachern: Andrew, just on that - Graham, can I just make a quick comment which is,
this is one of the things that we were referencing at the end here, which
actually creates the complexity of responding to the guidelines, in terms of
saying, how long will you maintain a service, or how long - you're making -
potentially making commitments you can't realistically fulfil.
Andrew Treloar: Yeah.
Steve: In some ways some of the expectations in the seal needed to account for
that, I think, a little bit more than they probably did. It's unrealistic for us to
say that we - yeah, other than being able to say, look, we've been running
for 35 years - we'll probably still be here in 10 years' time. But as I say, in
terms of [IFA] commitment, I could say the ANU will be here in 10 years'
time. Or if I'm the National Archives, I can say that. There aren't many
organisations who'd actually make such claims - or parts of organisations.
Andrew Treloar: Yeah, that's actually a really good point, Steve, that I think the Data Seal of
Approval certainly came out of one environment where they - the players
involved at the time were largely national archives and they saw the world
through that set of lenses - we've been around forever and we're going to
be around forever. Trying to apply that as you move out into the wider data
repository space - it gets harder and harder to make those kinds of
Steve McEachern: Yeah.
Andrew Treloar: I realise we're running close on time, so I might just skip over an
observation, and maybe finish with this question, which is, what are the
costs, in terms of human and non-human resource, to get certification? Did
Page 23 of 24
any of you try and add up how much time and effort you put into this? Or
did you choose not to, because the number was just going to be too scary?
Steve McEachern: Well, we can say, in terms of human - well, one of the human resources in
this room - so Heather - as I say, part of our funding was for Heather to
contribute to that. I'd say that you were working a fair chunk of about nine
months, probably, to put that together. There's a reasonable proportion of
my time, as well - not at that level, but probably half a day a week, for
several months - and bits and pieces of other parts of our organisation as
There aren't too many non-human resources, because we were certifying
the existing facility, so realistically, it was a document-gathering or creation
exercise. So it really was primarily staff time that was the involvement there,
plus a little bit of developer - development, but not much.
I think the experience will be different in Andrew's case in particular, where
you are - where it'll be a new service. So to reflect on - to certify the existing
service, it really is staff time. It depends on how good your documentation is
In that way, it actually is a useful exercise, that experience, because it
reminds you of what you haven't done. So as I say, there is real value in that,
but as I said, there was a - there is time that's involved in that as well.
Andrew Treloar: Yeah, all right. Thank you. Mikaela, is it possible for you to respond to that?
Mikaela Lawrence: Yes. Certainly we had similar commitment as the ADA. We had myself and
another data librarian working on this and then given that we were looking
at hosting externally available data, we also had some of our researchers
talking to external organisations as well. However to cost time is a difficult
thing. Also, our legal counsel also put in a significant time and effort into
developing new procedures and policies for hosting externally-owned data.
Andrew Treloar: Okay, thank you. Andrew Mehnert, did you want to weigh in on that?
Page 24 of 24
Andrew Mehnert: Yes. So similar experiences. We had the challenge, I guess, of having four
different sites. We had a project manager at each site and a little team
around that project manager, to address some of these issues - talk to IT
services, talk to library services, to resolve some of the question.
Then, as the overall project manager, I was consulting with each of the other
project managers and trying to come to consensus - even talking to yourself,
Andrew, to understand some of the questions and how to respond, and I
guess at the end of the day it was giving an honest response, with the best
information you have at hand, and that is the honest status of your - how
you've addressed each of the requirements.
Andrew Treloar: Yeah. All right, thank you, and I'm afraid we're going to have to leave it
there. We're over time. My apologies to those people who have questions in
the question panel that we haven't got to yet, but my - in particular, my
thanks to the presenters - to Mikaela, to Heather, to Steve, to Andrew
Mehnert and to Graham for weighing in. Thank you for sharing your
experiences with us. I hope that's been of benefit to the community.
We look forward to seeing or hearing from some of you on our next
webinar. Thank you all.
END OF TRANSCRIPT