Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Transcript - Trusted Data Repositories - 13 March 2018

64 views

Published on

Full webinar recording, slides and transcript will be available from the ANDS website:
http://www.ands.org.au/news-and-events/presentations/2018

Published in: Education
  • Be the first to comment

  • Be the first to like this

Transcript - Trusted Data Repositories - 13 March 2018

  1. 1. [Unclear] words are denoted in brackets Webinar: Trusted Data Repositories 13 March 2018 Video & slides available from ANDS website START OF TRANSCRIPT Andrew Treloar: Okay, good afternoon, or good morning, depending on what time zone you're in. Welcome to this webinar on the ANDS - and I'll explain why it's ANDS in a minute, trusted data repository program. My name's Andrew Treloar. I work for ANDS and I was responsible for the trusted data repository program itself. So I'm going to start by providing a bit of context for the program, and an overview of - a very high-level overview of CoreTrustSeal. We then have three case studies. These were three projects that we funded, one from CSIRO, one from the National Imaging Facility and one from the Australian Data Archive, and I'll explain why we picked those and the different perspectives they provide, and then we have a slightly more freewheeling Q&A session at the end, so there should be plenty of time for you to ask any questions that you have. Trusted data repositories was a program that ANDS funded in its 2016/17 annual business plan. Now, 2016/17 seems like a very long time ago now, and a number of these projects - for reasons that they will explain, ran after the end of the 2016/17 financial year, and that's one of the reasons why we're talking about them now.
  2. 2. Page 2 of 24 It also means that this was a program that we started when ANDS and Nectar and RDS were largely separate activities. Since that time, ANDS, Nectar and RDS have been progressively aligning what they do, our 2017/18 activities, which are of course still running, are being undertaken under an integrated business plan, and so, while it seems slightly strange to be talking about an ANDS-only program - for me, at least - it is very much something that's been embraced by RDS and Nectar. So this overall trust agenda is very much something that is reflected in the 2017/18 business plan and is tied into a wider concern around research quality and trustedness. So, please see this as entirely consistent with what the three projects, as they come together, care about, even though it started under an ANDS-only umbrella. I was trying to think about the best way to provide some context for the concern with trustedness, and I decided to go back - not to beer - although the beer is relevant - but go back to an article that was very formative for me in the early '90s, written by a guy called John Perry Barlow, who in fact died last month. So he was a guy who was, among other things, a lyricist for the band, the Grateful Dead, but was one of the early thinkers around intellectual property and ideas and wrote a very influential article called The Economy of Ideas, subtitled Everything you know about intellectual property is wrong, whereby other things he distinguished between then are in the contents. So the way I want us to think about trust for the purposes of this seminar, is to distinguish between the contents of a repository and the repository itself. So, what we all want is we want contents - in this case, something that we can consume without fear, and one of the ways that we're happy to, in this case, drink from this delightful glass of beer, is by looking at the container. So the container has some characteristics that make the contents more trustworthy. One of the characteristics - and I hadn't actually thought about this joke until now - one of the characteristics is the seal on top of the
  3. 3. Page 3 of 24 container, in this case. You can see that the beer bottle has not been unsealed and so we're prepared to believe that it hasn’t been tampered with between the brewery and getting poured. Another of the elements of the trust that we might have in this particular beer is the label on the bottle. So we look at this and we say, ah yes, I've heard of Batemans, if I think - I think it's what that says. I've heard of Batemans. They produce trustworthy beer. There might be more information on the back. It might say, brewed in - somewhere. So there's some brand information that's associated with the container that leads us to trust it more, and there may even be some provenance information about how the beer was produced, where the ingredients came from, and so on, So the distinction that I want you to take out of this image is not, mm, it's lunchtime - I'd really like a beer now, but that distinction between container and contents. In fact, the ANDS trusted data repository projects were focused on the container. We had a separate set of projects, called trusted research outputs, which were focused much more around the process of producing the outputs and the provenance associated with producing the outputs. In this webinar, we want to focus on the container - the trusted data repository projects. So we selected a small number of pilot projects, which were deliberately designed to cover as wide a range of settings as possible. So they covered a range of disciplines - human and non-human imaging, in the case of NIF, social sciences, mostly quantitative social science in the case of ADA, water and a number of other physical sciences in the case of CSIRO, a number of different organisational settings - in some cases universities, in some cases a national archive, in some cases a research institution, and a number of different provision models - and you'll hear more about each of those in the case studies.
  4. 4. Page 4 of 24 The point was to try and get information about what it takes to implement a trusted data repository. Part of what we were doing in the program was we're saying, look, we'd like you to use this - what was then called the data seal of approval - now called the CoreTrustSeal certification as the approach to follow, to determine whether or not this was a trusted data repository. So, for those of you that are unfamiliar with CoreTrustSeal, this started out as a thing called the data seal of approval and was one of a number of certification schemes. If you look at the bottom of this slide, originally it was part of a three-level hierarchy. So the simplest was data seal of approval. Next level up was a German standard, which was looking at standardisation of digital resources, and the most advanced level was the thing that some of you may have heard of, called the [TRACK] criteria, or ISO 16363. What happened was that the World Data System - WDS - worked with the research data alliance via a working group on repository audit and certification, took the DANS data seal of approval, tweaked it a bit, added some additional questions, and essentially agreed on this as what was for a while called the IDAWDS criteria, and then just recently has morphed into an organisation called CoreTrustSeal. So there is now an organisation called CoreTrustSeal.org, which has taken over responsibility for this certification program. It's relatively straightforward at the high level. There are 16 criteria the you use to assess your repository. Some of these are around your organisation and the characteristics of your organisation. Some of these are around the way your repository does digital object management - and you have varying levels of compliance possible for a number of these criteria, so you can assess how well you're doing. In that respect it's a little bit like a maturity model assessment. The thing I want to stress, before we get into the case studies, is that this is not just technology, and in fact, the technology is almost the lease
  5. 5. Page 5 of 24 important bit. A lot of it is to do around the organisational processes and the kind of organisation that's standing behind this trusted data repository. You can use the criteria to do a self-assessment, and you hear a number of the presenters talk about a self-assessment, or you can then submit that assessment for certification - so external reviewers will look at that and maybe come back and ask some questions. You'll then get a CoreTrustSeal tick, which will last three years, and there's a business model that sits behind that. You have to pay for that, but you can of course just do the self- assessment as an exercise for yourself at zero cost. So there's more information there and links to the criteria, but if you go to CoreTrustSeal.org that will be enough. That's all I really wanted to provide by way of introduction. What I'd now like to do is pass to the first of our presenters. The first of our presenters is Mikaela Lawrence who would like to talk to us about the CSIRO experience in working with this approach. Mikaela Lawrence: Okay, so today I'm just going to talk about CSIRO's trusted data repository project. So I'll guide you through the aims of the project, some background about the data access portal, the requirements of self-assessment as a TDR, gathering the evidence, applying for certification and also another aspect of our project was looking into hosting externally-owned data. The aims of the trusted data repository project was to investigate certifying the data access portal as the trusted data repository, to develop a plan to implement changes to policies and procedures, to support CSIRO business requirements, and certification. To develop a plan to implement systems changes that may be required to the DAP infrastructure, to engage with external entities to host externally-owned data as a test case, and to prepare an application for certification. This is our data access portal. So just a bit of background information of the repository, and this'll provide some of the context that relates to the first section of the application.
  6. 6. Page 6 of 24 The data access portal is currently an institutional repository and when we submitted our application that's what we submitted our application as. Deposit is by self-service and is accessible to CSIRO staff, using their institutional username and log in. We have approximate 2100 publicly available collections and storage of the data is over one petabyte. The subject matter includes a broad range of sciences, with 17 of the 22 fields of research codes represented. The software and storage infrastructure of the DAP - which is what we term our data access portal - are developed and managed by CSIRO information management and technology. We have a data deposit checklist, which ensures depositors can see the key quality and legal issues prior to deposit. A science leader then approves the collection after assessing for quality and legal issues. The repository - we offer a few different curation levels, based on depositor needs. So the content can be distributed as deposited. We may offer some basic curation - brief checking or addition of basic metadata - or enhanced curation - such as conversion to new formats. The community or data users of the data access portal are researchers, industry, policymakers, general public and students. The data users can download the majority of collections without a user log in, and a smaller number of collections will require registration to access the files. So the requirements of self-assessment as a TDR - when we went through the process, to help with understanding - there are 16 requirements of the self-assessment - we read other organisations' applications and considered the evidence they had used. Applications within the CoreTrustSeal are now open, with certified repository applications available on their website. Applications that were useful to us in reading were DANS, as they're part of the secretariat of the CoreTrustSeal, and have been involved in developing their requirements, and also the UK data archives. They had a well- organised application with the detailed evidence.
  7. 7. Page 7 of 24 To help with the next step of gathering and determining what evidence to use for CSIRO, an analysis was undertaken of the types of evidence used in a few of the published applications. We've included a list of references we used to inform our understanding of the requirements in the appendix of our report to ANDS. This will be published on their website. There also is a useful extended guidance document and webinar available on the CoreTrustSeal website that discusses the requirements and reviewers' expectations. Gathering the evidence. The certifying body have a preference for evidence that is public, and we found this a major challenge. In this table are some examples of the evidence we used for the first part of the requirements which were organisational related, from requirement one to six. This gives an idea of new evidence we developed, such as the mission statement. Also, the difficulty with providing publicly available evidence. It also provides information about the departments we consulted for expert guidance within our organisation, such as legal, business development and staff from within our own information and technology - information management and technology department. We have attempted to overcome the challenge with providing public evidence with the development of collection development principles, preservation principles and an update of the data management we've got. These provide a summary of the processes, so requirements from seven to 16, which covered digital object management and technology. These public documents are available from the CSIRO DAP help page. The next - what stage are we up to with applying for certification? So the data seal of approval ceased applications in October 2017, and we missed this deadline. However, our application was submitted with the CoreTrustSeal in February 2018, as part of their soft launch to test their system. Processing of our application will begin when the CoreTrustSeal legal entity is finalised. So we're currently waiting to pay the administration fee of €1000 and then our application will be processed.
  8. 8. Page 8 of 24 We found getting an account for the application management tool gave access to a staff member, who promptly answered our questions. A word of warning - once an application is submitted, it is locked, however, we found the helpful staff member could amend a small error we had made. One aspect of our project looked at investigating policies, procedures and system changes to host externally-owned data. So why was this part of our strategy? As an organisation, we understand the value of new research possibilities in drawing together research data, produced by organisations beyond CSIRO and across the research community. Also, researchers from our land and water business unit are interested in investigating a trusted repository for water research data. This vision is to bring together nationally significant data from a wide range of organisations for the benefit of industry, policy and research. What did we implement as part of this part of the project? We defined the scope for accepting data in the collection development principles. For example, data should be aligned with CSIRO's function as set out in Section 9 of the Science and Research Industry Act, 1949. Terms and conditions were developed into an agreement to be signed by the depositing organisation, called the Data Deposit Conditions. Some example of the terms and conditions include that data is free from embargo, it has not previously been published with the DOI, data is owned by the depositing organisation, data complies with ethics, privacy, confidentiality, contractual licensing and copyright obligations, and data will have a [CC by] licence applied. A data deposit form was developed for the data depositor, to provide metadata. We developed some procedures for depositing externally-owned data. The DAP is a self-service repository, with access to deposit by CSIRO staff only, so the research data service will liaise with external data owners to facilitate the deposit of data. Then the science - CSIRO science leader
  9. 9. Page 9 of 24 with the main knowledge of the data will be the approver of the collection. This is part of the risk management framework that all public data collections in the DAP are subject to. It involves the checkers, the data quality and legal issues prior to publishing. Some future enhancements to the DAP include the ability to customise a collection landing page, such as the addition of logos for external organisations, automation of the data deposit conditions within the existing DAP software and to develop a self-serve deposit interface for external organisations. We found that this project had some immediate benefits for us, such as when applying for recommended repository status with journal publishers and funders, we've found that we had information ready to use to meet those requirements. We've also had enquiries from researchers regarding publishing externally- owned data, and we now have a response with policies and procedures in place. So thank you. There was a lot of people involved in this project within CSIRO - too many to list, but a thank you to all of them as well. Andrew Treloar: Okay, so next, another Andrew - Andrew Mehnert, to talk about the NIF experience. Andrew Mehnert: All right. So I'm going to talk about national trusted data repositories for the National Imaging Facility. My name's Andrew Mehnert. I'm a NIF Informatics Fellow at the Centre for Microscopy, Characterisation and Analysis, at the University of Western Australia. Very quickly, what is NIF? The Australian National Imaging Facility is a $130 million project, providing state of the art imaging capability of animals, plants and materials for the Australian research community. The little map there to the right shows the various nodes of the National Imaging Facility around the country.
  10. 10. Page 10 of 24 Now, why is NIF interested in trusted data repositories? Well, the imaging equipment such as MRI, PET, CT scanners are capable of producing vast amounts of valuable research data. So we're interested in maximising those research outcomes, and to do so, the data must be stored securely, it must have its quality verified and should be accessible to the wider research community. From the CoreTrustSeal point of view, why trusted data repositories? Well, firstly, to be able to share data, secondly, to preserve the initial investment in collecting that data, thirdly, to ensure that the data remain useful, and meaningful into the future. The last one, importantly, is that funding authorities are increasingly requiring continued access to data that's produced by projects they fund. All right, now I want to talk specifically about the NIF/RDS/ANDS trusted data repositories project, officially titled Delivering Durable, Reliable, High- Quality Image Data for the National Imaging Facility. Now, broad aim of the project was to enhance quality, durability and reliability of data that is generated by the NIF. By quality, we mean that data has to be captured by what we call the NIF- agreed process. Durable means that the data has to have guaranteed availability for 10 years. Reliable means that the data has to be useful for future researchers. So it has to be stored in one or more open data formats, and with sufficient evidential metadata, so we know how it was created, what the state of the instrument was at the time of creation, and so on. The NIF nodes involved were the University of Western Australia, University of Queensland, University of New South Wales and Monash University. In the project, we limited our scope to MRI data, but essentially, the results are generalisable to other modalities, and in fact we've already progressed to micro CT. Key outcomes from the project include the NIF-agreed process to obtain trusted data from NIF instruments. I'll talk more about that shortly. The
  11. 11. Page 11 of 24 second is requirements necessary and sufficient for a basic NIF-trusted data repository service. The third were exemplar repository services across all four participating noes, and then the last one were self-assessments against the core trustworthy data repositories' requirements, from CoreTrustSeal. The NIF-agreed process requiring high quality data - this essentially lists requirements that have to be satisfied to obtain high-quality data - which we call NIF-certified data - this then suitable for ingestion in a NIF trusted data repository service. We mandate that repository data must be organised by project ID, because projects' IDs will persist with time, whereas user IDs don't - users come and go. Now, to be NIF-certified, the data must have been acquired on a NIF- compliant instrument - more about that shortly. It has to possess NIF- minimal metadata - so that includes cross-reference to relevant instrument quality control data. It has to include the native data generated by the instrument in proprietary format and include conversions to one or more open data formats. So the requirements for a NIF trusted data repository service. We drew upon the CoreTrustSeal requirements in the left column that you see there, and additionally added some NIF requirements. One of them you've seen already, the project ID requirement, but we also require an instrument ID requirement, a quality control requirement, authentication by Australian Access Federation requirement, interoperability - that is we should be able to upload data from one repository to another. Redeployability - it should be possible to deploy the service from one NIF node to another - and a service requirement that essentially, we have a help desk responding to requests regarding the repository. So in a nutshell, if we have a look at this diagram. If we concentrate on the right-hand side. If we have - we've got the four sites, UWA, UQ, UNSW and Monash, so TruDat@ that particular site represents the trusted data
  12. 12. Page 12 of 24 repository. Login is via the Australian Access Federation, so it means that on any of the sites it will direct you back to your institutional login page and use institutional credentials. As I mentioned before, data sets are organised by project ID. A data set is associated with an instrument, and provided the NIF-agreed process has been followed, then a NIF certification flag, indicating that it is certified, is also included with the data se. The repository has a record for the instrument. The instrument itself is linked to another special project, called the Quality Control Project, and also a handle to a record in Research Data Australia. Looking at the bottom of the screen, you can see Research Data Australia is a data and service discovery portal, provided by ANDS. So we put into that an instrument description that's both hardware and software, and there's a unique handle to that record. If we look at the top left now, at the instrument PC, or client PC, data is uploaded, according to the NIF-agreed process, so the top box above NIF- agreed process, the user data set has to have minimal metadata, has the project I'd, instrument ID, date and time the data was acquired, implicit metadata that’s in the proprietary data, the native data from the instrument and conversions to one or more open data formats. The instrument operator can also upload data to the quality control project, which includes the [quality stamp], quality control standard operating procedure, which of course can be updated over time, and quality control data. So what this means is that when a user uploads data to the repository there's an automatic link to the quality control project, and so it's possible to know the state of the instrument at the time that the data was acquired. This is what the portal looks like for TruDat@UWA. We have based this on the MyTardis platform, which originated at Monash, with several extensions developed during the project, and we use docker technology to be able to
  13. 13. Page 13 of 24 easily deploy different sites. So this allows easy instrument integration, simple data sharing and user control publishing data sets. Okay, now I come to the comparison of all the self-assessments against the CoreTrustSeal requirements. All four sites did their own self-assessments for their respective repositories. What we can see here in this table - also this shows the first eight such requirements - is that essentially, we independently arrived at a fairly similar level of assessment, except for the cases there where we marked in blue. So the third one, we talk about continuity of access. Monash here believes that at this point in time that that was not assured, whereas the other three sites did. I should point out this self-assessment is a statement of the reality in the situation at the point in time that the self-assessment was completed. Then there was a difference as well at row four, which is requirement four, confidentiality and ethics. Monash have this fully implemented, whereas the other three sites are in various stages of getting this to be implemented. Then the other differences, with the remaining requirements, some differences with respect to data storage - documented storage procedures, work flows and data discovery and identification. Post-funding. The project hasn’t finished, just because the funding has finished. We intend to maintain the services for 10 years now, and we plan to meet quarterly, to make sure that this happens. We are integrating additional instruments - as I said, we're adding micro CT instruments at the moment. We will create a project web portal, so we have a single landing page for all these trusted data repository services. We're planning new national and international service deployments, including one in Turku, Finland. We're refining and improving the trusted data repository portal and we intend progressing to CoreTrustSeal certification. Very quickly, benefits of the NIF trusted data repository services. For NIF users and the broader community it means reliable, durable access to data,
  14. 14. Page 14 of 24 improved reliability of research outputs and provenance associated with it, making NIF data more fair. Easier linkages between publications and data and stronger research partnerships. For NIF it means improved data quality, improved international reputation, ability to run multi-centre trials. For the various research institutions, enhanced reputation management, a means by which to comply with the draft code for responsible research, and the enhanced ability to engage in multi-centre imaging research projects. With that, I thank you. I list on the page here the various project leads at the various nodes. So thank you very much. Andrew Treloar: Okay, thank you Andrew Mehnert. So that's two quite different perspectives on trusted data repositories. The third perspective comes from Heather Leasor and Steve McEachern, from the Australian Data Archive. Heather Leasor: We're at the [centre] archive, which is a social science research data archive. Our mission is to be a national service for the collection and preservation of digital research data, and to make these available to academics, government and other researchers. We hold about 5000 datasets in over 1500 studies, on all areas of social science, from social attitude surveys, censuses, aggregate statistics, administrative data, and many other sources, both qualitative and quantitative. Our data holdings are sourced from academics, government and private sector. We undertook the process with ANDS, as part of the trusted repositories. We originally started under the Data Seal of Approval, before they had actually combined fully with the World Data Service - our systems. Originally, we were the DSA and then we became the DSA/WDS. When we found out that they were moving to the CoreTrustSeal, we delayed our
  15. 15. Page 15 of 24 implementation of which guidelines to take. We officially started the DSA/WDS in March of 2017 and submitted our application in April of 2017. We were due to have a review from our reviewers in May, but it didn't actually arrive until August. Then we made our corrections to this. We sent it back in, and then we got another aspect of corrections, did the other round of correction and submitted and finalised in February of 2018. So slightly less than a year's length of process, we are a CoreTrustSeal repository now. We did use the November 2016 WDS - DSA/WDS guidelines, which weren't as detailed as what is given in the CoreTrustSeal, and there were no people to look at for reference - as in Mikaela had said she looked at others for reference - there was no one to look at for reference for this new CoreTrustSeal, so we flew from what people had done in the DSA/WDS and flew blind for a bit. When we want through the process, which was a very useful process for self-assessment, we identified four of the guidelines, which we set at a level three, which is the implement station phase of process, which were data integrity and authenticity, the guideline 10, which is preservation and planning, guideline 15, which is technical infrastructure, and guideline 16. Later, in assessment with one of our reviewers, we also changed guideline nine, which is documented storage, down to a level three. Everything else we had set at a level four, for our repository. Our repository has been around for about 35 years - coming up on 40 years, so we do have quite a few procedures in place. Some of the challenges that we found, doing the CoreTrustSeal process. When we initially undertook it there was no recommendation of what a minimum requirement would be for any of the guidelines, so we didn’t know if we set it at a three, if that meant we wouldn't be able to get a CoreTrustSeal or not. Or if you set it at a one, can you still get a CoreTrustSeal? There doesn’t seem to be a minimum requirement that we
  16. 16. Page 16 of 24 have ever found. The extended guidelines do detail things a little bit better nowadays, for those who are undertaking it in the future. We weren't sure if you had to respond to every aspect of all of the sub- questions in a guideline, or just to the overall arching guideline. We also found that there was a complex interplay between the relevant documents required for a guideline and those for other guidelines, so that one document may respond to up to four different guidelines, or it may respond to only one guideline. Also, we found it difficult for providing evidence from documents which are not in the public domain. Like the other two, we had to go through our own websites and find out what we did have, forward-facing, what we had internally-facing and which aspects of those we feel we can now put into an outward-facing website or Wiki page. There should be - all aspects should be outward-facing, but if things had to be inward-facing, there seems to be some basis that the CoreTrustSeal can deal with that. The assessors did not indicate, in our original guidelines, that you had to have a timeline for things that were in process. The new guidelines do state that you have to list a guideline of when you plan to have your implementation in place by, so we had to add that in our final version of when we planned to have these items forward-facing and our new website up and running. We had to come up - we had no idea when we originally started the process, what the process entailed, and what time frames it was going to take. We were unclear if it was going to take a few months or a year. It ended up taking us a year, but the CoreTrustSeal does seem to be coming along as an organisation much better, so that time lines should move a bit quicker now. So, from our experience, we found that doing as Mikaela and Andrew had done, going through and finding out what is in the public domain already,
  17. 17. Page 17 of 24 and what can safely be put into the public domain is a good first step for any repository undertaking the CoreTrustSeal. How to cite the items which are out of public domain and private elements is still an area of question, which the CoreTrustSeal is dealing with. We also would like to know how to deal with items that are out of our direct control, such as funding models, infrastructure and governance, being a part of the larger university, or as CSIRO, part of a governmental body, and Andrew being part of multiple institutions, how do you fit into their governance models? How do you fit into the infrastructure, and how is this relayed to the CoreTrustSeal, with these complexities? Also, the risk management section of the CoreTrustSeal we found a bit difficult, because they kept referencing almost ISO standard requirements and to undertake an ISO standard for a risk assessment, to do a base CoreTrustSeal seemed a bit overkill for us. So finding some risk management standards that are free and in the public domain, would be very useful. We actually answered the final one, which is the guidelines are freely available for self-assessment, without paying to obtain a seal. You can just undertake the CoreTrustSeal as a self-assessment for your repository, so you can define what your repository is, where your boundaries are, and undertake the assessment. So in the Australian context - these aren't necessarily only in the Australian context. We did find that they relate to other repositories worldwide, but the complexity of how institutions and repositories - one institution may encompass multiple repositories, or one repository may encompass multiple institutions, and how this affects your governance, your funding, your security and all those aspects, as well as things that are in the national framework. So things that are involved in our national roadmaps - how these play into the CoreTrustSeal, and how they're also out of the control of the individual repositories. Infrastructure frameworks. Infrastructure that is
  18. 18. Page 18 of 24 provided by your host institution and the government frameworks of host institutions, which are not easily explainable in a CoreTrustSeal. So these are not necessarily, as I said, Australian specific, but more to do with the repository sector, because the repository sector is a very varied sector, with multiple institutions, multiple repositories, playing different roles. Andrew Treloar: Okay. Thank you, Steve and Heather. We've now heard from three separate experiences of engaging what trusted data repositories and the CoreTrustSeal. We now have 15 minutes or so for questions. The first question is from Nick, who says that they got Data Seal of Approval in 2012, so very early on. Is there any advantage in going through the WDS process? Would any of the three panellists like to weigh in on that one? Steve McEachern: I'll take that one, Andrew. Andrew Treloar: Thanks, Steve. Steve McEachern: My sense would be probably - I mean, the DSA - the three-year certification in and of itself - so it's a question - I mean from the point of view of being ongoing certification, I suppose there's a consideration there. What I would say to you, having been through - as I say, we were familiar with the DSA in its original version and what it morphed into in the CoreTrustSeal - there is probably a heavier expectation on some of the risk management and preservation requirements than there was in the past, and the emphasis has shifted somewhat, I would say. The other point I would probably make is that the review process itself - I mean, we were flying blind, as Heather pointed out. This - our experience is probably not reflective of everyone as a whole. I think the CoreTrustSeal organisation itself was developing, and the reviewing that was going on probably - the reviews and so on were probably a bit different, as they brought together what was - that seal of approval was a social science standard, to begin with, and into the humanities, as Nick's does as well.
  19. 19. Page 19 of 24 It has - I think the WDS side of this is more the physical and life - business sciences in particular, or the earth sciences. So there's probably a shift in emphasis there. I think it would be a good experience, but it might be a bit different from what you went through in the DSA experience, is probably how I'd reflect on that. Andrew Treloar: Okay, thanks, Steve. Comments from either of the presenters, if you want to weigh in on that? Okay, no - so it looks like a no. Maybe if I could ask a question that builds on that question from Nick and Steve's answer. Under CoreTrustSeal, the idea is that you would apply for certification, you would get certification. That certification would run for three years. I know that they've talked about a lightweight recertification down the track, if you want to get recertified in three years' time. Would any of the presenters like to comment on the question of the time length for the certification - or rather the expiry time for certification time, and whether that is a reasonable thing to do? Is it - in your view, does it seem sensible that your certification would slowly evaporate over a three-year period and that there be value in applying again in three years' time? Anyone want to… Andrew Mehnert: I guess I might chip in there and say, the amount of effort I guess in getting the original certification through, I'd say it would be worth it to keep this going to the future, and it should be a - three years would seem reasonable, and it should be a fairly lightweight exercise to get that recertification. That's having not yet achieved certification the first time around. Steve McEachern: I would say there, Andrew, I think three years is the right sort of cycle as well, so long as the certification process itself - the time frame shortens. Our application was for April 2017. Our certification was in February 2018. Our certification will end in December 2019. So I think the cycle is right, given the context of what the content is, but I - as I say, I - and so this is, I think, partly a function of the organisation itself evolving and sorting itself out, but
  20. 20. Page 20 of 24 the process itself has to speed up somewhat, in order to make that three- year cycle an appropriate one. I think that's the right time frame, but they have to speed up the process. Andrew Treloar: Yeah, that makes sense, and I - sorry, go on. Heather Leasor: I believe that, yeah, as soon as you do have most of your documents together, and you know which ones need to evolve, the recertification should go a bit quicker, because you can just copy and paste, pretty much, and iterate what new developments have happened to your institution or your repository in that time. Andrew Treloar: Yeah, that makes sense. So as in most new things that one does, the first time's a bit painful, and then it gets easier. Heather Leasor: Yeah. Andrew Treloar: Two questions from Carmella. The first question is, what is ANDS' long-term plan to include university repositories to have all the university repositories meet the CoreTrustSeal? That is an interesting combination of issues there. Firstly, it would now be ANDS/Nectar/RDS, as we continue to merge towards a new organisation, and the ANDS/Nectar/RDS is not really in a position to require university repositories to do anything. University repositories will apply for CoreTrustSeal if they see value in doing it themselves. In the case of the projects that presented here, we provided some funding to help ADA and NIF and CSIRO do something. That's something that they really wanted to do anyway. So I don't think that - yeah, I think it's going to depend on the drivers for the individual repositories as to whether they see value in this. Then the second question was on the duration of the data to be preserved. Carmella's comment there was, 10 years of data preservation seems like it's a relatively short period of time, especially in the case of clinical trials. I'll leave it to the three presenters to comment on that, but I would just say that I suspect the 10 years is a consensus number and is not a, you have to throw it away after 10 years number. It's a, at least that number. I'm pretty
  21. 21. Page 21 of 24 sure the NHMRC Australian code for responsible conduct of research says either seven or 10, so it's not inconsistent with that - but would any of the presenters like to weigh in on this subject of 10 years, before I move to the next question? Andrew Mehnert: I might answer first, if I can? I think you're right - NHMRC ARC requirement's for seven years for retaining data. The figure of 10 was essentially in the original research proposal as something reasonable each of the NIF nodes and the associated institutions were happy to support, but that doesn’t mean we won't support beyond the 10-year period. That was just something in consensus we agreed that collectively we could do, to ensure. 10 years is a long time to guarantee a service is running, but the plan at each of the nodes is that we would continue to the future. The 10 years is proof of concept that we can indeed do this over a long period of time. Andrew Treloar: Yeah. Andrew Mehnert: In the case of NIF, this was establishing some new repository services, so that's been a challenge unto itself, and guaranteeing 10 years of running service is no mean feat. Andrew Treloar: Yeah, 10 years sounds like a long time. Graham Galloway: It's Graham - Graham Galloway here, who was the lead for that, the NIF trusted data repository. I think one of the things we need to recognise that over 10 years the nature of the repository is going to change. We don't know where we're going to be storing data in 10 years. So for the institutions to guarantee more than 10 years at this point in time is going to be difficult. National Imaging Facility, as Andrew has already said, is committed to providing mechanisms for data storage, but we could - in 10 years' those repositories could be existing on Amazon or on other publicly - domain services. So, we - yeah, we want a commitment from the partners at the time we signed the contract that they would guarantee to maintain that storage for that 10 years, but we're committed to looking at the mechanisms beyond
  22. 22. Page 22 of 24 that, to ensure - and then of course, then you've got to look at migration of data between those repositories, and that's an issue we'll have to address. Andrew Treloar: Yeah, indeed. For the benefit of those people who are unfamiliar with the august presence of Graham Galloway. He is the director of the National Imaging Facility. Steve McEachern: Andrew, just on that - Graham, can I just make a quick comment which is, this is one of the things that we were referencing at the end here, which actually creates the complexity of responding to the guidelines, in terms of saying, how long will you maintain a service, or how long - you're making - potentially making commitments you can't realistically fulfil. Andrew Treloar: Yeah. Steve: In some ways some of the expectations in the seal needed to account for that, I think, a little bit more than they probably did. It's unrealistic for us to say that we - yeah, other than being able to say, look, we've been running for 35 years - we'll probably still be here in 10 years' time. But as I say, in terms of [IFA] commitment, I could say the ANU will be here in 10 years' time. Or if I'm the National Archives, I can say that. There aren't many organisations who'd actually make such claims - or parts of organisations. Andrew Treloar: Yeah, that's actually a really good point, Steve, that I think the Data Seal of Approval certainly came out of one environment where they - the players involved at the time were largely national archives and they saw the world through that set of lenses - we've been around forever and we're going to be around forever. Trying to apply that as you move out into the wider data repository space - it gets harder and harder to make those kinds of commitments. Steve McEachern: Yeah. Andrew Treloar: I realise we're running close on time, so I might just skip over an observation, and maybe finish with this question, which is, what are the costs, in terms of human and non-human resource, to get certification? Did
  23. 23. Page 23 of 24 any of you try and add up how much time and effort you put into this? Or did you choose not to, because the number was just going to be too scary? Steve McEachern: Well, we can say, in terms of human - well, one of the human resources in this room - so Heather - as I say, part of our funding was for Heather to contribute to that. I'd say that you were working a fair chunk of about nine months, probably, to put that together. There's a reasonable proportion of my time, as well - not at that level, but probably half a day a week, for several months - and bits and pieces of other parts of our organisation as well. There aren't too many non-human resources, because we were certifying the existing facility, so realistically, it was a document-gathering or creation exercise. So it really was primarily staff time that was the involvement there, plus a little bit of developer - development, but not much. I think the experience will be different in Andrew's case in particular, where you are - where it'll be a new service. So to reflect on - to certify the existing service, it really is staff time. It depends on how good your documentation is already. In that way, it actually is a useful exercise, that experience, because it reminds you of what you haven't done. So as I say, there is real value in that, but as I said, there was a - there is time that's involved in that as well. Andrew Treloar: Yeah, all right. Thank you. Mikaela, is it possible for you to respond to that? Mikaela Lawrence: Yes. Certainly we had similar commitment as the ADA. We had myself and another data librarian working on this and then given that we were looking at hosting externally available data, we also had some of our researchers talking to external organisations as well. However to cost time is a difficult thing. Also, our legal counsel also put in a significant time and effort into developing new procedures and policies for hosting externally-owned data. Andrew Treloar: Okay, thank you. Andrew Mehnert, did you want to weigh in on that?
  24. 24. Page 24 of 24 Andrew Mehnert: Yes. So similar experiences. We had the challenge, I guess, of having four different sites. We had a project manager at each site and a little team around that project manager, to address some of these issues - talk to IT services, talk to library services, to resolve some of the question. Then, as the overall project manager, I was consulting with each of the other project managers and trying to come to consensus - even talking to yourself, Andrew, to understand some of the questions and how to respond, and I guess at the end of the day it was giving an honest response, with the best information you have at hand, and that is the honest status of your - how you've addressed each of the requirements. Andrew Treloar: Yeah. All right, thank you, and I'm afraid we're going to have to leave it there. We're over time. My apologies to those people who have questions in the question panel that we haven't got to yet, but my - in particular, my thanks to the presenters - to Mikaela, to Heather, to Steve, to Andrew Mehnert and to Graham for weighing in. Thank you for sharing your experiences with us. I hope that's been of benefit to the community. We look forward to seeing or hearing from some of you on our next webinar. Thank you all. END OF TRANSCRIPT

×