The document discusses digital data curation and preservation. It covers topics like why data is important for science, different types of data, best practices for data curation, and challenges around long-term preservation and reusability of digital research data. The goal of data curation is to organize and manage data to be findable, accessible, interoperable and reusable over extended periods of time.
Open science curriculum for students, June 2019Dag Endresen
Living Norway seminar on Open Science in Trondheim 12th June 2019.
https://livingnorway.no/2019/04/26/living-norway-seminar-2019/
https://www.gbif.no/events/2019/living-norway-seminar.html
Federation and Interoperability in the Nectar Research CloudOpenStack
Audience Level
Beginner
Synopsis
The Nectar Research Cloud provides an OpenStack cloud for Australia’s academic researchers. Since its inception in 2012 it has grown steadily to over 30,000 CPUs, with over 10,000 registered users from more than 50 research institutions. It is different to many clouds in being a federation across eight organisations, each of which runs cloud infrastructure in one or more data centres and contributes to a distributed help desk and user support. A Nectar core services team runs centralised cloud services. This presentation will give an overview of the experiences, challenges and benefits of running a federated OpenStack cloud and a short demonstration on using the Nectar cloud. We will also describe some current approaches that are looking to extend this federation to encompass other institutions including some in New Zealand, to extend the infrastructure using commercial cloud providers, and to move towards interoperability with the growing number of international science and research clouds through the new Open Research Cloud initiative.
Speaker Bio
Dr Paul Coddington is a Deputy Director of Nectar, responsible for the Nectar national Research Cloud, and also Deputy Director of eResearch SA. He has over 30 years experience in eResearch including computational science, high performance and distributed computing, cloud computing, software development, and research data management.
I am text block. Click edit button to change this text. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.
Open science curriculum for students, June 2019Dag Endresen
Living Norway seminar on Open Science in Trondheim 12th June 2019.
https://livingnorway.no/2019/04/26/living-norway-seminar-2019/
https://www.gbif.no/events/2019/living-norway-seminar.html
Federation and Interoperability in the Nectar Research CloudOpenStack
Audience Level
Beginner
Synopsis
The Nectar Research Cloud provides an OpenStack cloud for Australia’s academic researchers. Since its inception in 2012 it has grown steadily to over 30,000 CPUs, with over 10,000 registered users from more than 50 research institutions. It is different to many clouds in being a federation across eight organisations, each of which runs cloud infrastructure in one or more data centres and contributes to a distributed help desk and user support. A Nectar core services team runs centralised cloud services. This presentation will give an overview of the experiences, challenges and benefits of running a federated OpenStack cloud and a short demonstration on using the Nectar cloud. We will also describe some current approaches that are looking to extend this federation to encompass other institutions including some in New Zealand, to extend the infrastructure using commercial cloud providers, and to move towards interoperability with the growing number of international science and research clouds through the new Open Research Cloud initiative.
Speaker Bio
Dr Paul Coddington is a Deputy Director of Nectar, responsible for the Nectar national Research Cloud, and also Deputy Director of eResearch SA. He has over 30 years experience in eResearch including computational science, high performance and distributed computing, cloud computing, software development, and research data management.
I am text block. Click edit button to change this text. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.
Research Data Management Services at UWA (November 2015)Katina Toufexis
Research Data Management Services at the University of Western Australia (November 2015).
Created by Katina Toufexis of the eResearch Support Unit (University Library).
CC-BY
Disciplinary and institutional perspectives on digital curationMichael Day
Slides from a presentation jointly given by Alexander Ball and Michael Day of UKOLN in a panel session on Scientific Data Curation at the DigCCurr 2009 Conference, Chapel Hill, NC, USA, 2 April 2009
A very short, very minimal presentation I prepared for the Yale Libraries' SCOPA event to introduce librarians in diverse disciplines to the concepts and challenges of data curation.
RDAP13 John Kunze: The Data Management EcosystemASIS&T
John Kunze, University of California, Curation Center
California Digital Library (CDL)
The Data Management Ecosystem
Panel: Partnerships between institutional repositories, domain repositories, and publishers
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
Research Data Management Services at UWA (November 2015)Katina Toufexis
Research Data Management Services at the University of Western Australia (November 2015).
Created by Katina Toufexis of the eResearch Support Unit (University Library).
CC-BY
Disciplinary and institutional perspectives on digital curationMichael Day
Slides from a presentation jointly given by Alexander Ball and Michael Day of UKOLN in a panel session on Scientific Data Curation at the DigCCurr 2009 Conference, Chapel Hill, NC, USA, 2 April 2009
A very short, very minimal presentation I prepared for the Yale Libraries' SCOPA event to introduce librarians in diverse disciplines to the concepts and challenges of data curation.
RDAP13 John Kunze: The Data Management EcosystemASIS&T
John Kunze, University of California, Curation Center
California Digital Library (CDL)
The Data Management Ecosystem
Panel: Partnerships between institutional repositories, domain repositories, and publishers
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
ESI Supplemental 1 E-research Support SlidesDuraSpace
E-Research Support at
Johns Hopkins University & Purdue University
Supplemental Webinar
Wednesday, October 17, 2012
Presented by Sayeed Choudhurry & James Mullins
'Data Management Planning: the role of institutions and researchers' eResearc...Marta Ribeiro
Recent changes to the Australian Code for the Responsible Conduct of Research bring home the importance of Data Management Planning. DMPs have been required by UK research funders for several years now, and the Digital Curation Centre (DCC) has developed a number of resources in response. Notably these include example plans, a DMP Checklist and DMPonline , a web-based tool to help researchers write plans according to requirements from their funder and institution.
This half-day workshop showcases the many benefits of data management and sharing plans. We will share resources and lessons from the UK context to assist Australian researchers and universities to address requirements for DMPs. Colleagues from ANDS will speak about the Australian context and the Digital Scholarship team will explain how the University of Melbourne is responding. The DCC will provide an overview of DMPonline and how this can be customised by institutions to add templates and tailored guidance. An exercise will also give an opportunity to write a DMP based on guidance and examples from the UK. The workshop will end with a Q&A session giving attendees the opportunity to ask questions and suggest ideas which may influence future development of the tool.
- An understanding of the purpose of data management planning and how the process benefits different stakeholders;
- An awareness of DMPonline and how it can be used;
- Ideas of how DMPs can be integrated into existing institutional system;
Stuart Macdonald talks about the Research Data Management programme at the University of Edinburgh Data Library, delivered at the ADP Workshop for Librarians: Open Research Data in Social Sciences and Humanities (ADP), Ljubljana, Slovenia, 18 June 2014
Presentation on the theme 'democratisation of knowledge' to RLUK in December 2010. Open Science, Open Access, Open Data, Research Libraries and research data...
Presentation to the ARROW repositories day, Brisbane, 2008, on suggestions for improving the rate of capture of documents in institutional repositories
A presentation to the Alliance for Permanent Access to the Records of Science on the ongoing work of the Blue Ribbon Task Force on Sustainable Digital Preservation and Access
Talk at JISC Repositories conference intended for repository managers or research managers on some of the issues involved. Talk had to be originally given unaided because of a technology problem!
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 4
Create, curate, re-use: the expanding life course of digital research data
1. a centre of expertise in data curation and preservation
Create, curate, re-use:
the expanding life course of digital research data
Chris Rusbridge
EDUCAUSE Australasia May 2007
Funded by:
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5
UK: Scotland License, excluding content property of others. To view a copy of this license, visit
http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative
Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
2. a centre of expertise in data curation and preservation
Contents
• Science and digital curation
• Why are data important?
• What kinds of data?
• What to do with your data: frontiers of
practice
• Repository frontiers
• Changing practice
EDUCAUSE Australasia 2007
3. a centre of expertise in data curation and preservation
Digital Curation Centre Mission
“The over-riding purpose of the DCC is to
support and promote continuing improvement
in the quality of data curation, and of
associated digital preservation”
EDUCAUSE Australasia 2007
4. a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
5. a centre of expertise in data curation and preservation
Summarising…
• Sustainability • Maintaining meaning
• Creation or selection over time
• Growth, development • Preserving, including
• Making available past states
• Access management • De-selection…
• Re-usability • Extended time
• Linkage, context, • Budget and policy
metadata impacts
• Authenticity, integrity,
• People issues!
provenance
EDUCAUSE Australasia 2007
6. a centre of expertise in data curation and preservation
Science and curation
• Creating and managing data suitable for re-use
• Good curation supports good science (managing
your data properly)
• Poor curation allows sloppy science?
• Data curation should save money
• Murray-Rust/Frey on interesting but fruitless experiments!
• Some science impossible without curation…
• QCD strong coupling constant prediction (Bethke)
• Viscosity of earth mantle from Shang Dynasty eclipse
records (Pang et al)
• Science depending on past baselines (eg environmental,
social sciences)
EDUCAUSE Australasia 2007
7. a centre of expertise in data curation and preservation
Records of science
• Data increasingly important as evidence
• Key part of the scholarly record (public good)
• Unrepeatable observations & experiments
• Experimental verifiability (the basis of science)
• Would Chang retractions have been reduced if his first
data were available?
• Allows additional interpretations
• Legal and compliance
• See APSR/AERES report for good examples
EDUCAUSE Australasia 2007
8. a centre of expertise in data curation and preservation
What kinds of data?
• Observations
• eg UARS (Upper Atmosphere) Level 0: telemetry
• UARS Level 1: measured physical parameters (post
calibration?)
• Derived data
• UARS Level 2: calculated geophysical? profiles
• UARS level 3: gridded, interpolated?
• Combined data
• Crafted data
• Eg annotated gene/protein databases
• Descriptive (meta)data
EDUCAUSE Australasia 2007
9. a centre of expertise in data curation and preservation
Retaining research data means…
• Data secure against loss (within group)
• Communal repository (secure bit dump)
• Re-usable, sharable information
• As above, plus active curation (eg bio-
informatics)
• Long term preservation of information
• Be clear what you are trying to do!
EDUCAUSE Australasia 2007
10. a centre of expertise in data curation and preservation
… or the data trajectory is…
• Hard drive → lost (crash)
• Hard drive →DVD →Cardboard box →Loft
→Skip/dumpster → lost
• Sometimes this is a very bad thing
• Sometimes these are the right options!
EDUCAUSE Australasia 2007
11. a centre of expertise in data curation and preservation
Long term bit storage…
• A solved problem? Just requires well-
understood good data management
practices?
• Wrong! For very large datasets over very long
time, there are significant problems…
BAKER, M., SHAH, M., ROSENTHAL, D. S. H., ROUSSOPOLOUS, M., MANIATIS, P., GIULI, T.
J. & BUNGALE, P. (2006) A Fresh Look at the Reliability of Long-term Digital Storage. EuroSys
'06. Leuven, Belgium, ACM.
EDUCAUSE Australasia 2007
12. a centre of expertise in data curation and preservation
How Well Must We Preserve?
Keep a petabyte for a century
– With 50% chance of remaining completely undamaged
Consider each bit decaying independently
– Analogy with radioactive decay
That's a bit half- life of 10**18 years
– One hundred million times the age of the universe
That's a very demanding requirement
– Hard to measure
– Even very unlikely faults will matter a lot
EDUCAUSE Australasia 2007 •Slide from David Rosenthal, LOCKSS
13. a centre of expertise in data curation and preservation
What to do about curation
• Build curation/reusability into your workflow
• Curation begins before creation
• What’s easy at first becomes (impossibly) hard
later
• Describe your data (metadata schemas,
“representation info”, etc)
• Keep experimental parameters (technical, who,
what, when, where)
• Keep ability to process
• Keep data!
EDUCAUSE Australasia 2007
14. a centre of expertise in data curation and preservation
What to do about curation - 2
• Use standard/agreed formats for data
• Make ownership & restrictions clear, &
explain how to cite your data
• Offer for deposit in institutional or discipline
repository
• Appraisal and selection essential
• Possible time-limited embargos
• “Publish” data in support of articles
EDUCAUSE Australasia 2007
15. a centre of expertise in data curation and preservation
Internet Archaeology: publication with
data
EDUCAUSE Australasia 2007
16. a centre of expertise in data curation and preservation
Database as book…
• Buneman (early pilot)
work on IUPHAR
database
• MySQL to XML
database
• Historic to logical
schema
• XML via XSLT to LaTeX
EDUCAUSE Australasia 2007
17. a centre of expertise in data curation and preservation
The StORe vision
• Seamless transport Source
from research data to
research publications
and vice versa ware
• Bi-directional links Middle
proven in social science
e-research but capable
of export to other
disciplines
Output
•http://jiscstore.jot.com/WikiHome/
EDUCAUSE Australasia 2007 •Slide from Graham Pryor
18. a centre of expertise in data curation and preservation
What are the reusability issues?
• Data not neutral to hypothesis
• Hard to know the risks & pitfalls of a particular
dataset
• Data not self-describing: hard to find
appropriate data (but see Murray-Rust on
Googling InChi etc)
• Hard to “understand” data once found
• Really need information, not data!
• Hard to use data once understood
EDUCAUSE Australasia 2007
19. a centre of expertise in data curation and preservation
Context
• Data meaningless without context
• Metadata of many kinds
• Representation information… from data to
information
• Linkage and connection between datasets
• Use your workflow!
• Provenance
• Authenticity/integrity
• Computational lineage
EDUCAUSE Australasia 2007
20. a centre of expertise in data curation and preservation
NASA
Csat8-day composite and subsceneCsat 8-day composite subscene PAR subscene RPT
E0SST and Pbopt calc H
Ctot calc Zeu calc PPeu calc
University research
University group3 local
research
research decision-
group1
group2 making body
EDUCAUSE Australasia 2007 Slide from Rajendra Bose
21. a centre of expertise in data curation and preservation
Access and re-use
• Ethics and rights control access
• Weak in expressing this long-term
• Collaboration tools
• Annotation, discussion, review (see DART…)
• Re-use leading to change and development
• “Publication”
• Not just in “print”
• Underlying data should be “published”, too
EDUCAUSE Australasia 2007
22. a centre of expertise in data curation and preservation
Database citation issues…
• Citation for human readers and machine use cases
• Granularity: database, record, item
• Citation of changing objects
• Version change (eg W3C practice: no version = latest, vs bibliographic:
no version = first)
• An efficient way to reference and access “archived” past states of
more rapidly changing dataset, eg Genomics… datasets that result
from the combined work of curators, or contain opinions or facts likely
to change (work in progress, Buneman et al)
• Standards conflict and immature (NLM best?)
• Citation ESSENTIAL for motivating quality academic work on data
management and curation
EDUCAUSE Australasia 2007
23. a centre of expertise in data curation and preservation
Who does curation?
• Individuals
• Departments or groups
• Institutions, maybe through libraries
• Communities
• Disciplines
• Publishers
• National services
• Other 3rd parties…
EDUCAUSE Australasia 2007
24. a centre of expertise in data curation and preservation
Curation: Individual
• “Small science 2-3 times more data than Big
science”, but much more at risk
• PhD student? RA? PI? Administrator? IT support?
• Data potentially on local hard drives, or at best
shared network drives
• May be inadequately protected
• Liable for policy-led deletion on resignation
• Individual “knows” too much (tacit knowledge)
• Documentation/metadata unlikely to be adequate
• Future: gone!
EDUCAUSE Australasia 2007
26. a centre of expertise in data curation and preservation
Department: eCrystals
• Partnership with Institutional
Repository
• Specialist department
archive (& national service)
• Workflow recording of lab
parameters (R4L)
• Public & private elements
• Trying to build eCrystals
federation (eBank 3)
• Future: likely to continue
EDUCAUSE Australasia 2007
27. a centre of expertise in data curation and preservation
Data in institutional repositories
EDUCAUSE Australasia 2007
28. a centre of expertise in data curation and preservation
Institution: Cambridge Chemistry
• 175,000 small molecule
structures in CML
• Alongside Archaeology,
Manuscripts, Learning
Materials, etc
• No library curation skills;
dependent on research
group enthusiast
• Collection isolated from
other Chemistry
• (Only 5 UK institutional
repositories claim to hold
data)
• Future: assured…
EDUCAUSE Australasia 2007
29. a centre of expertise in data curation and preservation
Community: LOCKSS?
• Self-selected group of
collectors: closest to genuine
open activity (despite
Alliance)?
• Traditionally libraries
collecting eJournals
• Model respects IPR
• No domain expertise; rely on
origins
• Data limitations…
• Future: potentially very
persistent (low cost, high
reliability, attack resistance,
distributed)
EDUCAUSE Australasia 2007
30. a centre of expertise in data curation and preservation
Discipline: Atmospheric Science
• Strong believer in need
for domain scientists as
curators
• Significant participant in
“community proxy”
agenda-setting activities
• Internationally
fragmented resources
• Future: mostly
dependent on grant
funding (but strong
commitment)
EDUCAUSE Australasia 2007
31. a centre of expertise in data curation and preservation
Discipline: Pharmacology
• International Scientific
Union
• Attempting to build
credit for data
contributions
• Future: extremely
limited funding
EDUCAUSE Australasia 2007
32. a centre of expertise in data curation and preservation
Bio-informatics: Nature article
23 June 05
• Databases in Peril
• 51 out of 89 biological databases contacted reported they
were struggling financially
• 7 have closed
• Several being updated in owner’s spare time
• (Notes that not all deserve long term support)
• [Nucleic Acids Research reports 968 databases in
2007!]
• Major issue: money
EDUCAUSE Australasia 2007
33. a centre of expertise in data curation and preservation
Publisher: Crystallography
• Publisher and Scientific
Union
• Created key domain
crystallographic standard
(CIF)
• Strong motivator for deposit
of structure data
• Consistent quality checks
• DOIs used for structure data
• Future: publishing business
model
EDUCAUSE Australasia 2007 •Slide from IUCr
34. a centre of expertise in data curation and preservation
National bodies: British Library
• Serious and robust
approach
• Legal deposit powers &
responsibilities as driver
• Oriented primarily
towards “cultural
heritage” (broadly
interpreted)
• Little data, no science
domain experience
• Future: strong future
commitment
EDUCAUSE Australasia 2007
35. a centre of expertise in data curation and preservation
National bodies: TNA/NDAD
• Specialist archive for
government datasets
• Understand government
regulations, dynamics &
requirements
• Subject generalists;
disconnected from
associated science
• Technology specialists
(understand databases)
• Future: likely to pass
eventually to The National
Archives
EDUCAUSE Australasia 2007
36. a centre of expertise in data curation and preservation
3rd parties: Portico
• Specific area: eJournals
• Depends on publisher
agreements
• No data or domain
science expertise
• Future: commitment
from Mellon +
publishers +
subscriptions, good
funding mix
EDUCAUSE Australasia 2007
37. a centre of expertise in data curation and preservation
3rd Parties: Iron Mountain?
• Records management
IS a curation problem
• Organisations like this
very likely to branch out
• No domain science
expertise
• Future: business case,
viability, stock market…
EDUCAUSE Australasia 2007
38. a centre of expertise in data curation and preservation
3rd parties: Web 2.0 style,
Swivel.com??
EDUCAUSE Australasia 2007
39. a centre of expertise in data curation and preservation
Institutions & the network
• Institutions have Inst’ Inst’n Inst’n
n1 2 3
fundamental
sustainability Discipline 1 X X
• Disciplines have domain
knowledge advantage Discipline 2 X X
but sustainability is an
issue
Discipline 3 X X
• Can we get the best of
both?
• Needs serious work to etc
examine!
EDUCAUSE Australasia 2007
40. a centre of expertise in data curation and preservation
Who are the curation players?
EDUCAUSE Australasia 2007
41. a centre of expertise in data curation and preservation
Cultural change
• If we build it, will they come? NO!!
• Outreach important: communication with
scientists and researchers is hard graft
• Cultural change to new approach requires more:
• Incentives, rewards and mandates
• Successful exemplars (well publicised)
• Discipline-oriented approach (one size does not fit all)
EDUCAUSE Australasia 2007
42. a centre of expertise in data curation and preservation
Australian context?
• In the emerging context of the Research
Quality Framework, and the expected
National Collaborative Research
Infrastructure Strategy, curation can only
increase in importance!
EDUCAUSE Australasia 2007
43. a centre of expertise in data curation and preservation
Thank you
•(Citations in paper in proceedings)
EDUCAUSE Australasia 2007