This document discusses challenges related to curating scientific data in repositories. It notes that data is increasingly important as evidence and for verifying scientific results. However, data loses meaning without proper context and curation beginning in the research workflow. The document examines issues like data formats, metadata, access and reuse, citation, and technological challenges for repositories in dealing with diverse data. It also explores who performs data curation roles like individuals, institutions, communities, publishers and national services.
Presentation given at CERN Workshop on Innovations in Scholarly Communication (OAI7) on 22nd June 2011
http://indico.cern.ch/conferenceDisplay.py?confId=103325
Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012Idafen Santana Pérez
Slides from a weekly talk at OEG in December 2012, focusing on defining the main concepts for my thesis, including what we understand by reproducibility, replicability, conservation, and preservation.
Presentation given at CERN Workshop on Innovations in Scholarly Communication (OAI7) on 22nd June 2011
http://indico.cern.ch/conferenceDisplay.py?confId=103325
Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012Idafen Santana Pérez
Slides from a weekly talk at OEG in December 2012, focusing on defining the main concepts for my thesis, including what we understand by reproducibility, replicability, conservation, and preservation.
Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017Deborah McGuinness
Ontologies are seeing a resurgence of interest and usage as big data proliferates, machine learning advances, and integration of data becomes more paramount. The previous models of sometimes labor-intensive, centralized ontology construction and maintenance do not mesh well in today’s interdisciplinary world that is in the midst of a big data, information extraction, and machine learning explosion. In this talk, we will provide some historical perspective on ontologies and their usage, and discuss a model of building and maintaining large collaborative, interdisciplinary ontologies along with the data repositories and data services that they empower. We will give a few examples of heterogeneous semantic data resources made more interconnected and more powerful by ontology-supported infrastructures, discuss a vision for ontology-enabled future research and provide some examples in a large health empowerment joint effort between RPI and IBM Watson Health.
DataCite and Campus Data Services
Paul Bracke, Associate Dean for Digital Programs and Information Services, Purdue University
Research libraries are increasingly interested in developing data services for their campuses. There are many perspectives, however, on how to develop services that are responsive to the many needs of scientists; sensitive to the concerns of scientists who are not always accustomed to sharing their data; and that are attractive to campus administrators. This presentation will discuss the development of campus-based data services programs, the centrality of data citation to these efforts, and the ways in which engagement with DataCite can enhance local programs.
RDAP13 John Kunze: The Data Management EcosystemASIS&T
John Kunze, University of California, Curation Center
California Digital Library (CDL)
The Data Management Ecosystem
Panel: Partnerships between institutional repositories, domain repositories, and publishers
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
Presentation to the ARROW repositories day, Brisbane, 2008, on suggestions for improving the rate of capture of documents in institutional repositories
Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017Deborah McGuinness
Ontologies are seeing a resurgence of interest and usage as big data proliferates, machine learning advances, and integration of data becomes more paramount. The previous models of sometimes labor-intensive, centralized ontology construction and maintenance do not mesh well in today’s interdisciplinary world that is in the midst of a big data, information extraction, and machine learning explosion. In this talk, we will provide some historical perspective on ontologies and their usage, and discuss a model of building and maintaining large collaborative, interdisciplinary ontologies along with the data repositories and data services that they empower. We will give a few examples of heterogeneous semantic data resources made more interconnected and more powerful by ontology-supported infrastructures, discuss a vision for ontology-enabled future research and provide some examples in a large health empowerment joint effort between RPI and IBM Watson Health.
DataCite and Campus Data Services
Paul Bracke, Associate Dean for Digital Programs and Information Services, Purdue University
Research libraries are increasingly interested in developing data services for their campuses. There are many perspectives, however, on how to develop services that are responsive to the many needs of scientists; sensitive to the concerns of scientists who are not always accustomed to sharing their data; and that are attractive to campus administrators. This presentation will discuss the development of campus-based data services programs, the centrality of data citation to these efforts, and the ways in which engagement with DataCite can enhance local programs.
RDAP13 John Kunze: The Data Management EcosystemASIS&T
John Kunze, University of California, Curation Center
California Digital Library (CDL)
The Data Management Ecosystem
Panel: Partnerships between institutional repositories, domain repositories, and publishers
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
Presentation to the ARROW repositories day, Brisbane, 2008, on suggestions for improving the rate of capture of documents in institutional repositories
Supplementary presentation slides from a lecture on digital preservation given at the University of the West of England (UWE) as part of the MSc in Library and Library Management, University of the West of England, Frenchay Campus, Bristol, March 10, 2010
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...datacite
2013 DataCite Summer Meeting - Making Research better
DataCite. Co-sponsored by CODATA.
Thursday, 19 September 2013 at 13:00 - Friday, 20 September 2013 at 12:30
Washington, DC. National Academy of Sciences
http://datacite.eventbrite.co.uk/
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
Presentation given at the M25 Consortium of Academic Libraries, CPD25 Event on 'The Role of the Library in Supporting Research'. Provides an introduction to data, software and PIDs and a brief look at how libraries can enable researchers to gain impact and credit for their research data and software.
Presentation on the theme 'democratisation of knowledge' to RLUK in December 2010. Open Science, Open Access, Open Data, Research Libraries and research data...
A presentation to the Alliance for Permanent Access to the Records of Science on the ongoing work of the Blue Ribbon Task Force on Sustainable Digital Preservation and Access
Talk at JISC Repositories conference intended for repository managers or research managers on some of the issues involved. Talk had to be originally given unaided because of a technology problem!
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
Curation of scientifica data: Challenges for repositories
1. a centre of expertise in data curation and preservation
Curation of Scientific Data:
Challenges for Repositories
Chris Rusbridge
JISC Repositories Conference
5 June 2007, Manchester
Funded by:
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5
UK: Scotland License, excluding content property of others. To view a copy of this license, visit
http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative
Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
2. a centre of expertise in data curation and preservation
Contents
• Audience?
• Science and digital curation
• Why are data important?
• What kinds of data?
• What to do with data?
• Repository options
• Changing practice
JISC Repositories 2007
3. a centre of expertise in data curation and preservation
Audience
• I assume you are either…
• A Repository Manager concerned about adding
data to your collections of ePrints (most likely), or
• A research data manager or other researcher,
concerned about finding an appropriate repository
to curate your data (possibly), or
• Neither of the above, in the wrong room, just come
in to get out of the sun…
JISC Repositories 2007
4. a centre of expertise in data curation and preservation
Digital Curation Centre Mission
“The over-riding purpose of the DCC is to
support and promote continuing improvement
in the quality of data curation, and of
associated digital preservation”
JISC Repositories 2007
5. a centre of expertise in data curation and preservation
JISC Repositories 2007
6. a centre of expertise in data curation and preservation
“The Records of Science”
• Data increasingly important as evidence
• Key part of the scholarly record (public good)
• Unrepeatable observations & experiments
• Value for public money (eg OECD)
• Experimental verifiability (the basis of science)
• Would Chang retractions have been reduced if his first data
were available?
CHANG, G., ROTH, C. B., REYES, C. L., PORNILLOS, O., CHEN, Y.-J. & CHEN, A. P. (2006)
Retraction of Pornillos et al., Science 310 (5756) 1950-1953. Retraction of Reyes and Chang,
Science 308 (5724) 1028-1031. Retraction of Chang and Roth, Science 293 (5536) 1793-1800.
Science Magazine, 314. http://www.sciencemag.org/cgi/content/full/314/5807/1875b
• Allows additional interpretations
• Legal and compliance (eg emerging RC mandates)
JISC Repositories 2007
7. a centre of expertise in data curation and preservation
OECD declaration
• “…Work towards the establishment of access regimes
for digital research data from public funding in
accordance with the following objectives and principles:
• Openness
• Transparency
• Legal conformity
• Formal responsibility
• Professionalism
• Protection of intellectual property
• Interoperability
• Quality and security
• Efficiency
• Accountability”
JISC Repositories 2007
8. a centre of expertise in data curation and preservation
Retaining research data means…
• Data secure against loss (within group)
• Communal repository (secure data store)
• Re-usable, sharable information
• As above, plus active curation (eg bio-
informatics)
• Long term preservation of information
• Be clear what you are trying to do!
JISC Repositories 2007
10. a centre of expertise in data curation and preservation
Long term bit storage…
• A solved problem? Just requires well-
understood good data management
practices?
• Wrong! For very large datasets over very long
time, there are significant problems…
BAKER, M., SHAH, M., ROSENTHAL, D. S. H., ROUSSOPOLOUS, M., MANIATIS, P., GIULI, T.
J. & BUNGALE, P. (2006) A Fresh Look at the Reliability of Long-term Digital Storage. EuroSys
'06. Leuven, Belgium, ACM.
JISC Repositories 2007
11. a centre of expertise in data curation and preservation
How Well Must We Preserve?
Keep a petabyte for a century
– With 50% chance of remaining completely undamaged
Consider each bit decaying independently
– Analogy with radioactive decay
That's a bit half- life of 10**18 years
– One hundred million times the age of the universe
That's a very demanding requirement
– Hard to measure
– Even very unlikely faults will matter a lot
JISC Repositories 2007 •Slide from David Rosenthal, LOCKSS
12. a centre of expertise in data curation and preservation
What to do about curation
• Build curation/reusability into science workflow
• Curation begins before creation
• What’s easy at first becomes (impossibly) hard later
• Describe data (metadata schemas, “representation info”,
etc)
• Keep experimental parameters (technical, who, what, when,
where)
• Keep ability to process
• Keep data!
JISC Repositories 2007
13. a centre of expertise in data curation and preservation
What to do about curation - 2
• Use standard/agreed formats for data
• Make ownership & restrictions clear, &
explain how to cite data
• Offer for deposit in institutional or discipline
repository
• Appraisal and selection essential
• Possible time-limited embargos
• “Publish” data in support of articles
JISC Repositories 2007
14. a centre of expertise in data curation and preservation
Internet Archaeology: publication with
data
JISC Repositories 2007
15. a centre of expertise in data curation and preservation
Database as book…
• Buneman (early pilot)
work on IUPHAR
database
• MySQL to XML
database
• Historic to logical
schema
• XML via XSLT to LaTeX
JISC Repositories 2007
16. a centre of expertise in data curation and preservation
The StORe vision
• Seamless transport Source
from research data to
research publications
and vice versa ware
• Bi-directional links Middle
proven in social science
e-research but capable
of export to other
disciplines
Output
•http://jiscstore.jot.com/WikiHome/
JISC Repositories 2007 •Slide from Graham Pryor
17. a centre of expertise in data curation and preservation
StORe survey: linkage value?
The value of
University University
direct links PG Contract Independent
academic research Other Totals
from source to student researcher researcher
staff assistant
output data
Significant
advantage 85 18 33 11 2 26 175
Useful 78 9 41 5 4 9 146
Interesting 24 4 5 3 0 5 41
Of no interest 9 0 0 0 0 1 10
Not sure 7 0 7 0 1 2 17
Other 1 1 0 0 0 1 3
Totals 204 32 86 19 7 44 392
•But: “researchers’ attitudes to enabling access depend to a large
•extent on whether they are behaving as producers or users of data”
JISC Repositories 2007 •Slide from StORe project
18. a centre of expertise in data curation and preservation
What to do about data (3)
• Institutional repository managers
• Make contact with emerging institutional data services
• Start raising awareness of the need to curate rather than just
dump data
• Start thinking about the relationship of data to publications
(especially e-theses)
• Start thinking about the metadata needed to find and re-use
data
• Make contact with key researchers
• Start thinking about their data…
JISC Repositories 2007
19. a centre of expertise in data curation and preservation
What kinds of data?
• Observations
• eg UARS (Upper Atmosphere) Level 0: telemetry
• UARS Level 1: measured physical parameters (post
calibration?)
• Derived data
• UARS Level 2: calculated geophysical? profiles
• UARS level 3: gridded, interpolated?
• Combined data
• Crafted data
• Eg annotated gene/protein databases
• Descriptive (meta)data
JISC Repositories 2007
20. a centre of expertise in data curation and preservation
StORe: Source data formats
CAD/GIS: 39
Extensible mark -up language (XML): 35
Database files (e.g. Access, MySQL): 117
Flat files (e.g. FITS): 66
Hypertext mark -up language (HTML): 60
Image files (e.g. .jpg, .tif, .bmp, .gif): 228
Plain text (.txt): 179
Portable document format (.pdf): 156
Rich text files (.rtf): 53
Spreadsheets (e.g. Excel/.xls): 220
Statistical software: 75
Tables/catalogues: 102
Word processed files (e.g. Word/.doc): 220
Other (please specify) : 76
JISC Repositories 2007 •Slide from StORe project
21. a centre of expertise in data curation and preservation
StORe: the other data formats?
They said the 76 other formats included:
+latex+.cc source code, .cif (crystallographic data),
.pdb, .mtz, .pool, .root, .raw, .swf, .fla, .raw, .mpg,
binary files, chemdraw cdx, xwin nmr files, .ps files,
.fla, .swf, masslynx files, derived data in PAw-format
ntuples, raw mass spectrometry data, X-ray
diffraction data, kaleidagraphs, Atlas/ti hermeneutic
unit files, C++/shell scripts, Fourier induction decay
files, etc., etc., etc., etc………..
JISC Repositories 2007 •Slide from StORe project
22. a centre of expertise in data curation and preservation
StORe: the other data formats - more
They also said such things as:
“It is stored in a database, but nothing so simple as an
Access file! It's one of the largest databases in the world!
The format is Kanga/Root and previously was
Objectivity. I think it's of the order of Picobytes in size.”
And:
“God preserve us from idiots who archive data in
proprietary commercial formats (Excel spreadsheets and
MS-word documents)!”
JISC Repositories 2007 •Slide from StORe project
23. a centre of expertise in data curation and preservation
What are the reusability issues?
• Data not neutral; highly contextual!
• Hard to know the risks & pitfalls of a particular
dataset
• Data not self-describing: hard to find
appropriate data (but see Murray-Rust on
Googling InChI etc)
• Hard to “understand” data once found
• Really need information, not data!
• Hard to use data once understood
JISC Repositories 2007
24. a centre of expertise in data curation and preservation
Context
• Data meaningless without context
• Metadata of many kinds
• Representation information… from data to
information
• Linkage and connection between datasets
• Provenance
• Authenticity/integrity
• Computational lineage
JISC Repositories 2007
25. a centre of expertise in data curation and preservation
Access and re-use
• Ethics and rights control access
• Weak in expressing this long-term
• Collaboration tools
• Annotation, discussion, review (see DART…)
• Re-use leading to change and development
• “Publication”
• Not just in “print”
• Underlying data should be “published”, too
JISC Repositories 2007
26. a centre of expertise in data curation and preservation
Data citation issues…
• Citation for human readers and machine use cases
• Granularity: database, record, item
• Citation of changing objects
• Version change (eg W3C practice: no version = latest, vs bibliographic:
no version = first)
• An efficient way to reference and access “archived” past states of
more rapidly changing dataset, eg Genomics… datasets that result
from the combined work of curators, or contain opinions or facts likely
to change (work in progress, Buneman et al)
• Standards conflict and immature (NLM best?)
• Citation ESSENTIAL for motivating quality academic work on data
management and curation
JISC Repositories 2007
27. a centre of expertise in data curation and preservation
Repository challenges
• Data are different: you’ll need access to some domain
knowledge
• Appraisal/selection harder
• Broader range of formats
• Appropriate “standards” for longevity? XML-based?
• What metadata are needed?
• Descriptive, to find the dataset
• Context and background
• Provenance
• “Representation information” to connect data to information
(whatever gives meaning to data for the “designated
community”)
JISC Repositories 2007
28. a centre of expertise in data curation and preservation
Repository challenges - 2
• May distort your repository
• Size
• Number of objects
• Rate of deposit
• Nature of use
• Databases may be dynamic
• Databases may need to be accessed in situ
• Rights and ethical limitations hard to describe and
enforce
• Need to build links to publications (cf StORe)
• Need to build discipline links across repositories…
JISC Repositories 2007
29. a centre of expertise in data curation and preservation
Repository challenges - 3
• Is your platform suitable?
• Most successful (ie older) data repositories
are DIY
• Data also held in repositories built on Dspace,
ePrints and Fedora
JISC Repositories 2007
30. a centre of expertise in data curation and preservation
JISC Repositories 2007 •Data from MIT DSpace Political Science
31. a centre of expertise in data curation and preservation
JISC Repositories 2007
32. a centre of expertise in data curation and preservation
JISC Repositories 2007
33. a centre of expertise in data curation and preservation
Who does data curation?
• Individuals
• Departments or groups
• Institutions, often through libraries
• Communities
• Disciplines
• Publishers
• National services
• Other 3rd parties…
JISC Repositories 2007
34. a centre of expertise in data curation and preservation
Who are the curation players?
JISC Repositories 2007
35. a centre of expertise in data curation and preservation
Disciplinary repositories…
• >900 Nucleic Acids datasets!
• ESDS/UKDA and NERC data centres, but…
• “AHRC Council has decided to cease funding the Arts
and Humanities Data Service (AHDS) from March
2008. […] Grant holders must make materials they
had planned to deposit with the AHDS available in an
accessible depository for at least three years after the
end of their grant”
• AHRC Press Release 14/05/2007
• (Note petition at http://petitions.pm.gov.uk/AHDSfunding/)
• Does not apply to Archaeology: ADS still funded?
JISC Repositories 2007
36. a centre of expertise in data curation and preservation
Institutional Repositories
• OpenDOAR: only 5 Institutional Repositories claim to
include datasets
• Bristol
• Cambridge
• Edinburgh
• Leicester
• Southampton
• …and some of these seem doubtful on inspection!
• … of course not all research data are “datasets”
JISC Repositories 2007
37. a centre of expertise in data curation and preservation
Cultural change
• If we build it, will they come? NO!!
• Outreach important: communication with
scientists and researchers is hard graft
• Cultural change to new approach requires more:
• Incentives, rewards and mandates
• Successful exemplars (well publicised)
• Discipline-oriented approach (one size does not fit all)
JISC Repositories 2007
38. a centre of expertise in data curation and preservation
Need for advocacy?
What functionality is missing from source repositories?
Academic Research Post- Independent
staff assistants graduates researchers
None 9 2 7
Don’t use 7 10 1
Lack of 3 4 2
knowledge
Don’t know 5 3 13 1
No reply 129 20 45 13
JISC Repositories 2007 •Slide from StORe project
39. a centre of expertise in data curation and preservation
Need for advocacy?
What functionality is missing from output repositories?
Academic Research Post- Independent
staff assistants graduates researchers
None 3 2 5 1
Don’t use 1 1
Lack of 2 1
knowledge
Don’t know 2 6 1
No reply 123 15 48 15
JISC Repositories 2007 •Slide from StORe project
40. a centre of expertise in data curation and preservation
Need for advocacy?
“The majority of academics do not know
what repositories are nor are they
familiar with the issues around new
means of dissemination”
– UKOLN/Eduserv Foundation: Digital
Repositories Roadmap: looking forward, April
2006
JISC Repositories 2007 •Slide from StORe project
41. a centre of expertise in data curation and preservation
Thank you
c.rusbridge@ed.ac.uk
JISC Repositories 2007