The document discusses the role of research libraries in curating and preserving data. It notes that data is increasingly important as evidence in science but poses new challenges compared to traditional scholarly literature. Research libraries are well positioned to help address these challenges by building infrastructure to curate and provide access to institutional research data over the long term. However, doing so effectively will require skills in domain knowledge, metadata practices, and digital preservation that differ from traditional library roles.
DataCite and Campus Data Services
Paul Bracke, Associate Dean for Digital Programs and Information Services, Purdue University
Research libraries are increasingly interested in developing data services for their campuses. There are many perspectives, however, on how to develop services that are responsive to the many needs of scientists; sensitive to the concerns of scientists who are not always accustomed to sharing their data; and that are attractive to campus administrators. This presentation will discuss the development of campus-based data services programs, the centrality of data citation to these efforts, and the ways in which engagement with DataCite can enhance local programs.
Presentation given at CERN Workshop on Innovations in Scholarly Communication (OAI7) on 22nd June 2011
http://indico.cern.ch/conferenceDisplay.py?confId=103325
Lecture at the three-week Summer School "Data Curation" for archaeologists from Sudan, Yemen, Libya, Palestine and Tunisia at the German Archaeological Institute in Berlin from 16 July to 5 August 2017.
The workshop was planned together with the Arab League Educational, Cultural and Scientific Organization (ALECSO), as well as with the Sudanese Anti-National Service for NCAM (National Corporation for Antiquities and Museums).
University of Bath Research Data Management training for researchersJez Cope
Â
Slides from a workshop on Research Data Management for research staff and students at the University of Bath.
Part of the Research360 project (http://blogs.bath.ac.uk/research360).
Authors: Cathy Pink and Jez Cope, University of Bath
DataCite and Campus Data Services
Paul Bracke, Associate Dean for Digital Programs and Information Services, Purdue University
Research libraries are increasingly interested in developing data services for their campuses. There are many perspectives, however, on how to develop services that are responsive to the many needs of scientists; sensitive to the concerns of scientists who are not always accustomed to sharing their data; and that are attractive to campus administrators. This presentation will discuss the development of campus-based data services programs, the centrality of data citation to these efforts, and the ways in which engagement with DataCite can enhance local programs.
Presentation given at CERN Workshop on Innovations in Scholarly Communication (OAI7) on 22nd June 2011
http://indico.cern.ch/conferenceDisplay.py?confId=103325
Lecture at the three-week Summer School "Data Curation" for archaeologists from Sudan, Yemen, Libya, Palestine and Tunisia at the German Archaeological Institute in Berlin from 16 July to 5 August 2017.
The workshop was planned together with the Arab League Educational, Cultural and Scientific Organization (ALECSO), as well as with the Sudanese Anti-National Service for NCAM (National Corporation for Antiquities and Museums).
University of Bath Research Data Management training for researchersJez Cope
Â
Slides from a workshop on Research Data Management for research staff and students at the University of Bath.
Part of the Research360 project (http://blogs.bath.ac.uk/research360).
Authors: Cathy Pink and Jez Cope, University of Bath
Scientific discovery and innovation in an era of data-intensive science
William (Bill) Michener, Professor and Director of e-Science Initiatives for University Libraries, University of New Mexico; DataONE Principal Investigator
The scope and nature of biological, environmental and earth sciences research are evolving rapidly in response to environmental challenges such as global climate change, invasive species and emergent diseases. Scientific studies are increasingly focusing on long-term, broad-scale, and complex questions that require massive amounts of diverse data collected by remote sensing platforms and embedded environmental sensor networks; collaborative, interdisciplinary science teams; and new tools that promote scientific data preservation, discovery, and innovation. This talk describes the challenges facing scientists as they transition into this new era of data intensive science, presents current solutions, and lays out a roadmap to the future where new information technologies significantly increase the pace of scientific discovery and innovation.
Data Equivalence
Mark Parsons, Lead Project Manager, Senior Associate Scientist, National Snow and Ice Data Center
Data citation, especially using persistent identifiers like Digital Object Identifiers (DOIs), is an increasingly accepted scientific practice. Recently, several, respected organizations have developed guidelines for data citation. The different guidelines are largely congruent in that they agree on the basic practice and elements of data citation, especially for relatively static, whole data collections. There is less agreement on the more subtle nuances of data citation that are sometimes necessary to ensure precise reference and scientific reproducibility--the core purpose of data citation. We need to be sure that if you follow a data reference you get to the precise data that were used or at least their scientific equivalent. Identifiers such as DOIs are necessary but not sufficient for the precise, detailed, references necessary. This talk discusses issues around data set versioning, micro-citation, and scientific equivalence. I propose some interim solutions and suggest research strategies for the future.
Making your data work for you: Scratchpads, publishing & the biodiversity dat...Vince Smith
Â
This is a derivative of a talk I gave at the Linnean society on 20th Sept. 2012. This version was given at the i4Life Environmental Genomics workshop on 25th Sept. and refocused to look at the dark taxa problem and developing published descriptions of molecular sequence clusters.
Slides from keynote talk at Dawson Day 2012 (slightly revised)
Contains stats on LSE Library collection trends and overview on how we made the case for LSE Digital Library and how we are progressing with implementation.
DuraSpace is OPEN presented by:
Debra Hanken Kurtz, CEO Jonathan Markow, CSO at the
11th Annual International Conference on Open Repositories 2016, Dublin
Data Publishing and Institutional RepositoriesVarsha Khodiyar
Â
Slides presented at the Force16 panel discussion on 18th April 2016 "Libraries united in opening new scholarly platforms" https://www.force11.org/meetings/force2016/program/agenda/concurrent-session-libraries-united-opening-new-scholarly
Scientific discovery and innovation in an era of data-intensive science
William (Bill) Michener, Professor and Director of e-Science Initiatives for University Libraries, University of New Mexico; DataONE Principal Investigator
The scope and nature of biological, environmental and earth sciences research are evolving rapidly in response to environmental challenges such as global climate change, invasive species and emergent diseases. Scientific studies are increasingly focusing on long-term, broad-scale, and complex questions that require massive amounts of diverse data collected by remote sensing platforms and embedded environmental sensor networks; collaborative, interdisciplinary science teams; and new tools that promote scientific data preservation, discovery, and innovation. This talk describes the challenges facing scientists as they transition into this new era of data intensive science, presents current solutions, and lays out a roadmap to the future where new information technologies significantly increase the pace of scientific discovery and innovation.
Data Equivalence
Mark Parsons, Lead Project Manager, Senior Associate Scientist, National Snow and Ice Data Center
Data citation, especially using persistent identifiers like Digital Object Identifiers (DOIs), is an increasingly accepted scientific practice. Recently, several, respected organizations have developed guidelines for data citation. The different guidelines are largely congruent in that they agree on the basic practice and elements of data citation, especially for relatively static, whole data collections. There is less agreement on the more subtle nuances of data citation that are sometimes necessary to ensure precise reference and scientific reproducibility--the core purpose of data citation. We need to be sure that if you follow a data reference you get to the precise data that were used or at least their scientific equivalent. Identifiers such as DOIs are necessary but not sufficient for the precise, detailed, references necessary. This talk discusses issues around data set versioning, micro-citation, and scientific equivalence. I propose some interim solutions and suggest research strategies for the future.
Making your data work for you: Scratchpads, publishing & the biodiversity dat...Vince Smith
Â
This is a derivative of a talk I gave at the Linnean society on 20th Sept. 2012. This version was given at the i4Life Environmental Genomics workshop on 25th Sept. and refocused to look at the dark taxa problem and developing published descriptions of molecular sequence clusters.
Slides from keynote talk at Dawson Day 2012 (slightly revised)
Contains stats on LSE Library collection trends and overview on how we made the case for LSE Digital Library and how we are progressing with implementation.
DuraSpace is OPEN presented by:
Debra Hanken Kurtz, CEO Jonathan Markow, CSO at the
11th Annual International Conference on Open Repositories 2016, Dublin
Data Publishing and Institutional RepositoriesVarsha Khodiyar
Â
Slides presented at the Force16 panel discussion on 18th April 2016 "Libraries united in opening new scholarly platforms" https://www.force11.org/meetings/force2016/program/agenda/concurrent-session-libraries-united-opening-new-scholarly
Introductory talk for ANDS workshop on Institutional Repositories and data. The talk situates the topic within the field of scholarly communication before comparing the relative technical simplicity of running repositories of publications with the complexities that accompany a shift to data. The most-retweeted slide is the one viewing the response of repository managers to data through the lens of Elizabeth KĂźbler-Ross' stages of grieving.
The thorough integration of information technology and resources into scientific workflows has nurtured a new paradigm of data-intensive science. However, far too much research activity still takes place in silos, to the detriment of open scientific inquiry and advancement. Data-intensive science would be facilitated by more universal adoption of good data management practices ensuring the ongoing viability and usability of all legitimate research outputs, including data, and the encouragement of data publication and sharing for reuse. The centerpiece of such data sharing is the digital repository, acting as the foundation for external value-added services supporting and promoting effective data acquisition, publication, discovery, and dissemination. Since a general-purpose curation repository will not be able to offer the same level of specialized user experience provided by disciplinary tools and portals, a layered model built on a stable repository core is an appropriate division of labor, taking best advantage of the relative strengths of the concerned systems.
The Merritt repository, operated by the University of California Curation Center (UC3) at the California Digital Library (CDL), functions as a curation core for several data sharing initiatives, including the eScholarship open access publishing platform, the DataONE network, and the Open Context archaeological portal. This presentation with highlight two recent examples of external integration for purposes of research data sharing: DataShare, an open portal for biomedical data at UC, San Francisco; and Research Hub, an Alfresco-based content management system at UC, Berkeley. They both significantly extend Merrittâs coverage of the full research data lifecycle and workflows, both upstream, with augmented capabilities for data description, packaging, and deposit; and downstream, with enhanced domain-specific discovery. These efforts showcase the catalyzing effect that coupled integration of curation repositories and well-known public disciplinary search environments can have on research data sharing and scientific advancement.
Data Repositories: Recommendation, Certification and Models for Cost RecoveryAnita de Waard
Â
Talk at NITRD Workshop "Measuring the Impact of Digital Repositories" February 28 â March 1, 2017 https://www.nitrd.gov/nitrdgroups/index.php?title=DigitalRepositories
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...EUDAT
Â
| www.eudat.eu | This webinar was co-organised by DANS, EUDAT and OpenAIRE and was held on 12th and 13th December 2016.
Everybody wants to play FAIR, but how do we put the principles into practice?
There is a growing demand for quality criteria for research datasets. In this webinar we will argue that the DSA (Data Seal of Approval for data repositories) and FAIR principles get as close as possible to giving quality criteria for research data. They do not do this by trying to make value judgements about the content of datasets, but rather by qualifying the fitness for data reuse in an impartial and measurable way. By bringing the ideas of the DSA and FAIR together, we will be able to offer an operationalization that can be implemented in any certified Trustworthy Digital Repository.
In 2014 the FAIR Guiding Principles (Findable, Accessible, Interoperable and Reusable) were formulated. The well-chosen FAIR acronym is highly attractive: it is one of these ideas that almost automatically get stuck in your mind once you have heard it. In a relatively short term, the FAIR data principles have been adopted by many stakeholder groups, including research funders.
The FAIR principles are remarkably similar to the underlying principles of DSA (2005): the data can be found on the Internet, are accessible (clear rights and licenses), in a usable format, reliable and are identified in a unique and persistent way so that they can be referred to. Essentially, the DSA presents quality criteria for digital repositories, whereas the FAIR principles target individual datasets.
In this webinar the two sets of principles will be discussed and compared and a tangible operationalization will be presented.
RDAP13 John Kunze: The Data Management EcosystemASIS&T
Â
John Kunze, University of California, Curation Center
California Digital Library (CDL)
The Data Management Ecosystem
Panel: Partnerships between institutional repositories, domain repositories, and publishers
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
Presentation to the ARROW repositories day, Brisbane, 2008, on suggestions for improving the rate of capture of documents in institutional repositories
Needs for Data Management & Citation Throughout the Information Lifecycle
Micah Altman, Director of Research and Head/Scientist, Program on Information Science for the MIT Libraries, Massachusetts Institute of Technology
This session will examine data management and data citation from an information lifecycle approach. The session will discuss the implications for data management of analyzing the needs, rights, and responsibilities of researchers and other stakeholders at each lifecycle stage. And the session will discuss data citation and other related mechanisms that are useful in linking services and aligning incentives across lifecycle stages and among stakeholders.
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...datacite
Â
2013 DataCite Summer Meeting - Making Research better
DataCite. Co-sponsored by CODATA.
Thursday, 19 September 2013 at 13:00 - Friday, 20 September 2013 at 12:30
Washington, DC. National Academy of Sciences
http://datacite.eventbrite.co.uk/
Research Data Management in GLAM: Managing Data for Cultural HeritageSarah Anna Stewart
Â
Presentation given at the 'Open Science Infrastructures for Big Cultural Data' - Advanced International Masterclass in Plovdiv, Bulgaria. Dec. 13-15, 2018
Stuart Macdonald talks about the Research Data Management programme at the University of Edinburgh Data Library, delivered at the ADP Workshop for Librarians: Open Research Data in Social Sciences and Humanities (ADP), Ljubljana, Slovenia, 18 June 2014
Supporting Libraries in Leading the Way in Research Data ManagementMarieke Guy
Â
Marieke Guy, Institutional Support Officer, Digital Curation Centre, UKOLN, University of Bath, UK presents on Supporting Libraries in Leading the Way in Research Data Management at Online Information, London 20th -21st November 2012
Presentation on the theme 'democratisation of knowledge' to RLUK in December 2010. Open Science, Open Access, Open Data, Research Libraries and research data...
A presentation to the Alliance for Permanent Access to the Records of Science on the ongoing work of the Blue Ribbon Task Force on Sustainable Digital Preservation and Access
Talk at JISC Repositories conference intended for repository managers or research managers on some of the issues involved. Talk had to be originally given unaided because of a technology problem!
Dev Dives: Train smarter, not harder â active learning and UiPath LLMs for do...UiPathCommunity
Â
đĽ Speed, accuracy, and scaling â discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Miningâ˘:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing â with little to no training required
Get an exclusive demo of the new family of UiPath LLMs â GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
đ¨âđŤ Andras Palfi, Senior Product Manager, UiPath
đŠâđŤ Lenka Dulovicova, Product Program Manager, UiPath
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
Â
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
Â
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. Whatâs changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Â
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Â
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
Â
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
⢠The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
⢠Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
⢠Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
⢠Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Â
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overviewâ
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
Â
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more âmechanicalâ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Â
Saving private data, sharing Open Data? Role of libraries and institutional repositories in a data world
1. a centre of expertise in data curation and preservation
Saving private data, sharing
Open Data?
Role of libraries and institutional
repositories in a data world...
Chris Rusbridge
CURL/SCONUL e-Research Task
Force meeting 14 June 2007
Funded by:
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5
UK: Scotland License. To view a copy of this license, visit http://creativecommons
.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard
Street, 5th Floor, San Francisco, California, 94105, USA.
2. a centre of expertise in data curation and preservation
Welcome
2 CURL/SCONUL e-Research
3. a centre of expertise in data curation and preservation
Contents
⢠Role of libraries
⢠How we should start dealing with data
⢠AHDS decision implicationsâŚ
3 CURL/SCONUL e-Research
4. a centre of expertise in data curation and preservation
Digital Curation Centre Mission
âThe over-riding purpose of the DCC is to
support and promote continuing improvement
in the quality of data curation, and of
associated digital preservationâ
4 CURL/SCONUL e-Research
5. a centre of expertise in data curation and preservation
Role of Research Libraries?
⢠Research Collections
⢠Special collections
⢠Research space & facilities
⢠Services (eg ILL)
⢠+++
5 CURL/SCONUL e-Research
6. a centre of expertise in data curation and preservation
Role of Research Libraries?
⢠Research Collections⌠going virtual
⢠Special collections
⢠Research space & facilities⌠for students
⢠Services (eg ILL)⌠going virtual
⢠Diminishing strategic importance?
6 CURL/SCONUL e-Research
7. a centre of expertise in data curation and preservation
Role of Research Libraries?
⢠Major service organisation at heart of
University
⢠Addresses institutional priorities
⢠Skills in organisation of knowledge
⢠Significant budget
⢠âHearts & Mindsâ
7 CURL/SCONUL e-Research
8. a centre of expertise in data curation and preservation
Role of Research Libraries?
⢠The right place to build infrastructure for
institutional knowledge capital?
⢠Yes! But donât under-estimateâŚ
⢠The scale of change needed
⢠How much your legacy (and skills!) drag you back
⢠The need for domain knowledge
Data are different!
8 CURL/SCONUL e-Research
9. a centre of expertise in data curation and preservation
âThe Records of Scienceâ
⢠Data increasingly important as evidence
⢠Key part of the scholarly record (public good)
⢠Unrepeatable observations & experiments
⢠Value for public money (eg OECD)
⢠Experimental verifiability (the basis of science)
⢠Would Chang retractions have been reduced if his first data
were available?
CHANG, G., ROTH, C. B., REYES, C. L., PORNILLOS, O., CHEN, Y.-J. & CHEN, A. P. (2006)
Retraction of Pornillos et al., Science 310 (5756) 1950-1953. Retraction of Reyes and Chang,
Science 308 (5724) 1028-1031. Retraction of Chang and Roth, Science 293 (5536) 1793-1800.
Science Magazine, 314. http://www.sciencemag.org/cgi/content/full/314/5807/1875b
⢠Allows additional interpretations
⢠Legal and compliance (eg emerging RC mandates)
9 CURL/SCONUL e-Research
10. a centre of expertise in data curation and preservation
OECD declaration
⢠ââŚWork towards the establishment of access regimes for digital
research data from public funding in accordance with the following
objectives and principles:
⢠Openness
⢠Transparency
⢠Legal conformity
⢠Formal responsibility
⢠Professionalism
⢠Protection of intellectual property
⢠Interoperability
⢠Quality and security
⢠Efficiency
⢠Accountabilityâ
10 CURL/SCONUL e-Research
11. a centre of expertise in data curation and preservation
Repositories
⢠Document / article repositories
⢠simple metadata (discovery, description)
⢠ePrints, DSpace, Fedora, ePubsâŚ.
⢠e-Research repositories
⢠more complex metadata (discovery, description, usage control,
software parametersâŚ)
⢠âhomebrewâ systems â portals to research datasets and software
â˘Slide from Keith Jeffery
11 CURL/SCONUL e-Research
12. a centre of expertise in data curation and preservation
Not only work with
the e-literature
Scenario
repository but â˘Current Research Infrastructure Service
alsoâŚ.. ⢠project, person, organisational unit, research
output (products, patents, publications),
funding, facilities, equipment, eventsâŚâŚ
â˘e-Research repository
⢠research datasets, software
â˘e-Research
⢠control experiments, take data, visualisation,
application in-silico experiments (simulation)
middleware â˘e-Process
⢠Workflows, research applications, travel
requests, claims
â˘Slide from Keith Jeffery
12 CURL/SCONUL e-Research
13. a centre of expertise in data curation and preservation
Retaining research data meansâŚ
⢠Data secure against loss (within group)
⢠Communal repository (secure data store)
⢠Re-usable, sharable information
⢠As above, plus active curation (eg bio-
informatics)
⢠Long term preservation of information
⢠Be clear what you are trying to do!
13 CURL/SCONUL e-Research
14. a centre of expertise in data curation and preservation
⌠or the data trajectory isâŚ
⢠Hard drive â lost (crash)
⢠Hard drive âDVD âCardboard box âLoft âSkip/dumpster â
lost
⢠Sometimes this is a very bad thing
⢠Sometimes these are the right options!
14 CURL/SCONUL e-Research â˘ÂŠ Marita Bushell
15. a centre of expertise in data curation and preservation
Preservation risks
⢠Not caring enough to try
⢠No permissions to do it (or donât know what permissions we have!)
⢠Insufficient contextual information to interpret
⢠Human error
⢠Media failure
⢠Lack of money
⢠Policy failure
⢠Deliberate attack
⢠Obsolescence of format
15 CURL/SCONUL e-Research
16. a centre of expertise in data curation and preservation
Preservation
⢠âPreservation starts before creationâ
⢠Not in many IRs!
⢠Where must lifecycle involvement start?
16 CURL/SCONUL e-Research
17. a centre of expertise in data curation and preservation
17 CURL/SCONUL e-Research
18. a centre of expertise in data curation and preservation
What to do about curation
⢠Build curation/reusability into science workflow
⢠Curation begins before creation
⢠Whatâs easy at first becomes (impossibly) hard later
⢠Describe data (metadata schemas, ârepresentation infoâ, etc)
⢠Keep experimental parameters (technical, who, what, when, where)
⢠Keep ability to process
⢠Keep data!
18 CURL/SCONUL e-Research
19. a centre of expertise in data curation and preservation
What to do about curation - 2
⢠Use standard/agreed formats for data
⢠Make ownership & restrictions clear, &
explain how to cite data
⢠Offer for deposit in institutional or discipline
repository
⢠Appraisal and selection essential
⢠Possible time-limited embargos
⢠âPublishâ data in support of articles
19 CURL/SCONUL e-Research
20. a centre of expertise in data curation and preservation
Internet Archaeology: publication with
data
20 CURL/SCONUL e-Research
21. a centre of expertise in data curation and preservation
Database as bookâŚ
⢠Buneman (early pilot)
work on IUPHAR
database
⢠MySQL to XML
database
⢠Historic to logical
schema
⢠XML via XSLT to LaTeX
21 CURL/SCONUL e-Research
22. a centre of expertise in data curation and preservation
Institutional repositories and data
⢠Institutional repository managers
⢠Make contact with emerging institutional data services
⢠Start raising awareness of the need to curate rather than just dump
data
⢠Start thinking about the relationship of data to publications
(especially e-theses)
⢠Start thinking about the metadata needed to find and re-use data
⢠Make contact with key researchers
⢠Start thinking about their dataâŚ
22 CURL/SCONUL e-Research
23. a centre of expertise in data curation and preservation
What kinds of data?
⢠Observations
⢠eg UARS (Upper Atmosphere) Level 0: telemetry
⢠UARS Level 1: measured physical parameters (post
calibration?)
⢠Derived data
⢠UARS Level 2: calculated geophysical? profiles
⢠UARS level 3: gridded, interpolated?
⢠Combined data
⢠Crafted data
⢠Eg annotated gene/protein databases
⢠Descriptive (meta)data
23 CURL/SCONUL e-Research
24. a centre of expertise in data curation and preservation
StORe: Source data formats
CAD/GIS: 39
Extensible mark -up language (XML): 35
Database files (e.g. Access, MySQL): 117
Flat files (e.g. FITS): 66
Hypertext mark -up language (HTML): 60
Image files (e.g. .jpg, .tif, .bmp, .gif): 228
Plain text (.txt): 179
Portable document format (.pdf): 156
Rich text files (.rtf): 53
Spreadsheets (e.g. Excel/.xls): 220
Statistical software: 75
Tables/catalogues: 102
Word processed files (e.g. Word/.doc): 220
Other (please specify) : 76
24 â˘Slide
CURL/SCONUL e-Research from StORe project
25. a centre of expertise in data curation and preservation
StORe: the other data formats?
They said the 76 other formats included:
+latex+.cc source code, .cif (crystallographic data),
.pdb, .mtz, .pool, .root, .raw, .swf, .fla, .raw, .mpg,
binary files, chemdraw cdx, xwin nmr files, .ps files,
.fla, .swf, masslynx files, derived data in PAw-format
ntuples, raw mass spectrometry data, X-ray
diffraction data, kaleidagraphs, Atlas/ti hermeneutic
unit files, C++/shell scripts, Fourier induction decay
files, etc., etc., etc., etcâŚâŚâŚ..
25 â˘Slide
CURL/SCONUL e-Research from StORe project
26. a centre of expertise in data curation and preservation
StORe: the other data formats - more
They also said such things as:
âIt is stored in a database, but nothing so simple as an
Access file! It's one of the largest databases in the world!
The format is Kanga/Root and previously was
Objectivity. I think it's of the order of Picobytes in size.â
And:
âGod preserve us from idiots who archive data in
proprietary commercial formats (Excel spreadsheets and
MS-word documents)!â
26 â˘Slide
CURL/SCONUL e-Research from StORe project
27. a centre of expertise in data curation and preservation
Data resource stages
⢠Curated data is createdâŚ
⢠Observations? Fixed!
⢠Or AcquiredâŚ
⢠Data brought/bought from outside
⢠Ingest
⢠Development
⢠Derived, refined, combined, processed data
⢠Potentially many stages
27 CURL/SCONUL e-Research
28. a centre of expertise in data curation and preservation
What are the reusability issues?
⢠Data not neutral; highly contextual!
⢠Hard to know the risks & pitfalls of a particular
dataset
⢠Data not self-describing: hard to find
appropriate data (but see Murray-Rust on
Googling InChI etc)
⢠Hard to âunderstandâ data once found
⢠Really need information, not data!
⢠Hard to use data once understood
28 CURL/SCONUL e-Research
29. a centre of expertise in data curation and preservation
Context
⢠Data meaningless without context
⢠Metadata of many kinds
⢠Representation information⌠from data to
information
⢠Linkage and connection between datasets
⢠Provenance
⢠Authenticity/integrity
⢠Computational lineage
29 CURL/SCONUL e-Research
30. a centre of expertise in data curation and preservation
But the problem with metadata is
â˘It takes too much effort for the researcher to put it in
(many web-form-screens)
â˘So have to input incrementally, no repetition, using the
workflow..
â˘And not re-keying data stored already elsewhere in
other (linked-up) systems
â˘Slide from Keith Jeffery
30 CURL/SCONUL e-Research
31. a centre of expertise in data curation and preservation
Access and re-use
⢠Ethics and rights control access
⢠Weak in expressing this long-term
⢠Collaboration tools
⢠Annotation, discussion, review (see DARTâŚ)
⢠Re-use leading to change and development
⢠âPublicationâ
⢠Not just in âprintâ
⢠Underlying data should be âpublishedâ, too
31 CURL/SCONUL e-Research
32. a centre of expertise in data curation and preservation
Data citation issuesâŚ
⢠Citation for human readers and machine use cases
⢠Granularity: database, record, item
⢠Citation of changing objects
⢠Version change (eg W3C practice: no version = latest, vs bibliographic: no
version = first)
⢠An efficient way to reference and access âarchivedâ past states of more rapidly
changing dataset, eg Genomics⌠datasets that result from the combined work
of curators, or contain opinions or facts likely to change (work in progress,
Buneman et al)
⢠Standards conflict and immature (NLM best?)
⢠Citation ESSENTIAL for motivating quality academic work on data
management and curation
32 CURL/SCONUL e-Research
33. a centre of expertise in data curation and preservation
Institutional Repositories
⢠Now largely text
⢠OpenDOAR: only 5 Institutional Repositories claim to include
datasets
⢠Bristol
⢠Cambridge
⢠Edinburgh
⢠Leicester
⢠Southampton
⢠âŚand some of these seem doubtful on inspection!
⢠⌠of course not all research data are âdatasetsâ
33 CURL/SCONUL e-Research
34. a centre of expertise in data curation and preservation
ERA
34 CURL/SCONUL e-Research
35. a centre of expertise in data curation and preservation
Repository types
35 CURL/SCONUL e-Research
36. a centre of expertise in data curation and preservation
Repository challenges
⢠Data are different: youâll need access to some domain knowledge
⢠Appraisal/selection harder
⢠Broader range of formats
⢠Appropriate âstandardsâ for longevity? XML-based?
⢠What metadata are needed?
⢠Descriptive, to find the dataset
⢠Context and background
⢠Provenance
⢠âRepresentation informationâ to connect data to information (whatever
gives meaning to data for the âdesignated communityâ)
36 CURL/SCONUL e-Research
37. a centre of expertise in data curation and preservation
Repository challenges - 2
⢠May distort your repository
⢠Size
⢠Number of objects
⢠Rate of deposit
⢠Nature of use
⢠Databases may be dynamic
⢠Databases may need to be accessed in situ
⢠Rights and ethical limitations hard to describe and
enforce
⢠Need to build links to publications (cf StORe)
⢠Need to build discipline links across repositoriesâŚ
37 CURL/SCONUL e-Research
38. a centre of expertise in data curation and preservation
Repository challenges - 3
⢠Is your platform suitable?
⢠Most successful (ie older) data repositories
are DIY
⢠Data also held in repositories built on Dspace,
ePrints and Fedora
38 CURL/SCONUL e-Research
39. a centre of expertise in data curation and preservation
Repositories?
⢠SH: âThe trouble with many Institutional Repositories is
that they are not run by researchers but by permissions
professionals...â
⢠PMR: âI have had similar thoughts. I got the distinct
impression that some IRs are run like Victorian
museums - look but donât touch. The very word
repository suggests a funereal process - itâs no surprise
that having put much of my stuff into DSpace I find itâs
an enormous effort to get it out. Why donât we build
disseminatories instead?â
â˘From Peter Murray Rustâs blog
39 CURL/SCONUL e-Research
40. a centre of expertise in data curation and preservation
What is a Repository? - revisited
⢠from the perspective of the content consumer a
repository is just a Web site
⢠think existing Web presences⌠think BBC⌠think
museum⌠think FlickrâŚ
⢠think content management systems
⢠are these Web sites or repositories?
⢠who cares?
⢠but conceptualising the repository as a Web site
changes priorities
⢠Web architecture, Google, usability, accessibility, âŚ
â˘Slide from Andy Powell
40 CURL/SCONUL e-Research
41. a centre of expertise in data curation and preservation
Real Open Access?
⢠PMR: But the problem with the repositories is that
there is no indication that the actual thesis is
OpenAccess. The Edinburgh repository announces:
All items in ERA are protected by copyright, with all
rights reserved⌠which discourages the visitor for
looking for an Open Licence within the thesis.
⢠PMR: Here is a very simple idea: Add dc:rights to
the splash page and metadata and proudly proclaim
in large letters:
THIS THESIS CARRIES A CREATIVE COMMONS
LICENCE - ENJOY!
â˘From Peter Murray Rustâs blog
41 CURL/SCONUL e-Research
42. a centre of expertise in data curation and preservation
Open Data
⢠More than open accessâŚ
⢠Includes the right to process the content
⢠Plus the capability to process the content
⢠Implies data-oriented metadata within the content
⢠Microformats?
Datuments?
42 CURL/SCONUL e-Research
43. a centre of expertise in data curation and preservation
43 CURL/SCONUL e-Research
44. a centre of expertise in data curation and preservation
44 CURL/SCONUL e-Research
45. a centre of expertise in data curation and preservation
Who does data curation?
⢠Individuals
⢠Departments or groups
⢠Institutions, often through libraries
⢠Communities
⢠Disciplines
⢠Publishers
⢠National services
⢠Other 3rd partiesâŚ
45 CURL/SCONUL e-Research
46. a centre of expertise in data curation and preservation
Who are the curation players?
46 CURL/SCONUL e-Research
47. a centre of expertise in data curation and preservation
Who are the curation players?
47 CURL/SCONUL e-Research
48. a centre of expertise in data curation and preservation
AHDS
⢠âAHRC Council has decided to cease funding the Arts and
Humanities Data Service (AHDS) from March 2008. [âŚ] Grant
holders must make materials they had planned to deposit with the
AHDS available in an accessible depository for at least three years
after the end of their grantâ
⢠AHRC Press Release 14/05/2007
⢠(Note petition at http://petitions.pm.gov.uk/AHDSfunding/)
⢠Does not apply to Archaeology: ADS still funded?
⢠âCouncil believes that long term storage of digital materials and
sustainability is best dealt with by an active engagement with HEIs
rather than through a centralised serviceâ
⢠âJISC has decided that it is unable to fund the service alone and
that therefore its own funding of the service will, in its current form,
cease on the same date. [âŚ] exploring with the AHDS ⌠and the
wider community alternative approaches to maintaining strong
support for that community beyond March 2008â
⢠JISC Press Release 13/06/2007
48 CURL/SCONUL e-Research
49. a centre of expertise in data curation and preservation
Repatriating resources?
⢠Complexity:
multimedia
(text DB,
image DB,
video
interviews,
VRML
models)
49 CURL/SCONUL e-Research
50. a centre of expertise in data curation and preservation
Repatriating Edinburgh resource?
⢠Different content:
Access database
⢠âcollection of
3.5K
bibliographic
entries on
secondary
literature on
avant-garde and
neo-avant-garde
related themesâ
⢠Documentation
50 CURL/SCONUL e-Research
51. a centre of expertise in data curation and preservation
New challenge?
⢠Can we find a way to combine sustainability
(but generality) of institutional repositories
with science focus (aiming to reduce the high
risk) of domain repositories?
⢠Some sort of domain repository in the network
space?
51 CURL/SCONUL e-Research
52. a centre of expertise in data curation and preservation
Cultural change
⢠If we build it, will they come? NO!!
⢠Outreach important: communication with
scientists and researchers is hard graft
⢠Cultural change to new approach requires more:
⢠Incentives, rewards and mandates
⢠Successful exemplars (well publicised)
⢠Discipline-oriented approach (one size does not fit all)
52 CURL/SCONUL e-Research
53. a centre of expertise in data curation and preservation
Financial sustainability: the 8 pillars
of wisdom?
⢠Someone has to payâŚ
⢠Consumer pays: subscription or usage?
⢠Depositor pays (ie grant or institution)?
⢠Institution pays (IR, cf library/archive/museum)
⢠Community (discipline repository?) pays
⢠Government, or science funder
⢠Learned society?
⢠Volunteers (cf open source, social computing, LOCKSS)?
⢠Side effect (advertiser) pays (unlikely for much data?)
⢠Endowment or donor paysâŚ
⢠Diversity?
53 CURL/SCONUL e-Research
54. a centre of expertise in data curation and preservation
Role of libraries
⢠2-4% of university budgets (âThereâs plenty of
money⌠thereâs just not plenty of money for
everything!â Courant)?
⢠Traditional role in sustaining the raw material
of scholarship
⢠Looking for new roles in the digital world?
⢠Many unsaid assumptions from publishing
paradigm?
⢠Domain knowledge: wide but not deep
⢠Involvement in data creation low
54 CURL/SCONUL e-Research
55. a centre of expertise in data curation and preservation
Thank you
c.rusbridge@ed.ac.uk
55 CURL/SCONUL e-Research
Editor's Notes
Slide 24 - And also link up sources of information, internal (HR, LDAP..) and external (bibliographic databases, citebase, crossref, funder grant databasesâŚ) to make sure that if data is available online somewhere, the researcher does not have to rekey it. I know you go on to make this point in the context of CERIF, but the point is broader than that and we can make progress beyond CERIF-compliant systems.