The document discusses the need for an ecosystem to better manage research data through its entire lifecycle, from creation to publication to sharing and reuse. It proposes that libraries can play a key role in this ecosystem by providing services like curation repositories, identifiers, metadata, and tools to help researchers publish, share, and get credit for their data. The goal is to improve data discovery, access, attribution, and incentivize data sharing to make research data as integral to the scholarly record as journal articles.
presented by Stuart Macdonald at the College of Science and Engineering - "What's new for you in the Library“, Murray Library, Kings Buildings, University of Edinburgh. 28 May 2014
Covers research data, research data management, funder policies and the University's RDM policy, RDM services and support, awareness raising, training, progress so far.
Presentation given by Stuart Macdonald at the International Workshop on ICT and e-Knowledge for the Developing World in Shanghai International Convention Center, Pudong, Shanghai.
presented by Stuart Macdonald at the College of Science and Engineering - "What's new for you in the Library“, Murray Library, Kings Buildings, University of Edinburgh. 28 May 2014
Covers research data, research data management, funder policies and the University's RDM policy, RDM services and support, awareness raising, training, progress so far.
Presentation given by Stuart Macdonald at the International Workshop on ICT and e-Knowledge for the Developing World in Shanghai International Convention Center, Pudong, Shanghai.
Presented by Peter Burnhill at the ost ALA Annual Holdings Update Forum, Universal and repurposed holdings information -- Emerging initiatives and projects, Morial Convention Center, New Orleans, Louisiana, USA, 25 June 2011
Delivered by Peter Burnhill, Director of EDINA, at the PRELIDA Consolidation and Dissemination workshop on 17/18 October 2014 (http://prelida.eu/consolidation-workshop).
Summary: The web changes over time, and significant reference rot inevitably occurs. Web archiving delivers only a 50% chance of success. So in addition to the original URI, the link should be augmented with temporal context to increase robustness.
Presented by Peter Burnhill, Director of EDINA, Beyond Books: What STM & Social Science publishing should learn from each other, London. Conference programme. 22 April 2010.
Presentation given by Peter Burnhill, director of EDINA, at #ReCon_15 : Beyond the paper: publishing data, software and more. Edinburgh, 19 June 2015
Peter Burnhill
http://reconevent.com/
Presented by Peter Burnhill at the ost ALA Annual Holdings Update Forum, Universal and repurposed holdings information -- Emerging initiatives and projects, Morial Convention Center, New Orleans, Louisiana, USA, 25 June 2011
Delivered by Peter Burnhill, Director of EDINA, at the PRELIDA Consolidation and Dissemination workshop on 17/18 October 2014 (http://prelida.eu/consolidation-workshop).
Summary: The web changes over time, and significant reference rot inevitably occurs. Web archiving delivers only a 50% chance of success. So in addition to the original URI, the link should be augmented with temporal context to increase robustness.
Presented by Peter Burnhill, Director of EDINA, Beyond Books: What STM & Social Science publishing should learn from each other, London. Conference programme. 22 April 2010.
Presentation given by Peter Burnhill, director of EDINA, at #ReCon_15 : Beyond the paper: publishing data, software and more. Edinburgh, 19 June 2015
Peter Burnhill
http://reconevent.com/
The International Federation of Library Associations and Institutions (IFLA) is responsible for the development and maintenance of International Standard Bibliographic Description (ISBD), UNIMARC, and the "Functional Requirements" family for bibliographic records (FRBR), authority data (FRAD), and subject authority data (FRSAD). ISBD underpins the MARC family of formats used by libraries world-wide for many millions of catalog records, while FRBR is a relatively new model optimized for users and the digital environment. These metadata models, schemas, and content rules are now being expressed in the Resource Description Framework language for use in the Semantic Web.
This webinar provides a general update on the work being undertaken. It describes the development of an Application Profile for ISBD to specify the sequence, repeatability, and mandatory status of its elements. It discusses issues involved in deriving linked data from legacy catalogue records based on monolithic and multi-part schemas following ISBD and FRBR, such as the duplication which arises from copy cataloging and FRBRization. The webinar provides practical examples of deriving high-quality linked data from the vast numbers of records created by libraries, and demonstrates how a shift of focus from records to linked-data triples can provide more efficient and effective user-centered resource discovery services.
A controversial discussion of the utility of DBpedia as authority data with examples from a project at the Library of Congress. Part of an ExLibris-sponsored panel discussion at ALA Chicago 2009.
Metadata plays an increasingly central role as a tool enabling the large-scale, distributed management of resources. However, metadata communities which have traditionally worked in relative isolation have struggled to make their specifications interoperate with others in the shared web environment.
This webinar explores how metadata standards with significantly different characteristics can productively coexist and how previously isolated metadata communities can work towards harmonization. The webinar presents a solution-oriented analysis of current issues in metadata harmonization with a focus on specifications of importance to the learning technology and library environments, notably Dublin Core, IEEE Learning Object Metadata, and W3C's Resource Description Framework. Providing concrete illustrations of harmonization problems and a roadmap for designing metadata for maximum interoperability, this webinar will provide a bird's-eye perspective on the respective roles of metadata syntaxes, formats, semantics, abstract models, vocabularies, and application profiles in achieving metadata harmonization.
70+ slides of highlights and quotes from all of the MozCon Day #1. See all of our coverage at http://www.contentharmony.com/blog/mozcon-2013-coverage/ & http://www.contentharmony.com/blog/mozcon-2013-tools/
Functional and Architectural Requirements for Metadata: Supporting Discovery...Jian Qin
The tremendous growth in digital data has led to an increase in metadata initiatives for different types of scientific data, as evident in Ball’s survey (2009). Although individual communities have specific needs, there are shared goals that need to be recognized if systems are to effectively support data sharing within and across all domains. This paper considers this need, and explores systems requirements that are essential for metadata supporting the discovery and management of scientific data. The paper begins with an introduction and a review of selected research specific to metadata modeling in the sciences. Next, the paper’s goals are stated, followed by the presentation of valuable systems requirements. The results include a base-model with three chief principles: principle of least effort, infrastructure service, and portability. The principles are intended to support “data user” tasks. Results also include a set of defined user tasks and functions, and applications scenarios.
Slides from a talk I gave at Perspectives Workshop on Semantic Web, http://www.dagstuhl.de/en/program/calendar/semhp/?semnr=09271 ... Dagstuhl, Germany 2009-06-29. Title was from Jim Hender!
Open data is a crucial prerequisite for inventing and disseminating the innovative practices needed for agricultural development. To be usable, data must not just be open in principle—i.e., covered by licenses that allow re-use. Data must also be published in a technical form that allows it to be integrated into a wide range of applications. The webinar will be of interest to any institution seeking ways to publish and curate data in the Linked Data cloud.
This webinar describes the technical solutions adopted by a widely diverse global network of agricultural research institutes for publishing research results. The talk focuses on AGRIS, a central and widely-used resource linking agricultural datasets for easy consumption, and AgriDrupal, an adaptation of the popular, open-source content management system Drupal optimized for producing and consuming linked datasets.
Agricultural research institutes in developing countries share many of the constraints faced by libraries and other documentation centers, and not just in developing countries: institutions are expected to expose their information on the Web in a re-usable form with shoestring budgets and with technical staff working in local languages and continually lured by higher-paying work in the private sector. Technical solutions must be easy to adopt and freely available.
Slides from my talk "Unicorns and Other Wild Things" at the IA Summit 2013 in Baltimore, MD.
Audio and transcripts: http://library.iasummit.org/podcasts/unicorns-and-other-wild-things/
http://lanyrd.com/2013/iasummit/sccqpm/
About the Webinar
In May 2012, the Library of Congress announced a new modeling initiative focused on reflecting the MARC 21 library standard as a Linked Data model for the Web, with an initial model to be proposed by the consulting company Zepheira. The goal of the initiative is to translate the MARC 21 format to a Linked Data model while retaining the richness and benefits of existing data in the historical format.
In this webinar, Eric Miller of Zepheira will report on progress towards this important goal, starting with an analysis of the translation problem and concluding with potential migration scenarios for a broad-based transition from MARC to a new bibliographic framework.
As described in the April NISO/DCMI webinar by Dan Brickley, schema.org is a search-engine initiative aimed at helping webmasters use structured data markup to improve the discovery and display of search results. Drupal 7 makes it easy to markup HTML pages with schema.org terms, allowing users to quickly build websites with structured data that can be understood by Google and displayed as Rich Snippets.
Improved search results are only part of the story, however. Data-bearing documents become machine-processable once you find them. The subject matter, important facts, calendar events, authorship, licensing, and whatever else you might like to share become there for the taking. Sales reports, RSS feeds, industry analysis, maps, diagrams and process artifacts can now connect back to other data sets to provide linkage to context and related content. The key to this is the adoption standards for both the data model (RDF) and the means of weaving it into documents (RDFa). Drupal 7 has become the leading content platform to adopt these standards.
This webinar will describe how RDFa and Drupal 7 can improve how organizations publish information and data on the Web for both internal and external consumption. It will discuss what is required to use these features and how they impact publication workflow. The talk will focus on high-level and accessible demonstrations of what is possible. Technical people should learn how to proceed while non-technical people will learn what is possible.
As described in the April NISO/DCMI webinar by Dan Brickley, schema.org is a search-engine initiative aimed at helping webmasters use structured data markup to improve the discovery and display of search results. Drupal 7 makes it easy to markup HTML pages with schema.org terms, allowing users to quickly build websites with structured data that can be understood by Google and displayed as Rich Snippets.
Improved search results are only part of the story, however. Data-bearing documents become machine-processable once you find them. The subject matter, important facts, calendar events, authorship, licensing, and whatever else you might like to share become there for the taking. Sales reports, RSS feeds, industry analysis, maps, diagrams and process artifacts can now connect back to other data sets to provide linkage to context and related content. The key to this is the adoption standards for both the data model (RDF) and the means of weaving it into documents (RDFa). Drupal 7 has become the leading content platform to adopt these standards.
This webinar will describe how RDFa and Drupal 7 can improve how organizations publish information and data on the Web for both internal and external consumption. It will discuss what is required to use these features and how they impact publication workflow. The talk will focus on high-level and accessible demonstrations of what is possible. Technical people should learn how to proceed while non-technical people will learn what is possible.
RDAP13 John Kunze: The Data Management EcosystemASIS&T
John Kunze, University of California, Curation Center
California Digital Library (CDL)
The Data Management Ecosystem
Panel: Partnerships between institutional repositories, domain repositories, and publishers
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
Presentation given at the M25 Consortium of Academic Libraries, CPD25 Event on 'The Role of the Library in Supporting Research'. Provides an introduction to data, software and PIDs and a brief look at how libraries can enable researchers to gain impact and credit for their research data and software.
Research Data Management in GLAM: Managing Data for Cultural HeritageSarah Anna Stewart
Presentation given at the 'Open Science Infrastructures for Big Cultural Data' - Advanced International Masterclass in Plovdiv, Bulgaria. Dec. 13-15, 2018
2013 DataCite Summer Meeting - Purdue University Research Repository (PURR) (...datacite
2013 DataCite Summer Meeting - Making Research better
DataCite. Co-sponsored by CODATA.
Thursday, 19 September 2013 at 13:00 - Friday, 20 September 2013 at 12:30
Washington, DC. National Academy of Sciences
http://datacite.eventbrite.co.uk/
Enabling better science - Results and vision of the OpenAIRE infrastructure a...Paolo Manghi
Enabling better science: presentation on the results and vision of the OpenAIRE infrastructure and RDA Publishing Data Services Working Group in this direction.
The traditional process of achieving metadata standards has failed, and I know what I’m talking about because of Dublin Core, BagIt, Z39.50, URLs, and ARKs.
We must think outside the box or we will keep failing. YAMZ (Yet Another Metadata Zoo) is not a standard. Instead it is a dictionary of terms, some fixed and others still evolving, that are meant to be selectively referenced by future standards. Terms are otherwise decoupled from standards that reference them. Each term is a kind of nano-specification with a unique persistent identifier that tracks the term from evolving to mature to deprecated.
YAMZ.net is a tool for taxonomy building. Metadata vocabulary standardization ranks among the most awful design-by-committee experiences, whether at the international standards level or at the working group level. We used a crowdsourced metadata dictionary with reputation-based voting, and in which every term gets a unique persistent identifier. In the second half, are exercises to see how it all works in practice.
Two themes
1. Proposed metadata for “persistence statements”
What you mean by persistence
Informing user linking choices
2. Metadata hardened in open yamz.net dictionary
Crowdsourced, but with reputation-based voting
Every term has a unique persistent identifier (PID)
The scheme of an identifier determines almost nothing about its behavior compared to a resolver that's ready to map it to various services. When resolver infrastructure is shared across schemes instead of siloed, all schemes benefit. With suitable prefixing dozens of well-known, so-called non-actionable schemes can become available from a single unified base URL. The idealized resolver would adopt a fully open infrastructure, and support all schemes and the best features from modern resolvers -- deduplication, content negotiation, link checking, inflections, suffix passthrough, etc.
A huge amount of incredibly diverse research data remains beyond the reach of internet search engines, peer review processes, and systematic cataloging. The ability by consumers to annotate data is an important mitigation, harnessing "the crowd" to make it easier for everyone to discover and re-use data.
1. The Data Management
Ecosystem
4 April 2013
University of California Curation Center
California Digital Library
2. The research data problem
• Journal article • Research data
– Uniquely and persistently – Nope
identified
– Concept of “publish” – Not really
– Multiple copies – Typically one
– Easily findable – Difficult
– Services: impact – Nope
metrics, citation
tracking, etc.
Research data is seen as a second-
class citizen in the scholarly record.
3. An ecosystem of inter-dependent partners
Besides data repository and publisher partners...
• researchers
• educators
• citizen science groups
• funders
• tenure and promotion committees
Libraries as neutral connection partners
4. Where can libraries make a difference?
Research & Scholarship Lifecycle
Research
Save Collect
Create
Knowledge
Share Publish
5. Collect > Publish > Share > Save > Research
Create, edit, share, and save data
management plans
Open source curation add-in for
Microsoft Excel
Capture today’s web; build
tomorrow’s archives
6. Collect >Publish > Share > Save > Research
Create and manage persistent
identifiers: ARKs, DOIs, etc.
An infrastructure to publish and get
credit for sharing research data
7. Collect > Publish >Share > Save > Research
Curation repository:
store, manage, preserve, and share
research data
Open deposit, open access
repository for spreadsheet data
Data Observation Network for Earth
8. Collect > Publish > Share > Save >Research
What’s missing to complete the “incentive” circuit?
• Impact measures, citation tracking
“Connecting the data to the
research it informs”
Altmetrics tools to measure non-
traditional products and uses , , etc.
9. Stable storage: Merritt repository
• Curation repository open to the UC
community and beyond
• Discipline / content agnostic
• Micro-services architecture
• Easy-to-use UI or API
• Hosted or locally deployed
10. EZID: Long term identifiers made easy
• Precise identification of a
dataset (DOI or ARK)
• Credit to data producers and
data publishers
• A link from the traditional
literature to the data (DataCite)
• Exposure and research metrics
for datasets
(Web of Knowledge, Google)
Take control of the
management and distribution
of your research, share and get
credit for it, and build your
reputation through its collection
and documentation
11. Discovery: DataCiteconsortium
• Technische Informationsbibliothek • Canada Institute for Scientific and
(TIB), Germany Technical Information (CISTI)
• L’Institut de l’Information Scientifique
• Australian National Data Service (ANDS)
et Technique (INIST), France
• The British Library
• Library or the ETH Zürich
• California Digital Library, USA • Library of TU Delft, The Netherlands
• Office of Scientific and Technical
Information, US Department of Energy
• Purdue University, USA
• Technical Information Center of
Denmark
12. New distributed framework
Coordinating Nodes Flexible, scalable, sustainabl
Member Nodes
• retain complete metadata
e network
• catalog institutions
diverse
• subset of all data
• serve local community
• perform basic indexing
• provide resources for
• provide network-wide
managing their data
services
• ensure data availability
(preservation)
• provide replication
services
13. The rest of the story
www.cdlib.org/uc3
John.Kunze@ucop.edu
uc3@ucop.edu for service questions
Editor's Notes
Panel: Partnerships between institutional repositories, domain repositories, and publishers20-25 mins, 9:30-11amThe 'data management ecosystem' angle seems appropriate for the panel, but feel free to share some of the technical aspects with the audience, too.partnerships via conventions and APIs. Data Citation conventions, Libraries are chipping away on several fronts to try to shrink this "data curation" problem to a more manageable size, and they are offering a great deal of support for data management planning, data citation, identifier and repository services,repository federation, and “data publication”.
Research data can be seen to fit in a kind of ecosystem of inter-dependent stakeholder niches. Each niche depends on other niches.In a broad sense, partnerships are about dependencies. Besides explicit partnerships between publishers and institutional and domain repositories, there are other critical inter-dependencies – essentially implicit partnerships.Libraries as neutral connectors to sub-partners insystem development and collection buildinglinking with museums and archives
Development partners:DMPTool: U Va, Smithsonian, DCC, et alDataUp: MSRC, GBMF, D1 WAS: LC, UNT, NYU, et alUser partners (clients, patrons, customers): any
Partners: JISC/EDINA, paying customers on two continents
D1 network partners all over the world
partnering with escholarship and UC campuses for collection building
Partnering with JISC/EDINA, DataCite, the Research Data Alliance
Each member partners with regional data repositoriesDataCite partners with publishers (eg, T-R) for data citation indexCreditDiscoveryImpact trackingHelping data authors verify use of their data andHelping identify how others have used the dataWith archiving: re-use and reproducibility