2010 CLARA Nijmegen - Data Seal of Approval tutorial

Quality and
improving
interoperability
between language
resources: http://www.datasealofapproval.org/
trust, process,
simplicity Dirk.Roorda@dans.knaw.nl
coordinator infrastructure at
http://www.dans.knaw.nl

Quality and interoperability

evolution
hard-to-fake traits
indicating fitness
promote interoperability

Overview
• Introduction and Theory
• qualities
• trust, simplicity
• guidelines
• Process and Demo
• assessment and review
• Discussion and Application
• CLARIN centers
• language resources

Scientific Quality
http://www.ploscompbiol.org/article/metrics/info:doi/10.1371/journal.pcbi.1000112

Scientific quality
• transparent
• from producer
• through repository
• to consumer
• properties to guard
• authenticity
• integrity
• provenance

Usage quality
• data formats
• usability
• metadata
• findability
• intellegibility

Quality control
• by the stakeholders
• data producers
• data custodians
• date consumers
• custodians = repositories
• substantial role for repositories
• guidelines for producers
• agreements for consumers

Quality issues
• metadata standards
• CMDI and www.isocat.org
• preferred formats
• TEI, XML
• referencing systems
• persistent identifiers
• long term preservation
• after the live-environment has died off
• interoperability
• OAI-PMH

Quality issues
• search engines
• CLARIN search and develop
• access rights
• comply with privacy law, copyright law
• respect people from which data is obtained
• accountability
• for all repository operations

Quality and Trust
• imperfection lurks everywhere
• trust works where certainty blocks
• trust is a process
• to greater quality
• to better relationships
• to more certainty

Quality and Simplicity
reduce organize
time learn differences
context
emotion trust
failure
focus:
subtract what is obvious
add what is meaningful
http://lawsofsimplicity.com/

Guidelines: producers
http://www.datasealofapproval.org/

1.The data producer deposits the research data in
a data repository with sufficient information for
others to assess the scientific and scholarly quality
of the research data and compliance with
disciplinary and ethical norms.
2. The data producer provides the research data in
formats recommended by the data repository
3. The data producer provides the research data
together with the metadata requested by the data
repository

Guidelines: consumers

14. The data consumer complies with access
regulations set by the data repository
15. The data consumer conforms to and agrees
with any codes of conduct that are generally
accepted in higher education and research for the
exchange and proper use of knowledge and
information
16. The data consumer respects the applicable
licenses of the data repository regarding the use of
the research data

Guidelines: repositories

4. The data repository has an explicit mission in the area
of digital archiving and promulgates it
5. The data repository uses due diligence to ensure
compliance with legal regulations and contracts
including, when applicable, regulations governing the
protection of human subjects.
6. The data repository applies documented processes
and procedures for managing data storage
7. The data repository has a plan for long-term
preservation of its digital assets

Guidelines: repositories

8. Archiving takes place according to explicit workflows across the
data life cycle
9. The data repository assumes responsibility from the data
producers for access and availability of the digital objects
10. The data repository enables the users to utilize the research
data and refer to them
11. The data repository ensures the integrity of the digital objects
and the metadata
12. The data repository ensures the authenticity of the digital
objects and the metadata
13. The technical infrastructure explicitly supports the tasks and
functions described in internationally accepted archival standards
like OAIS

Guidelines: outsourcing

repositories may outsource digital preservation
to specialist repositories
• implement all except 4,6,7,8 and 13
• store a copy of the data in another (TDR) that
• has acquired the DSA logo
• by implementing each of the sixteen guidelines
• (including 4, 6, 7, 8 and 13).

Seal of Approvement
• a repository shows it on its webpage
• if conditions are fulfilled
• as testified by
• a self-assessment
• with reviews
• on a yearly basis
• the exact level of compliance is
• transparently published under the seal

Assessment and review
minimum requirements
threshold will go up
as time proceeds

score actions taken comments issues
* nothing done give a reason
** theoretical concept point to initiation doc describe main issues
*** implementation phase point to definition doc describe main issues
**** fully implemented point to definition doc
N/A not applicable give a reason

Organisation
• repositories represented by a board
• tools to facilitate the procedure
• modifiaction record
• the DSA website links to compliant
repositories

CLARIN centres
• A = provide infrastructure
• managing the federation
• B = provide services
• data and webservices
• C = provide metadata
• harvestable metadata
• R = respected = recognised
• offer LRT resources in whatever form
• E = external
• offer non-LRT resources or services
• identity federations
• national libraries

Group assignment
• P(roducers)
• invent p-guidelines for B/C centers
• R(epositories)
• invent r-guidelines for A/B centers
• C(onsumers)
• invent c-guidelines for B/C/R centers
Suggestions for
• assessment
• review
• modification record

Wrap-up: P-Group

metadata about background

information about researchers

who, why, publications

DAI

In IMDI it is difficult to update information, affiliation updates,
use unique identifiers for participants in building a corpus, store records of people,
and link from the metadata of resources to the records of people

using formats depending on formats
formats maybe standardised, but not usable to researchers, I do not want to wrap
my data in dead formats: the repositories should support innovation in this respect,
when it is driven by researchers

Wrap up: C-group
goal is: finding info in a repository

we need:

overview of access rights

proper web-connection to the repository

user-friendly interface

low threshold for feedback for new features

we should be part of the chain in the design of the access tools

GUIDELINES
WE WANT ALL CENTERS IN THE CHAIN THAT PROVIDE US WITH THE
INFORMATION WE NEED TO OFFER US TRANSPARENCY AND VERIFIABILITY
ON HOW THEIR DATA IS OBTAINED, PROCESSED AND
CONTROLLED/MANAGED
WE WANT TOOLS WITH CLEAR COPYRIGHT PERMISSIONS THAT HAVE A

Wrap-up: R-group
we provide infrastructure and management for data

we want to standardize our stuff

we need knowledge, the right metadata of the stuff that is coming to us

we want the materials in the right format, allowing for some flexibility
retro-archiving: we offer tools for converting legacy data, so that producers may submit
raw materials

management of data concerning legal access

protect the providers, so that the providers can trust the consumers: licensing forms

share knowledge about services we provide with

potential users: people working in the field

other repositories
we want a forum as an instrument for developing trust between producers and
consumers: the community becomes more transparent

Wrap-up: General

add weights to guidelines, in order to
declare some guidelines more important
than others.

2010 CLARA Nijmegen - Data Seal of Approval tutorial

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 2010 CLARA Nijmegen - Data Seal of Approval tutorial

Similar to 2010 CLARA Nijmegen - Data Seal of Approval tutorial (20)

More from Dirk Roorda

More from Dirk Roorda (20)

Recently uploaded

Recently uploaded (20)

2010 CLARA Nijmegen - Data Seal of Approval tutorial

Editor's Notes