RDA and Adoption 
Early WG Report back session 
September 23, 2014
Happy Birthday! 2 
http://cdn.cakecentral.com/d/d3/900x900px-LL-d3548099_gallery6680631282672149.jpeg
3 
What did we learn? 
§ Motivated groups of people can do a lot 
§ But we are relying too much on volunteer labour 
contributed on top of over-full lives 
§ Looks like the RDA-challenge goal of 12-18 months is 
achievable 
§ But IGs also provide valuable space for longer-term 
interaction 
§ We need to reduce friction in our processes 
§ But the organisation is maturing rapidly
4 
RDA and Outputs 
§ RDA will only deliver on its promise if it produces 
deliverables, and those deliverables become adopted 
outside the groups that created them 
§ Consequential TAB foci: 
§ proposals for new groups – adoption plans? 
§ tracking groups underway – fit for purpose? 
§ monitoring of adoption once groups conclude – actually 
adopted? 
§ So, how can we most usefully think through the process 
of adoption?
5 
Diffusion and RDA 
§ Adoption can be seen as the end result of a diffusion 
process. This diffusion process involves 
§ awareness 
§ interest 
§ evaluation 
§ trial 
§ adoption 
§ RDA has a role to play in 
§ supporting each stage 
§ making the transitions from one stage to the next more likely
6 
Important questions 
1. How do we talk about data? 
2. How can we describe the data? 
3. Can we optimize addressing the data? 
4. How can we get trust in our infrastructure?
7 
What 
§ Base infrastructure 
§ (Coincidence, also social groups!) 
§ Lets agree on Terms. (DFT) 
§ Descriptions for Interoperability. (DTR) 
§ Scaling across PID systems. (PIT) 
§ Building policies into the infrastructure. (PP)
8 
These groups 
§ Amplify each other 
§ Use each others outputs 
§ Have to interlock properly 
§ Will continue the effort after they finish.
Data Foundation and Terminology 
Chairs: Gary Berg-Cross, Raphael Ritz, Peter Wittenburg
Task 10 
Bob Kahn: 
You need to know where you are talking about. 
DFT mission: understand what the core of the data domain 
is, develop definitions of core terms based on data models. 
DFT is part of coming to an agreed culture in RDA. 
Scope: 
AND only speak about domain of registered data. 
§ knowing that there is a lot of non-registered data 
§ knowing that some disciplines are further away from 
what we are discussing as necessity
DFT WG Activities & Accomplishments 11 
§ Drafted 4 related Model Documents on core 
work: 
1. Data Models 1: Overview – 22+ models 
2. Data Models 2: Analysis & Synthesis 
3. Data Models 3: Term Snapshot 
4. Data Models 4: Use Cases 
(Work with other RDA WGs on use cases to 
illustrate 
data concepts) 
§ Developed Semantic Media Wiki Term Tool to 
capture initial list of terms and definitions for 
discussions, demo held at P3 
(open for others and “persistent”) 
Candidate List 
Evolved to 
Consolidated 
List
12 
Our Core Terms in simple Words J 
§ digital object (DO) 
§ persistent identifier 
§ PID resolution system 
§ metadata 
§ aggregation 
§ digital collection 
§ (digital) repository 
§ bitstream 
§ state information 
Need to put relation between terms into the documents 
On purpose no formal ontology (yet) and no terminologist’ exactness 
since we made definitions for data practitioners first.
13 
Definitions & Process 
§ A digital collection is an aggregation of DOs that is identified by 
a PID and described by metadata. 
§ Note: A digital collection is a (complex) DO. 
§ Variants 
§ A collection is a form of aggregation of elements that has an identity of its own separate from the 
identity of the elements. 
§ Collection is defined as a “group of objects gathered together for some intellectual, artistic curatorial 
purpose. 
§ A digital collection is a type of aggregation formed by a collection process on existing data and data 
sets where the collected data is in digital form. 
§ Collection is a type of aggregation obeying part-role relations and is a digital object since it has a 
PID to be referable and metadata describing its properties. 
§ A Digital Collection is an organized aggregation or other grouping of distinct DOs that are related by 
some criteria and where the collection is described by metadata. A Digital Collection may also be 
identified by a unique persistent identifier, in which case the collection may be construed as a DO. 
(Kahn et.al) 
§ Conclusion points 
§ purpose and process of aggregation/collection building and part relations not 
relevant for definition 
§ remember: only speak about domain of registered DOs.
Interactions with others 14 
• Interacted with RDA WGs and IGs. 
• Participated in Munich meeting and Chairs telcos. 
• Part of WG forum discussions 
• also “active” interactions with about 120 groups 
RDA/EU & EUDAT Interviews Interactions Total 
Humanities &Soc Sci 8 13 21 
Environmental 7 2 9 
Life Sciences 10 7 17 
Natural Sciences 11 13 24 
Engineering & CS - 14 14 
Various disciplines - 24 24 
others 4 3 7 
40 74 114
Adoption 15 
• What does adoption mean in case of a set of terms? 
• it’s about the interaction process itself within and 
outside of RDA 
• it’s about influencing conceptualization and thus 
harmonizing “language” 
• it’s about changing cultures 
• we have done a lot – many departments & communities 
• why so relevant: 
• report from 120 interactions tells us that data practices 
are a nightmare (report is available) 
• data organizations are so different that data federation 
including “logical information” is too expensive 
• current data science is not reproducible
Objectives until/for P4 16 
1. Go out and intensify interaction based on Snapshot 
§ create condensed statements for different groups (2-page flyer) 
§ interact with other groups in RDA and early adopters 
§ interact with the many communities (outside RDA) we already contacted 
(in Europe ESFRI RI projects: 17th October, Brussels) 
§ encourage people using the term wiki 
2. Come to new consolidated agreements 
§ consolidated definitions until P5 
§ present the consolidated definitions and tend core term set 
§ identify some people from communities that have adoption talks (no PR!) 
3. Finish some unsolved issues 
§ synthesis: generic flexible enough model to capture terms and their 
relationships 
§ add more use cases 
§ see how to continue maintenance
Thanks for your attention.
Data Type Registries WG 
Outcomes
19 
Problem: Implicit Assumptions in Data 
§ Data sharing requires that data can be parsed, 
understood, and reused by people and applications 
other than those that created the data 
§ How do we do this now? 
§ For documents – formats are enough, e.g., PDF, and then the 
document explains itself to humans 
§ This doesn’t work well with data – numbers are not self-explanatory 
§ What does the number 7 mean in cell B27? 
§ Data producers may not have explicitly specified certain 
details in the data: measurement units, coordinate 
systems, variable names, etc. 
§ Need a way to precisely characterize those assumptions 
such that they can be identified by humans and 
machines that were not closely involved in its creation
20 
Goals: Explicate and Share Assumptions using 
Types and Type Registries 
§ Evaluate and identify a few assumptions in data that can 
be codified and shared in order to… 
§ Produce a functioning Registry system that can easily 
be evaluated by organizations before adoption 
§ Highly configurable for changing scope of captured and shared 
assumptions depending on the domain or organization 
§ Supports several Type record dissemination variations 
§ Design for allowing federation between multiple Registry 
instances 
§ The group’s emphasis is not on 
§ Identifying every possible assumption and data characteristic 
applicable for all domains 
§ Technology
21 
Results 
§ Produced a community consensus system – in this case the 
consensus was between the group members 
§ Input from folks from different backgrounds including 
technologists, scientists, policy analysts, etc., is considered 
§ Released a functioning prototype that can be adapted (with no s/w 
changes) for domain-specific use 
§ Not a turnkey solution 
§ Adapt - Evaluate – Adopt cycle is expected at each organization 
or community 
§ Federation between different instances is technically possible 
§ Organizational policies were not discussed due to the lack of 
time 
§ CNRI, a member of the group, has designed and implemented a 
prototype, the latest of which is at: http://typeregistry.org 
§ With the help of RDA provided scholar, we seeded the Registry 
with Types that pertain to geosciences community
22 
Points to Keep in Mind 
§ Data Type Registry is neither a turnkey system 
nor an immediate ROI application 
§ Every organization should nominate a domain 
expert for defining the scope of Type records 
and for seeding their Registry instance 
§ Cross-domain interpretation beyond some basic 
computability needs social processes in place 
§ Data systems such as Type Registries are low-level 
infrastructure systems with wide 
applicability 
§ Network effect plays a significant role in the success of any 
infrastructure
23 
Adoption and Impact 
§ We expect multiple groups to put significant 
efforts into exercising the prototype: 
§ the EUDAT project in Europe, 
§ National Institute of Standards and Technology 
(NIST) in the US, 
§ the International DOI Foundation 
§ (Wo Chang, Digital Data Manager at NIST, 
shares his evaluation plans)
24 
Conclusion – For Now 
§ Adoption plans will continue 
§ The group, or some part of it, will continue to 
work, we hope with RDA’s blessing and maybe 
support. We will have more to say at P5 
§ Future-proofing data is hard work, but is 
essential for long-term data-driven science
WG PID Information Types 
Outcomes
26 
Problem & Goal 
§ PIDs are associated with additional information and this 
information needs to be typed 
§ Harmonization across disciplines and PID providers 
§ What are PID Information Types? 
§ Specify a framework for defining types 
§ Agree on some essential types 
§ Provide technical solutions for interaction with PID types 
§ Provide the tools first, then create types individually
27 
Results 
Insights gained: 
§ Types depend on use cases and semantics differ between 
disciplines 
§ There is no single set of types fitting all cases 
§ Community processes must define types from practical adoption 
Final deliverables avaliable: 
§ Type examples and illustrating use cases 
§ Types registered in the Type Registry prototype 
§ API description and prototypic implementation 
§ Client demonstrator GUI
Registered types enable cross-services 28 
Format: 
Checksum: 
Size: 
Verification service 
Size: 
Format: 
Checksum:
29 
Adoption & Impact 
§ Register your types so they can be adopted and reused, 
making it easier for others to use your data 
§ Information on how to register new types available in the report 
§ Adopt types already being used in your domain to 
increase interoperability 
§ Decouple object management from contents 
§ Simplify client access to data across domains, implementations 
and changes in information models 
§ More lightweight access to information on less accessible 
objects
30 
Possible follow-ups 
§ Adoption of these capabilities by PID infrastructure 
providers 
§ Discipline-specific types, preferably from practical 
adoption 
§ Establish a type ecosystem 
§ Refine data model 
§ Enhance REST API
31 
Conclusions 
§ Draft final report available via the website 
§ Demonstrator web GUI: 
http://smw-rda.esc.rzg.mpg.de/PitApiGui/
Practical Policies 
Outcomes
WG Practical Policies 33
§ Create research data repository 
§ Data: 2 TB, 500,000 files + growing 
+ integrity 
+ access (IG FIM) 
+ publish (publication+PID) 
+ … 
§ Some assertions: policies & rules attached to the data 
WG Practical Policies 34 
Scenario 
Policy: 
Asser%on 
or 
assurance 
that 
is 
enforced 
about 
a 
collec%on 
or 
a 
dataset
Computer actionable policies 
§ Enforce management, 
§ Automate administrative tasks, 
§ Validate assessment criteria, 
§ Automate scientific analyses 
§ etc. 
A generic set of policies that can be revised and adapted 
by user communities and site managers does not exist. 
§ Domain scientists who want to build-up a collection or 
a repository 
§ Data centers for automating policies 
WG Practical Policies 35 
Problem
§ To bring together practitioners in policy making and 
policy implementation (nearly all RDA WG/IGs) 
§ To identify typical application scenarios for policies 
such as replication, preservation etc. 
§ To collect and to register practical policies 
§ To enable sharing, revising, adapting, and re-using of 
computer actionable policies 
WG Practical Policies 36 
Goals
Survey of 30 Institutions for Highest Priority 
Policies 
Policy 
Importance 
Integrity 
217 
Preserva%on 
150 
Access 
control 
126 
Provenance 
108 
Data 
Management 
plans 
99 
Publica%on 
75 
Replica%on 
66 
Data 
staging 
52 
Federa%on 
37 
Metadata 
sharing 
23 
Regulatory 
16 
Collec%on 
proper%es 
7 
Iden%fiers 
7 
Data 
sharing 
7 
Versioning 
7 
Licensing 
6 
Format 
6 
Data 
Life 
Cycle 
6 
Arrangement 
5 
Processing 
5 
In close cooperation with the Engagement Group 
WG Practical Policies 37
Contextual 
Metadata 
Extrac%on 
Data 
Reten%on 
Disposi%on 
Integrity 
Storage 
Cost 
Reports 
Restricted 
Searching 
No%fica%on 
Data 
Access 
Control 
Use 
Agreements 
Data 
backup 
Data 
Format 
Control 
Collec%on-­‐ 
based 
Policies 
Identification of 
11 important 
policy areas:
Identification of 11 important policy areas: 
§ Contextual metadata extraction 
§ Data access control 
§ Data backup 
§ Data format control 
§ Data retention 
§ Disposition 
§ Integrity (including replication) 
§ Notification 
§ Restricted searching 
§ Storage cost reports 
§ Use agreements 
WG Practical Policies 39 
Results
https://www.rd-alliance.org/filedepot?cid=104&fid=556 
Templates 
§ Interactions of policies and DO attributes 
§ Policy descriptions 
§ Technology independent 
§ Reviews of the provided policy areas in progress 
WG Practical Policies 40 
Results
Results 
https://www.rd-alliance.org/filedepot?cid=104&fid=553 
§ Examples for implementations: 
§ English language descriptions 
§ iRODS 
§ GPFS 
WG Practical Policies 41 
§ ~50 pages
Result: List of of policy categories and policies 
§ Improved data center administration 
§ By sharing policies, communities can interoperate and 
share data more effectively 
§ Transparency: basis of establishing trust 
§ Implemented policies: can be used as examples and be 
adapted to specific requirements and other data 
management systems 
WG Practical Policies 42 
Impact
Target Communities: 
§ Groups managing data collections 
§ Data centers 
First adopters are the institutions/organizations who 
contributed to the results, e.g. RENCI, KIT, OSC, DARIAH, 
RZG, etc.: 
§ EUDAT 
§ CESNET 
§ (DataNet Federation Consortium, WDS ? ) 
WG Practical Policies 43 
Adoption
§ “Outcomes Policy Templates: Practical Policy Working 
Group, September 2014” 
https://www.rd-alliance.org/filedepot?cid=104&fid=556 
§ “Implementations: Practical Policy Working Group, 
September 2014” 
https://www.rd-alliance.org/filedepot?cid=104&fid=553 
§ Work in Progress: Reviews 
WG Practical Policies 44 
Conclusions
Conclusions: Next Steps 
§ More interaction with other technical groups 
à Data Fabric 
à Publication policies 
§ More interaction with domain specific groups 
WG Practical Policies 45 
à Adopters 
For information please contact 
§ Reagan Moore rwmoore@renci.org and 
§ Rainer Stotzka rainer.stotzka@kit.edu
WG Practical Policies 
Outbreak Session: 
Tuesday September 23, 14:00 – 15:30 
Agenda: 
1. Introduction 
2. Presentation of deliverables 
3. David Antos & Petr Benedikt: "Policy implementations 
WG Practical Policies 46 
on GPFS” 
4. Discussions: 
§ Policy reviews 
§ Adding new policies 
§ Interoperability with other WG/IGs 
§ Adoption
47 
P5 and Adoption Day 
§ More groups will be presenting at P5 
§ Starting to see how different WG outputs can fit together 
§ Ex: Data Fabric 
§ Planning to have a major focus at P5 on adoption of WG 
outputs 
§ Also thinking through how best to accelerate adoption 
and support groups that want to integrate RDA outputs
48 
How you can help! 
§ Get involved in WGs, IGs to ensure outputs meet your 
needs and the needs of your organisation 
§ Encourage your organisation to become aware of RDA 
outputs and evaluate or trial them 
§ Look for places where RDA can make a difference

RDA Work Groups Outputs and Adoption - Early WG Report back session

  • 1.
    RDA and Adoption Early WG Report back session September 23, 2014
  • 2.
    Happy Birthday! 2 http://cdn.cakecentral.com/d/d3/900x900px-LL-d3548099_gallery6680631282672149.jpeg
  • 3.
    3 What didwe learn? § Motivated groups of people can do a lot § But we are relying too much on volunteer labour contributed on top of over-full lives § Looks like the RDA-challenge goal of 12-18 months is achievable § But IGs also provide valuable space for longer-term interaction § We need to reduce friction in our processes § But the organisation is maturing rapidly
  • 4.
    4 RDA andOutputs § RDA will only deliver on its promise if it produces deliverables, and those deliverables become adopted outside the groups that created them § Consequential TAB foci: § proposals for new groups – adoption plans? § tracking groups underway – fit for purpose? § monitoring of adoption once groups conclude – actually adopted? § So, how can we most usefully think through the process of adoption?
  • 5.
    5 Diffusion andRDA § Adoption can be seen as the end result of a diffusion process. This diffusion process involves § awareness § interest § evaluation § trial § adoption § RDA has a role to play in § supporting each stage § making the transitions from one stage to the next more likely
  • 6.
    6 Important questions 1. How do we talk about data? 2. How can we describe the data? 3. Can we optimize addressing the data? 4. How can we get trust in our infrastructure?
  • 7.
    7 What §Base infrastructure § (Coincidence, also social groups!) § Lets agree on Terms. (DFT) § Descriptions for Interoperability. (DTR) § Scaling across PID systems. (PIT) § Building policies into the infrastructure. (PP)
  • 8.
    8 These groups § Amplify each other § Use each others outputs § Have to interlock properly § Will continue the effort after they finish.
  • 9.
    Data Foundation andTerminology Chairs: Gary Berg-Cross, Raphael Ritz, Peter Wittenburg
  • 10.
    Task 10 BobKahn: You need to know where you are talking about. DFT mission: understand what the core of the data domain is, develop definitions of core terms based on data models. DFT is part of coming to an agreed culture in RDA. Scope: AND only speak about domain of registered data. § knowing that there is a lot of non-registered data § knowing that some disciplines are further away from what we are discussing as necessity
  • 11.
    DFT WG Activities& Accomplishments 11 § Drafted 4 related Model Documents on core work: 1. Data Models 1: Overview – 22+ models 2. Data Models 2: Analysis & Synthesis 3. Data Models 3: Term Snapshot 4. Data Models 4: Use Cases (Work with other RDA WGs on use cases to illustrate data concepts) § Developed Semantic Media Wiki Term Tool to capture initial list of terms and definitions for discussions, demo held at P3 (open for others and “persistent”) Candidate List Evolved to Consolidated List
  • 12.
    12 Our CoreTerms in simple Words J § digital object (DO) § persistent identifier § PID resolution system § metadata § aggregation § digital collection § (digital) repository § bitstream § state information Need to put relation between terms into the documents On purpose no formal ontology (yet) and no terminologist’ exactness since we made definitions for data practitioners first.
  • 13.
    13 Definitions &Process § A digital collection is an aggregation of DOs that is identified by a PID and described by metadata. § Note: A digital collection is a (complex) DO. § Variants § A collection is a form of aggregation of elements that has an identity of its own separate from the identity of the elements. § Collection is defined as a “group of objects gathered together for some intellectual, artistic curatorial purpose. § A digital collection is a type of aggregation formed by a collection process on existing data and data sets where the collected data is in digital form. § Collection is a type of aggregation obeying part-role relations and is a digital object since it has a PID to be referable and metadata describing its properties. § A Digital Collection is an organized aggregation or other grouping of distinct DOs that are related by some criteria and where the collection is described by metadata. A Digital Collection may also be identified by a unique persistent identifier, in which case the collection may be construed as a DO. (Kahn et.al) § Conclusion points § purpose and process of aggregation/collection building and part relations not relevant for definition § remember: only speak about domain of registered DOs.
  • 14.
    Interactions with others14 • Interacted with RDA WGs and IGs. • Participated in Munich meeting and Chairs telcos. • Part of WG forum discussions • also “active” interactions with about 120 groups RDA/EU & EUDAT Interviews Interactions Total Humanities &Soc Sci 8 13 21 Environmental 7 2 9 Life Sciences 10 7 17 Natural Sciences 11 13 24 Engineering & CS - 14 14 Various disciplines - 24 24 others 4 3 7 40 74 114
  • 15.
    Adoption 15 •What does adoption mean in case of a set of terms? • it’s about the interaction process itself within and outside of RDA • it’s about influencing conceptualization and thus harmonizing “language” • it’s about changing cultures • we have done a lot – many departments & communities • why so relevant: • report from 120 interactions tells us that data practices are a nightmare (report is available) • data organizations are so different that data federation including “logical information” is too expensive • current data science is not reproducible
  • 16.
    Objectives until/for P416 1. Go out and intensify interaction based on Snapshot § create condensed statements for different groups (2-page flyer) § interact with other groups in RDA and early adopters § interact with the many communities (outside RDA) we already contacted (in Europe ESFRI RI projects: 17th October, Brussels) § encourage people using the term wiki 2. Come to new consolidated agreements § consolidated definitions until P5 § present the consolidated definitions and tend core term set § identify some people from communities that have adoption talks (no PR!) 3. Finish some unsolved issues § synthesis: generic flexible enough model to capture terms and their relationships § add more use cases § see how to continue maintenance
  • 17.
    Thanks for yourattention.
  • 18.
  • 19.
    19 Problem: ImplicitAssumptions in Data § Data sharing requires that data can be parsed, understood, and reused by people and applications other than those that created the data § How do we do this now? § For documents – formats are enough, e.g., PDF, and then the document explains itself to humans § This doesn’t work well with data – numbers are not self-explanatory § What does the number 7 mean in cell B27? § Data producers may not have explicitly specified certain details in the data: measurement units, coordinate systems, variable names, etc. § Need a way to precisely characterize those assumptions such that they can be identified by humans and machines that were not closely involved in its creation
  • 20.
    20 Goals: Explicateand Share Assumptions using Types and Type Registries § Evaluate and identify a few assumptions in data that can be codified and shared in order to… § Produce a functioning Registry system that can easily be evaluated by organizations before adoption § Highly configurable for changing scope of captured and shared assumptions depending on the domain or organization § Supports several Type record dissemination variations § Design for allowing federation between multiple Registry instances § The group’s emphasis is not on § Identifying every possible assumption and data characteristic applicable for all domains § Technology
  • 21.
    21 Results §Produced a community consensus system – in this case the consensus was between the group members § Input from folks from different backgrounds including technologists, scientists, policy analysts, etc., is considered § Released a functioning prototype that can be adapted (with no s/w changes) for domain-specific use § Not a turnkey solution § Adapt - Evaluate – Adopt cycle is expected at each organization or community § Federation between different instances is technically possible § Organizational policies were not discussed due to the lack of time § CNRI, a member of the group, has designed and implemented a prototype, the latest of which is at: http://typeregistry.org § With the help of RDA provided scholar, we seeded the Registry with Types that pertain to geosciences community
  • 22.
    22 Points toKeep in Mind § Data Type Registry is neither a turnkey system nor an immediate ROI application § Every organization should nominate a domain expert for defining the scope of Type records and for seeding their Registry instance § Cross-domain interpretation beyond some basic computability needs social processes in place § Data systems such as Type Registries are low-level infrastructure systems with wide applicability § Network effect plays a significant role in the success of any infrastructure
  • 23.
    23 Adoption andImpact § We expect multiple groups to put significant efforts into exercising the prototype: § the EUDAT project in Europe, § National Institute of Standards and Technology (NIST) in the US, § the International DOI Foundation § (Wo Chang, Digital Data Manager at NIST, shares his evaluation plans)
  • 24.
    24 Conclusion –For Now § Adoption plans will continue § The group, or some part of it, will continue to work, we hope with RDA’s blessing and maybe support. We will have more to say at P5 § Future-proofing data is hard work, but is essential for long-term data-driven science
  • 25.
    WG PID InformationTypes Outcomes
  • 26.
    26 Problem &Goal § PIDs are associated with additional information and this information needs to be typed § Harmonization across disciplines and PID providers § What are PID Information Types? § Specify a framework for defining types § Agree on some essential types § Provide technical solutions for interaction with PID types § Provide the tools first, then create types individually
  • 27.
    27 Results Insightsgained: § Types depend on use cases and semantics differ between disciplines § There is no single set of types fitting all cases § Community processes must define types from practical adoption Final deliverables avaliable: § Type examples and illustrating use cases § Types registered in the Type Registry prototype § API description and prototypic implementation § Client demonstrator GUI
  • 28.
    Registered types enablecross-services 28 Format: Checksum: Size: Verification service Size: Format: Checksum:
  • 29.
    29 Adoption &Impact § Register your types so they can be adopted and reused, making it easier for others to use your data § Information on how to register new types available in the report § Adopt types already being used in your domain to increase interoperability § Decouple object management from contents § Simplify client access to data across domains, implementations and changes in information models § More lightweight access to information on less accessible objects
  • 30.
    30 Possible follow-ups § Adoption of these capabilities by PID infrastructure providers § Discipline-specific types, preferably from practical adoption § Establish a type ecosystem § Refine data model § Enhance REST API
  • 31.
    31 Conclusions §Draft final report available via the website § Demonstrator web GUI: http://smw-rda.esc.rzg.mpg.de/PitApiGui/
  • 32.
  • 33.
  • 34.
    § Create researchdata repository § Data: 2 TB, 500,000 files + growing + integrity + access (IG FIM) + publish (publication+PID) + … § Some assertions: policies & rules attached to the data WG Practical Policies 34 Scenario Policy: Asser%on or assurance that is enforced about a collec%on or a dataset
  • 35.
    Computer actionable policies § Enforce management, § Automate administrative tasks, § Validate assessment criteria, § Automate scientific analyses § etc. A generic set of policies that can be revised and adapted by user communities and site managers does not exist. § Domain scientists who want to build-up a collection or a repository § Data centers for automating policies WG Practical Policies 35 Problem
  • 36.
    § To bringtogether practitioners in policy making and policy implementation (nearly all RDA WG/IGs) § To identify typical application scenarios for policies such as replication, preservation etc. § To collect and to register practical policies § To enable sharing, revising, adapting, and re-using of computer actionable policies WG Practical Policies 36 Goals
  • 37.
    Survey of 30Institutions for Highest Priority Policies Policy Importance Integrity 217 Preserva%on 150 Access control 126 Provenance 108 Data Management plans 99 Publica%on 75 Replica%on 66 Data staging 52 Federa%on 37 Metadata sharing 23 Regulatory 16 Collec%on proper%es 7 Iden%fiers 7 Data sharing 7 Versioning 7 Licensing 6 Format 6 Data Life Cycle 6 Arrangement 5 Processing 5 In close cooperation with the Engagement Group WG Practical Policies 37
  • 38.
    Contextual Metadata Extrac%on Data Reten%on Disposi%on Integrity Storage Cost Reports Restricted Searching No%fica%on Data Access Control Use Agreements Data backup Data Format Control Collec%on-­‐ based Policies Identification of 11 important policy areas:
  • 39.
    Identification of 11important policy areas: § Contextual metadata extraction § Data access control § Data backup § Data format control § Data retention § Disposition § Integrity (including replication) § Notification § Restricted searching § Storage cost reports § Use agreements WG Practical Policies 39 Results
  • 40.
    https://www.rd-alliance.org/filedepot?cid=104&fid=556 Templates §Interactions of policies and DO attributes § Policy descriptions § Technology independent § Reviews of the provided policy areas in progress WG Practical Policies 40 Results
  • 41.
    Results https://www.rd-alliance.org/filedepot?cid=104&fid=553 §Examples for implementations: § English language descriptions § iRODS § GPFS WG Practical Policies 41 § ~50 pages
  • 42.
    Result: List ofof policy categories and policies § Improved data center administration § By sharing policies, communities can interoperate and share data more effectively § Transparency: basis of establishing trust § Implemented policies: can be used as examples and be adapted to specific requirements and other data management systems WG Practical Policies 42 Impact
  • 43.
    Target Communities: §Groups managing data collections § Data centers First adopters are the institutions/organizations who contributed to the results, e.g. RENCI, KIT, OSC, DARIAH, RZG, etc.: § EUDAT § CESNET § (DataNet Federation Consortium, WDS ? ) WG Practical Policies 43 Adoption
  • 44.
    § “Outcomes PolicyTemplates: Practical Policy Working Group, September 2014” https://www.rd-alliance.org/filedepot?cid=104&fid=556 § “Implementations: Practical Policy Working Group, September 2014” https://www.rd-alliance.org/filedepot?cid=104&fid=553 § Work in Progress: Reviews WG Practical Policies 44 Conclusions
  • 45.
    Conclusions: Next Steps § More interaction with other technical groups à Data Fabric à Publication policies § More interaction with domain specific groups WG Practical Policies 45 à Adopters For information please contact § Reagan Moore rwmoore@renci.org and § Rainer Stotzka rainer.stotzka@kit.edu
  • 46.
    WG Practical Policies Outbreak Session: Tuesday September 23, 14:00 – 15:30 Agenda: 1. Introduction 2. Presentation of deliverables 3. David Antos & Petr Benedikt: "Policy implementations WG Practical Policies 46 on GPFS” 4. Discussions: § Policy reviews § Adding new policies § Interoperability with other WG/IGs § Adoption
  • 47.
    47 P5 andAdoption Day § More groups will be presenting at P5 § Starting to see how different WG outputs can fit together § Ex: Data Fabric § Planning to have a major focus at P5 on adoption of WG outputs § Also thinking through how best to accelerate adoption and support groups that want to integrate RDA outputs
  • 48.
    48 How youcan help! § Get involved in WGs, IGs to ensure outputs meet your needs and the needs of your organisation § Encourage your organisation to become aware of RDA outputs and evaluate or trial them § Look for places where RDA can make a difference