Automating rights decisions
ELAG 2017, 08-06-2017
Jeffrey van der Hoeven, Rene Wiermer
info@kb.nl
The dream: In reality:
Open access to
everything
for
everybody!
Limited access
due to
copyright
&
contracts
Examples of restrictions (1)
1600 1930 1945 1980 2017
open closed
1400 1900 2017
open restricted
1995
Time ->
digitized
newspapers
digitized
books
no
download
Examples of restrictions (2)
Publisher AReading room only
Journal titels ->
open API key account
datasets
Scientific articles
Publisher B
Publisher Z
Examples of restrictions (3)
Copyright
infringement on
photographs
Newspaper X Newspaper Y
What can I do
with this
publication
about quantum
physics?
Do I have access to this ?
What can I do with it ?
Access to sensitive material
User interaction. Here: Accepting terms of uses
Needs 1: more information to the end user
- How do I get access ?
- What can I do with it ?
Improve UX with standardization of rights decisions
Needs 2: One system for multiple applications
- Several websites: Delpher, Geheugen van Nederland, Staten
Generaal Digitaal
- Several API’s: URN-Resolver, OAI-PMH, Search services …
Centralize access decisions for better compliance, management
and reporting
One change = immediately visible in each application
Needs 3: reducing our digitization backlog
- We have a lot of digital content that requires certain restrictions
- How can we make this accessible to anybody who is allowed to
see it ?
- We had an “on/off” infrastructure for most of our content
- Either accessible for everybody or not at all
- Not flexible enough, blocked workflows
Automation of rights decisions based on
- Metadata (Publication date, authors, publisher, type of
material..)
- Location (e.g. reading room)
- Type of user (e.g. researcher)
Simple approach: extra metadata field ?
- For example
- <rights> FREE|RESTRICTED|CLOSED|... </rights>
- <license> CC0|CustomContract|... </license>
- Make decision based on the value of that field
- Works probably fine in a lot scenarios
- But:
- Does not scale with variation depending on context
- “Free for users of type researcher and visitors to the reading room, but not outside
of it”
- Needs maintenance over time
-Missing: why was this decision made ?
Instead: policies as code
- Policy: formalized set of rules regarding a collection of objects
- Decided at runtime -> decisions can change over time
- Follows general lines of thought of the organization: legal
obligations, contracts with publishers, management decisions
Example: Simplest policy
All is freely accesible
return Decision.permit();
Still simple policy
Role-based access (from API-key, username/password auth…)
if (context.roles.contains("DS_METADATA_DTS"))
return Decision.permit();
Access based on publication date
static GregorianCalendar metadataFreeDate=new GregorianCalendar(1940,Calendar.JANUARY,1);
if (attributes.getMetadata().getPublicationDate()?.before(metadataFreeDate.getTime())) {
return Decision.permit();
}
Fallback
return Decision.denied();
Example: Books
Check for location
if (context.location.equals("READING_ROOM")) {
...
}
Demand measures to prevent downloads from frontend
if (attributes.listContainsValue("boeken-leeszaal-kopieerbeveiliging", "ppn",
attributes.getMetadata().getPpn()) ) {
return Decision.permit(new Obligation("DoNotDownload"),usageRights);
}
Check for death dates of all contributors
if (DateChecks.allAuthorsDeadLongerThan(attributes.getMetadata(),authorDeathDateLimit)) {
return Decision.permit(usageRights);
}
Decisions
Input: Identifier, Metadata, Location, Authorization
End result of a policy decisions:
- PERMIT
- DENIED
- NOT APPLICABLE
additional attributes:
- obligations: things the endpoint has to enforce
- advices: things the endpoint might need to improve UX
Ex: PERMIT (obligation:”DoNotDownload”, advice:”OnlyInReadingRoom”)
Diagram by David Brossard under a CC-BY 3.0 license
Enforce
Decide
Administer Metadata
Context
Enforce
Decide
Administer
Metadata
Context
Image server OAI-PMHObject store
PDP webservice
RDBMS Metadata HTTP Request
Admin/Reporting
GUI
Policy Scripts
Groovy
Authorization
LDAP
Architecture: XACML (sort of)
- Attribute Based Access Control (ABAC)
- Follows XACML reference architecture
- … but not the language (cumbersome, slow and restricted)
Technology
- Write the policies in an embedded scripting language (Groovy)
- Fast (in comparison to XACML language implementations)
- Able to be adopted/managed outside of core development team
- still: reuse of existing development toolchain
- Automated testing !
- Deployed as central REST service
- Serves multiple applications
Reporting and testing
Collections Policies Digital Objects Policies Metadata
Reporting and testing
Limitations
- Search filtering on access: combination with dynamic decisions
- Which objects am I allowed to use ?
- Export of access information to other systems (e.g. WorldCat)
Possible mitigations
- Compromises on dynamic decisions (short term)
- Move from slow ETL to event-based architectures (longer term)
Current status & results
- Stepwise in production since Mid 2016
- New objects are becoming available
- Copyright claims are easier to handle
- Clearer insight into current status of collection
- Better insight into needs for partnership contracts
- Impulses for better metadata storage/access infrastructure
175M requests per month
+/- 6 million a day
60+ million pages
under control by
access management
Any questions?
END
About
- Managing digital collections with multiple licenses and access
policies
- Technical choices that fit our organisational needs
Not about
- DRM and copy protection
- Usage of closed proprietary systems
Motivation
- As a public service organisation we want: access as far as
possible
- Limit of possibilities
- Licenses
- Contractual obligations
- Governmental and organisational policies
- Copyright status
- A simple yes or no is not always enough; we need
- a clear guideline for the user: what can I do with it and how do I get
access ?
- automation of management: we want to be able to scale and still be
compliant
Crossing the domains: communication
- Define your terms: Collection, policy, decision … make sure to
communicate them clearly
- Make sure contracts and managerial decisions can be translated to
the technical reality.
- Offer protection and guarantee options for future contracts
- Make compliance easier through monitoring + reporting
- Use of examples + flow diagrams
ONIX-PL: machine-readable contracts
Machine-readable, but not actionable
Our problems
- Multiple applications give access to collections
- ideally centralised decision making and reporting
- Decisions depend on context: user, location, time
- Flexible to allow for individual interventions
- Clearer insight necessary why things are hidden away
Click to adjust
• Subject 1
• Subject 2
• Subject 3
Click to adjust
• Subject 1
• Subject 2
Name table

Automating rights decision elag 2017

  • 1.
    Automating rights decisions ELAG2017, 08-06-2017 Jeffrey van der Hoeven, Rene Wiermer info@kb.nl
  • 2.
    The dream: Inreality: Open access to everything for everybody! Limited access due to copyright & contracts
  • 3.
    Examples of restrictions(1) 1600 1930 1945 1980 2017 open closed 1400 1900 2017 open restricted 1995 Time -> digitized newspapers digitized books no download
  • 4.
    Examples of restrictions(2) Publisher AReading room only Journal titels -> open API key account datasets Scientific articles Publisher B Publisher Z
  • 5.
    Examples of restrictions(3) Copyright infringement on photographs Newspaper X Newspaper Y
  • 6.
    What can Ido with this publication about quantum physics?
  • 8.
    Do I haveaccess to this ?
  • 9.
    What can Ido with it ?
  • 10.
  • 11.
    User interaction. Here:Accepting terms of uses
  • 12.
    Needs 1: moreinformation to the end user - How do I get access ? - What can I do with it ? Improve UX with standardization of rights decisions
  • 13.
    Needs 2: Onesystem for multiple applications - Several websites: Delpher, Geheugen van Nederland, Staten Generaal Digitaal - Several API’s: URN-Resolver, OAI-PMH, Search services … Centralize access decisions for better compliance, management and reporting One change = immediately visible in each application
  • 14.
    Needs 3: reducingour digitization backlog - We have a lot of digital content that requires certain restrictions - How can we make this accessible to anybody who is allowed to see it ? - We had an “on/off” infrastructure for most of our content - Either accessible for everybody or not at all - Not flexible enough, blocked workflows Automation of rights decisions based on - Metadata (Publication date, authors, publisher, type of material..) - Location (e.g. reading room) - Type of user (e.g. researcher)
  • 16.
    Simple approach: extrametadata field ? - For example - <rights> FREE|RESTRICTED|CLOSED|... </rights> - <license> CC0|CustomContract|... </license> - Make decision based on the value of that field - Works probably fine in a lot scenarios - But: - Does not scale with variation depending on context - “Free for users of type researcher and visitors to the reading room, but not outside of it” - Needs maintenance over time -Missing: why was this decision made ?
  • 17.
    Instead: policies ascode - Policy: formalized set of rules regarding a collection of objects - Decided at runtime -> decisions can change over time - Follows general lines of thought of the organization: legal obligations, contracts with publishers, management decisions
  • 18.
    Example: Simplest policy Allis freely accesible return Decision.permit();
  • 19.
    Still simple policy Role-basedaccess (from API-key, username/password auth…) if (context.roles.contains("DS_METADATA_DTS")) return Decision.permit(); Access based on publication date static GregorianCalendar metadataFreeDate=new GregorianCalendar(1940,Calendar.JANUARY,1); if (attributes.getMetadata().getPublicationDate()?.before(metadataFreeDate.getTime())) { return Decision.permit(); } Fallback return Decision.denied();
  • 20.
    Example: Books Check forlocation if (context.location.equals("READING_ROOM")) { ... } Demand measures to prevent downloads from frontend if (attributes.listContainsValue("boeken-leeszaal-kopieerbeveiliging", "ppn", attributes.getMetadata().getPpn()) ) { return Decision.permit(new Obligation("DoNotDownload"),usageRights); } Check for death dates of all contributors if (DateChecks.allAuthorsDeadLongerThan(attributes.getMetadata(),authorDeathDateLimit)) { return Decision.permit(usageRights); }
  • 21.
    Decisions Input: Identifier, Metadata,Location, Authorization End result of a policy decisions: - PERMIT - DENIED - NOT APPLICABLE additional attributes: - obligations: things the endpoint has to enforce - advices: things the endpoint might need to improve UX Ex: PERMIT (obligation:”DoNotDownload”, advice:”OnlyInReadingRoom”)
  • 22.
    Diagram by DavidBrossard under a CC-BY 3.0 license Enforce Decide Administer Metadata Context
  • 23.
    Enforce Decide Administer Metadata Context Image server OAI-PMHObjectstore PDP webservice RDBMS Metadata HTTP Request Admin/Reporting GUI Policy Scripts Groovy Authorization LDAP
  • 24.
    Architecture: XACML (sortof) - Attribute Based Access Control (ABAC) - Follows XACML reference architecture - … but not the language (cumbersome, slow and restricted)
  • 25.
    Technology - Write thepolicies in an embedded scripting language (Groovy) - Fast (in comparison to XACML language implementations) - Able to be adopted/managed outside of core development team - still: reuse of existing development toolchain - Automated testing ! - Deployed as central REST service - Serves multiple applications
  • 26.
    Reporting and testing CollectionsPolicies Digital Objects Policies Metadata
  • 27.
  • 28.
    Limitations - Search filteringon access: combination with dynamic decisions - Which objects am I allowed to use ? - Export of access information to other systems (e.g. WorldCat) Possible mitigations - Compromises on dynamic decisions (short term) - Move from slow ETL to event-based architectures (longer term)
  • 29.
    Current status &results - Stepwise in production since Mid 2016 - New objects are becoming available - Copyright claims are easier to handle - Clearer insight into current status of collection - Better insight into needs for partnership contracts - Impulses for better metadata storage/access infrastructure 175M requests per month +/- 6 million a day 60+ million pages under control by access management
  • 30.
  • 31.
  • 32.
    About - Managing digitalcollections with multiple licenses and access policies - Technical choices that fit our organisational needs Not about - DRM and copy protection - Usage of closed proprietary systems
  • 33.
    Motivation - As apublic service organisation we want: access as far as possible - Limit of possibilities - Licenses - Contractual obligations - Governmental and organisational policies - Copyright status - A simple yes or no is not always enough; we need - a clear guideline for the user: what can I do with it and how do I get access ? - automation of management: we want to be able to scale and still be compliant
  • 34.
    Crossing the domains:communication - Define your terms: Collection, policy, decision … make sure to communicate them clearly - Make sure contracts and managerial decisions can be translated to the technical reality. - Offer protection and guarantee options for future contracts - Make compliance easier through monitoring + reporting - Use of examples + flow diagrams
  • 35.
  • 36.
    Our problems - Multipleapplications give access to collections - ideally centralised decision making and reporting - Decisions depend on context: user, location, time - Flexible to allow for individual interventions - Clearer insight necessary why things are hidden away
  • 37.
    Click to adjust •Subject 1 • Subject 2 • Subject 3
  • 38.
    Click to adjust •Subject 1 • Subject 2
  • 39.