Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Fedora Commons in the
CLARIN Infrastructure
Menzo Windhouwer
menzo.windhouwer@meertens.knaw.nl
Meertens Institute, TLA, CL...
Overview
1. An overview of Fedora Commons (3.8.1)
2. Current usage by CLARIN centres
3. TLA-FLAT: a CLARIN compatible repo...
Overview
1. An overview of Fedora Commons (3.8.1)
2. Current usage by CLARIN centres
3. TLA-FLAT: a CLARIN compatible repo...
Fedora Commons
• fedora-commons.org
• 300 registered installations
• 1997: started as a research project at Cornell Univer...
Fedora Commons main features
• Digital Objects
• Content Model Architecture (FOXML)
• Datastreams
• Relationships between ...
Fedora Commons main features
• Digital Objects
• Content Model Architecture (FOXML)
• Datastreams
• Relationships between ...
Digital Objects - Model
“Fedora uses a "compound digital object" design
which aggregates one or more content items
into th...
Digital Objects – Content Model Architecture
1. Data Object
• “Data objects are what we normally think
of when we imagine ...
Digital Objects - Datastreams
• “The content represented by a Datastream is treated as an opaque bit
stream; it is up to t...
Digital Objects - Relations
• Relationships between Digital Objects
• Collections, compounds, cross references, …
• Using ...
Digital Objects - FOXML
<foxml:digitalObject PID="lat:1839_00_0000_0000_0016_7E07_7" xmlns:foxml="info:fedora/fedora-syste...
Digital Objects - FOXML
…
<foxml:datastream ID="RELS-EXT" STATE="A" CONTROL_GROUP="X" VERSIONABLE="true">
<foxml:datastrea...
APIs (REST/SOAP)
• The ‘RESTful’ APIs provide easy HTTP URLs to access (API-A) objects
and their datastreams:
1. https://w...
Security (XACML)
• eXtensible Access Control Markup Language (XACML) is a OASIS standard to encode access
control policies...
Fedora Commons as a basis - extensions
• Facetted search: gsearch (Solr)
• Listens to the FC message queue
• Runs an XSLT ...
Fedora Commons as a basis - frontends
• Islandora
• Drupal based
• Large set of modules, relatively easy extensible
• Stil...
Overview
1. An overview of Fedora Commons (3.8.1)
2. Current usage by CLARIN centres
3. TLA-FLAT: a CLARIN compatible repo...
Repository solutions in use by CLARIN centres
0
1
2
3
4
5
6
7
8
9
Fedora
Commons
DSpace custom LAT GIT eSciDoc
Repository ...
How happy are these centres with Fedora Commons?
• Send out a questionnaire to 9 centres: 6 responses 
Do you (still) con...
Fedora Commons versions
0
0.5
1
1.5
2
2.5
3.6 3.6.2 3.7.1 3.8.1 4
Which version of Fedora Commons does
your centre use in ...
Size of the centre’s repositories
# Digital Objects:
ca. 150
2,500
3,038
10,000
33,000
# bytes:
ca. 125M (metadata only)
5...
Community support
How helpful was/is the
documentation available
within the Fedora Commons
community?
not at all somewhat ...
Frontends
0
0.5
1
1.5
2
2.5
3
3.5
none Islandora custom
Do you use a front-end, e.g., Islandora, Hydra or your own, next t...
Additional advice
• “Let Apache httpd (or Apache Tomcat) take care for most of the
configuration (access control) and conf...
Overview
1. An overview of Fedora Commons (3.8.1)
2. Current usage by CLARIN centres
3. TLA-FLAT: a CLARIN compatible repo...
FLAT’s predecessors
• The Language Archive (TLA) at the MPI for Psycholinguistics
• long history in digital archiving, esp...
FLAT’s predecessors
TLA-FLAT base line
• Meet the, technical, CLARIN B centre requirements
• Meet the, technical, Data Seal of Approval (DSA) ...
Islandora 7.x-1.x
• islandora.ca
• An open-source software framework designed to help institutions and organizations and t...
CLARIN B centre requirements
• [CLARIN-B-2] Centres need to adhere to the security guidelines, i.e. the
servers need to ha...
DSA requirements
• [DSA-10] The data repository enables the users to discover and use
the data and refer to them in a pers...
Meertens Institute & TLA requirements
• [Home-1] The repository should support arbitrary deep collection hierarchies.
• [H...
FLAT’s place at the Meertens Institute
Drupal
Islandora
Fedora Commons
Deposition
Service
(DoorKeeper)
SIP
AIP
Workspace
(...
FLAT’s place at the MPI/TLA
Drupal
Islandora
Fedora Commons
Deposition
Service
(DoorKeeper)
SIP
AIP
Workspace
(ownCloud)
D...
FLAT modules
• Core
• Fedora Commons and Islandora setup
• CMDI Solution Pack
• CMD to FOXML conversion
• Proai setup
• In...
CMDI Solution Pack
• Registers a metadata renderer in Islandora
• Triggers when a Digital Object uses the CMDI content mod...
Archival Information Package (AIP)
isMemberOfCollection
isMemberOfCollection
Collection + CMDI
CMD
RELS-EXT
DC
Collection
...
FLAT’s DoorKeeper
• A configurable chain of actions that
• Validate the CMDI, also according to centre specific requiremen...
Submission Information Package (SIP)
• A CMD record referring with
• relative paths to resources within the package
• abso...
Security
• To hide the intricacies of XACML and design choices for content
models we use WebACL to specify the access rule...
CMDI indexing for facetted search
1. gsearch for CMDI
• Based on a XSLT that processes the FOXML
• FLAT generates an XSLT ...
Deposition UI
• Drupal/Islandora module
• Create a project
• Upload a CMD record
• Or create a new one using a form
• Uplo...
New vs legacy data
• New data goes via the DoorKeeper so its checked against the centres
policies!
• Legacy (meta)data can...
Branding
• Drupal has extensive facilities for styling and templating
• Drupal has many modules and blocks for additional ...
Branding
Where are we?
• Set of Docker images that extend each other to build up a complete
solution for a:
• Read only interface f...
FLAT is moving
• github.com/TheLanguageArchive/FLAT
• Its birthplace, but FLAT is moving to
• github.com/TLA-FLAT
• Code c...
Let’s visit FLAT!
Conclusions
• Fedora Commons (3.8.1) provides many of the basic functionality
needed by a CLARIN B centre
• Fedora Commons...
Thanks!
Questions?
now or later 
menzo.windhouwer@meertens.knaw.nl
Please visit
github.com/TheLanguageArchive/FLAT
github...
Upcoming SlideShare
Loading in …5
×

Fedora Commons in the CLARIN Infrastructure

160 views

Published on

This presentation gives an overview of: 1) Fedora Commons, 2) it's current use by CLARIN B centres, and 3) the new TLA/FLAT setup that meets the CLARIN B centre requirements using the Fedora Commons/Islandora stack.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Fedora Commons in the CLARIN Infrastructure

  1. 1. Fedora Commons in the CLARIN Infrastructure Menzo Windhouwer menzo.windhouwer@meertens.knaw.nl Meertens Institute, TLA, CLARIN ERIC
  2. 2. Overview 1. An overview of Fedora Commons (3.8.1) 2. Current usage by CLARIN centres 3. TLA-FLAT: a CLARIN compatible repository solution based on Fedora Commons and Islandora
  3. 3. Overview 1. An overview of Fedora Commons (3.8.1) 2. Current usage by CLARIN centres 3. TLA-FLAT: a CLARIN compatible repository solution based on Fedora Commons and Islandora
  4. 4. Fedora Commons • fedora-commons.org • 300 registered installations • 1997: started as a research project at Cornell University • Implemented as a Java servlet • 2009: joined the DSpace foundation (now DuraSpace) • 2014: Fedora Commons 4 released • More RDF-based • Not backward compatible qua functionality, e.g., APIs • Data migration utilities available • 2015: last Fedora Commons 3 release (3.8.1) • wiki.duraspace.org/display/FEDORA38/ • github.com/fcrepo3 • focus
  5. 5. Fedora Commons main features • Digital Objects • Content Model Architecture (FOXML) • Datastreams • Relationships between Digital Objects (RDF) • APIs (REST/SOAP) • Access • Management • Security (XACML) • Access control • Policies • Message queue • OAI-PMH • Replication & mirroring • Versioning • Checksums
  6. 6. Fedora Commons main features • Digital Objects • Content Model Architecture (FOXML) • Datastreams • Relationships between Digital Objects (RDF) • APIs (REST/SOAP) • Access • Management • Security (XACML) • Access control • Policies • Message queue • OAI-PMH • Replication & mirroring • Versioning • Checksums
  7. 7. Digital Objects - Model “Fedora uses a "compound digital object" design which aggregates one or more content items into the same digital object. Content items can be of any format and can either be stored locally in the repository, or stored externally and just referenced by the digital object. The Fedora digital object model is simple and flexible so that many different kinds of digital objects can be created, yet the generic nature of the Fedora digital object allows all objects to be managed in a consistent manner in a Fedora repository.”
  8. 8. Digital Objects – Content Model Architecture 1. Data Object • “Data objects are what we normally think of when we imagine a repository storing digital collections. Data objects can represent such varied entities as images, books, electronic texts, learning objects, publications, datasets, and many other entities.” 2. Content Model Object • “[A]cts as a container for the Content Model document which is a formal model that characterizes a class of digital objects.” 3. Service Definition Object 4. Service Deployment Object
  9. 9. Digital Objects - Datastreams • “The content represented by a Datastream is treated as an opaque bit stream; it is up to the user to determine how to interpret the content (i.e. data or metadata).” • Where does this bit stream live? 1. Internal XML Content “the content is stored as XML in-line within the digital object XML file” (FOXML) 2. Managed Content “the content is stored in the repository and the digital object XML maintains an internal identifier that can be used to retrieve the content from storage” 3. Externally Referenced Content “the content is stored outside the repository and the digital object XML maintains a URL that can be dereferenced by the repository to retrieve the content from a remote location” 4. Redirect Referenced Content “the content is stored outside the repository and the digital object XML maintains a URL that is used to redirect the client when an access request is made”
  10. 10. Digital Objects - Relations • Relationships between Digital Objects • Collections, compounds, cross references, … • Using the Fedora relationship ontology • Domain specific relationships • Encoded in RDF • RELS-EXT: relations from the DO to other DOs or external resources • RELS-INT: relations from datastreams in the DO to other resources
  11. 11. Digital Objects - FOXML <foxml:digitalObject PID="lat:1839_00_0000_0000_0016_7E07_7" xmlns:foxml="info:fedora/fedora-system:def/foxml#" …> <foxml:objectProperties> <foxml:property NAME="info:fedora/fedora-system:def/model#state" VALUE="A"/> <foxml:property NAME="info:fedora/fedora-system:def/model#label" VALUE="deerhunt"/> </foxml:objectProperties> <foxml:datastream ID="DC" STATE="A" CONTROL_GROUP="X"> <foxml:datastreamVersion ID="DC.0" FORMAT_URI="http://www.openarchives.org/OAI/2.0/oai_dc/" MIMETYPE="text/xml" LABEL="Dublin Core Record for this object"> <foxml:xmlContent> <oai_dc:dc …> <dc:title>deerhunt story</dc:title> <dc:description xml:lang="eng">The text was recorded at Madison University in the 1960s. The text was recorded indoors.</dc:description> ... </oai_dc:dc> </foxml:xmlContent> </foxml:datastreamVersion> </foxml:datastream> <foxml:datastream ID="CMD" STATE="A" CONTROL_GROUP="X"> <foxml:datastreamVersion ID="CMD.0" LABEL="CMD Record for this object" MIMETYPE="application/x-cmdi+xml" …> <foxml:xmlContent> <cmd:CMD xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" CMDVersion="1.1" …>...</cmd:CMD> </foxml:xmlContent> </foxml:datastreamVersion> </foxml:datastream> …
  12. 12. Digital Objects - FOXML … <foxml:datastream ID="RELS-EXT" STATE="A" CONTROL_GROUP="X" VERSIONABLE="true"> <foxml:datastreamVersion ID="RELS-EXT.0" LABEL="RDF Statements about this object" MIMETYPE="text/xml"> <foxml:xmlContent> <rdf:RDF xmlns:oai="http://www.openarchives.org/OAI/2.0/" xmlns:fedora="info:fedora/fedora-system:def/relations-external#" …> <rdf:Description rdf:about="info:fedora/lat:1839_00_0000_0000_0016_7E07_7"> <fedora:isMemberOfCollection rdf:resource="info:fedora/lat:1839_00_0000_0000_0016_7E41_8"/> <fedora-model:hasModel rdf:resource="info:fedora/islandora:compoundCModel"/> <fedora-model:hasModel rdf:resource="info:fedora/islandora:sp_cmdiCModel"/> <oai:itemID xmlns="http://www.openarchives.org/OAI/2.0/">oai:flat.example.com.:lat:1839_00_0000_0000_0016_7E07_7</oai:itemID> </rdf:Description> </rdf:RDF> </foxml:xmlContent> </foxml:datastreamVersion> </foxml:datastream> <foxml:datastream ID="TN" STATE="A" CONTROL_GROUP="E"> <foxml:datastreamVersion ID="TN.0" LABEL="icon.png" MIMETYPE="image/png"> <foxml:contentLocation TYPE="URL" REF="file:/app/flat/icons/folder.png"/> </foxml:datastreamVersion> </foxml:datastream> </foxml:digitalObject>
  13. 13. APIs (REST/SOAP) • The ‘RESTful’ APIs provide easy HTTP URLs to access (API-A) objects and their datastreams: 1. https://www.meertens.knaw.nl/flat/objects/lat:10744_1b9e0d44_ef4d_496 c_8939_6129b5ee5b49/datastreams/CMD/content?asOfDateTime=2017- 01-27T11:30:52.732Z 2. https://www.meertens.knaw.nl/flat/objects/lat:10744_792194f7_d1fd_400 c_ab2b_9b51f4fe3907/datastreams/OBJ/content?asOfDateTime=2017-01- 27T11:31:01.207Z Used as redirect for a handle Notice the use of a timestamp to refer to a specific version of the datastream • API-M provides methods to update objects and their datastreams • Access to API-M can be limited using repository wide XACML policies
  14. 14. Security (XACML) • eXtensible Access Control Markup Language (XACML) is a OASIS standard to encode access control policies “Each XACML policy defines: (1) a "target" describes what the policy applies to (by referring to attributes of users, operations, objects, datastreams, dates, and more), and (2) one or more "rules" to permit or deny access.”  Rather cryptical and bloated language • Repository wide policies • Access to API-M (methods) by certain user/roles from certain IP adresses • … • Object specific policies • Which users can access which datastreams • … • User profiles • Plugin any authfilter in the application server • Hardcoded users • …
  15. 15. Fedora Commons as a basis - extensions • Facetted search: gsearch (Solr) • Listens to the FC message queue • Runs an XSLT to create a SOLR document • OAI-PMH: Proai • Occasionally queries FCs resource index • Can deliver other metadata datastreams than the default Dublin Core • …
  16. 16. Fedora Commons as a basis - frontends • Islandora • Drupal based • Large set of modules, relatively easy extensible • Still based on Fedora Commons 3 • Ongoing experiments/development, e.g., CLAW for Islandora • Hydra • Ruby on Rails based • More hardcoded workflow and data models • … • Portland Common Data Model • Common data model (content models) so migration between front-ends/frameworks becomes easier
  17. 17. Overview 1. An overview of Fedora Commons (3.8.1) 2. Current usage by CLARIN centres 3. TLA-FLAT: a CLARIN compatible repository solution based on Fedora Commons and Islandora
  18. 18. Repository solutions in use by CLARIN centres 0 1 2 3 4 5 6 7 8 9 Fedora Commons DSpace custom LAT GIT eSciDoc Repository info on 20 B centres in the Centre registry # B centres Notes: • Meertens: custom -> Fedora Commons • MPI: LAT -> Fedora Commons • eSciDoc: Fedora Commons under the hood • Various C centres also run a Fedora Commons (based) repository
  19. 19. How happy are these centres with Fedora Commons? • Send out a questionnaire to 9 centres: 6 responses  Do you (still) consider Fedora Commons a sustainable repository solution for your center? yes no Would you advice new CLARIN centers to use Fedora Commons as (the basis for) their CLARIN-compatible repository solution? yes no maybe If you are member of CLARIN-D then you probably might want to choose Fedora, but if you're in another country you might want to take a closer look at other solutions (DSpace or TLA software).👍🏻 Depends partly on available technical expertise
  20. 20. Fedora Commons versions 0 0.5 1 1.5 2 2.5 3.6 3.6.2 3.7.1 3.8.1 4 Which version of Fedora Commons does your centre use in production? # centres Do you plan a move to Fedora Commons 4? yes no maybe benefit from Linked Open Data approach; within next 2 years We are migrating to version 4 right now. We also made major enhancements to our front-end. We are planning to go into production with it within the next months.
  21. 21. Size of the centre’s repositories # Digital Objects: ca. 150 2,500 3,038 10,000 33,000 # bytes: ca. 125M (metadata only) 5G 16G ca. 500G Both MPI and Meertens have currently over the 100.000 CMD records in the VLO, which describe resources that take up several TB (and up to 1M DOs). Experiments did reveal problems in the FC area, but they can be repaired 
  22. 22. Community support How helpful was/is the documentation available within the Fedora Commons community? not at all somewhat ok very much How helpful was/is the support by the Fedora Commons community? not at all somewhat ok very much How helpful was/is the documentation on Fedora Commons by the CLARIN community? not at all somewhat ok very much How helpful was/is the support for Fedora Commons within the CLARIN community? not at all somewhat ok very much Unfortunately there seem to be no more Fedora User Groups in Europe... Being one of the first centers to use Fedora Commons, we did use the documentation available within the FC community. At that time there was not much CLARIN documentation. This blog entry was very useful for us: http://asingh.com.np/blog/fedo ra-commons-installation-and- configuration-guide/ an option for the case one has never made use of the support should have been included
  23. 23. Frontends 0 0.5 1 1.5 2 2.5 3 3.5 none Islandora custom Do you use a front-end, e.g., Islandora, Hydra or your own, next to Fedora Commons? # centres own front-end, based on Django (EulFedora) and MySQL We developed our own, called Erdo The built-in user interface is not adequate. You will need to replace it with something better.
  24. 24. Additional advice • “Let Apache httpd (or Apache Tomcat) take care for most of the configuration (access control) and configure Fedora Commons to be "open". Take care what to store in Fedora and what not (it can be very unhandy to store too many data streams inside Fedora).” • “I consider the two offered RDF query languages (SPARQL, ITQL) by Fedora as insufficient, as both miss important features, e.g ITQL can't use regexp search and can't sort strings numerically and SPARQL can't use COUNT operator and also cannot sort strings numerically (at least in version 3.6.2).” • “For CMDI metadata, you also need the Proai OAI provider. Use the version customised for Fedora Commons.”
  25. 25. Overview 1. An overview of Fedora Commons (3.8.1) 2. Current usage by CLARIN centres 3. TLA-FLAT: a CLARIN compatible repository solution based on Fedora Commons and Islandora
  26. 26. FLAT’s predecessors • The Language Archive (TLA) at the MPI for Psycholinguistics • long history in digital archiving, especially resources on endangered languages • home build LAT (Language Archiving Technology) • 2014 – now: preparing to switch to a stack that is largely based on off-the-shelf software based on Fedora Commons + Islandora • choice made after a INNET repository workshop and several pilots • initial version based on scripts kindly provided by IDS • started as EasyLAT now known as (TLA-)FLAT (Fedora Language Archiving Technology) • doing a lot of cleanup/curation along the way from LAT to FLAT • The Meertens Institute • collecting valuable (Dutch) (physical) humanities resources for over a century • digitization projects • digital born resources • KNAW participates in TLA and the Meertens Institute teamed up with the MPI to modernize its setup and develop FLAT
  27. 27. FLAT’s predecessors
  28. 28. TLA-FLAT base line • Meet the, technical, CLARIN B centre requirements • Meet the, technical, Data Seal of Approval (DSA) requirements • Meet organization specific requirements • Meet, at least the CLARIN B centre and DSA, requirements, as much as possible, with the Fedora Commons backend • frontend (technology) come and go quickly • How far can we get using available components, configuration and a limited level of tailor made software? • Mainly to add support for CMDI • Start with Fedora Commons 3.8.x and Islandora 7.x-1.x, move along with the Islandora community to Fedora Commons 4
  29. 29. Islandora 7.x-1.x • islandora.ca • An open-source software framework designed to help institutions and organizations and their audiences collaboratively manage, and discover digital assets using a best-practices framework. • Islandora was originally developed by the University of Prince Edward Island's Robertson Library, but is now implemented and contributed to by an ever-growing international community. • Built on a base of Drupal (7.x), Fedora (3.x), and Solr, Islandora releases solution packs which empower users to work with data types (such as image, video, and pdf) and knowledge domains (such as Chemistry and the Digital Humanities). Solution packs also often provide integration with additional viewers, editors, and data processing applications. • wiki.duraspace.org/display/ISLANDORA/Islandora • github.com/Islandora • github.com/Islandora-Labs/islandora_awesome • github.com/discoverygarden • Digital Objects are not Drupal nodes, the Islandora modules interact with Fedora Commons via an intermediate (PHP) layer, Tuque • In CLAW Digital Objects are Drupal nodes synchronized using Apache Camel
  30. 30. CLARIN B centre requirements • [CLARIN-B-2] Centres need to adhere to the security guidelines, i.e. the servers need to have accepted certificates. • [CLARIN-B-3] Centres need to join the national identity federation where available and join the CLARIN service provider federation to support single identity and single sign-on operation based on SAML 2.0 and trust declarations. • [CLARIN-B-5] Centres need to offer component based metadata (CMDI) that make use of elements from accepted registries such as ISOcat in accordance with the CLARIN agreements, i.e. metadata needs to be harvestable via OAI-PMH. • [CLARIN-B-6] Centres need to associate PIDs records according to the CLARIN agreements with their objects and add them to the metadata record.
  31. 31. DSA requirements • [DSA-10] The data repository enables the users to discover and use the data and refer to them in a persistent way. • [DSA-11] The data repository ensures the integrity of the digital objects and the metadata. • [DSA-12] The data repository ensures the authenticity of the digital objects and the metadata. • [DSA-13] The technical infrastructure explicitly supports the tasks and functions described in internationally accepted archival standards like OAIS.
  32. 32. Meertens Institute & TLA requirements • [Home-1] The repository should support arbitrary deep collection hierarchies. • [Home-2] The repository should support handles as persistent identifiers. • [Home-3] The repository should work with arbitrary CMDI profiles. • [Home-4] The repository should provide resource level access control. • [Home-5] The repository should allow collection management to review submissions before the resources are actually ingested. • [Home-6] The repository should allow system management to determine the location of resources on persistent storage, e.g., from fast access times to secure tape drives. • [Home-7] The repository should allow the storage of arbitrary relationships between data sets. • [Home-8] The repository should provide entry points for interaction with Virtual Research Environments, • [Home-9] The repository should allow for collection management oriented metadata, which might not be public.
  33. 33. FLAT’s place at the Meertens Institute Drupal Islandora Fedora Commons Deposition Service (DoorKeeper) SIP AIP Workspace (ownCloud) Virtual Research Environment Persistent storage SOLR (MTAS) Backups (EUDAT) Collection Management Infrastructures (CLARIN) SWORD CMDI SP OAI-PMH 💡 💡 💡 💡 💡
  34. 34. FLAT’s place at the MPI/TLA Drupal Islandora Fedora Commons Deposition Service (DoorKeeper) SIP AIP Workspace (ownCloud) Deposition UI Persistent storage Backups (DANS) Infrastructures (CLARIN) SWORD CMDI SP OAI-PMH 💡
  35. 35. FLAT modules • Core • Fedora Commons and Islandora setup • CMDI Solution Pack • CMD to FOXML conversion • Proai setup • Indexing (SOLR) • gsearch-based solution for CMDI • Meertens’ CMDI indexer • SWORD 2.0 • Reuses a deposit via SWORD approach and implementation by DANS • DoorKeeper • Deposition UI • IMDI conversion • Shibboleth Shibboleth setup is very server specific, so there is a module that illustrates the Drupal setup and can be combined with a test IdP.
  36. 36. CMDI Solution Pack • Registers a metadata renderer in Islandora • Triggers when a Digital Object uses the CMDI content model and renders the CMD datastream • The default render XSLT can be overwritten by profile specific XSLTs • Not FLAT specific, i.e., could be reused outside of FLAT
  37. 37. Archival Information Package (AIP) isMemberOfCollection isMemberOfCollection Collection + CMDI CMD RELS-EXT DC Collection DC Image OBJ RELS-EXT DC OBJ RELS-EXT DC Collection + Compound + CMDI CMD RELS-EXT DC Compound + CMDI CMD RELS-EXT DC Video OBJ RELS-EXT DC isMemberOfCollection isMemberOfCollection isMemberOfCollection isConstituentOf isConstituentOf isConstituentOf contentLocation contentLocation contentLocation isMemberOfCollection FLAT reuses a lot of Islandora’s content models so rendering is easy. And they can be easily taken along without Islandora.
  38. 38. FLAT’s DoorKeeper • A configurable chain of actions that • Validate the CMDI, also according to centre specific requirements • Check the validity of resources against preferred formats (FITS) • Assess metadata quality • Offer the SIP for evaluation to collection management • Move new resources from a temporary workspace into persistent locations • Expand WebACL to XACML • Version management • Assign and create handles (EPIC) • Interact with Fedora Common’s API-M • Trigger indexing • Create backup bags (for DANS or EUDAT) • Creates user and develop oriented logs • Interaction via a REST API or the command line • Uses dynamic class loading, i.e., easily extensible with centre specific actions • Not too FLAT specific, e.g., usable by other repository setups or replace Fedora by DSpace  Actions are, in general, lean and mean, so its relatively easy to implement one in Java.
  39. 39. Submission Information Package (SIP) • A CMD record referring with • relative paths to resources within the package • absolute paths to resources already on the server • For example, in the user’s ownCloud data directory • (block access to system files!) • Additional files • Access control • License • … • When using the SWORD 2.0 interface these are put in a bag and zipped for upload • The SWORD interface allows upload in parts +-test-sip/ +-bag-info.txt +-bagit.txt +-data/ | +-metadata/ | | +-policy.n3 | | +-record.cmdi | +-resources/ | +-my comic.pdf | +-secret.txt +-manifest-md5.txt +-tagmanifest-md5.txt
  40. 40. Security • To hide the intricacies of XACML and design choices for content models we use WebACL to specify the access rules for a SIP @prefix acl: <http://www.w3.org/ns/auth/acl#> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . # make a specific resource (identified by the ID of the ResourceProxy) in the SIP accessible to a specific user [acl:accessTo <sip#h1>; acl:mode acl:Read; acl:agent <#other1>]. # a colleague<#other1> a foaf:Person ; foaf:account [foaf:accountServiceHomepage <#flat>; foaf:accountName "sarah@meertens.knaw.nl"]. # give the owner read and write access [acl:accessTo <sip>; acl:mode acl:Read, acl:Write; acl:agent <#owner>]. # the owner <#owner> a foaf:Person ; foaf:account [foaf:accountServiceHomepage <#flat>; foaf:accountName "bob@meertens.knaw.nl"]. shortcuts Shibboleth EPPNs
  41. 41. CMDI indexing for facetted search 1. gsearch for CMDI • Based on a XSLT that processes the FOXML • FLAT generates an XSLT for the (internal) CMD datastream • Based on the profiles in your CMD records • And a VLO-like mapping • Facet = VLO facet • Facet = concept • Facet = hard coded XPath • Only the configured facets will be available • Can also be used for the required CMD to DC mapping • Allows to run FLAT for your CMD records out-of-the-box 2. Meertens CMD indexer • Analyzes the profiles in your CMD records • Creates facets for all semantic paths it finds • Facet names based on concept links (plus context) • At runtime switch between facets for querying and rendering Includes indexing of collection and compound relationships. Islandora can use the SOLR for this instead of the resource index (by default Mulgara), which is needed in case of large collections/compounds. Replacing Mulgara by another triple store, e.g., Blazegraph, is even better, but requires all components to use SPARQL instead of ITQL.
  42. 42. Deposition UI • Drupal/Islandora module • Create a project • Upload a CMD record • Or create a new one using a form • Upload resources • Via a project specific ownCloud data directory • dropbox-like functionality • possibility to link with other providers (dropbox, google drive, ….) • no need to worry about uploading ‘big’ files • Freeze a project • Validate the SIP using the DoorKeeper (async) • Deposit a valid project • Validate and deposit the SIP using the DoorKeeper (async)
  43. 43. New vs legacy data • New data goes via the DoorKeeper so its checked against the centres policies! • Legacy (meta)data can be bulk loaded into Fedora Commons: • Convert IMDI to CMDI (optional) • Create FOXML for CMD records and resources • ResourceProxies should contain the local paths to resources, e.g., via @lat:localURI • Bulk load into Fedora Commons • Index for facetted search • Update handles • EPICify (github.com/meertensinstituut/EPICify) Scripts available, but need to be generalized.
  44. 44. Branding • Drupal has extensive facilities for styling and templating • Drupal has many modules and blocks for additional functionality • Islandora as well, and also offers solution packs • During FOXML creation resource specific content models can be used • Take care, after bulk import or via a DoorKeeper action, that needed derivatives are created • Enable solution pack specific viewers • Some experiments have been done • FLAT comes with a basic style, but the MPI/TLA and Meertens instances look very different
  45. 45. Branding
  46. 46. Where are we? • Set of Docker images that extend each other to build up a complete solution for a: • Read only interface for bulk loaded existing (meta)data (master) • Upload of new data via the DoorKeeper (develop) • Update metadata resource proxies in the CMDI collection hierarchy • User audit trails and checksums for big files • Updating existing data via the DoorKeeper • Versioning • Ongoing cleanup and enrichment of (legacy) metadata and resources, e.g., controlled vocabularies, license information In production at the Meertens Institute www.meertens.knaw.nl/flat and we are continuously moving, cleaned, (meta)data from the old setup to FLAT. CLARIN B certification based on FLAT started. Being connected to Meertens Institutes questionnaire system at the moment. A containerization platform that allows easy development, testing and deployment.
  47. 47. FLAT is moving • github.com/TheLanguageArchive/FLAT • Its birthplace, but FLAT is moving to • github.com/TLA-FLAT • Code can be more clearly split over multiple repositories • DoorKeeper • Bundles of actions • Servlet wrapper • CMDI Solution Pack • … • Docker setups • finer granualarity • Place for cooperation on • code • configuration • actions • knowledge sharing • Q&A, issues A Dockerfile precisely describes what software to install and how to configure it to get a running system. Fedora Commons, Islandora and Drupal documentation is sometimes hard to find/read and the full stack has many layers and corners. We can share our experience CLARIN-wide.
  48. 48. Let’s visit FLAT!
  49. 49. Conclusions • Fedora Commons (3.8.1) provides many of the basic functionality needed by a CLARIN B centre • Fedora Commons has a proven record of being a stable and satisfactory repository solution for many existing CLARIN centres • Transition from version 3 to 4 is starting to happen • TLA-FLAT is a modular CLARIN-compliant Fedora Commons-based solution that is easy to step in and a platform to share knowledge on running a Fedora Commons repository and its context
  50. 50. Thanks! Questions? now or later  menzo.windhouwer@meertens.knaw.nl Please visit github.com/TheLanguageArchive/FLAT github.com/TLA-FLAT TLA-FLAT team MI: Marc Kemps-Snijders, Menzo Windhouwer, Rob Zeeman, Bas van der Veen MPI: André Moreira, Daniel von Rhein, Paul Trilsbeek, Guilherme Silva

×