Say our names and frame the topic we’re presenting: an overview of the work that’s been done on Sufia, Worthwhile, and PCDM over the past year.
Note that this work has been done by a large group of folks, not just by us two.
Talk about the intertwined history of Scholarsphere/Sufia and Curate -> Worthwhile.
In the beginning (fall 2012) was Penn State’s ScholarSphere (briefly say more about what ScholarSphere was: file-oriented self-deposit repository with social features and RDF metadata and the latest Hydra/Rails components)
MIKE 1.a. Sufia was extracted from SS because some in the community were looking to stand up a similar application to ScholarSphere (HydraDAM was the first) but didn’t want all the Penn State-specific branding and LDAP assumptions, and so forth. Sufia was born as a community-developed and -managed solution, not as a component that was owned or controlled by any particular organization.
And then Sufia’s models were extracted from Sufia (click to next slide to explain why)
Around that time (spring 2013) the team at Notre Dame was looking to build their own repository (Curate), and they wanted file-oriented behaviors and didn’t want to start from scratch, so Sufia has some code that fit their needs quite well. Sufia’s opinions about upload and metadata workflow, on the other hand, were not a great fit for the application they were building. So, Notre Dame extracted Sufia Models from Sufia.
The Curate gem was born, and it made use of sufia-models for file-oriented behaviors.
The Worthwhile gem came along in the spring of 2014 to simplify or streamline some of the decisions made in Curate, and also to defer to Sufia wherever it could. WW is a somewhat simplified fork of Curate with some updated dependencies — some changes: strips out a small number of features and removes pre-built AF models in favor of curation concerns, a term/concept invented by Jeremy Friesen in his work on Curate, which are models along with views,and controllers built around said models that can be generated dynamically as you need them). Curation Concerns allow you to start with a base model and then extend it to add in custom metadata, custom workflows, custom branding, etc.
This is where things stood around this time last year: Sufia and Worthwhile were using the same models, and the Hydra community voiced some concern about fragmentation and confusion due to this situation, and the sustainability of maintaining these separate but similar codebases.
At Hydra Connect in 2014 there was a session about Sufia’s future evolution especially because Sufia users were asking for some features that were already implemented in Worthwhile, so there was potential for even more confusion. The session focused on how further convergence could bring together the features of Sufia with the features of Worthwhile, from the modeling layer on up to end-user functionality. These discussions that started at last year’s HC ultimately led to the development of the Portland Common Data Model (PCDM), which provided an opportunity to bring more alignment between Sufia and Worthwhile (though PCDM is broader than Sufia and Worthwhile, broader than just Hydra, and it establishes a common data model in the Fedora 4 world which may lead to increased interoperability between Hydra repositories and Islandora repositories, for instance).
JON 1.c A quick intro to PCDM
As Mike said, result of work within the Hydra, Islandora, and Fedora communities over the past year Very briefly, PCDM is a deceptively small data model that consists of classes and predicates that can be used to model arbitrarily complex digital resources. The classes represent types of things we all have (C, O, F) and enforce what types of metadata and resources can be associated with them. You’ll notice that two of the classes are recursive, leaving you plenty of rope; hence the arbitrary complexity The predicates limit the the ways in which these classes can be associated with one another. There is an extension for ordering based on OAI-ORE (PCDM is) Stable, but as is often the case, boxes and skittles on a PowerPoint aren’t an implementation. As our and others’ impl mature, feedback those experiences will no doubt refine details within PCDM
JON 1.c That’s all we’ll say because there are two sessions dedicated to PCDM and its impl in Hydra on Wednesday
1 TO 2: We’re going to talk about use cases now, and how we came to realize that for example PSU and Princeton, who appear to have very different use cases figured out that we could and should be working together to build a common foundation
At Penn State… for both of our Sufia-based production applications, ScholarSphere and ArchiveSphere, we needed what Sufia already provides plus (1) a new workflow for multi-file objects, (2) object sub-typing (e.g., for type-specific metadata or workflows, or curation concerns), (3) collection hierarchies, and (4) complex nested objects.
In parallel, around spring of this year Penn State was eager to start implementing multi-file works in Sufia, so our priority in the spring was to implement PCDM support in Hydra gems — and the repository community had spent the past 6 months developing PCDM, so we felt it was mature and stable enough to test via implementation. Through conversations with fellow Hydra partners and community members, we learned that we weren’t the only institution that was eager to start building multi-file works in Sufia using the PCDM.
JON 2.b Princeton’s use cases aren’t that different from much of what you’ve already heard: We want(ed) a lot of the features and behaviors that Sufia’s models could get us But we have very different types of objects and workflow requirements that are provided by Sufia We want to work with the community and not go it alone
JON 3.a Princeton started work in April of this year, though we’d been framing our use cases for some time before that.
We had contracted for an extended engagement w/ Matt Zumwalt, and that’s when we realized it didn’t make sense to simply pick up Worthwhile and run with it. It wasn’t completely working w/ F4 There wasn’t a big developer community around it It was close to being a fork of Sufia / reuse of Sufia in its current state had been taken as far as it could But again back to Features: But we knew we wanted a lot of stuff Sufia has and that our use cases were not unique So we chose to take a step back...
JON 3.a.i Matt spent a significant amount of time analyzing the SOURCE CODE (the modules) within Sufia and Worthwhile
JON 3.a.i And also analyzed the high-level features in existing gems, particularly Sufia, Worthwhile, and Hydra-Collections
The result of this exercise was a plan which: Re-imagines the Sufia / Curate / Worthwhile lineage using a bottom-up approach Uses PCDM as a foundation (New Code) And results in a foundation on which many feature-rich Hydra Heads, incl. Sufia, can be built Thus maximizing code reuse and our (the community’s) ability to collaborate on shared solutions
Transition to Mike: We took this plan to the community via the sprint that PSU was organizing for vetting and refinement
It quickly became clear that this new solution would work for Princeton, and for Penn State, and for the other institutions who signed up for a two-week open Hydra community code sprint in May. The open sprint in May produced a solid foundation on which to base further development, and there was a lot more work to do. A number of organizations who were involved with the May sprint continued working on this new stack over the summer, with each institution dedicating developer cycles over a sustained nine week sprint to see this work through. [Slide acknowledging the institutions that took part in the May and summer sprints: Princeton, Penn State, Data Curation Experts, Oregon State, Michigan, Stanford, UCSD, Alberta, Cincinnati, Virginia Tech, Indiana, and Cornell]
Jon mentioned Matt Zumwalt’s involvement and though you may not see his name here... Without his creativity, deep knowledge of Hydra, and eye for detail, we would likely not have built out the stack that we now have. Matt was responsible for the lion’s share of the early planning (like the spreadsheets you just saw) and also key architectural design discussions that led us to where we are now. So, thank you, Matt.
The ultimate result of our early planning work is CurationConcerns; which takes its name and, to some extent its scope, from the concept coined by Jeremy during his work on Curate.
Walk through this new alignment, but first,
If one were to set out to design an application from scratch, they would probably never come up with a set of gems/features organized in this way. This illustrates the strength of the community and the ways we’ve learned to collaborate. The idea here is that one can build an application using any of the leaves of this onion and those beneath it, and the more you use, the more features you get. Finally, I want to point out that there’s not actually a ton of new code here. As was reflected back on the spreadsheet screenshots I showed early, this really is a redistribution of features into a more tightly scoped set of gems that we hope will maximize opportunities for reuse.
AF. It’s been here since the beginning.
PCDM with no further opinions.
Hydra::Works is where it starts to get interesting. With H::W you start taking on more opinions, but with those opinions come features
Characterization Make Derivatives Virus Checking Text Extraction Group them to create ‘Works’
JON 3.c. I want basic views, and more shared features: Custom Models Routing for Create-Read-Update-Delete File Auditing Versioning Single-Use links Upload File Sets Access Controls Leases and Embargoes
JON 3.c. Proxy Deposit User Dashboard User Profiles Featured Works & Researchers Contact Form WYSIWYG UI Editing Activity Streams Upload via Cloud Providers Integration with Zotero
We’re out of the “Google Docs” phase.
Outstanding tasks are understood and discreet enough that they can be real issue tickets.
Now that we have released versions of all the gems, which among them contain a lot of functionality, we’ve turned our focus towards making sure we’ve got the functionality attached to the right gems. Work is now happening on the 0.2 release of CurationConcerns which moves a number of ingest features (fulltext extraction, file characterization, virus checking) down the stack into Hydra::Works. After that release, we’re turning our attention back to Sufia and will work on updating Sufia atop CC 0.2 and moving closer to the Sufia 7.0 release. As for when that can happen, it depends on how we collectively resource that — all of the work that needs to happen is ticketed on GitHub, and Jon and I (as product owners of CC and Sufia, respectively) would be happy to chat with you about how we can effectively work together to get Sufia 7.0 done.
Like we do with all the other gems in the Hydra ecosystem, we will be continually examining where features in this stack are implemented and tweaking where they live over time. Key to this is learning more about how you all want to be using PCDM, CurationConcerns, and Sufia.
The best way to surface issues and hone the distribution of features across the stack is to use it!
A more Worthwhile Sufia: Now with PCDM
A More Worthwhile Sufia
Now with PCDM™
Jon Stroop (Princeton University)
Mike Giarlo (Stanford University)
Introducing Hydra::Works: PCDM in Hydra
Portland Common Data Model for managers - what is it,
should I use it?
Penn State Use Cases & Desiderata
Sufia + Multi-file Objects
New upload workflow in addition to existing
Built atop a common data model with community buy-in
Ability to add new object types over time, e.g.:
Princeton Use Cases & Desiderata
Have Multi-File objects of all sorts
Need to Support Distributed / Flexible Workflow
Want many of the Sufia/GenericFile Features:
Derivatives / Transcoding
Leases / Embargoes
Where to start?
Penn State University
Data Curation Experts
Oregon State University
University of Michigan
University of California, San Diego
University of Alberta
University of Cincinnati
I want to talk to Fedora in a way
that feels like Rails / ActiveRecord
I want to build my own
I want to do things with files, like:
● Make Derivatives
● Virus Checking
● Text Extraction
● Group them to create ‘Works’
I want basic views, and more shared features:
● Custom Models
● Routing for CRUD
● File Auditing
● Single-Use links
● Upload File Sets
● Access Controls
● Leases and Embargoes
I want a fully functioning IR, with:
● Proxy Deposit
● User Dashboard
● User Profiles
● Featured Works & Researchers
● Contact Form
● WYSIWYG UI Editing
● Activity Streams
● Upload via Cloud Providers
● Integration with Zotero
Where Are We Now?
1. CurationConcerns 0.2
2. Sufia 7.0
3. New features up and down this stack
a. Mediated deposit
b. Administrative dashboard
c. See my lightning talk on usage survey for more
How to get involved
Check out the code on GitHub
Hydra::PCDM, Hydra::Works, CurationConcerns, Sufia
Grab a ticket off Waffle
PCDM and Works, CurationConcerns and Sufia
Tell us about your needs, goals, and timelines
Start a thread on hydra-community
Propose a topic for a Hydra Tech call
Talk to folks in the community about how we can work together
Mike Giarlo, as Sufia product owner