Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Documenting metadata application profiles and vocabularies


Published on

Presentation given in session at DCMI 2017

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Documenting metadata application profiles and vocabularies

  1. 1. Paul Walk Director, Antleaf Managing Director, Dublin Core Metadata Initiative (DCMI) Web: Email: Twitter: @paulwalk Sharing profiles: Documenting profiles and vocabularies on the Web
  2. 2. is it more important that application profiles are machine-friendly, or user- friendly?
  3. 3. the specific challenge: how to manage & publish the Dublin Core technical documentation in a more efficient & sustainable way, making it as user-friendly as possible while maintaining its machine-readability
  4. 4. context • DCMI publishes important technical documentation (vocabularies, specifications, models) on the Web • until recently, managed in sophisticated bespoke system: • sources edited as XML files • maintained in a Subversion repository • assembled & converted with shell scripts and 'Ant' • FTP to a 'staging server' • deployed to the live server by the server admin, on request • essentially a "closed" system
  5. 5. three technologies which make the difference 1. Git • stable, sophisticated, free version control technology which is ubiquitously supported • github: global scale infrastructure providing git as a service • invite contribution by 'pull request’ 2. Markdown • simple, parseable but easily readable plain text format 3. Static website generators • a new class of content management system where sources are managed locally and compiled into webpages which are then uploaded to a server (like we used to do it in the early 90s!) • supports distributed content-management via git • supports long-term preservation by requiring only simple text-based formats • supports use of desktop authoring tools - e.g. text-editors
  6. 6. we are exploring how these three technologies: * Git/GitHub * Markdown (with metadata “front matter”) * static-site generators can be harnessed together to address our challenge
  7. 7. what are static site generators?
  8. 8. what are static site generators? • a different kind of web-content management system, designed to publish content as static content to a bog-standard web-server. • content is processed during the publishing operation, rather than when the user requests content (although client-side Javascript still supported) • simple command-line application to generate content and serve pages • no database - content in semi-structured text files
  9. 9. components - standard to most systems 1. content-model • folder hierarchy, text files 2. content pages • (markdown, front-matter) • blog type content is also often supported 3. templates (& themes) • (with some level of basic scripting) 4. generator software • typically a command-line script or application 5. configuration file
  10. 10. 1. content-model • text files arranged in folder hierarchy • folder hierarchy relates to URL path structure • filename relates to URL
  11. 11. 2. content pages • "front-matter" metadata • often in YAML format like here • main body in Markdown, arbitrary HTML also accepted where necessary
  12. 12. 3. templates • can reference metadata (e.g. 'page title') from content page • can re-use 'partial' templates (e.g. a common 'header' & 'footer') • often in a common templating language such as HAML • (example below is in Go's templating syntax) = include partials/header.html . div.row-fluid div class="col-xs-12" {{if .Draft}}[**draft**]{{end}}{{.Title}} i {{}}, {{.Date.Format "Monday, January 02, 2006"}} {{.Content}} = include partials/share_buttons.html . = include _internal/disqus.html . = include partials/footer.html .
  13. 13. 4. generator software • used to generate new content: • also used to run a local sever to see how the site will look
  14. 14. deployment options • SFTP • Rsync (over SSH) • git commit hooks (or GitHub webhooks) • requires the site to be built on the server, so a little more infrastructure (a simple CGI) is required
  15. 15. 436 known generators
  16. 16. workflow
  17. 17. ‘flipping’ the approach
  18. 18. old approach (single source file)
  19. 19. new approach (many source files, one per term)
  20. 20. pros and cons • old approach (source in XML file or similar) • pros: • easy to track source files (few in number) • easy to transform into other machine-readable formats • cons: • difficult to maintain the source - not user-friendly • poor support for extensive free text description • new approach (source in Markdown+YAML) • pros: • easier to for humans to read and maintain • good support for extensive free text description • easy to re-use (partially/completely) • cons: • may not suit very complex vocabularies/or profiles
  21. 21. simplifying curation and preservation • version control and redundancy • synchronised repositories & distributed version control via Git • active curation • ease of access and contribution to sources via Git • simple & readable plain text formats (Markdown) • "one click" deployment • minimal deployment infrastructure • standard web-server • text files, open formats, no database or server-side 'logic', static site generators • reduces broken websites
  22. 22. issues & challenges
  23. 23. 1. is this still too technical for some people who may need to maintain a metadata profile or vocabulary?
  24. 24. 2. will this approach be sophisticated enough to document the majority of candidate profiles/vocabularies?
  25. 25. 3. can we generalise this approach to provide a useful, re-usable tool kit for others to adopt?
  26. 26. 4. how do we handle versioning? By term, or by ‘collection’ - e.g. vocabulary or profile
  27. 27. versioning by term
  28. 28. Paul Walk Director, Antleaf Managing Director, Dublin Core Metadata Initiative (DCMI) Web: Email: Twitter: @paulwalk Thank you!