New 2024 Cannabis Edibles Investor Pitch Deck Template
Verbose explanations about data for transcription
1. Disclaimer: I may gloss over or poorly represent tangential responsibilities in this message to
serve the purpose of verbosely illustrating data object creation and transport. json documents
are pulled from extant samples and may not mirror precisely what will be created or be adjusted
for current standards. This is not intended as a specification or work document.
These are things that currently have to happen for a user to move from finding an image to
transcription practice:
1. Host images at a stable URI (or URL)
2. Honor requests for image resizing, manipulation, cropping, etc. at a predictable URI
pattern (IIIF)
3. Store all of the metadata associated with the imaged manuscripts and their folios
4. Query selected metadata (facets) for sets of matches among these collections
5. Provide endpoints for T-PEN to query for specific or sets of collections, manuscripts,
and folios (manifests and canvases)
1-4 happen on the paleography site (within Islandora I believe) but 5 will have to be negotiated.
Any data that is accessible in T-PEN will either need to be stored in T-PEN or resolved through
use of the transcription project identifier (specifically, a SharedCanvas Manifest URI).
Collections, Manuscripts, and Folios (at least) should all have a URI which resolves to a
descriptive JSON-LD document.
Let's take it in pieces:
sc:Manifest
{
"@context" : "http://www.shared-canvas.org/ns/context.json",
"@id" : "http://t-pen.org/Example+Simple/manifest.json",
"@type" : "sc:Manifest",
"label" : "Example Simple",
"metadata" : "http://paleoberry.org/Example+Simple/metadata",
"sequences" : [ {
"@id" : "http://t-pen.org/Tradamus+Simple/sequence/normal",
"@type" : "sc:Sequence",
"label" : "Current Page Order",
"canvases": [
"http://t-pen.org/Example+Simple/canvas/100r",
"http://t-pen.org/Example+Simple/canvas/100v",
"http://t-pen.org/Example+Simple/canvas/101r"
]
}
At its core, a manifest is just a sequence of canvases. Each new arrangement is a unique
canvas. In fact, the IIIF manifest standard suggests "There are no semantics conveyed by
[metadata] information, and clients should not use it for discovery or other purposes." In T-PEN,
each project has its own sc:manifest, even if several people are working on the same
2. manuscript. That said, there should be some URI a person or machine could dereference to see
a JSON-LD file of Newberry's curated arrangement of images with all available associated
metadata. However, the metadata field can itself be a URI, as it is just key-value pairs, meaning
if Iter made available something like newberry.org/IIIF/SHELFMARK/metadata.json and
returned JSON, any authorized user could make a legitimate and good sc:Manifest, including T-
PEN or any Open Annotation Store.
Bottom line: Official transcription projects will have manuscript metadata in Islandora, but the
project sc:Manifest can be resolved via T-PEN and will include a reference to a metadata URI
(spec).
sc:Canvas
{
"@id" : "http://t-pen.org/Example+Simple/canvas/100r",
"@type" : "sc:Canvas",
"label" : "100r",
"height" : 1000,
"width" : 667,
"images" : [ {
"@type" : "oa:Annotation",
"motivation" : "sc:painting",
"resource" : {
"@id" : "http://paleoberry.org/iiif/Example+Simple/res/100r.jpg",
"@type" : "dctypes:Image",
"format" : "image/jpeg",
"height" : 2365,
"width" : 1579
} ]
"otherContent" : [ {
"@id":"http://t-pen.org/Example+Simple/lines/100r",
"@type":"sc:AnnotationList",
"resources":[
"http://t-pen.org/Example+Simple/line/101083792",
"http://t-pen.org/Example+Simple/line/101083842",
"http://t-pen.org/Example+Simple/line/101083841" ...
]
} ]
}
There does exist a sc:Canvas URI in the authoritative sc:Manifest, but as soon as a new project
is created in T-PEN, it is copied so as not to pollute the original with annotations. Most of the
metadata, if not all, is already covered in the sc:Manifest, so the label is the most important
thing it carries. In fact, even the image resource does not have to be included and can also be
3. just a URI (IIIF). This canvas is the target of all new annotations (image and transcription) and is
stored and resolved through T-PEN.
Bottom line: The canvases and manifests are linked through @id:URI. The canvas, in this case,
has the image dereferenced, but it can be anything that results in an image (likely a IIIF URI). A
‘service’ property can carry additional instructions if special access is needed.
sc:Annotation
{
“@id" : "http://t-pen.org/Tradamus+Simple/line/101083792",
"@type" : "oa:Annotation",
"motivation" : "tr:transcribing",
"resource" : {
"@type" : "cnt:ContentAsText",
"cnt:chars" : "Infesto Trinitatis"
},
"on" : "http://t-pen.org/Tradamus+Simple/canvas/100r#xywh=148,60,409,18"
}
The region, as it is rectangular, is abbreviated in the ‘on’ as “#xywh=” per the standard. The
content of the transcription is a ‘resource’ that is ‘cnt:chars’ literal string here, but can be
anything in the standard (XML, HTML, JSON, MEI, OGG). In T-PEN, for simplicity it is always a
straight UTF-8 string with escapes for ” to avoid breaking JSON. Any authorized user can use
the ‘@id’ to resolve this annotation as a JSON file and climb up to the canvas as well, if desired.
Bottom line: This is CDH territory, but there are things needed to get this far that may not be
instantly available.
Accession
When the paleography site wants to include a new manuscript for transcribing, it will need to
create one of three links:
1. On the fly project creation
○ The use case for this is only to create the official transcriptions or for a user to
start work on an untranscribed, but discoverable manuscript.
○ Send T-PEN a new sc:Manifest JSON object request that includes a label
(unless this is generated from metadata), a metadata URI, and an ordered list of
labelled images.
○ In a perfect world, this would be a skeletal, but otherwise well-formed sc:Manifest
object so that T-PEN can simply extend it and return it when further URIs are
minted.
○ The user is then passed into T-PEN with this new Manifest displayed as their
new project.
2. Copy a project for transcription practice
○ Most users will follow this path to get into transcription.
4. ○ Send T-PEN an existing sc:Manifest URI (such as the official transcription) with a
copy request.
○ T-PEN creates a new manifest for the users project with the same sequence,
labels, images, and lines, but without the transcription data.
○ Nothing is passed back to the paleography site.
○ The user starts a new project in T-PEN, related to the original in case they want
to check their transcription work.
3. Work on a public project
○ Newberry can decide how heavily this is encouraged, but it is easy to do and
may be a good way to crowd-source some transcriptions or line annotations.
○ Send T-PEN an existing sc:Manifest URI (such as the official transcription).
○ The user is authenticated and begins to transcribe on the public project as
permissions allow. (This will not break the official transcription.)
Bottom line: If T-PEN is the gatekeeper for the new sc:Manifest URIs, it will have to let Islandora
know what it has minted (or agree to a strict convention) and Islandora will have to mint a URI
for the manuscript metadata and construct something very manifest-esque to request its
creation. If Islandora stores the whole manifest, T-PEN will have to let it know what URIs have
been created for every new sc:Canvas in the sequence. There is no good case for canvases or
annotations being stored outside of T-PEN.