Special Topics: Data Provenance (Transcript)

ROUGHLY EDITED COPY
ALA
Special Topics: Data Provenance
July 22, 2019
CART/CAPTIONING PROVIDED BY:
ALTERNATIVE COMMUNICATION SERVICES, LLC
WWW.CAPTIONFAMILY.COM
"This text is being provided in a rough draft format.
Communication Access Realtime Translation (CART) is provided in
order to facilitate communication accessibility and may not be a
totally verbatim record of the proceedings."
>> Hi, everyone, it's noon Eastern Time. We're going to
get started. Welcome to the second session of the RDA 3R
training special topic series. We're glad to have Thurstan
Young here today. I'm going to make a quick technical
introduction here. I'll keep it short. I know Thurstan has a
lot to cover. The first thing I want to point out is the chat
space on the lower right-hand side of your screen. Many of you
have introduced yourselves there already. Please feel
encouraged to use the chat space. There's no question too big
or too small for the chat space. Any time you have a question,
just go ahead and use that. If you need help, you can click on
the pull down window underneath the chat space that says to and
select the user, host. That'll send a private chat to the host.
We can answer your technical questions if you've got them.
We'll do Q&A at the end of the session, but you don't need
to hold your questions for the Q&A period. If you have a
question, put it in the chat space and we'll relay it to
Thurston at the end. Ask your questions when you've got them.
If you're having audio problems, you can troubleshoot those by
going up to the top of your screen, clicking on communicate and
then audio connection. That'll give you the option to call in
or adjust the settings on your computer. If you hear an echo,
it's probably because you have two broadcasts open
simultaneously. If you close one, that should resolve right
away. If yours quality is going in and out, the best way to
handle that is to disconnect and reconnect. You'd do that by
clicking communicate, audio connection, and disconnect audio and
immediately get an option to reconnect.

We are recording today's event. You'll get an e-mail
giving you full access to the archive, an audio/video recording
of what you see right now. We'll also send you the slides on
slide share. Everything will be recorded if you missed part of
this or need to hear it again. There are many additional
eLearning opportunities available at the ALA store. Take a look
at ALAstore.ALA.org. We're glad to have Thurstan Young here
with us today. The collection metadata analyst at the British
Library. He maintains the MARC21 bibliographic format mappings
within the RDA Toolkit. I'll turn things over to Thurstan and
they'll get started. Glad to have you here.
>> Thurstan Young: Hi, thanks very much. I'll give you a
quick run down of the things that we'll be covering today.
So... we're going to have a look at the definition and scope of
data provenance as it relates to RDA. Guidance that's available
in the RDA toolkit. The elements we're likening to data
provenance, vocabulary encoding schemes, recording methods for
various elements and the application of data provenance in RDA
and also, future developments. And as I go through, I'll be
comparing coverage in the current toolkit with that in the new
data toolkit.
Okay... so to startoff, the definition for data provenance.
In the current toolkit, there's no glossary definition. Within
the new toolkit, the definition for data provenance is given as
follows. Information about the metadata recorded in an element
or set of elements. Metadata about metadata or meta metadata.
So... guidance on the recording of elements in relation to
data provenance is dispersed between the different entity
chapters in the current toolkit where data provenance relates to
choice of sources for transcribed data, this is dealt with as
part of the manifestation of items.
The new toolkit provides the guidance chapter which focuses
solely on data provenance. Besides covering data provenance as
it relates to vacant entities, guidance also encompasses the new
RDA entities nomenclature and timestamp. It considers both the
choice of sources for transcribed data and the attributes and
relationships of that data.
It also offers guidance on the purpose, structure,
attributes and relationships with resource descriptions
themselves or... metadata works.

The change in scope on guidance of data provenance. This
is achieving the following outcomes. Greater alignment with the
Library Reference Model, greater support for linked data
applications. Greater support for local applications and
greater granularity of applications.
I mentioned the new toolkit guidance on data provenance.
And... the new guidance chapter on data provenance specifies
that the purpose and value of data provenance is as follows.
Data provenance provides information about the metadata recorded
in an element or set of elements. This information can be used
to infer the context and quality of the metadata.
Moving on to structure. The high level structure of this
metadata is set out as follows. Metadata being described by
data provenance are treated as a metadata work that consists of
metadata, metadata statement or a metadata description set.
On the guidance on data provenance goes objection offering
illustration of what metadata work is and some of the attributes
associated with it.
For example... a descriptive catalog record is a set of
statements about a specific information or source that often
includes subsets of statements or the work including manifest
and item that constitute the resource. The catalog record is a
metadata work that is realized in specific encoding scheme and
embodied by a carrier such as a microfiche or online
manifestation.
Another word on relationships. The guidance also notes
that the metadata work may be related to other entities. These
lesser entities may or may not be related to each other.
Metadata work may include statements about more than one entity.
The entity may be related or unrelated.
Okay... so... this term metadata work -- I want to go a
little further into that. So... again... there's no reference
to this term in the current toolkit. The new toolkit glossary,
work that is a metadata statement or... a metadata description
set.
Then it goes on to provide a pass-through definition for
metadata statement. Metadata statement is defined as a piece of
metadata that assigns a value to an RDA element that describes
an individual instance of an RDA entity.

Lastly, metadata description set is defined as one or more
metadata statements that describe and relate individual
instances of one or more RDA entities.
So... let's have a look at what that looks like, in terms
of a diagramatic representation. So... this diagram offers a
basic practical representation of the relationship between an
RDA entity being described and the metadata statement which
collectively may have got the metadata description sent.
Each statement is a metadata work in its own right.
Like... the metadata description set which aggregates them. And
here... is a representation of these relationships. Using a
manifestation of The Tempest by William Shakespeare.
Transcribed from the manifestation, each represent the metadata
statement and... metadata work. Together, they form a basic
metadata description set, which is also a metadata work.
Then... we have an example using the person entity, William
Shakespeare. Consisting of the appellation of person, related
time span of person, 1564-1616 as another. Okay... when brought
together, these two statements form a metadata description set
and metadata work.
Now... specific guidance is available for common
requirements associated with recording data provenance. There's
a list of those requirements covered by the new toolkit guidance
chapter.
From our guidance on the recording data of metadata,
recording an agent who published metadata recording an agent who
records metadata, content standard use for metadata, language of
description, scope of validity of metadata, script of
description, a source of metadata, whether the source of
metadata is the manifestation that's being described or source
is not the manifestation that's being described. The time span
for validity of metadata, time span when metadata are published
and transcription standard used for metadata.
So... I'm going to go into some of these categories a
little deeper. Here's -- each focuses on recording metadata
about metadata or meta metadata. Know that the guidance
provided is optional in each case. Highlighting the emphasis on
flexibility of applications in the new toolkit.
What we have here is the guidance for recording an agent
who records metadata. We have one option, recording metadata

work. This is instance of a work entity and we follow this with
another option to record the metadata work at author agent of
the metadata work.
Another example... recording a content standard used for
metadata. We have the option to record a metadata work as an
instance of the work entity and to record a content standard
used for the metadata as a work, related manifestation of work
of the content standard.
And also... an option to record a string encoding scheme
for a structure description as a work related work of work of
the string encoding scheme.
So... let's have a visual, have a look at what that would
mean in terms of our visualization. So... here we've got an
example, an application, looking at how to describe
manifestation of The Tempest by William Shakespeare. The
British library using the code UK. The British library is an
author agent as a metadata description set for the manifestation
being described.
Meanwhile, the mark code RDA is used to represent the
content standard applied. It's a related manifestation of the
metadata description set for manifestation being described.
These statements might body a proper statement of
responsibility. Also, metadata works in their own right. Also
note that RDA allows us to create additional links for these new
statements to the individual statements for title proper and
statement of responsibility.
Remember, this is optional and I'm not going to do so right
now.
And here is the same relationship, set of relationships
I've just described but applied to our person description. In
this case, the marked code ELC is recorded to represent the
Library of Congress as an author agent as a metadata description
set. We have RDA there as the related manifestation work for
the content standard.
Okay... so... the guidance from sources in the new toolkit
begins with a section which contextualizes the source of
metadata within the resource description process and the
description of other related entities.

So... again, we start with the option to record a metadata
work with instance of a work entity and... that, source of
metadata may be a manifestation that's being described for
another manifestation.
A manifestation is being described, may carry textual
content that can be transcribed or otherwise used as a source of
information for a metadata work about the manifestation.
The manifestation may carry content that can be used as a
source of information for a metadata work about any RDA entity.
Going back to our visualization for manifestation of a
template. Here, we can see how source of metadata relates to
the manifestation being described in The Tempest. The
manifestation is the source of metadata, while its textual
content provides the means for recording a title proper and
statement of responsibility.
And here... we see how the textual content would produce
our statement of responsibility, can be further used to generate
information which relates to an entity other than the
manifestation level description. The statement of
responsibility relating to partial proper can be used as the
work entity element of the person.
The guidance chapter goes on to list the RDA elements for
which a transcribed value may record an unstructured
description. This includes both the elements, themselves and
their sub-elements and element subtypes.
We've got a description statement, addition statement,
manifestation statement, manufactured statement, production
statement, publication statement, series statement, statement of
responsibility and title of manifestation.
So... next, we've come to a choice of sources for
transcribed elements. Where guidance in the toolkit doesn't
explicitly contextualize the resource description process, it
does offer detailed guidance on the [indiscernible] source which
may be found within the manifestation being described. It also
provides guidance on the choice of sources within the
manifestation and what action to take when the manifestation
being described doesn't carry source of information required to
identify. This guidance is located in chapter two of the parent
toolkit. Which covers identifying manifestations and items.
Section 2.2 of the chapter is headed sources of information and

breaks down as follows. So... we get a section on application,
preferred source of information, general guidelines, and
specific information covering manifestations consisting of one
or more pages, leaves, sheets or cards or images of one or more
pages, leaves, sheets or cards. Manifestations consisting of
moving images, other manifestations and then move on to more
than one preferred source of information occurring on the source
and other sources of information.
Now... it should be said at this point that the guidance on
current toolkit cover certain areas that are out of scope in the
new toolkit. While the current toolkit guidance encompasses
something material, the new toolkit covering sources is
not -- this is because, the general level, this concept is
detailed with -- in the context of aggregates.
Likewise... the current toolkit's guidance on choice of
sources refers to comprehensive and analytical descriptions.
The new toolkit's guidance doesn't. Instead, different
categories of description are encompassed by the guidance on
resource description.
Conversely, the new toolkit does introduce a new element in
the recording source which can be used to record the choice of
source used to record a transcribed element. This is referenced
in the new toolkit's guidance on data provenance. With which to
record a value of this element. Under the heading, recording a
source of metadata, as if the manifestation is being described,
we think the following condition, followed by an option. The
condition reads the source of the metadata work is the
manifestation that is being described. And the option is to
record the source of information as a recording source.
We go to the glossary, find the definition of recording
source -- the source of information for a metadata work that is
an unstructured description transcribed from a manifestation
being described.
Note... by unstructured description, the definition refers
to information being transcribed fry the resource such as a
statement of responsibility. Rather... than the source from
which that information is taken. The source from which
information is taken can be recorded in an unstructured way,
but... also, in a structured way, using a control term or as an
identify ROI.

So... we find one term on the recording source. Things
like title card, title frame and title page. As mentioned
before, the current toolkit provides guidance on the choice of
sources for different categories of manifestation.
Manifestation is consisting of one or more pages, leaves, sheets
or cards or images of one or more pages, leaves, sheets or
cards, broken down as follows. Cover modern print, early print,
reproductions, microforms and computer mediated text. In the
new toolkit, these are dealt with in the data provenance section
on the recording source of metadata for the manifestation being
described.
So... in the new toolkit, we have a condition and option
combination which addresses the choice of sources for modern
print text. This corresponds to guidance in the current toolkit
however... it's noteworthy that in the new toolkit, the order of
preference is optional. In the current toolkit, it represents
half of the main instruction. Source metadata work is a
manifest being described. Manifestation consists of one or more
painted leaves, sheets or cards. The further following
recording sources in this order. Title page, title sheet or
title card, a cover or jacket issued with the manifestation. A
caption, a mast head, and a colophan.
Returning to manifestation of The Tempest by William
Shakespeare, we have a title page and choice of source. So...
record the title proper and statement of responsibility from
that, since it comes first in our order of preference.
Note the optional nature of this new guidance. So...
manifestation is an early printed resource, and we have the
option to preserve the following sources in the resource. Title
page, title sheet or title card. A colophan, a cover or jacket
issued with the manifestation and a caption.
Similar pattern can be found in the guidance on choice of
sources for reproductions. So... manifestation is a
reproduction of one article pages, leaves, sheets or cards. And
prefer a title page, title sheet, title card of the
reproduction.
And the same condition option combination applies to this
choice of sources for microforms. The manifestation is a
microreproduction of one or more pages, leaves, sheets or cards
and have the option to prefer a readable label, permanently
fixed as a manifestation as a source of information.

Okay... so the current toolkit's guidance on choice of
sources for moving image resources are broken down into the
following categories.
Tangible manifestations and online resources. The new
toolkit guidance is again, a section on recording the source of
metadata which is the manifestation being described. It makes
no high level distinctions between different types of carrier,
but provides guidance according to whether the resource being
described does or doesn't feature a title frame, title screen or
permanently affixed label.
Hence, we have the following condition and option
combination of moving image resources. Manifestation embodies
moving images. So... we have the option to prefer a title frame
or title screen as a source of information. Another option to
prefer a label that is permanently affixed to the manifestation
as a source of information.
The following conditions, manifestation lacks a title frame
or title screen. Prefer the following sources in this order. A
label that is permanently fixed to the manifestation, container
of the manifestation, digital menu and embedded metadata of a
digital manifestation.
The current toolkit guidance on sources of information for
categories of described manifestation, not already dealt with,
again... breaks down in terms of tangible and online. The new
toolkit makes no high level distinctions between different types
of carrier. Instead, it's of specific guidance, manifestations
which do not consist of one or more pages, et cetera, of moving
images.
Lest we have the following condition and option
combination. So... a manifestation doesn't consist of one or
more pages, leaves, sheets or cards. The manifestation doesn't
embody moving images. Our option is to prefer the following
recording sources that carry a title in this order.
The textual source on the manifestation or a label that is
permanently affixed to the manifestation. Embedded metadata in
a contextual form that contains a title. A container of the
manifestation and another source on any part of the
manifestation, itself, giving preference to sources in which the
information is formally presented.

Some manifestations are for different choices of source,
based on the presence of textual content in multiple languages,
scripts or bearing different dates. The current toolkit covers
all these categories of manifestation as does the new toolkit.
However... as with other guidance on choice of sources already
covered, the new toolkit is not prescriptive. Rather, the user
is pre-qualified on an optional basis.
So... we've got the following conditions. May be applied
to sources in more than one language or scripts. The
manifestation contains recording sources in one or more, in more
than one language or script. And the following option. Prefer
recording sources of information in this order. A recording
source is the language or script that corresponds to the
language or script of the content of the manifestation.
A recording source in the language or script that
corresponds to the predominant language or script of the content
of the manifestation.
A recording source in the language or script of
translation, if the manifestation embodies the same work in more
than one language or script and translation is known to be the
purpose of the manifestation.
The recording source in the original language or script of
the content, if the manifestation embodies the same content more
than one language or script and the original language or script
can be identified. A recording source that occurs first and a
recording source in the language or script preferred by an agent
who creates metadata. If the resource is formatted tete-beche
as a head-to-head bound volume or head to tail bound volume.
The following condition or option may apply to sources bearing
more than one date. The manifestation provides two or more
preferred recording sources with different dates.
And we have the option, preferred roaring source with the
latest date.
Now, the current toolkit addresses another situation in
which different choices of sources are present in another
manifestation. That in which manifestation provides the
original and reproduced content. This is another case in which
the new toolkit makes no distinction. The general guidance on
choice of sources for reproductions apply.

As a reminder... since the current toolkit guidance prefers
the reproduction over the original, applying the new toolkit's
general guidance on reproductions would result in the same
outcome as regards to choice of sources.
Again... though, unlike the current toolkit, guidance
provided in the new toolkit is optional.
Now... we've dealt with our situation in which the choice
of source pertains to the manifestation being described. Let's
move on with other sources, the current toolkit breaks these
down into the following categories in order of preference.
Accompanying material, published descriptions of the
manifestation, the container not issued with the manifestation
and other variable source. The new toolkit offers equivalent,
though... optional guidance apart from the accompanying material
which was previously mentioned by separate guidance.
The new toolkit guidance is situated under the general
heading recording the source of metadata.
And here's the relevant condition and optional order of
preference.
So... condition reads, the manifestation that's being
described doesn't provide a source of information for an
element. Our option states to professor the following external
source of intonation in this order. A published description of
the manifestation, a container that is not issued with the
manifestation, itself and any other available source.
Now... the new toolkit also offers guidance on how to
record sources of information when information is from outside
the resource being described.
Whereas, the current toolkit refers to this, supplies data,
indicated by notes or other means, such as encoding or square
bracket. The new toolkit's guidance is more specific in one way
and more generalized in another.
Rather than recording a note, the new toolkit guidance
provides an option for recording the elements, sourced and
sorted for metadata work, in the case of where the information
is not the work of the manifestation being described. In cases
where only a potential source of information is found, outside
the manifestation being described, it provides an option for
recording the element note on metadata work. However...

information taken outside the manifestation, the new toolkit
doesn't refer to encoding, square brackets or any other means of
doing so.
Here, we can see the condition and option recommendations.
A source of metadata is not the manifestation that is being
described.
Record the source of information as a work, source
consulted of the metadata work and an option to indicate that
the value of the metadata work is not taken from the
manifestation that is being described.
Here... we can see the condition and option combination for
a source of metadata that is not the manifestation being
described, in which, it's only a potential source of metadata.
Given the option to record the source of metadata, metadata work
is a manifestation of the metadata work and indicate that
additional information is not found in the source.
So... moving on from the guidance chapter, we can consider
the list of elements, which is specific to recording data
provenance information in the new toolkit.
These break down into two broad categories. General meta
elements, relating to the work entity and Nomen-related
properties. Unstructured, structured, identified or ROI.
Context of use can only be recorded in an unstructured way.
Undifferentiated name indicators can only be recorded in a
structured way. And so... the validity can be recorded in both
structured and unstructured ways. But... not an identifier or
ROI.
We already looked at the recording source on the context of
governance on data provenance. Let's go on to address the
elements of source consulting.
Source consulted is an element already present in the
toolkit. It can be used as a means of identifying wordsmith
expressions, referenced in chapter eight. It can also be used
as a means of identifying agents.
And... as a means to record relationships between works,
expressions, manifestations and guidance. And relationships
between agents.

The current toolkit glossary defines source consulted as
follows. The resource used in determining the name, title, or
other identifying attributes of an entity, or... in determining
the relationship between entities.
The new toolkit defines source-consulted as follows.
Manifestation in which there is evidence for metadata work. And
it also has a reciprocal element source consulted of defined as
follows. It's a metadata work for which a manifestation
provides evidence.
Now... returning to our diagram for person description, it
includes an example of a source consulted for the metadata
description set. In this case, our person entity, William
Shakespeare, we have the source-consulted encyclopedia.
Encyclopedia Britannica. And we have that referenced in terms
of the metadata statement in its own right and as part of the
larger metadata descriptions.
Another meta element in the toolkit is cataloger's note.
It can be used as a means of identifying expressions. Reference
to it can be found in chapter five. It can also be used as a
means of identifying regions of reference in chapter eight.
In addition... this element can be used with relationships
between works, expressions manifestation as we checked before.
And relationships between agents. Covered in chapter 29.
Another element in the toolkit is explanation of relationship.
This performs a similar role to cataloger's note with relating
works and expressions.
And... agents...
So... in the current toolkit, catalogers note is defined as
annotation that clarifies the selection and recording of
identifying attributes, relationship data, or access points for
an entity.
Whilst explanation of relationship is defined as
information elaborating on or clarifying a relationship between
related entities. In the new toolkit, cataloger's note and
explanation of relationship would be replaced by the new meta
element note on metadata work.
If you look up cataloger's note and explanation of
relationship in the new toolkit glossary, you'll find and see
references to names on metadata work.

Metadata work is defined as a broad, unstructured
description of one or more attributes of a metadata work.
Here, we see an action. We can see the note on metadata
work is being included in the form of a formerly CIP note. The
British library recorded it.
The last metadata element to cover is scope of validity.
This is new to the toolkit. An unstructured or structured
description of a range of works for which the value of a
metadata work is valid.
Turning now to nomen-related properties. The current
toolkit lists the following elements all used to identify agent.
Scope of usage, dates of usage, status of identification and
undifferentiated name indicators. These are all found in
chapter eight of the current toolkit. So... let's have a look
at the definitions.
So... scope usage is defined as follows. A type or form of
work chosen as a preferred name for an agent.
In the new toolkit, scope of usage is replaced by the
element context of use. Again... if you look for scope of usage
in the glossary, you'll find a C reference. Contexted use,
which is defined as the circumstances or situation in which an
appellation of an RDA entity is used. And again... here we have
a visualization of that. In our diagram, I noted that patient
description, after the person William Shakespeare, the element
context of use... and by extension... a metadata statement in
its own right. In this case, the context of use for the agent
description is a machine-generated nonreference script.
Moving on, the next element to consider is date of usage.
This is defined in the current toolkit as follows. A date or
range of dates associated with the use of a name chosen as the
preferred name for a person. The new toolkit defines data usage
as follows. A date or range of dates that is associated with
the use of an appellation of an RDA entity. An indication of
level of authentication with data identified entity. The new
toolkit defines it as follows... level of authentication of the
nomen of an entity.
You can choose from the following list of control terms.
Specify whether the status of identification is fully
established, provisional or preliminary.

And the terms are the same in the new toolkit.
Another element for which there's an equivalent in the
toolkit is the un differentiated name indicator. Categorization
indicating that the core elements recorded are insufficient to
differentiate between two or more persons with the same name.
The new toolkit is defined as follows. A categorization
indicating that the nomen is insufficient to differentiate
between two or more entities.
We already mentioned reference source. That reference...
okay... reference source, the new name of related elements. The
new toolkit defines it as follows the source in which there's
elements. The use for source-consulted reference should read
the author source consulted.
And finally, we have two other elements to consider.
Assigned by agent and assigner agent of. Assigned by agent is
defined as an agent who assigns a nomen to an entity and
assigner agent is a nomen that is assigned to an entity by an
agent.
Now... having looked at elements, I'm going to look at the
means of recording data provenance in the current cataloguing
environment. RDA is, and will continue to be agnostic of
encoded scheme.
However... most current implementations of RDA are in a
MARC21 context. Mappings are available for data-prove
intense-related elements in the new -- in the current toolkit.
So here we see the authority format in the toolkit.
Covering source data and information found and also the 675
field covering source data not found in source citation.
Here are the mappings for second data element catalog note.
688 covering application history note. Again... this relates to
mappings in the authority format.
Explanation of relationship. Covering explanatory text.
Again... in the authority format.
And then scope of usage and dates of usage, both map to
667, nonpublic general note. In the authority format.
Steps to identification relates to character position 33,
level of establishment. Non-differentiated name indicator

relates to ROI parent fission 32, undifferentiated personal
name.
So... all of the mappings mentioned so far, the MARC21
authority format. Data mappings with the level of geographic
format. Since these are not considered in scope.
Now... in terms of the new tool kits, below the majority of
elements now have MARC21 mapping assigned to them. Represent a
category which is not yet being covered. It's already apparent
that MARC coding isn't related to all the relationships
available.
Moreover, as we see, the new RDA allows data provenance to
be reported at the level of the metadata description set and the
metadata description set. The statement level is limited in
MARC21 present. It's available in some cases, but very
infrequently.
So... here, is a MARC21 record for the manifestation of
tempest which we've been describing. I highlighted the various
values which are being recorded.
So... we have subfields covering the modified agency and
author agent representing the British library. The O40 content
standard, ODO. The person covered in 100 failed subfields, the
relationship of the agent to the manifestation being described
by extension, the work, we have the two [indiscernible] covering
the tempest, responsibility relating to title proper and the
$500 A recording metadata work and the agency responsible for
recording that note, UK in the subfield dollar five.
Here's a closer look at how those values relating to data
provenance in RDA encoded in MARC. First example, the values
for author agent and related manifestation of work are assigned
to the record as a whole or metadata description set and the
second example, it's possible to assign an author agent value at
the metadata statement level, that is sufficiently granular,
MARC coding is available.
So... in this case, in the 040, the metadata is applied as
a whole, as is, the related manifestation of work, RDA, but
here, we know that the author agent responsible for recording
this RRP note is the British library.
And here... is a MARC21 record for the person William
Shakespeare who we've been describing. I highlighted the

various values which have been recorded. We have the 040 dollar
D. Library of Congress, the author agent, we have RDA again, as
the related manifestation. We have Shakespeare William in the
$100A. We have the related time span of person, 1564 to 1616
recorded in subfield D. We have the 667 field, covering the
context of use. Machine-derived non-Latin script. Reference
project... and we have the source consulting encyclopedia
Britannica. In the first example, the values for author agent
for manifestation of work were assigned to the record as a
whole. As you can see, besides Library of Congress, a large
number of different authors have contributed to the content of
this record. Some more than once. It could be very difficult
if one wanted to determine which organization made what
contribution? The same applies to the following examples. With
regards to the context of use, if this is recorded in the
description set level, there's no means of determining who
assigned this value.
Remember... it's not only a relationship to the agent that
can be established, the metadata statement level. Although...
the examples provided so far, do not illustration -- it is now
also possible to establish relationships about the publishers of
metadata, the time span of its publication, transcription
standard used for metadata as well as scope and time span of
validity. It's possible to express statement level
relationships between other elements belonging to RDA.
Returning to our diagramatic representation of the
description set, for the templates. It shows an example of a
data problem's relationship, which is not possible to record
using current MARC coding. But William Shakespeare is described
as a person using a controlled term provided by RDA. This can
only occur at the level of the larger metadata description set.
A practical application of being able to record the
component standard at the statement level will be to address the
ambiguity presented by bibliographic data. There's an example
of data provenance relationship which can currently not be
recorded in RDA. But... the value recorded for the related time
span of a person was taken from the Encyclopedia Britannica.
The source consulted cannot be applied to an individual
statement, only of the level of a larger metadata description
set. We'll continue to address the deflations of agent nomen
entities in the context of authority files.

Contrast to the current situation, expression of these
statement level relationships in the data context would be
unproblematic. Here... they model this statement. The box used
to enclose the first statement illustrates this has become the
subject of the second statement. The Tempest, William
Shakespeare has related manifestation of work, RDA.
Again... another example. William Shakespeare has related
time span of person 1564 to 1616 has source consulted
Encyclopedia Britannica.
So... to summarize... a new RDA toolkit provides a more
flexible approach to recording data provenance. It provides a
broader range of data provenance element to choose from. It
also provides greater specificity in relating data provenance to
RDA entities, elements and their sub-elements. And... while
there's limited compatibility in MARC21, it's optimized for use
in our limited data environments.
Thank you very much. And... are there any questions?
>> Hi, Thurstan, there certainly are. There's been a lot
of discussion. First question, I find it difficult to think of
a field in a catalog record as a work, doesn't work infer some
sort of creativity?
>> An individual field within a catalog record requires a
degree of creativity as does the record which contains it. One
of the features of the new RDA is to think of the record
creation process or... the field creation process in terms of
being a proactive one in its own right. As a result of that, as
we've seen in the presentation, you can record various
attributes and relationships to that work. Such as... the
author who created that information and also, creating
relationships such as source-consulted, used in the process of
creating that work.
>> Okay... next question... can you reiterate the
difference between sub-elements and subtypes again?
>> Okay... I didn't go into that in a great deal of detail,
but... if you think of something like a publication statement,
you would generally find that to be composed of a plighted
publication, name and publisher and date of publication. Values
of what we would regard as sub-elements. In terms of subtypes,
you could think of some subtypes of manifestation, the type of
proper is a subtype for that manifestation.

Likewise... the statement of responsibility, relating to
proper subtitles of statements and responsibilities.
>> Okay... will there be a mapping developed between the
frame data model and new RDA standard for bibliographic
description?
>> Oh... that's a good question. In terms of the current
model in RDA, as many people know... there's a mismatch as
regards to the entity structures. So... that is a -- an issue
to consider. But... there are means for looking at
commonalities between them. Those of you who are familiar with
[indiscernible], seeing any of the cloned vocabulary on that for
various sets of control terms which are taken, originally from
RDA.
So... that is something that, that, the issue of mapping
from RDA to bit frame has been in many people's minds for, for
several years now. And... hopefully progress can be made on
that front.
>> All right... if an option gives a certain order, are
communities free to change this order in an application profile
or with a policy statement? As it is only an option.
>> Absolutely, that reflects the flexibility of the new RDA
toolkit. One of the issues that we covered were the list of
sources for things like modern print as opposed to early printed
resources, et cetera, et cetera. And as mentioned in the
current toolkit, you have prescribed list, which you are told to
follow. There are options and alternatives and exceptions,
there are still a core list or set of instructions which are
prescribed and there is no scope for moving away from. Whereas,
in current toolkit -- yes. Much more information is provided on
an optional basis.
>> Are options to be applied in the order of appearance?
>> Well... that's an option. In terms of situations where
you have a condition followed by more than one option, in the
context of data provenance, your first option is often to record
a metadata work, basically that, the decision you're making
there is to recreate a catalog record or, to record an
individual field content and then a second option is to, then,
record the value that you record in that record or in that, in
that field. According to certain criteria. Whether that be a
list of, a list of sources or using a particular RDA element.

So... yes, in terms of that order, that does follow the
thought process which you would go down.
>> Will it be routine to always include the agent and
cataloguing metaphor field and the source for which the
information was taken, even when with a fully vetted source of
information, such as the title page.
>> These are things that are optional. So... the, the
granularity and degree to which data provenance is recorded on a
record by record or... field by field, subfield by subfield
entity is... like local application. There is no expectation
that one would record that information as a, a mandatory, on a
mandatory basis like all the other elements and in terms of
local applications, I would suggest that the thought process
for, that uses the toolkit applied -- what is the use of this,
at the local level. As I mentioned, coming towards the end of
my presentation, the ability to record data prove intense data
at such a granular level now is beneficial in terms of being
able to contextualize information in terms of who has recorded
what -- in terms of many other elements as well. In terms of
the source-consulted -- that may be of use to a particular
community. RDA is facilitating that.
>> Okay... this is the -- at least as of right now, the
last question on our list. Why did you use the element related
manifestation of work for the standard use for the metadata
work? RDA, isn't that a work to work relationship.
>> In terms of the example used... following the data
provenance counter guidance, RDA is manifested within the RDA
toolkit, so... your source can be described in that way.
Equally, the value that is recorded there, can be recorded in a
number of ways. Whether that be by description structured,
identified or ROI. In the case of the value I recorded, it was
using a capillary encoding scheme and therefore is an example of
manifestation.
>> All right... well... that is the end of our question
list. I'm hoping we didn't miss anything. Oh... here's one
more that just came in. Can you ask about examples for
undifferentiated name indicators.
>> In terms of examples, I mentioned the mark coding for
that particular element, the, the 08 count position. 32... as
regards, providing examples within the toolkit, itself, be

aware, those are currently not complete. And that is a, an
ongoing task and then... hopefully those who want to provide it
for differentiated name indicator in the fullness of time.
>> She said I meant for other -- things other than person.
>> In terms of the scope of undifferentiated name
indicator... so... yes... the new toolkit decides that,
categorization indicating that nomen is insufficient relating to
two or more entities.
This is now scoped within the context of the nomen entity.
So... yes... it could be for both a person and for -- and the
other agent entities, for the [indiscernible] entities as well.
In terms of MARC coding... the categorization nomen,
categorization for differentiated naming detector. Some degree,
currently... related to the definition in the parent toolkit.
Which is categorization with recorded identification between two
persons with the same name.
So... that may have implications in terms of MARC mappings.
>> How does all this flexibility of entering data fit in
with the ICP principle, 2.8 given with consistency and
standardization?
>> Well... you could argue that it deals with consistency
and standardization from the perspective of making so much of
the guidance optional. You could say, if you make everything
optional, doesn't that lead, unavoidably to inconsistency? I
think that goes to achieving operability. You have different
recording methods that are available on an element by element
basis. And depending on your implementation, that will drive
the choice you have to make in terms of recording methods.
The presentation that I've given is focused chiefly on
providing examples that are recorded in an unstructured way or
structured way because... that is what MARC21 supports. You
will be, many of you, aware, increasingly over recent years, has
been a drive towards supporting the encoding of URIs in the
marked formats, using subfields dollar zero and dollar one and
to some extent, the use of those fields to record URIs. URIs
are providing information about data provenances themselves.
The point is that apart from the recording methods, there's
also, using the distinction that can be now made between
metadata description set and metadata description statement, the

ability to create RDA data at both an umbrella and... a,
a -- the most-detailed level available.
Both are acceptable and... therefore, for the means of
encoding by which an individual agency records metadata.
So... consistency is achieved in that respect, in terms of
providing effective guidance that is more hospitable to a wider
range of applications.
>> Okay... will there be a test phase for creating catalog
records with the new toolkit?
>> Okay... as you'll be aware, the new toolkit is currently
in its beginning phase. It will remain in that phase until the
RDA board has decided that the clock can start running on
decommissioning the current toolkit. In terms of the process by
which we reach that status, one of the key issues to address is
the provision of policy statements. And other user-contributed
content, which we'll be able to mediate between the toolkit
and... application of it by catalog. And... in terms of the
test phase, I would anticipate that that will occur during the
period when the, the clock on the new toolkit starts running.
Having said that, to some extent, testing is already going
on on the new toolkit content. The process of looking at
providing new policy statements, the provision of application
profiles, the development of using documentation, lots
of -- what we refer to in the current toolkit as work flow
documentation is raising issues. That's, that we can, we are
looking at in terms of what impact is going to have on our
current cataloguing practices. What additional guidance we need
to put into policy statements and... what would be -- as I
mentioned in the presentation, what would be desirable changes
to the MARC format, itself, in terms of facilitating some of the
new scope of RDA. As I said in terms of nomen, that being with
regard to feasibility, but also, with desirability.
Take something like the manifestation of that whole
statement, for example. Being able to record a whole title
page, all of its content in one go, without having to separate
that out into individual elements would be beneficial from the
perspective of work flows that use a light pen to scan in the
information, as opposed to ones where data is keyed in, by a
cataloger or other administrative staff. An immediate benefit
to supporting that kind of coding that, within the MARC format.

>> All right... well, I think we are ready to wrap up,
so... Thurstan, do you have any closing thoughts for us before
we wrap the event up?
>> I just want to thank everybody for attending today. I
hope that you have found it useful and informative in terms of
the content that I've covered. And... I look forward, myself,
to watching and listening to the rest of the special topics and
the new concepts series. Thank you very much.
>> Thank you, Thurstan, we really appreciate your wonderful
presentation and thank you to our audience for all the fantastic
and active discussion and... we will see you back, hopefully,
most of you, we'll see you back next week for the next event. I
hope everyone has a wonderful day. Thanks very much.
>> Bye.
[Call concluded at 1:27 p.m. ET].
"This text is being provided in a rough draft format.
Communication Access Realtime Translation (CART) is provided in
order to facilitate communication accessibility and may not be a
totally verbatim record of the proceedings."

Special Topics: Data Provenance (Transcript)

Recommended

Recommended

More Related Content

What's hot

What's hot (8)

Similar to Special Topics: Data Provenance (Transcript)

Similar to Special Topics: Data Provenance (Transcript) (20)

More from ALAeLearningSolutions

More from ALAeLearningSolutions (20)

Recently uploaded

Recently uploaded (20)

Special Topics: Data Provenance (Transcript)