e u r o p e a n a s o u n d s . e u
Metadata Ingestion Training
23-24 October 2014
NTUA, Athens
Metadata Ingestion Plan
Targets
Reporting progress
Andra Patterson
Metadata Manager, Europeana Sounds
e u r o p e a n a s o u n d s . e u
Metadata Ingestion Plan
Takes into account:
• 4 main stages of aggregation
• Needs of data providers for scheduling
• Info from Rights and metadata ingestion survey
• Info from emails, phone calls, etc.
• Targets from DoW
Flexible - may need to take into account:
• Changing needs of data providers during project
• Needs of Europeana Ingestion Team
e u r o p e a n a s o u n d s . e u
Aggregation – 4 main stages
Content
selection
Metadata
preparation
Metadata
ingestion
Metadata
curation
e u r o p e a n a s o u n d s . e u
Aggregation – Stage 1
Content
selection
Select the objects for which you will provide metadata to
Europeana Sounds
• According to selection guidelines in D1.1 Content Selection Policy
• According to figures in Table 0, DoW (part B, p.22-27)
Establish the correct rights statements for the objects
• Use Europeana Available Rights Statements
e u r o p e a n a s o u n d s . e u
Aggregation – Stage 2
Metadata
preparation
Prepare your metadata and export in .xml or .csv
• Check that mandatory elements are included or can be added
• Check that source metadata is well-formed
• Ensure that digital objects are accessible via links in metadata
• Ensure that objects that can be made available for re-use fit
criteria in Europeana Content Re-use Framework
• File quality; Rights
e u r o p e a n a s o u n d s . e u
Aggregation – Stage 3
Metadata
ingestion
Ingest your metadata records using MINT tool
• MINT
• Web-based tool
• Developed by NTUA
• Used to map, ingest and deliver metadata to Europeana
• Map metadata to schema defined in D1.4 EDM Profile for Sound
e u r o p e a n a s o u n d s . e u
Aggregation – Stage 4
Metadata
curation
Enrich your metadata records using MINT tool
• Normalise metadata
• Enrich metadata
• Add controlled vocabulary terms
e u r o p e a n a s o u n d s . e u
Targets
Table 0 Underlying Content (Part B, p.22-27) =
what we are contracted to achieve
e u r o p e a n a s o u n d s . e u
Targets
Progress measured against Performance
Monitoring Table (Part B, p.91)
“Available for re-use” Europeana definition:
PDM, CC0, CC-BY, CC-BY-SA
e u r o p e a n a s o u n d s . e u
Targets
Targets for each “metadata set”
Set 1: October 2014-January 2015 (Milestone 5)
Set 2: February 2015-January 2016 (no formal Milestone)
Set 3: February 2016-July 2016 (Milestone 6)
Milestones say: “Content and metadata ready for ingestion”
e u r o p e a n a s o u n d s . e u
Targets
0
100000
200000
300000
400000
500000
600000
700000
800000
Re-use subset
Audio-related
Audio
Chart showing required (minimum) metadata ingestion progress
e u r o p e a n a s o u n d s . e u
Reporting progress – what to count
• DoW requires us to count digital objects
– Digital objects must be counted the same way as in the DoW
• Audio objects
• Audio-related objects
• Objects “Freely available for re-use”
– These are a subset of the total, not additional items
• Also count metadata records
– Useful to compare what you have prepared for publication
with what is actually published on Europeana
e u r o p e a n a s o u n d s . e u
Each line
is a
metadata
record
Counting BL digitised sound
One metadata record usually represents one digital object
e u r o p e a n a s o u n d s . e u
No duplicates, please!
Keep track internally of what you have supplied
to Europeana already for this project and for
other Europeana projects – no duplicates!
e u r o p e a n a s o u n d s . e u
Each line
is a
metadata
record
Number of digital objects
counted for DoW Table 0
Counting BL digitised printed scores
One metadata record often represents many digital objects
e u r o p e a n a s o u n d s . e u
Reporting progress – how to record
• Record statistics in your Google or Excel spreadsheet
– See Europeana Sounds Manual for Data Providers section
3.3.3 for links to Google spreadsheets (will be active next
week!)
• Update your spreadsheet by 3rd Friday of each
month
• Targets
– are based on Table 0, Metadata Ingestion Survey, emails
– are distributed across the 3 metadata sets
– are the minimum required - feel free to do more!
e u r o p e a n a s o u n d s . e u
Sample Google spreadsheet showing targets for BL – edit the orange cells!
e u r o p e a n a s o u n d s . e u
Thank you for listening!
e u r o p e a n a s o u n d s . e u
Metadata Ingestion Training
23-24 October 2014
NTUA, Athens
Metadata Quality
Meaningful metadata
Rights
Controlled vocabularies
Andra Patterson
Metadata Manager, Europeana Sounds
e u r o p e a n a s o u n d s . e u
Metadata Quality
• The richer the metadata, the better for discovery by
users
• Europeana Sounds provides an opportunity for us to
enhance our metadata and check quality
• EDM mandatory elements ensure a minimum metadata
standard
• Metadata Quality Task Force (end 2013-mid 2014)
– Quality of metadata varies between institutions
– Need meaningful information in fields
e u r o p e a n a s o u n d s . e u
Metadata Quality – Main Issues
• To aid discovery, metadata needs to provide context to
the CHO
– Include a meaningful title and/or description
• Metadata needs to be understandable to
– Humans (e.g. rich descriptions, rights information)
– Machines (e.g. UTF-8 coding, xml-lang)
• Metadata needs to be standardised
– EDM-compliant
– Controlled vocabularies (edm:type, ebucore:hasGenre)
e u r o p e a n a s o u n d s . e u
Rights
• Establish the rights of your web resources
– May need to discuss with colleagues
– Use information & resources from WP3
• Important to use the most appropriate rights
statement for your web resources
– Tells users what they can or can’t do with an object
– Web resources of Public Domain CHOs should be labelled
as Public Domain – discuss any issues about this with
Andra Patterson or Lisette Kalshoven
Right!Getting
e u r o p e a n a s o u n d s . e u
Rights – Public Domain Works
• Europeana Public Domain Charter
– “Digitisation of Public Domain content does not create new rights over it”
• Europeana Sounds Consortium Agreement
– “… where possible … content which is in the Public Domain … will be made
available without any access restriction and will be labelled as being in the Public
Domain …”
• Some data providers may encounter issues with this, e.g.
– Commercial re-use considered inappropriate
• Academic, artistic, private OK; some commercial re-use considered inappropriate;
sponsorship funds provided according to this (ONB)
– Desire to refinance digitisation activities
• Government funding is basic – charging fees for high quality images contributes to
refinancing digitisation (ONB)
• However, non-profit institutions run risk of losing non-profit status by earning too
much from commercial users! (ONB)
– Legal
• Case law in UK is inconclusive so far (BL)
e u r o p e a n a s o u n d s . e u
e u r o p e a n a s o u n d s . e u
Rights - EDM
edm:ProvidedCHO dc:rights
– Name of rights holder of CHO, or more general rights information
edm:WebResource dc:rights
– Name of rights holder of a particular web resource, or more general rights information
edm:WebResource edm:rights (Strongly recommended)
– Formal rights statement for a particular web resource
– Overrides statement in ore:Aggregation edm:rights (see below)
– Choose from http://pro.europeana.eu/available-rights-statements
ore:Aggregation edm:rights (Mandatory)
– Formal rights statement for a particular web resource without edm:rights (see above)
– Formal rights statement for a group of web resources without their own edm:rights,
when these are attached to one CHO
– Choose with care from http://pro.europeana.eu/available-rights-statements
e u r o p e a n a s o u n d s . e u
What is this?
Danish pastry
Wieneråtta
Wienerbrød
Kopenhagener Plunder
Dänischer Plunder
Danish
e u r o p e a n a s o u n d s . e u
Vocabularies
• Enable users to search and navigate across different
metadata sets
• Important in Europeana Portal, where different data
providers use different vocabularies
• Bring together using linked data where possible
– LC Linked Data Service
– VIAF (Virtual International Authority File)
Controlled
e u r o p e a n a s o u n d s . e u
Controlled Vocabularies – Linked Data
VIAF Virtual International Authority File
e u r o p e a n a s o u n d s . e u
Controlled Vocabularies
• EDM vocabularies
– edm:rights
• http://pro.europeana.eu/available-rights-statements
– edm:type
• TEXT, VIDEO, SOUND, IMAGE, 3D
• Europeana Sounds new vocabularies
– dcterms:medium
• Europeana Carrier Types Vocabulary
– ebucore:hasGenre
• Europeana Music Genre/Form Vocabulary
• Europeana Non-Music Genre/Form Vocabulary
Shared,
e u r o p e a n a s o u n d s . e u
Europeana Vocabularies – Carrier Types
Europeana Carrier Types
Vocabulary
DISMARC
dmFormats
RDA Carrier
Types
dcterms:medium
e u r o p e a n a s o u n d s . e u
New Europeana Vocabularies – Genre/Form
Europeana Music Genre/Form
Vocabulary
Europeana Non-Music
(Generic) Genre/Form
Vocabulary
ebucore:hasGenre
DISMARC
dmGenre
DBpedia
D1.1 Content
Selection
Policy broad
categories
Freebase
e u r o p e a n a s o u n d s . e u
Broad Genre/Form Concepts (Mandatory)
Europeana Music Genre/Form
Vocabulary
Europeana Non-Music
(Generic) Genre/Form
Vocabulary
Broad Genre
(Mandatory)
• Music
• Spoken word
• Radio
• Environment
ebucore:hasGenre
e u r o p e a n a s o u n d s . e u
• Europeana Sounds Manual for Data Providers section 4.5
has links to recommended vocabularies
• Genre/Form
• Subjects
• Places
• Carrier types
• Digital formats
• Medium of performance
• Names
• Roles
• Works
More About Controlled Vocabularies
e u r o p e a n a s o u n d s . e u
Thank you for listening!
Image: Friends of Music
Society, Greece CC-BY-NC

Metadata ingestion plan presentation

  • 1.
    e u ro p e a n a s o u n d s . e u Metadata Ingestion Training 23-24 October 2014 NTUA, Athens Metadata Ingestion Plan Targets Reporting progress Andra Patterson Metadata Manager, Europeana Sounds
  • 2.
    e u ro p e a n a s o u n d s . e u Metadata Ingestion Plan Takes into account: • 4 main stages of aggregation • Needs of data providers for scheduling • Info from Rights and metadata ingestion survey • Info from emails, phone calls, etc. • Targets from DoW Flexible - may need to take into account: • Changing needs of data providers during project • Needs of Europeana Ingestion Team
  • 3.
    e u ro p e a n a s o u n d s . e u Aggregation – 4 main stages Content selection Metadata preparation Metadata ingestion Metadata curation
  • 4.
    e u ro p e a n a s o u n d s . e u Aggregation – Stage 1 Content selection Select the objects for which you will provide metadata to Europeana Sounds • According to selection guidelines in D1.1 Content Selection Policy • According to figures in Table 0, DoW (part B, p.22-27) Establish the correct rights statements for the objects • Use Europeana Available Rights Statements
  • 5.
    e u ro p e a n a s o u n d s . e u Aggregation – Stage 2 Metadata preparation Prepare your metadata and export in .xml or .csv • Check that mandatory elements are included or can be added • Check that source metadata is well-formed • Ensure that digital objects are accessible via links in metadata • Ensure that objects that can be made available for re-use fit criteria in Europeana Content Re-use Framework • File quality; Rights
  • 6.
    e u ro p e a n a s o u n d s . e u Aggregation – Stage 3 Metadata ingestion Ingest your metadata records using MINT tool • MINT • Web-based tool • Developed by NTUA • Used to map, ingest and deliver metadata to Europeana • Map metadata to schema defined in D1.4 EDM Profile for Sound
  • 7.
    e u ro p e a n a s o u n d s . e u Aggregation – Stage 4 Metadata curation Enrich your metadata records using MINT tool • Normalise metadata • Enrich metadata • Add controlled vocabulary terms
  • 8.
    e u ro p e a n a s o u n d s . e u Targets Table 0 Underlying Content (Part B, p.22-27) = what we are contracted to achieve
  • 9.
    e u ro p e a n a s o u n d s . e u Targets Progress measured against Performance Monitoring Table (Part B, p.91) “Available for re-use” Europeana definition: PDM, CC0, CC-BY, CC-BY-SA
  • 10.
    e u ro p e a n a s o u n d s . e u Targets Targets for each “metadata set” Set 1: October 2014-January 2015 (Milestone 5) Set 2: February 2015-January 2016 (no formal Milestone) Set 3: February 2016-July 2016 (Milestone 6) Milestones say: “Content and metadata ready for ingestion”
  • 11.
    e u ro p e a n a s o u n d s . e u Targets 0 100000 200000 300000 400000 500000 600000 700000 800000 Re-use subset Audio-related Audio Chart showing required (minimum) metadata ingestion progress
  • 12.
    e u ro p e a n a s o u n d s . e u Reporting progress – what to count • DoW requires us to count digital objects – Digital objects must be counted the same way as in the DoW • Audio objects • Audio-related objects • Objects “Freely available for re-use” – These are a subset of the total, not additional items • Also count metadata records – Useful to compare what you have prepared for publication with what is actually published on Europeana
  • 13.
    e u ro p e a n a s o u n d s . e u Each line is a metadata record Counting BL digitised sound One metadata record usually represents one digital object
  • 14.
    e u ro p e a n a s o u n d s . e u No duplicates, please! Keep track internally of what you have supplied to Europeana already for this project and for other Europeana projects – no duplicates!
  • 15.
    e u ro p e a n a s o u n d s . e u Each line is a metadata record Number of digital objects counted for DoW Table 0 Counting BL digitised printed scores One metadata record often represents many digital objects
  • 16.
    e u ro p e a n a s o u n d s . e u Reporting progress – how to record • Record statistics in your Google or Excel spreadsheet – See Europeana Sounds Manual for Data Providers section 3.3.3 for links to Google spreadsheets (will be active next week!) • Update your spreadsheet by 3rd Friday of each month • Targets – are based on Table 0, Metadata Ingestion Survey, emails – are distributed across the 3 metadata sets – are the minimum required - feel free to do more!
  • 17.
    e u ro p e a n a s o u n d s . e u Sample Google spreadsheet showing targets for BL – edit the orange cells!
  • 18.
    e u ro p e a n a s o u n d s . e u Thank you for listening!
  • 19.
    e u ro p e a n a s o u n d s . e u Metadata Ingestion Training 23-24 October 2014 NTUA, Athens Metadata Quality Meaningful metadata Rights Controlled vocabularies Andra Patterson Metadata Manager, Europeana Sounds
  • 20.
    e u ro p e a n a s o u n d s . e u Metadata Quality • The richer the metadata, the better for discovery by users • Europeana Sounds provides an opportunity for us to enhance our metadata and check quality • EDM mandatory elements ensure a minimum metadata standard • Metadata Quality Task Force (end 2013-mid 2014) – Quality of metadata varies between institutions – Need meaningful information in fields
  • 21.
    e u ro p e a n a s o u n d s . e u Metadata Quality – Main Issues • To aid discovery, metadata needs to provide context to the CHO – Include a meaningful title and/or description • Metadata needs to be understandable to – Humans (e.g. rich descriptions, rights information) – Machines (e.g. UTF-8 coding, xml-lang) • Metadata needs to be standardised – EDM-compliant – Controlled vocabularies (edm:type, ebucore:hasGenre)
  • 22.
    e u ro p e a n a s o u n d s . e u Rights • Establish the rights of your web resources – May need to discuss with colleagues – Use information & resources from WP3 • Important to use the most appropriate rights statement for your web resources – Tells users what they can or can’t do with an object – Web resources of Public Domain CHOs should be labelled as Public Domain – discuss any issues about this with Andra Patterson or Lisette Kalshoven Right!Getting
  • 23.
    e u ro p e a n a s o u n d s . e u Rights – Public Domain Works • Europeana Public Domain Charter – “Digitisation of Public Domain content does not create new rights over it” • Europeana Sounds Consortium Agreement – “… where possible … content which is in the Public Domain … will be made available without any access restriction and will be labelled as being in the Public Domain …” • Some data providers may encounter issues with this, e.g. – Commercial re-use considered inappropriate • Academic, artistic, private OK; some commercial re-use considered inappropriate; sponsorship funds provided according to this (ONB) – Desire to refinance digitisation activities • Government funding is basic – charging fees for high quality images contributes to refinancing digitisation (ONB) • However, non-profit institutions run risk of losing non-profit status by earning too much from commercial users! (ONB) – Legal • Case law in UK is inconclusive so far (BL)
  • 24.
    e u ro p e a n a s o u n d s . e u
  • 25.
    e u ro p e a n a s o u n d s . e u Rights - EDM edm:ProvidedCHO dc:rights – Name of rights holder of CHO, or more general rights information edm:WebResource dc:rights – Name of rights holder of a particular web resource, or more general rights information edm:WebResource edm:rights (Strongly recommended) – Formal rights statement for a particular web resource – Overrides statement in ore:Aggregation edm:rights (see below) – Choose from http://pro.europeana.eu/available-rights-statements ore:Aggregation edm:rights (Mandatory) – Formal rights statement for a particular web resource without edm:rights (see above) – Formal rights statement for a group of web resources without their own edm:rights, when these are attached to one CHO – Choose with care from http://pro.europeana.eu/available-rights-statements
  • 26.
    e u ro p e a n a s o u n d s . e u What is this? Danish pastry Wieneråtta Wienerbrød Kopenhagener Plunder Dänischer Plunder Danish
  • 27.
    e u ro p e a n a s o u n d s . e u Vocabularies • Enable users to search and navigate across different metadata sets • Important in Europeana Portal, where different data providers use different vocabularies • Bring together using linked data where possible – LC Linked Data Service – VIAF (Virtual International Authority File) Controlled
  • 28.
    e u ro p e a n a s o u n d s . e u Controlled Vocabularies – Linked Data VIAF Virtual International Authority File
  • 29.
    e u ro p e a n a s o u n d s . e u Controlled Vocabularies • EDM vocabularies – edm:rights • http://pro.europeana.eu/available-rights-statements – edm:type • TEXT, VIDEO, SOUND, IMAGE, 3D • Europeana Sounds new vocabularies – dcterms:medium • Europeana Carrier Types Vocabulary – ebucore:hasGenre • Europeana Music Genre/Form Vocabulary • Europeana Non-Music Genre/Form Vocabulary Shared,
  • 30.
    e u ro p e a n a s o u n d s . e u Europeana Vocabularies – Carrier Types Europeana Carrier Types Vocabulary DISMARC dmFormats RDA Carrier Types dcterms:medium
  • 31.
    e u ro p e a n a s o u n d s . e u New Europeana Vocabularies – Genre/Form Europeana Music Genre/Form Vocabulary Europeana Non-Music (Generic) Genre/Form Vocabulary ebucore:hasGenre DISMARC dmGenre DBpedia D1.1 Content Selection Policy broad categories Freebase
  • 32.
    e u ro p e a n a s o u n d s . e u Broad Genre/Form Concepts (Mandatory) Europeana Music Genre/Form Vocabulary Europeana Non-Music (Generic) Genre/Form Vocabulary Broad Genre (Mandatory) • Music • Spoken word • Radio • Environment ebucore:hasGenre
  • 33.
    e u ro p e a n a s o u n d s . e u • Europeana Sounds Manual for Data Providers section 4.5 has links to recommended vocabularies • Genre/Form • Subjects • Places • Carrier types • Digital formats • Medium of performance • Names • Roles • Works More About Controlled Vocabularies
  • 34.
    e u ro p e a n a s o u n d s . e u Thank you for listening! Image: Friends of Music Society, Greece CC-BY-NC