Drewry universal identifiers throughout production chain, overview and interoperability

MovieLabs Confidential 1
What Good is an Identifier anyway?
Raymond Drewry (MovieLabs/EIDR)
Richard Kroon (EIDR)

What this will cover
• Who we are
• What’s an identifier?
• The EIDR identifier system
• Example EIDR applications
• Questions

MOVIELABS FOCUS AREAS
PROPRIETARY & CONFIDENTIAL 3
NEW TECH
R&D
PRODUCTION
TECHNOLOGY
NEXT GEN
FORMATS
CONTENT
PROTECTION
BUSINESS
INTELLIGENCE
DISTRIBUTION
SUPPLY CHAIN
Drive innovation toward tangible solutions that help Hollywood transition to the next generation of production technologies,
distribution platforms and content experiences

EIDR focus areas
• Identifiers
• for audiovisual content
• For video service networks
• Helping people use those identifiers
4

What’s an
Identifier?
And what
makes a
good one?

What’s an identifier?
• It’s just a name for something
• Raven, ‘Fanny and Alexander’, Mercury
• Many (most?) things have more than one name
• Raven: Corvis Corax, swan of battle
• ‘Fanny and Alexander’: ‘Fanny och Alexander’
• Some names are ambiguous
• Mercury: Roman god, planet, chemical element, a car made by Ford
• The name itself may not have enough information to know what the name is naming
• Tin’s periodic table name is Sn
• Names can be more or less precise, depending on circumstances
• ‘Dog’ vs ‘Poodle’ vs ‘Miniature poodle’
• People have been dealing with this for a long, long time
• Dictionaries and other reference works
• Binomial naming of species
• Periodic table
• Library call numbers

Why are identifiers so important now?
• The world is more connected via computer networks
• Efficiency makes computers faster and more reliable
• Ambiguity generates inefficiency and errors
• Examples in audiovisual
• Are ‘The Philosopher’s Stone’ and ‘Harry Potter 1’ the same?
• Is ‘The Simpsons: Working Mom’ the same as ‘S30E7’ or ‘Werking
Mom’?
• ‘My Neighbor Totoro’ has at least two different English dubs
• Computer-based or computer-mediated communication doesn’t
work well with this

What makes a good identifier?
• The identifier is unique with its context
• In that environment, one name for one thing
• The same identifier can mean something different in different contexts
• In one system, ’42’ is ‘When Harry Met Sally’ and in another it’s ‘Love and Death’
• The identifier has the right granularity
• Name what you care about
• The identifier’s level of detail has to work for the applications that use it
• There are multiple versions of Blade Runner.
• In some systems they all share the same identifier, and in some they’re all different
• The identifier is resolvable
• Get underlying information that tells you what it is, not just its name
• Non-resolvable identifiers can be useful too
• The identifier should link to other identifiers
• Provide a way of working with systems that use something else

Is that enough?
• ‘Unique in context’ solves the one name for one thing problem
• ‘Right granularity’ solves the naming of versions, variants, and
related things
• ‘Resolvable’ solves the problem of finding out what the name
means
• ‘Link to other identifiers’ solves interacting with items that may
use some other naming scheme
• …so at the philosophical level, it works
• What about the practical world?

EIDR

What is EIDR? Technical view
• A unique identifier for audiovisual works and their versions
• With an API for accessing it
• Enough descriptive metadata to distinguish one work from another
• Different episodes of a Series, remakes, director’s cuts, dubbed languages,
etc
• Different requirements at different levels of the hierarchy
• Factual, not interpretive = no genres or plot summaries, e.g.
• Links to other EIDR IDs
• Containing Series, original abstract work, items included in a retail
compilation, etc
• We try pretty hard to make this concrete, not marketing or opinion
• External identifiers
• If they’re resovable, provide the link too
• Optional relationship (SameAs, DerivedFrom, ContainsAllOf, etc)

What is EIDR? Governance and social view
• A central online registry open to everyone
• Free to resolve for any one
• Web UI or by machine
• Free and unlimited registrations for members
• Non-members can register through services provided by members
• Not for Profit, IP Neutral
• Globally, members include
• Studios
• Broadcasters
• Digital service providers
• Archives and academic institutions
• Metadata providers
• Standards bodies
• Infrastructure and service providers

Who is EIDR?
[EIDR uses curated crowdsource metadata from its trusted Participant
organizations.]

EIDR record structure
14

Episodic content
15
Blank page for large graphic
Abstraction EIDR
Collection EIDR ID
Series
Season 2
Episode 1
Edit EIDR
Manifestation EIDR
Episode 1
(Broadcast)
Episode 1 (EST
UHD)
Episode 2
Episode 2
(Broadcast)
Episode 2 (EST
UHD)
Episode n
Episode n
(Broadcast)
Episode n (EST
UHD)
…
…
…
Season 1 Season n…

EIDR Content IDs by Type

EIDR Identifiers and Alternate Identifiers
17

EIDR Identifier structure
• An ISO-standard Digital Object Identifier (DOI) Registry
[ISO 26324:2012 | www.doi.org]
• Content ID: A unique ID for audiovisual works, versions, and encodings
• Video Service ID: A unique ID for content delivery services

Using
EIDR

Basics: Cataloguing and Connectivity
• Registering for an identifier forces you to think about your own
data and models
• Can find opportunities, mistakes, hidden gems
• A shared, linked identifier makes it easier to
• Get data in from other sources
• Send data and content out to ther places
• Collaborate (commercially or otherwise)
• The following example is theatrical, but the paper describes a
broadcast example

This Modern Age
• The Rank Organisation’s answer (1946-1954) to The March of Time
(by Time, Inc.)
• Not the 1931 MGM film (that’s https://doi.org/10.5240/DE76-BA99-3701-6237-
6BCE-I )
• ITV has most of the Rank catalogue; BFI has some of it; CITWF has metadata
for some of it
• All three had partial data
• Combining data from all the sources in the EIDR records gives better overall
information
• All parties can update their records when they want to
• And they can also talk to each other about possible collaborations
• Re-release with supplemental material from BFI, e.g.
• Should also make later researchers’ lives easier
• See https://doi.org/10.5240/E051-49A0-94DB-28CC-9F5F-Z for the results

Using identifiers
• In the previous example, the act of getting the identifiers created
value
• That happens more often than you might think
• More commonly, an identifier’s value comes frm the uses
• EIDR IDs are used in many kinds of applications…
• …which depend on EIDR following the rules of good identifiers

Digital Distribution
• Getting digital assets from a rights holder to a consumer is
complicated
• Using identifiers reduces or removes many problems
• Makes it easier to tie together extras, interactivity, etc
• MDDF is a MovieLabs-led standard way of doing this

MDDF
26

• The official definition (from Tim Berners-Lee) is:
1. Use URIs to name (identify) things.
2. Use HTTP URIs so that these things can be looked up (interpreted,
"dereferenced").
3. Provide useful information about what a name identifies when it's
looked up, using open standards such as RDF, SPARQL, etc.
4. Refer to other things using their HTTP URI-based names when
publishing data on the Web.
• This is forward-looking, and visionary, and all that good stuff
• The current world is different
What does Linked Data mean (officially)?

Linked Data and Ontologies, workably
• But the core of the definition is
• Following /resolving identifiers to get to more stuff
• Returning data in a standard way
• So MovieLabs built an ontology
• Forced us to think about the model and the vocabulary
• Much more data that the basic stuff in EIDR
• Treat provenance as an essential item
• Heavy emphasis on which country or region the data applies to/came from
• Implementations
• Start with EIDR IDs
• Follow the alternate identifiers for more data
• Transform that data into the ontology’s structure

What can we use it for?
• Connecting lots of large data sets
• Some with specialized information
• A place to keep data that is generated in other ways
• Machine learning
• Annotations
• Scene-level connections
• Use the data
• We did an analysis of genre across four different genre sets
• Compared the sets to each other
• Did some machine learning on implicit ve explicit genre
• May analyse regional differences next
• Possible because the information is available in a standard form, so different sources
can be compared, used together, etc
• Other applications
• Business analytics
• Character tracking of ‘public characters (Sherlock Holmes, Dracula, Robin Hood,….)
• Insights into management of large franchises
• See also LUCID project in a couple of slides

And others
• TAXI/OBID
• SMPTE spec for binding identifiers at multiple levels of granularity to a stream
• Initial use for audience measuremen,t lot sof other uses
• LUCID
• Project at UCL for rights determination based on machine learning
• Uses ontology based on EIDR and linked identifiers to find contirbutors
• Copyright Hub
• Uses EIDR and ARDI (DOI-based rights statement)
• Ties rights to content via identifier, video fingerprinting, etc
• Makes rights discoverable on th internet
• Academic citations
• Music cue sheets
• Use in other standards
• IMF, other SMPTE standards, MARC, …

THANK YOU
Questions?

A personal view (or, what I wanted)
• Coverage of a wide range of works from global sources (excluding user-
generated content for now)
• Appropriate granularity of identification (covering the abstract concept
of an underlying work and all its many variations)
• Reliable, free access to the identifiers and their metadata (i.e. the
identifiers are resolvable, and anyone can use them)
• Connection to other data sources (information from multiple sources is
more powerful than information from any single source)
• A knowledgeable, engaged user community to help populate the
database (crowd-sourced with a curated crowd.)
• Ease of integration with and use by other pieces of software - databases
aren’t very useful if no one uses them. A UI is just another application.
• Economic viability – cheap is good, and persistence requires longevity,
which requires money

What is EIDR – spiffy marketing version
EIDR Technology Summary
•Interoperable, standards-based infrastructure
•Built on ISO Digital Object Identifier (DOI)
standard
•Application integration through public APIs and
schemas, freely available SDK for members
•Efficient infrastructure for new and existing
applications
EIDR Purpose
•Make digital distribution competitive
•Help reduce costs
•Improve collaboration and automation across
multiple application domains & platforms
•Enable new businesses and create new
efficiencies
What EIDR is
•Global registry for unique identification of movie
and TV content
•Designed for automated machine-to-machine
communication
•Flexible data hierarchy down to the product &
version level, incl. edits, clips, composites,
encodings, and relationships
What EIDR is Not
•Profit-making
•Rich commercial metadata
•Ownership or rights information
•US-only
35

Getting it right – double-shot movies
• Some movies aren’t dubbed
• Scenes are re-shot at more or less the same time in a different
language sometimes with some of the actors different
• These aren’t edits of the same movie – they meet the definition
of a separate work
• Common in the 1930s
• Still done for some Indian productions (Tamil/Hindi, for example)

Double-shot examples
• https://doi.org/10.5240/BD8D-8F89-8F75-FE28-7010-M
• Murder! 1930 GB Double-shot in English (this version) and German.
• https://doi.org/10.5240/C264-EC88-AFA1-2EC2-9B28-Z
• Mary 1931 GB DE Double-shot in English and German (this version).
• https://doi.org/10.5240/1388-8D7E-42D2-7147-D5DB-L
• S.O.S Iceberg 1933 US DE Double-shot in German and English (this version).
• https://doi.org/10.5240/9162-6940-4DC3-ABF9-A67B-0
• S.O.S. Eisberg 1933 DE US Double-shot in English and German (this version).
• https://doi.org/10.5240/C97B-42DD-BF23-B3FF-4A8B-9
• Raavan 6/18/2010 IN Shot simultaneously with the Tamil-language version, Raavanan.
• https://doi.org/10.5240/F635-4E44-475B-158B-9FF4-Z
• Raavanan 6/18/2010 IN Shot simultaneously with the Hindi-language version, Raavan.
• https://doi.org/10.5240/13D5-090F-CA5A-A590-CE47-5
• Mumbai Express 4/15/2005 IN Double-shot in Hindi (this version) and Tamil.
• https://doi.org/10.5240/2492-A8ED-46AE-7631-40F1-H
• Mumbai Express 4/15/2005 IN Double-shot in Hindi and Tamil (this version).

● CIMM (Coalition for Innovative Media Measurement)
○ TAXI: Trackable Asset Cross-Platform Identification
● SMPTE (Society of Motion Picture & Television Engineers)
○ OBID: Open Binding of IDs (ST 2112-10)
■ EIDR Content IDs for programs
■ Ad-IDs for commercials
○ OBID-TLC: OBID-Time Labels to Content (ST 2112-20)
■ EIDR Video Service IDs for delivery channels
CIMM TAXI / SMPTE OBID

• For the same piece of content...
– Producers can add an EIDR Abstraction ID
– Distributors can add their EIDR Edit ID
– Broadcasters/Retailers can add their EIDR
Manifestation ID plus EIDR Video Service ID
• All four EIDR IDs can be detected at playout
• Detection at device-level or acoustically
• Acoustic detection verified for ATSC 3.0
Layering OBID Watermarks

Applications of TAXI/OBID
• Improve speed, accuracy, and accountability os measurement
• Better second screen integration
• Track assets across media platforms

Drewry universal identifiers throughout production chain, overview and interoperability

Recommended

Recommended

More Related Content

Similar to Drewry universal identifiers throughout production chain, overview and interoperability

Similar to Drewry universal identifiers throughout production chain, overview and interoperability (20)

More from FIAT/IFTA

More from FIAT/IFTA (20)

Recently uploaded

Recently uploaded (20)

Drewry universal identifiers throughout production chain, overview and interoperability