2. The French National Audiovisual Institute,
missions and audiovisual collections
Audiovisual archiving at Ina
Globo/IFTA seminar, Brazil, may 2016
3. Ina’s missions
1975 : creation of Ina, french National Audiovisual Institute
Missions
• Preservation of the audiovisual heritage of ORTF (French radio and
television French Broadcasting Authority, 1964-1974)
• Training, R&D
• Production, audiovisual creation
4. 127 million euros budget in 2016
972 employees (2015)
14 700 000 hours of TV and radio documents
1 200 000 preserved photographs
123 television channels and radio stations picked up under legal
deposit
1 232 096 hours of online archives available on inamediapro.com
3 000 professionals trained each year
156,7 millions views on ina.fr
Bry sur Marne
Paris
Lille
Rennes
Strasbourg
Lyon
MarseilleToulouse
Audiovisual archiving at Ina
Globo/IFTA seminar, Brazil, may 2016
5. 1986 - Ina is entitled to sell archives for new audiovisual productions :
Ina’s collections
• Public TV programs (from 1949)
• Public Radio programs (from 1902)
Target audience : directors,
journalists, institutions… needing
footage
Use : commercial
Access :
Inamediapro
https://www.inamediapro.com/
Ina.fr (selection of media free of rights)
http://www.ina.fr
The audiovisual heritage of ORTF (Office de radio et télévision française)
Professional
archives
database
1,2 M hours of radio and TV programs
6. Ina’s collections
1995 : application of a 1992 act relative to the legal deposit of
french TV and radio programs (public and private)
• 7 TV channels in 1995 > 100 channels in 2016
• 6 radio channels in 1995 > 20 channels in 2016 (soon 64)
Target audience : educational,
academic population
Use : patrimonial, scientific
(researches, studies)
Access : Ina THEQUE
http://www.inatheque.fr/index.html
Legal
deposit
database
1 M hours of radio and TV programs/year
7. Ina’s collections
2000 : Ina started archiving private audiovisual resources
Centre Georges
Pompidou
Ariana films
(Afghanistan)
Newspaper
IOC – Olympic games
From :
• producers
• cultural institutions…
• individuals (directors, artists…)
• private companies
Uses : commercial and/or patrimonial
Legal
deposit
database
Professional
archives
database
or
Access :
• Inamediapro
• Ina THEQUE
9. Metadata generated through different activities :
- 1/ Management of the daily inflow (metadata concerning TV and radio
programs from 120 channels) : 2 sources of metadata
external :
• from private content providers (Plurimedia, Mediametrie,
Kantar Media…)
• from broadcasters (France Television, Radio-France….)
internal : segmentation and content description of informative
programs (news, magazines, documentaries…) by Ina’s archivists
Cataloguing and descriptive metadata
Audiovisual archiving at Ina
Globo/IFTA seminar, Brazil, may 2016
10. Cataloguing and descriptive metadata
Data import before
broadcast : data about
programs, contents,
broadcast time forecast
Data import after
broadcast
(real programmation)
Quality control,
alignment,
enrichment and
validation of the
cataloguing datasynchronisation
by cataloguers
Day - 1
Day +1
Link to media file
Content
description,
segmentation and
indexation by
archivists
Automatic
creation of
records in the
Legal deposit
database
Records of public
TV and radio
programs
transferred to the
« professionnal
archives »
database
11. 2/ valorization activities
Structuration of the collections
Multimedia sets of archives about
personnalities or thematic issues
Contextualisation, editorialisation
• Preparation of content to be put online
on the occasion of special events
Professional
archives
database
- selection, annotation, indexation
of integral archives
- creation of clips
- media file segmentation
• Creation of thematic frescoes…
3/ quality control operations
Audiovisual archiving at Ina
Globo/IFTA seminar, Brazil, may 2016
12. Cataloguing
+
Thematic description
Issues, people, places,
historical periods, events…
+
Descriptors
Issues, people, places
Different uses, different content descriptions
Cataloguing
+
Thematic description
Issues, people, places, ,
historical periods, events…
+
Analytic description
Shotlists, sounds, effects
+
Descriptors
Issues, people, places,
footage, sounds
LEGAL DEPOSIT
DATABASE
PROFESSIONAL ARCHIVES
DATABASE
27 364 000* records 8 547 000* records
* TV and radio
Audiovisual archiving at Ina
Globo/IFTA seminar, Brazil, may 2016
13. Applications and resources used for documentary tasks
Audiovisual archiving at Ina
Globo/IFTA seminar, Brazil, may 2016
14. Segmentation and annotation application
Mediascope
http://www.inatheque.fr/consultation/mediascope.html
Audiovisual archiving at Ina
Globo/IFTA seminar, Brazil, may 2016
18. Project of a new information system
Audiovisual archiving at Ina
Globo/IFTA seminar, Brazil, may 2016
19. Professionnal archivesLegal Deposit
Oracle
databases
[documentary and
material data]
Importing of
external metadata
(Plurimédia, Mediametrie,
Kantar, Edison, Lisa, Gilda,
Sierra, TF1/LCI)
Text
archives
(Gestion Document 4D)
Technical applications
(ARPP, CAPTN, etc.)
Consultation databases
(exports)
Customer
management
applications
(Workflow Radio &
TV, Gescom,
InaMédiapro,
Ina.fr)
Legal
applications
(Adaje, Aida)
Technical applications
(Batchnum, Scandir, File Registration,
Infos fichiers HSM, Info fichiers Radio)
Oracle
databases
[documentary and
material data]
Selection
and
transfer
Flowchart showing the current system : main data streams
(non-exhaustive schematic diagram)
Data use
Data creation / updating
20. Customer
management
applications and
databases
(Workflow Radio & TV,
Gescom, InaMédiapro,
Ina.fr)
Technical applications
(Batchnum, Scandir, File
Registration, Infos fichiers HSM, Info
fichiers Radio, Autres fonds, ARPP,
CAPTN, Ossean, etc.)
Data use
Data creation / updating
Importing of
external metadata
(Plurimédia,
Mediametrie, Kantar,
Edison, Lisa, Gilda,
Sierra, TF1/LCI)
Data lake
Access environments
(InaThèque, PCM)
Consultation and
documentary and
material metadata
processing applications
= Notilus
Legal
applications
and databases
(Adaje, Aida)
Main data streams of the target information system
(non-exhaustive schematic diagram)
21. Technical applications
(Batchnum, Scandir, File
Registration, Infos fichiers HSM, Info
fichiers Radio, Autres fonds, ARPP,
CAPTN, Ossean, etc.)
Importing of
external metadata
(Plurimédia,
Mediametrie, Kantar,
Edison, Lisa, Gilda,
Sierra, TF1/LCI)
Data lake
Access environments
(InaThèque, PCM)
Consultation and
documentary and
material metadata
processing applications
= Notilus
Customer
management
applications and
databases
(Workflow Radio & TV,
Gescom, InaMédiapro,
Ina.fr)
Legal
applications
and databases
(Adaje, Aida)
Main data streams of the target information system
(non-exhaustive schematic diagram)
Data use
Data creation / updating
22. Legal
applications
and databases
(Adaje, Aida)
Access environments
(InaThèque, PCM)
Technical applications
(Batchnum, Scandir, File
Registration, Infos fichiers HSM, Info
fichiers Radio, Autres fonds, ARPP,
CAPTN, Ossean, etc.)
Importing of
external metadata
(Plurimédia,
Mediametrie, Kantar,
Edison, Lisa, Gilda,
Sierra, TF1/LCI)
Consultation and
documentary and
material metadata
processing applications
= Notilus
Data lake
Customer
management
applications and
databases
(Workflow Radio & TV,
Gescom, InaMédiapro,
Ina.fr)
Main data streams of the target information system
(non-exhaustive schematic diagram)
Data use
Data creation / updating
23. - To answer the respective needs
of both documentary activities
in the same production and retrieve tool
- To allow a better interoperability with others sources or collections
- To import and describe non broadcast audiovisual resources
- To be able to adapt our documentary practices to new audiovisual
objects
The new model is currently elaborated by the project team and IT
architects.
A new data model
Audiovisual archiving at Ina
Globo/IFTA seminar, Brazil, may 2016
25. A new information system with services developped by the R&D department
- Speech to text with
- name entity extraction
- detection of quotations
- alignment text/time codes
- Connexion to external sources
(artistic works, regular events)
- Footage and sounds analysis (detection of
faces, monuments, works of art, voices…)
http://recherche.ina.fr/eng
- New multimedia player
https://ina-foss.github.io/amalia.js/acmmm2015/
Audiovisual archiving at Ina
Globo/IFTA seminar, Brazil, may 2016
26. In the respect of legal constraints, new uses of
our data could emerge :
- elaboration of a policy of open data (on
data generated at Ina) to develop
secondary uses by media specialists
New information system, new prospects
- linked data : audiovisual works on Victor Hugo linked to Victor
Hugo’s literary works archived at the Bibliothèque nationale de
France (French national Library) ?
Audiovisual archiving at Ina
Globo/IFTA seminar, Brazil, may 2016
27. The changing role of archivists
At Ina, the integration of external metadata caused many changes in
documentary procedures during the last 10 years.
New challenges to come with the big data policy and the development of
semi automatic description :
guarantee the quality of the metadata
ensure consistency of data in relation with uses
use the very good knowledge of the collections :
• to structure even more than before massive amount of data
• to work at developping new enhancement practices
• to accompany the audience in its researches among more
and more data
Audiovisual archiving at Ina
Globo/IFTA seminar, Brazil, may 2016
28. More information about Ina
http://www.institut-national-audiovisuel.fr/en/home
acouteux@ina.fr
Audiovisual archiving at Ina
Globo/IFTA seminar, Brazil, may 2016
Anne Couteux
In troduction : project manager in documentary engineering at Ina.
Sommaire.
I am going to present :
Ina’s missions and collections
Workflows and descriptive metadata
IT applications and ressources used for documentary work
The second part of this presentation will concern
the multi-annual project of a new system of information
the new data model in a few words
new web services
And as a conclusion a few words on future activities of the professional of information at Ina
Creation of Ina in 1975 when ORTF (the French radio and television broadcasting authority, 1964-1974) was wound up
It is a public establishment of an industrial and commercial nature (EPIC)
It was set up to collect and conserve French audiovisual resources, basically for production purposes of national broadcasters
Besides, Ina was also positioned as a training, research and creative production organisation when it was founded – I will not present these activities here.
Creation of Ina in 1975 when ORTF (the French radio and television broadcasting authority, 1964-1974) was wound up
It is a public establishment of an industrial and commercial nature (EPIC)
It was set up to collect and conserve French audiovisual resources, basically for production purposes of national broadcasters
Besides, Ina was also positioned as a training, research and creative production organisation when it was founded – I will not present these activities here.
There are 3 types of audiovisual resources at Ina.
The « historical » one is concerning the programs broadcast on public TV and radio (dating from the beginning of TV and radio broadcast until today).
In 1986, Ina is entitled to sell archives for new audiovisual productions (rights are transferred to Ina 1 year after broadcast (6 months today).
That’s why this collection is commonly called « Professional archives ». They are meant to be sold for audiovisual productions. The collections are described in a specific database.
Audiences
1/ These audiovisual resources are dedicated to any producer, director, journalist… looking for footage for new audiovisual works.
Use : commercial
Access : watching, selection and purchase of extracts are available on the online access interface inamediapro after registration.
OUVRIR INAMEDIAPRO (AFE85000552 ou CPF04007161)
The services (archiving, preserving and marketing) associated with these resources not only require Ina’s archiving skills, but also legal analysis of intellectual property rights, so as to enable marketing and royalty payments (activity managed by the legal department)
2/ The « general » public can also have access to these archive on ina.fr.
OUVRIR INA.FR
A selection of content in the public domain (media free of rights), freely distributed online without breaching intellectual property rights
On this site the free distribution of extracts or full content is governed by an editorial selection policy. Content distributed in this way must be related to a news item (death of a personality, anniversary of a historic event, etc.)
Daily updating according to events happening in cultural, political, sports matters.
In 1992, inspired by the legal deposit related to different forms of printed material since the 16th century, the legislature extended it to french radio and television broadcasts, treated as forms of publication. Ina was chosen to take this new mission in charge.
The criteria is that all programmes produced or co-produced by French national broadcasters are relevant for legal deposit archives.
Law voted in 1992 but became effective only in 1995.
In 1995 : 7 TV and 6 radio channels were archived by Ina.
Initially, collection was based on physical recordings by broadcasters but the system changed in 2001 when Ina began to record direct digital streams from stations and channels.
It ended Ina’s dependency on broadcasters in terms of content supply, but it changed the very principle of collection for deposit : it went from selection at source based on statutory criteria to the comprehensive recording of broadcast content 24 hours a day.
The perimeter of collection was gradually extended and today it takes in 100 television channels and 20 (soon 64) radio stations that are recorded continuously, generating 1 million hours of new content every year.
This spectacular increase influenced the way metadata were going to be implemented as we will see further.
The audience for this audiovisual resources is mainly composed of teachers and students for scientific researches on the mediatisation of specific topics (how the immigration issue is treated on TV series for instance) or on media themselves (the evolution of political debates on TV since the 70’s for instance).
Uses : this second type of collection is based on a patrimonial purpose.
non commercial archives and no copy allowed (1992 acts).
Access : inatheque.fr where you can find the catalogue but no media is available online.
OUVRIR INATHEQUE
The media can be watched or listened at the National Library (Bibliothèque nationale de France where Ina share a research reception adedicated to audiovisual collections.
To widen its panel of audiovisual content Ina started to sign agreements with private partners 15 years ago.
These private collections, about 400, come from cutural institutions (theatres, operas…) but also from private companies like news agency… or individuals (directors, TV producers…).
This activity is constantly developping and, depending on the agreement between Ina and the partner, the aime is either preservation or communication of the archives.
According to the final use, records are stored in the professional database and available on inamediapro or in the legal deposit database and available on inatheque.fr.
All the collections, whatever the audience or the uses are searchable on the basis of cataloguing and descriptive metadata produced by Ina’s cataloguers and documentalists.
The following slides will describe the main lines of the workflows of creation of descriptive metadata
These metadata are generated from several activities. The main one is from the management of the daily inflow for which there are 2 sources :
In the last 10 years, the constant increase of channels collected in the framework of legal deposit activities led us to acquire metadata from external sources, private content providers.
Plurimedia = a « news agency » specialised in the providing of information about TV and radio programs and more largely cinema, cultural events and leisure. Provides TV and radio program editors.
Mediametrie = company specialized in audience measurement and marketing studies of audiovisual and interactive media in France.
Kantar media = society dedicated to the analysis of media content
About 80% of the metadata on the legal deposit collections come from external sources
They are mainly managed by the team of cataloguers
The other source of metadata is internal : it comes froms documentary processing with indexing, sequencing and the production of a summary.
The process is the following :
The day before boradcast we get data from Plurimedia.The day after we get data from mediametrie : as it reflects the real broadcasting we cross the information with the previous one, we synchronise information
Once it is done a record is automatically created in the legal deposit database
After this first stage of processing there isa cataloguing control consisting mainly in the alignment of the imported data with internal references
At the same time, a link is made with the appropriate media file.
+ more detailed documentary processing with indexing and the production of a summary,
The data is available online on Inatheque website
The data concerning public TV and radio programs is then transferred to the professional database.
This database is devoted to professional distribution services, which require specific processing on the data.
The activities of structuration and contextualisation also generate descriptive metadata : through selection and description of full archive and also creation, annotation and indexation of many extracts enhancing remarquable footage
The structuration of the collections consists in elaborating multimedia folders related to personnalities (politicians, artists, athletes…) or thematic issues (environment, health, economy…). The objectives are :
- to facilitate rapid and relevant access to the searched archive in the large amount of data.
- to offer ready to use archive
- to show archives that would not be found by the clients (quotations)
- to help the journalists not to use the same archive all the time…
MONTRER LA FRESQUE Festival de Cannes
Quality control : concerns more specifically collections which are not at the level of description considering Ina’s standard.
Content description, segmentation, indexations, links to appropriate media files
The scientific uses on one side and the sale of footage on the other have incidences on the choices made to describe our collections : researchers, teachers and students mainly need a thematic description whereas the inamediapro clients need the description and indexation of footage.
2 uses, 2 databases, 2 types of description.
Exemple DA/DL
CPC97101061(inamediapro)
Brésil, le carnaval des enfants
Faut pas rêver
Mediascope
Used for the annotation and sequencing of media content in parallel with viewing.
It also enables the capture of thumbnails that are themselves timecoded and indexed for both quantitative and qualitative research
Used by cataloguers, documentalists but also by researchers at Ina Thèque.
Application available online (can be downloaded)
This database is devoted to professional distribution services, which require specific editorial processing– it is facilitated by the first documentary stages.
This processing pattern (shown in figure 1) does not prevent data redundancy between the databases (since each one is connected with its own processing applications and dedicated consultation interfaces) or the risk of divergent changes in an originally shared piece of information since the two databases are not systematically co-synchronised. Even so, this model has the advantage of establishing communication between the databases and throwing light on the shared information they contain, a necessary preparation and even source of awareness prior to the redesign of these applications in a single shared environment.
Thesaurus : common vocabulary to all uses and teams but implemented in each database which means a double management !
4 languages : persons, organizations, places and concepts.
Translated in 3 languages : english, spanish and arabic
Application which concentrate many detailed instructions and rules used for archive description. As the number of archivists and cataloguers is significant (about 180), we needed to give definitions and organize information so that the audiovisual collection would descripted in a coherent way.
How to write titles
How to describe sports collections
How to write the name of a person, of a musical group…
How to distinguish report and documentary ?
Gives very detailed instructions for documentary processing so that the teams (cataloguers and archivists) apply the same rules for the same type of collection.
Working in silos as opposed to working accross processes
Kind of "silo" management approach
As you understood, over the last 20 years, Ina’s collections have been managed by two parallel systems designed to operate independently, each one having its own processing applications and dedicated consultation interfaces.
Each one plays a specific role:
- In one system, it is the distribution of a relatively limited volume of content for commercial and professional purpose (the sale of extracts mainly)
for the other one, it is the management of an ever-growing volume of content for uses that enable research into trends applied to thousands of data sets
(favorisant notamment les études de tendances appliquées à des milliers voire des dizaines de milliers de données)
However, links have gradually been established between these two systems of collection and two types of uses.
But there are still two problems with this processing pattern :
1/ data redundancy between the databases. Some content can fall into both categories – for instance, public television news, covered by both legal deposit and the professional archives (according to the terms of the service agreement signed between Ina and national public broadcasters).
2/ risk of divergent changes on the same record since the two databases are not systematically co-synchronised
Because of this fragmentation in the data management, the documentary processes are complex and not fluid, the metadata collected or created at Ina, especially in the frame of the legal deposit, are under exploited.
It was becoming necessary to redesign of these applications in a single shared environment.
This is why, since 2014, the Institute has been remodeling its documentary IT, in close coordination with the broader construction of a "data lake", which will allow the merging of metadata from all enterprise applications (documentary, legal, commercial).
The adoption of a big data policy will help to develop new ways of using and linking metadata for internal needs (metadata linked to legal or commercial information for instance) first and perhaps later to other collections.
This rationalization of the metadata management should also help building their opening policy, according to legal criteria to be determined (such as their origin).
Alongside, a new application is also elaborated to allow the processus of ingest of third-party source via a single, systematic gateway.
It concerns any descriptive or technical metadata related to any type of carriers or digital files.
This brick will contain mapping and alignment components will guarantee the conformity with Ina’s new standards.
Finally, a new documentary system is conceived :
to streamline (rationaliser) the processing chain around a common tool
to rationalize medatada production = rationnaliser la production de métadonnées (no more duplications)
to guarantee a better documentary quality : a specific brick will deal with quality work on massive data
To answer the respective needs of both documentary activities in the same production and retrieve tool > more flexibility in the description
To allow better interoperability with others sources or collections > in the future for legal reasons mainly
to be able to adapt our description to new audiovisual objects or practice > which could appear in the future (filmed radio, cross media works, television online),
we needed to adopt a new data model. It is inpired from FRBR for there is no work but 3 main entities : instance, event item with different typologies for each of them.
Type of instances : unique program (documentary), episode, subject of the evening news…
Type of item : original film, copy, digital file…
Types of event : production, broadcast, shooting, recording, publication…
Each entity will have its own attributes : annotation (text, descriptors…), activities…
This is the new architecture in its simplest expression.
Instance, event and item will be the skeletton of the data model and according to the media described (TV, radio, Photo, text…), whether it is broadcast or not, according to the level of description… we will be able to manage the continuity of our actual process in a much more flexible way.
For instance, in the legal deposit database there was one record for each braoadcast of a program (and many programs are broadcast several time on different channels and formats) > in the new system there will be one instance and as many broadcast event as broadcast)
In the new system, new applications will be implemented. They have been developped by IT department fo many years and are accessible online. They concern
Speech to text :
to help documentalists in the work of description
to enrich the description of radio collections
Connexion to external sources
to adopt standards from other authorities and facilitate connexion in the future if objects described in a homogeneous way
New player : Amalia.js
New functionalities compared to mediascope and other players : better sequencing and footage timecoding
The challenge is, in the future data lake, the management of massive volumes of data.
A policy of open data would be an opportunity to valorize our contents and to open the way to new explorations of the data. But the legal frame is not yet determined to protect private life of people appearing in the media. Their identity, which is private, mentionned in the record would thus be accessible to anyone. This is not possible today in the actual legal frame.
If our data where open, new collaborations could be imagined with partners like cultural or educational institutions to elaborate
To answer the respective needs of both documentary activities in the same production and retrieve tool > more flexibility
To allow better interoperability with others sources or collections
to be able to adapt our description to new audiovisual objects or practice > which could appear in the future (radio video recorded or any cross media work, television online),
we needed to adopt a new data model. It is inpired from FRBR (Functional Requirements for Bibliographic Records) but not quite the same for there is no work