This document discusses strategies for archives from 2015-2020. It proposes that archives archive all own productions in high definition if economically sustainable, and increase harmonization, standardization, and centralization. The SRG Task Force will analyze what SRG content production will look like in 2020 and what role archives will have, develop archive policies considering production evolution and technical/economic factors, and determine how to realize policies through structures, tools, resources, and a roadmap. The document also discusses RSI's use of semantic analysis on audio/images to automatically categorize, catalog, and extract metadata for easier retrieval of material. It concludes that archives should ask industry to provide integrated standard solutions and accept common workflows.
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
FIAT/IFTA MMC Seminar May 2015. Giant Steps: The 2020 Broadcaster Archive. Theo Mausli SRG SSR
1. Theo Mäusli (SGR SSR) with a contribution of Sarah-Haye Aziz, Lorenzo
Vassallo, Francesco Veri (RSI)
FIAT/IFTA Media Management Seminar 2015
Glasgow - May 21st – 22nd
2. ‹#›
Agenda
0. Introduction, context
1. Selection
2. Dataflow
3. Traceability
4. Data mining
5. Artificial intelligence
6. The archivist is a coach
Conclusion: please, industry
3. ‹#›
SRG Task Force about the realization of archive
strategies 2015-2020
• Archiving its entire own production in high definition, if economically sustainable
• Harmonization, standardization and centralization
• Greater use, also for externals
• …
4. ‹#›
analysis in 4 steps:
1. What will be the SRG content production in 2020?
What will be the rule of the archives in such
context?
2. What will be the archive‘s policies, considering
the evolution in SRG production and the macro
evolution of the technical/economical factors?
3. How can these policies be realized (structures,
instruments, technical and human resources)
4. Scenarios, roadmap and cost evaluations
12. ‹#›
Conclusion 1: this is not new
Many archives are already behaving as a 2020
archive, using dataflows, data mining and new
skills (very dinamical experiences by RTS and
SRF)
The case of RSI (upadates of the last seminar
contribution by Sarah, Lorenzo and Francesco)
13. ‹#›
Updates on SIA at RSI
Sarah-Haye Aziz, Lorenzo Vassallo, Francesco Veri
May 2015
15. ‹#›
…
Differences between Radio and TV
Background Music/Noise does not help the transcription.
Based only on silences and
without key frames, the system
creates too many sequences.
Key frames help to locate a
change of context.
Speech rhythm and pauses are different between and .
16. ‹#›
Solution – Capitalize 24h Radio Logging
24h Radio Logging
0 24
Semantic Engine
Semantic Analysis:
Categorisation
Catalogue
Automatic
Cut
+ Editorial Text
Editorial Text Themes
Geo Ref
Credits+
17. ‹#›
User download from 24h logging
• Audio material retrievable using metadata automatically acquired from the channel schedule
• Download from the 24 h Radio logging
18. ‹#›
Capitalize editorial texts
• Automatic Semantic Analysis of the editorial text
• Automatic uploading of the editorial texts on CMM
• Terms not related to our thesaurus become keywords
19. ‹#›
Further implementation
• This solution can be implemented on webpages
• Automatic semantic analysis of web content
Automatic categorisation of the webpages content
20. 20
Hello everybody!
Contact us for
more information!Well done
Theo!
francesco.veri@rsi.ch sarah-haye.aziz@rsi.chlorenzo.vassallo@rsi.ch
21. ‹#›
Conclusion: please, industry
• The features are not new, but they are not integrated into the archive
systems yet
• It is not the job of the archivists to integrate them
• The archivist comunity should ask the industry to provide standard
solutions
• The archive comunity has to accept comun workflows and standards
(learned yesterday!)
22. ‹#›
Conclusion: …. and collaboration
SRG SSR looks forward to collaborate and
benchmark with FIAT/IFTA community
thank you for your attention
Theo Mäusli, Lugano, Switzerland
Theo.maeusli@srgssr.ch
+41 91 803 51 28
Editor's Notes
Giant steps: the 2020 broadcasters archive
In the process of an efficient managing of metadata the way to 2020 will be done by giant steps, considering concrete experience inside of broadcasters and archives, considering also existing prototypes and credible research and industry announcements. We believe that relevant changings will mostly happen on 6 levels:
Selection: by always lower cost of memory selection happens no longer on the level of what to keep, but of what to invest in metadata.
Dataflow: Its based (basic?) production workflows permit to converge all production data into metadata.
Traceability: also the use of the archivocuments (and of parts of it) will be a future dynamic metadata.
Data mining: Speech to text, image and face recognition techniques permit to understand and find pertinent content. Big data analyzing (or analysis?) provides furnishes contextual information, if well selected filtered and organized.
Artificial intelligence furnishes provides the (relevant?) pertinent document to the users (requests?) demands – and suggests pertinent (requests?) demands.
The new archivist is a coach: she/he will teach and train the archive systems to keep and to emphasize the interesting information and to provide furnish the right material.
We expect aspect from the industry good integrated products so that these giant steps will be sure (definitive?) steps. FIAT/IFTA can motivate the industry to do so and coordinate the archivists' requests and ideas.
Giant steps: the 2020 broadcasters archive
In the process of an efficient managing of metadata the way to 2020 will be done by giant steps, considering concrete experience inside of broadcasters and archives, considering also existing prototypes and credible research and industry announcements. We believe that relevant changings will happen mostly on 6 levels:
Selection: by always lower cost of memory selection happens no longer on the level of what to keep, but of what to invest in metadata.
Dataflow: It-based production workflows permit to converge all production data into metadata.
Traceability: also the use of the archives documents (and of parts of it) will be a future dynamic metadata.
Data mining: Speech to text, image and face recognition techniques permit to understand and find pertinent content. Big data analyzing furnishes contextual information, if well filtered and organized.
Artificial intelligence furnishes the pertinent documents to the users demands – and suggests pertinent demands.
The new archivist is a coach: she/he will teach and train the archive systems to keep and to emphasize the interesting information and to furnish the right material.
We aspect from the industry good integrated products so that these giant steps will be sure steps. FIAT/IFTA can motivate the industry to do so and coordinate the archivists requests and ideas.
Welcome ….
before I start, I would like to emphasis that this presentation is a brief update of the Automatic Indexing System used in the RSI Radio archives. Two years ago during the FIAT/IFTA conference in Amsterdam we presented on the implementation of the Automatic Indexing System in TV and its impact on the daily work of archivists.
At that time our Automatic indexing system received good feedback by several institutions which contacted us for further information. In particular they were interested in the synergy between IT staff and archivists during the tuning phase of the Automatic Indexing system – in which archivists plays a central role in developing certain improvement such as using different colors in order to distinguish between human and automatic indexing -, and the transformation of the work of the archivist that now has the choice between different paths of documentation: (or that) a human documentation, an automatic documentation and an automatic and human documentation.
For those who were not in Amsterdam two years ago I would like to reiterate that in 2011 the Radiotelevisione della Svizzera Italiana (RSI) (Swiss Italian Broadcast), introduced an Automatic Indexing system – consisting of the automatic transcription of audio sources (Speech to Text) and also the automatic semantic analysis - in its archives Multimedia Catalogue, which is called CMM.
Today we would like to present on how the automatic indexing system has developed in the context of the radio.
From now on, when I refer to CMM I am referring to RSI’s archiving database and when I talk about the Automatic Indexing System I will use the Italian acronym SIA (Sistema d’Indicizzazione Automatica).
Silde 2: SIA’s sources in TV and Radio
In 2011-2012 we implemented the SIA in both radio and TV. The automatic transcription’s sources in the Radio was based on the audio and in the TV it was based on audio and images.
Slide 3: Differences between Radio and TV
We noticed that the Automatic Transcriptions of Radio and TV were radically different due to the fact that radio:
has background noise or music which interferes with the Speech to text and creates inaccurate transcriptions.
Secondly, Radio and TV are different typologies of media with different narrative language, due to a diverse significance of speech pauses during programs Therefore the sequences engendered by the SIA from radio were based on silences resulting in too many sequences.
By contrast, the SIA in TV works well because of the presence of images. The key frames in video help to locate a change of context and thus creates logical sequences in TV documents
For example in a reportage with different interviewees the change of key frame from a person to another helps the SIA to create separate sequences for each person, therefore the system automatically put the full interview for each person in sequence.
In other words we considered too hazardous to use one common technical solution for TV and Radio.
Slide 4: Solution capitalize 24 Radio Logging
In order to deal with the radio’s pitfall we now implement the SIA on pre-existent editorial texts and add these texts as attachments to our multimedia catalogue.
Slide 5: User download from 24 h logging
From the user’s point of view we have directly implemented the metadata automation within the broadcast programming of the day (24 Radio logging), which includes the editorial text written by the journalists.
The system automatically retrieves the material using metadata acquired from the channel schedule (such as titles and the playout’s dates)
Slide6: Capitilze editorial texts
From the broadcast programming the system automatically picks up the editorial texts and operates a semantic analysis of the text – which consists of the extrapolation of credits, geographical terms and terms that are linked to our thesaurus -which is automatically uploaded and attached to the respective audio file on our database.
Other terms, that are not linked to our thesaurus, are also extrapolated and they will appear as keywords. Basically the Automatic Indexing System also automatically tags the text.
Slide 8: further implementation of the Automatic Semantic Analysis
Automatic Semantic analysis has several implementation options.
For example its use on web pages
In this case, the system operates an automatic semantic analysis of the content of the web pages which results in the automatic tagging of the web page. This will facilitate the process of categorization of the content of the web pages though keywords