Audiovisual archives benefit from fostering a ‘culture of innovation’ – as a way to effectively manage ever-changing expectations of user groups, and at the same time make the most of new opportunities offered by technology. In the context of managing digital assets, adapting the idea of “two-speed IT” contributes to building the culture of innovation. The core strategy aims to accommodate two tracks simultaneously – foundational but “slow”, and innovative but flexible and “fast”. In the case of Sound and Vision, an off-the shelf asset management system forms the foundation, next to a more agile layer of tailor made solutions for distinct functionalities, notably open source search and automatic metadata extraction. This is the layer where output of research can be implemented in production workflows. In 2014, Sound and Vision has followed this strategy to successfully deploy speaker labeling, followed by the roll out of technology to extract so-called named entities from subtitle files in 2015. Both help to automate the annotation process. This presentation will highlight the choices behind the two-speed IT and discuss lessons learned over the past year in working with internal stakeholders and external parties such as software development agencies and researchers in academia.
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
DAS 2015 - Implementing R&D output in production systems
1. Johan Oomen
Head of Research
Netherlands Institute for Sound and Vision
Building smart, connected an open archives:
implementing R&D output in production systems
New York - May 8, 2015
digitalassetsymposium.com
@johanoomen
2.
3. My talk today
1. Envisioning the post-analogue archive
2. From R&D output to production systems
two-speed IT
3. Two case studies:
entity extraction
speaker identification
4. Film from 1898 onwards
Television from 1951
Advertising 1920
Cinema journals ‘22–’80
Radio from 1934
Dutch royal family collection
Dutch football league archive
National Music Archive
Objects related to media
Web video
Amateur film
Documentary film
Photographs
Websites
Visual art collections
…and much more.
a million hours
5. “We enable everyone to utilize the
collections to learn, experience and create.”
6.
7. “Images for the Future” digitisation programme (2007-2014)
137.200 hours video MXF SD (HD for Film)
17.510 hours film (DPX and MXF)
123.900 hours audio WAF
1.200.000 photo’s TIFF
http://beeldenvoordetoekomst.nl/publicatie/
8. A
size Q2 2015:
~11 petabyte
Visuals https://vimeo.com/51425368 - Sebastiaan ter Burg CC-BY
+ backup
~11 petabyte
9. A
annual ingest:
8.000 hrs video
54.000 hrs radio
=
~1,5 petabyte
Visuals https://vimeo.com/51425368 - Sebastiaan ter Burg CC-BY
17. Low-level features
Multimedia
content analysis
today sniper fire disrupted the funeral of an
eleven year old ethnic albanian boy he was
killed yesterday while while chopping wood
his family blames serb police before his death
louisiana state police now say six workers
were killed after a natural gas well exploded
and caught fire about forty five miles east of
shreveport four others were injured in
yesterday's blast a police spokesman says the
derek started to melt in the intense heat the
Audio transcripts Concept detectors
Speaker identification Face recognition
Machine analysis
18. Technology transfer - the Accelerator Team
R&D A-team ICT
Daily&produc-on&Demonstrators&
19. Technology transfer - the Accelerator Team
R&D A-team ICT
Products, not
projects
Incremental
development
Demo every two
weeks
Daily&produc-on&Demonstrators&
Multi-annual
research agenda
Collaborative
projects
Day-to-day
maintenance
Contact with 3rd
parties
20. Technology transfer - the Accelerator Team
R&D A-team ICT
Daily&produc-on&Demonstrators&
spin-off SME’s
23. Two-speed IT
1. Solid foundation (MAM system)
2. Agile development
3. Open source software from/through R&D
4. Collaboration with spin-offs (customization,
processing, support)
24. Case 1. Entity extraction
Extracting keywords from the thesaurus from
subtitles
=> import in MAM system
Currently working with two services
x-TAS (University of Amsterdam)
Textrazor
Reseach partners Spin-off SME
25. Case 1. Entity extraction
Victor de Boer, oeland J.F. Ordelman and Josefien Schuurman: ‘Practice-oriented
Evaluation of Unsupervised Labeling of Audiovisual Content in an Archive Production
Environment.’ (to appear)
26. Case 1. Entity extraction
Term Extraction Service
Dashboard
http://gtaa.beeldengeluid.nl/
text: http://www.volkskrant.nl/dossier-britse-verkiezingen/
cameron-groot-brittannie-nog-groter-maken~a4008460/
27. Case 2. Speaker identification
Challenge:
Identify different speakers, label who he/she is.
Link to the correct thesaurus entry
=> import output in MAM system
Reseach partners Spin-off SME
28. Case 2. Speaker identification
Three step process:
…and a lot of tuning!
identify
speech
cluser
speakers
label clusters
31. Summary
Invest in innovation = essential
smart, connected and open
From R&D to production
the accelerator team
two-speed IT
partnerships: researchers and SME’s
learn more:
labs.beeldengeluid.nl
32. Credits
Johan Oomen
Netherlands Institute for Sound and Vision
@johanoomen
& @benglabs
Many thanks to colleagues and collaborators:
- Bouke Huurnink
- Roeland Ordelman
- Marijn Huijbregts
- Josefien Schuurman
- Harm-Jan Triemstra