SlideShare a Scribd company logo
1 of 32
Johan Oomen
Head of Research
Netherlands Institute for Sound and Vision
Building smart, connected an open archives:
implementing R&D output in production systems
New York - May 8, 2015
digitalassetsymposium.com
@johanoomen
My talk today
1. Envisioning the post-analogue archive
2. From R&D output to production systems
two-speed IT
3. Two case studies:
entity extraction
speaker identification
Film from 1898 onwards
Television from 1951
Advertising 1920
Cinema journals ‘22–’80
Radio from 1934
Dutch royal family collection
Dutch football league archive
National Music Archive
Objects related to media
Web video
Amateur film
Documentary film
Photographs
Websites
Visual art collections
…and much more.
a million hours
“We enable everyone to utilize the
collections to learn, experience and create.”
“Images for the Future” digitisation programme (2007-2014)
137.200 hours video MXF SD (HD for Film)
17.510 hours film (DPX and MXF)
123.900 hours audio WAF
1.200.000 photo’s TIFF
http://beeldenvoordetoekomst.nl/publicatie/
A
size Q2 2015:
~11 petabyte
Visuals https://vimeo.com/51425368 - Sebastiaan ter Burg CC-BY
+ backup
~11 petabyte
A
annual ingest:
8.000 hrs video
54.000 hrs radio
=
~1,5 petabyte
Visuals https://vimeo.com/51425368 - Sebastiaan ter Burg CC-BY
Smart OpenConnected
Designing the post-analogue archive
Sound and Vision - Channels
general public
media studies scholars
Sound and Vision - Channels
general public through 3rd party platforms
open collections: access through syndication
www.openimages.eu
labs.beeldengeluid.nl
broadcasts
QC and enrichments
Search engine
creative
industries
general
public
academics
QC and enrichments
Machine analysis Data gathering
general
public
academics
creative
industries
broadcasts
WikisTwitter
Subtitles
Importer
NewsCrowdsourcing
Data gathering
Low-level features
Multimedia
content analysis
today sniper fire disrupted the funeral of an
eleven year old ethnic albanian boy he was
killed yesterday while while chopping wood
his family blames serb police before his death
louisiana state police now say six workers
were killed after a natural gas well exploded
and caught fire about forty five miles east of
shreveport four others were injured in
yesterday's blast a police spokesman says the
derek started to melt in the intense heat the
Audio transcripts Concept detectors
Speaker identification Face recognition
Machine analysis
Technology transfer - the Accelerator Team
R&D A-team ICT
Daily&produc-on&Demonstrators&
Technology transfer - the Accelerator Team
R&D A-team ICT
Products, not
projects
Incremental
development
Demo every two
weeks
Daily&produc-on&Demonstrators&
Multi-annual
research agenda
Collaborative
projects
Day-to-day
maintenance
Contact with 3rd
parties
Technology transfer - the Accelerator Team
R&D A-team ICT
Daily&produc-on&Demonstrators&
spin-off SME’s
21
22
Two-speed IT
1. Solid foundation (MAM system)
2. Agile development
3. Open source software from/through R&D
4. Collaboration with spin-offs (customization,
processing, support)
Case 1. Entity extraction
Extracting keywords from the thesaurus from
subtitles
=> import in MAM system
Currently working with two services
x-TAS (University of Amsterdam)
Textrazor
Reseach partners Spin-off SME
Case 1. Entity extraction
Victor de Boer, oeland J.F. Ordelman and Josefien Schuurman: ‘Practice-oriented
Evaluation of Unsupervised Labeling of Audiovisual Content in an Archive Production
Environment.’ (to appear)
Case 1. Entity extraction
Term Extraction Service
Dashboard
http://gtaa.beeldengeluid.nl/
text: http://www.volkskrant.nl/dossier-britse-verkiezingen/
cameron-groot-brittannie-nog-groter-maken~a4008460/
Case 2. Speaker identification
Challenge:
Identify different speakers, label who he/she is.
Link to the correct thesaurus entry
=> import output in MAM system
Reseach partners Spin-off SME
Case 2. Speaker identification
Three step process:
…and a lot of tuning!
identify
speech
cluser
speakers
label clusters
Speaker labeling
Dashboard
Suggested(speaker(including(name(and(thesaurus(ID(
30
Remove&speaker&if&sugges0on&is&incorrect&
Summary
Invest in innovation = essential
smart, connected and open
From R&D to production
the accelerator team
two-speed IT
partnerships: researchers and SME’s
learn more:
labs.beeldengeluid.nl
Credits
Johan Oomen
Netherlands Institute for Sound and Vision
@johanoomen
& @benglabs
Many thanks to colleagues and collaborators:
- Bouke Huurnink
- Roeland Ordelman
- Marijn Huijbregts
- Josefien Schuurman
- Harm-Jan Triemstra

More Related Content

Viewers also liked

Towards more smart, connected and open audiovisual archives
Towards more smart, connected and open audiovisual archivesTowards more smart, connected and open audiovisual archives
Towards more smart, connected and open audiovisual archives
Johan Oomen
 

Viewers also liked (6)

CLARIAH kick-off 13 March 2015
CLARIAH kick-off 13 March 2015CLARIAH kick-off 13 March 2015
CLARIAH kick-off 13 March 2015
 
Pilod 2014 welkom
Pilod 2014 welkomPilod 2014 welkom
Pilod 2014 welkom
 
Towards more smart, connected and open audiovisual archives
Towards more smart, connected and open audiovisual archivesTowards more smart, connected and open audiovisual archives
Towards more smart, connected and open audiovisual archives
 
LinkedTV Europeana tech 2015 ignite talk
LinkedTV Europeana tech 2015 ignite talkLinkedTV Europeana tech 2015 ignite talk
LinkedTV Europeana tech 2015 ignite talk
 
Europeana Awareness year 2 review slides for Workpackage 2 'End-user engagement'
Europeana Awareness year 2 review slides for Workpackage 2 'End-user engagement'Europeana Awareness year 2 review slides for Workpackage 2 'End-user engagement'
Europeana Awareness year 2 review slides for Workpackage 2 'End-user engagement'
 
Over de impact van open en genetwerkt erfgoed
Over de impact van open en genetwerkt erfgoedOver de impact van open en genetwerkt erfgoed
Over de impact van open en genetwerkt erfgoed
 

More from Johan Oomen

New approaches towards accessing digital audiovisual heritage What will EUscr...
New approaches towards accessing digital audiovisual heritage What will EUscr...New approaches towards accessing digital audiovisual heritage What will EUscr...
New approaches towards accessing digital audiovisual heritage What will EUscr...
Johan Oomen
 
FIAT-IFTA 2013 - Television linked to the web: the case for audiovisual arch...
FIAT-IFTA 2013 - Television linked to the web:  the case for audiovisual arch...FIAT-IFTA 2013 - Television linked to the web:  the case for audiovisual arch...
FIAT-IFTA 2013 - Television linked to the web: the case for audiovisual arch...
Johan Oomen
 
Introductie slides - De Toekomst van Interactie met Media (VOLT seminar)
Introductie slides - De Toekomst van Interactie met Media (VOLT seminar)Introductie slides - De Toekomst van Interactie met Media (VOLT seminar)
Introductie slides - De Toekomst van Interactie met Media (VOLT seminar)
Johan Oomen
 
Beelden voor de Toekomst als Innovatieve Deeltjesversneller
Beelden voor de Toekomst als Innovatieve DeeltjesversnellerBeelden voor de Toekomst als Innovatieve Deeltjesversneller
Beelden voor de Toekomst als Innovatieve Deeltjesversneller
Johan Oomen
 
PATCH @ ACM MM 2012 introduction slides
PATCH @ ACM MM 2012  introduction slidesPATCH @ ACM MM 2012  introduction slides
PATCH @ ACM MM 2012 introduction slides
Johan Oomen
 

More from Johan Oomen (17)

RE:VIVE pitch at the Time Machine conference
RE:VIVE pitch at the Time Machine conferenceRE:VIVE pitch at the Time Machine conference
RE:VIVE pitch at the Time Machine conference
 
Towards Horizon Europe - Europeana Research and Innovation Agenda
Towards Horizon Europe - Europeana Research and Innovation AgendaTowards Horizon Europe - Europeana Research and Innovation Agenda
Towards Horizon Europe - Europeana Research and Innovation Agenda
 
DMI slides
DMI slidesDMI slides
DMI slides
 
Open, Smart and Connected access to Audiovisual Collections
Open, Smart and Connected access to Audiovisual CollectionsOpen, Smart and Connected access to Audiovisual Collections
Open, Smart and Connected access to Audiovisual Collections
 
MediaDNA


MediaDNA

MediaDNA


MediaDNA


 
New approaches towards accessing digital audiovisual heritage What will EUscr...
New approaches towards accessing digital audiovisual heritage What will EUscr...New approaches towards accessing digital audiovisual heritage What will EUscr...
New approaches towards accessing digital audiovisual heritage What will EUscr...
 
SEAPAVAA 2018 Closing panel
SEAPAVAA 2018 Closing panelSEAPAVAA 2018 Closing panel
SEAPAVAA 2018 Closing panel
 
DIVE+: Explorative Search for Digital Humanities
DIVE+: Explorative Search for Digital HumanitiesDIVE+: Explorative Search for Digital Humanities
DIVE+: Explorative Search for Digital Humanities
 
Preserving Interactive Media - SXSW 2017
Preserving Interactive Media - SXSW 2017Preserving Interactive Media - SXSW 2017
Preserving Interactive Media - SXSW 2017
 
Op weg naar een Nederlandse Erfgoedthesaurus met Linked Open Data
Op weg naar een Nederlandse Erfgoedthesaurus met Linked Open DataOp weg naar een Nederlandse Erfgoedthesaurus met Linked Open Data
Op weg naar een Nederlandse Erfgoedthesaurus met Linked Open Data
 
Hackathon Publieke Omroep 2013
Hackathon Publieke Omroep 2013Hackathon Publieke Omroep 2013
Hackathon Publieke Omroep 2013
 
FIAT-IFTA 2013 - Television linked to the web: the case for audiovisual arch...
FIAT-IFTA 2013 - Television linked to the web:  the case for audiovisual arch...FIAT-IFTA 2013 - Television linked to the web:  the case for audiovisual arch...
FIAT-IFTA 2013 - Television linked to the web: the case for audiovisual arch...
 
Introductie slides - De Toekomst van Interactie met Media (VOLT seminar)
Introductie slides - De Toekomst van Interactie met Media (VOLT seminar)Introductie slides - De Toekomst van Interactie met Media (VOLT seminar)
Introductie slides - De Toekomst van Interactie met Media (VOLT seminar)
 
Europeana Awareness WP2 End-user engagement - Year 1 review slides
Europeana Awareness WP2  End-user engagement - Year 1 review slides Europeana Awareness WP2  End-user engagement - Year 1 review slides
Europeana Awareness WP2 End-user engagement - Year 1 review slides
 
Audiovisual archives and digital humanities
Audiovisual archives and digital humanitiesAudiovisual archives and digital humanities
Audiovisual archives and digital humanities
 
Beelden voor de Toekomst als Innovatieve Deeltjesversneller
Beelden voor de Toekomst als Innovatieve DeeltjesversnellerBeelden voor de Toekomst als Innovatieve Deeltjesversneller
Beelden voor de Toekomst als Innovatieve Deeltjesversneller
 
PATCH @ ACM MM 2012 introduction slides
PATCH @ ACM MM 2012  introduction slidesPATCH @ ACM MM 2012  introduction slides
PATCH @ ACM MM 2012 introduction slides
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

DAS 2015 - Implementing R&D output in production systems

  • 1. Johan Oomen Head of Research Netherlands Institute for Sound and Vision Building smart, connected an open archives: implementing R&D output in production systems New York - May 8, 2015 digitalassetsymposium.com @johanoomen
  • 2.
  • 3. My talk today 1. Envisioning the post-analogue archive 2. From R&D output to production systems two-speed IT 3. Two case studies: entity extraction speaker identification
  • 4. Film from 1898 onwards Television from 1951 Advertising 1920 Cinema journals ‘22–’80 Radio from 1934 Dutch royal family collection Dutch football league archive National Music Archive Objects related to media Web video Amateur film Documentary film Photographs Websites Visual art collections …and much more. a million hours
  • 5. “We enable everyone to utilize the collections to learn, experience and create.”
  • 6.
  • 7. “Images for the Future” digitisation programme (2007-2014) 137.200 hours video MXF SD (HD for Film) 17.510 hours film (DPX and MXF) 123.900 hours audio WAF 1.200.000 photo’s TIFF http://beeldenvoordetoekomst.nl/publicatie/
  • 8. A size Q2 2015: ~11 petabyte Visuals https://vimeo.com/51425368 - Sebastiaan ter Burg CC-BY + backup ~11 petabyte
  • 9. A annual ingest: 8.000 hrs video 54.000 hrs radio = ~1,5 petabyte Visuals https://vimeo.com/51425368 - Sebastiaan ter Burg CC-BY
  • 10. Smart OpenConnected Designing the post-analogue archive
  • 11. Sound and Vision - Channels general public media studies scholars
  • 12. Sound and Vision - Channels general public through 3rd party platforms open collections: access through syndication www.openimages.eu
  • 14. broadcasts QC and enrichments Search engine creative industries general public academics
  • 15. QC and enrichments Machine analysis Data gathering general public academics creative industries broadcasts
  • 17. Low-level features Multimedia content analysis today sniper fire disrupted the funeral of an eleven year old ethnic albanian boy he was killed yesterday while while chopping wood his family blames serb police before his death louisiana state police now say six workers were killed after a natural gas well exploded and caught fire about forty five miles east of shreveport four others were injured in yesterday's blast a police spokesman says the derek started to melt in the intense heat the Audio transcripts Concept detectors Speaker identification Face recognition Machine analysis
  • 18. Technology transfer - the Accelerator Team R&D A-team ICT Daily&produc-on&Demonstrators&
  • 19. Technology transfer - the Accelerator Team R&D A-team ICT Products, not projects Incremental development Demo every two weeks Daily&produc-on&Demonstrators& Multi-annual research agenda Collaborative projects Day-to-day maintenance Contact with 3rd parties
  • 20. Technology transfer - the Accelerator Team R&D A-team ICT Daily&produc-on&Demonstrators& spin-off SME’s
  • 21. 21
  • 22. 22
  • 23. Two-speed IT 1. Solid foundation (MAM system) 2. Agile development 3. Open source software from/through R&D 4. Collaboration with spin-offs (customization, processing, support)
  • 24. Case 1. Entity extraction Extracting keywords from the thesaurus from subtitles => import in MAM system Currently working with two services x-TAS (University of Amsterdam) Textrazor Reseach partners Spin-off SME
  • 25. Case 1. Entity extraction Victor de Boer, oeland J.F. Ordelman and Josefien Schuurman: ‘Practice-oriented Evaluation of Unsupervised Labeling of Audiovisual Content in an Archive Production Environment.’ (to appear)
  • 26. Case 1. Entity extraction Term Extraction Service Dashboard http://gtaa.beeldengeluid.nl/ text: http://www.volkskrant.nl/dossier-britse-verkiezingen/ cameron-groot-brittannie-nog-groter-maken~a4008460/
  • 27. Case 2. Speaker identification Challenge: Identify different speakers, label who he/she is. Link to the correct thesaurus entry => import output in MAM system Reseach partners Spin-off SME
  • 28. Case 2. Speaker identification Three step process: …and a lot of tuning! identify speech cluser speakers label clusters
  • 31. Summary Invest in innovation = essential smart, connected and open From R&D to production the accelerator team two-speed IT partnerships: researchers and SME’s learn more: labs.beeldengeluid.nl
  • 32. Credits Johan Oomen Netherlands Institute for Sound and Vision @johanoomen & @benglabs Many thanks to colleagues and collaborators: - Bouke Huurnink - Roeland Ordelman - Marijn Huijbregts - Josefien Schuurman - Harm-Jan Triemstra