Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

New life for old media - Investigations into Speech Synthesis and Deep Learning-based Colorization for Audiovisual Archive

961 views

Published on

By Rudy Marsman, Victor de Boer, Themistoklis Karavellas, Johan Oomen. Presented at NEM Summit 2017, 29/30 November 2017 in Madrid. Paper can be found here: http://publications.beeldengeluid.nl/pub/570

Published in: Science
  • Be the first to comment

New life for old media - Investigations into Speech Synthesis and Deep Learning-based Colorization for Audiovisual Archive

  1. 1. New Life for Old Media Investigations into Speech Synthesis and Deep Learning-based Colorization for Audiovisual Archive Rudy Marsman, Victor de Boer, Themistoklis Karavellas, Johan Oomen
  2. 2. Netherlands Institute for Sound and Vision (NISV)
  3. 3. 70% audio-visual heritage material More than 1.000.000 hrs of TV (public broadcasters) Radio, Music,Documentaries, Film, Commercials, etc. Photographs, objects, …
  4. 4. CC BY - SA as preferable license 3000 items “Internet Quality” Polygoon newsreels Supporting a National and European Audiovisual Commons Public outreach by embracing new technologies and ‘participatory culture’ Openbeelden.nl / openimages.eu
  5. 5. Explore AI techniques to enrich this archival material to allow for new types of engagement 1. Text-To-Speech engine based on limited single narrator 2. Colorization of old black-and-white video footage
  6. 6. Philip Bloemendal Famous anchorman Iconic voice tiny.cc/voiceNL (not a virus)
  7. 7. Limited Domain Speech Synthesis Can the current corpus of audio recordings of Bloemendal be used to construct a TTS engine? • Percentage of the Dutch language can be generated with the current corpus? • What can we do to improve? • How well is the text-to-speech engine recognizable as Philip Bloemendal? • How understandable are the constructed audio files?
  8. 8. Text: Audio: The Dutch football played Germany the.wav dutch.wav football.wav Spoken Language Elements Repository (35,000 words) team Slot-and-filler Text-to-speech 3,300 newsreels, speech recognition
  9. 9. How to expand the coverage of the index? •Many (contemporary) words have not been pronounced by Philip Bloemendal •Multiple strategies –Change format (Lowercase, diaeresis) –Numbers –Finding synonyms –Decompounding
  10. 10. Finding Synonyms • Open Dutch Wordnet Dutch lexical semantic database (Postma et al. 2016) • Yields synsets (e.g. Hoofdmeester -> Rector, Schoolhoofd) • Computationally expensive lookup
  11. 11. Decompounding • Dutch language allows for compounding words, each word is distinct in the corpus • Decompounding is computationally expensive (for large corpora, long words) • Constructed Bigrams and Trigrams School, hoofd -> Schoolhoofd Regen, water -> regenwater Staat, hoofd -> StaatShoofd
  12. 12. 4 corpora to test against •News articles (same domain, different time) | 50 articles, 2743 unique words •1970s news articles from the (same domain, time) | 50 articles | 16,191 words •E-books (different domain, various times) |6 books | 2,657 words •Tweets (different domain, different time) | 1000 tweets| 27,180 words • Evaluation – Number of distinct words – Number of sentences Evaluation
  13. 13. Results (words)Coverage
  14. 14. • 8 people tested the software • Philip was recognized (or ‘that news guy’) • Words with more consonants were easier to recognize • When user input their own sentences, more recognition • When sentences were demonstrated without subtitles, less • Speed of software / GUI limited testing capabilities How recognisable are sentences?
  15. 15. The use of Deep Neural Networks in colorizing video
  16. 16. Neural Networks Recent progress in computational power made implementation of Deep Neural Nets possible Neural Networks trained on large training set can accurately make predictions in real-world examples
  17. 17. Zhang et al. (2012) trained a neural net on over a million images for colorization http://richzhang.github.io/colorization/ Existing Literature
  18. 18. • Extract individual frames from video using FFMPEG • Colorize each individual frame • Re-compile video and attach original audio file Outcome Extract 200x200 frames 24fps (ffmpeg) Zhang et al. implemented in TensorFlow Combine into videos (ffmpeg) Implementation on Video
  19. 19. • Colorized videos are more ‘tangible’ and ‘alive’ than black/white • Showing colorized Polygoonjournaals can augment TTS engine • General positive responses on technology may increase attention to NISV collection Outcome
  20. 20. Outcome
  21. 21. • Each frame is considered independent and is colorized as such --> Artifacts appear between frames • Slow performance without use of Nvidia GPU • Low resolution • Predicted colors still far from perfect Challenges
  22. 22. www.openbeelden.nl/tags/ingekleurd Hosted on Openbeelden platform One of the colorized videos received 61,000+ views, 1,700 likes and was shared 521 times, illustrating the potential to engage new audiences. tiny.cc/colorNL
  23. 23. • Collection-specific TTS systems for audio-enrichments of archive material or multimedia applications. • Colorization of old media allows for a new view on existing images • NISV will continue investigating these emerging technologies to enable new types of interaction and to further engage new audiences with archival material in unexpected ways. – In the media museum – On its public-facing online channels. Take home
  24. 24. New Life for old Media: Investigations into Speech Synthesis and Deep Learning-based Colorization for Audiovisual Archive Rudy Marsman, Victor de Boer, Themistoklis Karavellas, Johan Oomen Thank you
  25. 25. Annex: Results (sentences) Dataset Unique sentences Unique sentences found After synsets After decompounding Contemporary news 1022 106 110 186 Old news 2626 183 190 301 Tweets 8937 174 181 296 Books 56106 9387 11385 18271

×