CONTENTUS            Technologies for   Next Generation Multimedia LibrariesAIEMPro’10, FirenzeAndreas Heß, German Nationa...
Motivation2
Motivation             More than 30 millions             hours of audio-visual             content are stored in          ...
Motivation             National Libraries contain             millions of printed media               More than 30 million...
Motivation                                 National Libraries contain                                 millions of printed ...
Next Generation Multimedia Archives (ca. 2015)   Überschrift» Everything is digital   » Mass processing, administration an...
Present Generation Multimedia Archives (ca. 2010)» Still large amounts of non-digitised material   » Mass digitisation is ...
Problems on the Way•       Deterioration•       Digitisation•       Metadata•       Accessibility    7
Deterioration•       Improper storage and handling•       Magnetic tape: drop-outs, wow and flutter, ...•       Film: dirt...
Digitisation •       Often lack of         quality awareness •       Quality is crucial for         preserving cultural   ...
Digitisation - Problems•    Causes for quality issues:     ‣ Unsuitable hardware     ‣ Unsuitable configuration     ‣ Erro...
Metadata - Problems•    Not always present•    Indexing and annotation are time-consuming and error-prone•    Incompatibil...
Accessibility - Problems•    Current search approaches not really suitable for     multi-media content•    Search and cons...
13
13
13
Our Project•   THESEUS    ‣ Research intiative in the area of Internet-based technologies,      focus: semantic technologi...
The CONTENTUS Processing Chain     Media-specific   Media-independent
Outline•    Media-specific processing     ‣ Print     ‣ Video     ‣ Audio•    Media-independent processing     ‣ Entity re...
Print - Quality Assessment17
Content-aware optimisation     Original18
Content-aware optimisation                Otsu           Sauvola        CONTENTUS Approach +     Original                B...
De-Warping19
Removal of Unwanted Objects20
Page Segmentation•    Automatic identification of:     ‣ Articles     ‣ Headings     ‣ Tables of content     ‣ Figures / p...
Video - Quality Assessment•    Quality survey essential to determine value of content•    Use perceptual no-reference qual...
Video - Restoration23
Video - Restoration          Scratches are automatically identified                                  Scratches are automat...
Video - Segmentation                     News Show „Tagesthemen“ – 22:35:29                                Report 1       ...
Video - Annotation                                                                    tagesthemen•    Face detection and a...
Audio•    Segmentation speech / non-speech•    Extraction of musical features•    Speech transcription•    Speaker recogni...
Media-Independent Processing27
Named Entity Disambiguation•    Named entities are     extracted from text•    Context is used for     disambiguation•    ...
Semantic Linking                   Authority Files                     Wikipedia                    MusicBrainz29
Semantic Multimedia Searchprovides• Seamless searching in multimedia• Query expansion / narrowingcombines• Full-text searc...
The CONTENTUS-Collection•    Digitisation of Music Information Center of the GDR•    Now a special collection of the Germa...
Demonstration32
SMMS Demo 1 – Who was Hanns Eisler?• Hanns Eisler, GDR-Composer• Composed the music of the  GDR national anthemSMMS-Featur...
Summary•    Challenges and obstacles on the way to the     Next Generation Multimedia Library•    Our project     ‣ High d...
Conclusion•    The next generation     library is not here yet•    We‘re on the way...•    We need your! help    35
Thank you for your attention•    Visit THESEUS:     http://www.theseus-programm.de/•    Mail us:     a.hess@d-nb.de     jn...
AIEMpro 2010: CONTENTUS: Technologies for Next Generation Multimedia Libraries
AIEMpro 2010: CONTENTUS: Technologies for Next Generation Multimedia Libraries
Upcoming SlideShare
Loading in...5
×

AIEMpro 2010: CONTENTUS: Technologies for Next Generation Multimedia Libraries

688

Published on

AIEMpro 2010 keynote speech by
Andreas Heß, German National Library
Jan Nandzik, Acosta Consult

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
688
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "AIEMpro 2010: CONTENTUS: Technologies for Next Generation Multimedia Libraries"

  1. 1. CONTENTUS Technologies for Next Generation Multimedia LibrariesAIEMPro’10, FirenzeAndreas Heß, German National LibraryJan Nandzik, Acosta Consult
  2. 2. Motivation2
  3. 3. Motivation More than 30 millions hours of audio-visual content are stored in European archives2
  4. 4. Motivation National Libraries contain millions of printed media More than 30 millions hours of audio-visual content are stored in European archives2
  5. 5. Motivation National Libraries contain millions of printed mediaWorld film production hit 5,039 More than 30 millionsfeature films in 2007 and hours of audio-visualrepresented 60% of the audio- content are stored invisual revenues. European archives 2
  6. 6. Next Generation Multimedia Archives (ca. 2015) Überschrift» Everything is digital » Mass processing, administration and provision of multimedia content is day-to-day business • Bullet Points » All media are available in high quality» Always accessible » Access from anywhere, at any time » Resources are not added to knowledge networks – they are created within them» Always up-to-date » Desired information finds the user» Knowledge Journeys » An interactive exploration of cultural and scientific collections » Largely replace traditional access (e.g. search engines) 3
  7. 7. Present Generation Multimedia Archives (ca. 2010)» Still large amounts of non-digitised material » Mass digitisation is still expensive, if quality is important » Problem: Media deterioration» Restriced Access » Only from reading room, even if digital (legal reasons...)» Not necessarily up-to-date » Indexing requires manual and intellectual effort» Search is the paradigm » User must know what he/she is looking for » Different search engines for different collections » Media discontinuity » Situation in present libraries/archives: search engines often slow and outdated 4
  8. 8. Problems on the Way• Deterioration• Digitisation• Metadata• Accessibility 7
  9. 9. Deterioration• Improper storage and handling• Magnetic tape: drop-outs, wow and flutter, ...• Film: dirt, scratches, blotches, ...• Paper: bleaching, acid, mice, ...• Optical discs: coating decay, ... 8
  10. 10. Digitisation • Often lack of quality awareness • Quality is crucial for preserving cultural heritage 9
  11. 11. Digitisation - Problems• Causes for quality issues: ‣ Unsuitable hardware ‣ Unsuitable configuration ‣ Errors during digitisation• Goals: ‣ Automatisation and efficiency ‣ Continuous checks while job is being processed 10
  12. 12. Metadata - Problems• Not always present• Indexing and annotation are time-consuming and error-prone• Incompatibility of different sources of metadata 11
  13. 13. Accessibility - Problems• Current search approaches not really suitable for multi-media content• Search and consumption is separated• Physical presence of media• Data is nothing without meta-data! 12
  14. 14. 13
  15. 15. 13
  16. 16. 13
  17. 17. Our Project• THESEUS ‣ Research intiative in the area of Internet-based technologies, focus: semantic technologies ‣ funded by German Federal Ministry of Economics and Technology ‣ consortium of approx. 60 partners from academia and industry ‣ „application scenario“- and „basic technology“-subprojects• CONTENTUS ‣ application scenario-subproject of THESEUS ‣ concepts and technologies for multimedia-libraries and archives
  18. 18. The CONTENTUS Processing Chain Media-specific Media-independent
  19. 19. Outline• Media-specific processing ‣ Print ‣ Video ‣ Audio• Media-independent processing ‣ Entity recognition and disambiguation ‣ Semantic Linking ‣ Semantic Multi-Media Search 16
  20. 20. Print - Quality Assessment17
  21. 21. Content-aware optimisation Original18
  22. 22. Content-aware optimisation Otsu Sauvola CONTENTUS Approach + Original Binarisation Binarisation Content-specific optimisation18
  23. 23. De-Warping19
  24. 24. Removal of Unwanted Objects20
  25. 25. Page Segmentation• Automatic identification of: ‣ Articles ‣ Headings ‣ Tables of content ‣ Figures / pictures and captions ‣ Bullet Points 21
  26. 26. Video - Quality Assessment• Quality survey essential to determine value of content• Use perceptual no-reference quality metrics• Check for specific image artifacts• Based on human visual models• Use specific restoration modulesRestoration• Current solutions require at least 4 hours of work per hour of video• Automation necessary 22
  27. 27. Video - Restoration23
  28. 28. Video - Restoration Scratches are automatically identified Scratches are automatically removed23
  29. 29. Video - Segmentation News Show „Tagesthemen“ – 22:35:29 Report 1 Report 2 Report Interview Report Interview Sum. Inter. Speaker Speaker24
  30. 30. Video - Annotation tagesthemen• Face detection and annotation Face detection Logo• Prerequisite for further indexing 1st version detection• Text and logo detection Text detection OCR Ulrich Wickert 25
  31. 31. Audio• Segmentation speech / non-speech• Extraction of musical features• Speech transcription• Speaker recognition• Similarity search 26
  32. 32. Media-Independent Processing27
  33. 33. Named Entity Disambiguation• Named entities are extracted from text• Context is used for disambiguation• Compare to reference text, e.g. from Wikipedia 28
  34. 34. Semantic Linking Authority Files Wikipedia MusicBrainz29
  35. 35. Semantic Multimedia Searchprovides• Seamless searching in multimedia• Query expansion / narrowingcombines• Full-text search and Semantic Web Stack (RDF based ontology)integrates• Multiple media (Video, Audio, Text)• Clickable filter facets• Text classification• Similarity search 30
  36. 36. The CONTENTUS-Collection• Digitisation of Music Information Center of the GDR• Now a special collection of the German Music Archive ‣ 1600 books ‣ 200.000 press clippings ‣ 4000 audio records ‣ 10.000 photos• Other media ‣ Newspapers (Neues Deutschland) ‣ News broadcasts ‣ Historical film material 31
  37. 37. Demonstration32
  38. 38. SMMS Demo 1 – Who was Hanns Eisler?• Hanns Eisler, GDR-Composer• Composed the music of the GDR national anthemSMMS-Features: – Multimedia – One unified index – Facetted search ? – Roles of persons – Timeline 33
  39. 39. Summary• Challenges and obstacles on the way to the Next Generation Multimedia Library• Our project ‣ High degree of automation during complete processing chain ‣ Semantic multimedia search engine 34
  40. 40. Conclusion• The next generation library is not here yet• We‘re on the way...• We need your! help 35
  41. 41. Thank you for your attention• Visit THESEUS: http://www.theseus-programm.de/• Mail us: a.hess@d-nb.de jn@acosta-consult.de 36

×