SlideShare a Scribd company logo
1 of 43
CONTENTUS
            Technologies for
   Next Generation Multimedia Libraries


AIEMPro’10, Firenze
Andreas Heß, German National Library
Jan Nandzik, Acosta Consult
Motivation




2
Motivation


             More than 30 millions
             hours of audio-visual
             content are stored in
             European archives




2
Motivation
             National Libraries contain
             millions of printed media


               More than 30 millions
               hours of audio-visual
               content are stored in
               European archives




2
Motivation
                                 National Libraries contain
                                 millions of printed media

World film production hit 5,039     More than 30 millions
feature films in 2007 and           hours of audio-visual
represented 60% of the audio-      content are stored in
visual revenues.                   European archives




      2
Next Generation Multimedia Archives (ca. 2015)

   Überschrift
» Everything is digital
   » Mass processing, administration and provision of
     multimedia content is day-to-day business
   • Bullet Points
   » All media are available in high quality

» Always accessible
   » Access from anywhere, at any time
   » Resources are not added to knowledge networks –
     they are created within them

» Always up-to-date
   » Desired information finds the user

» Knowledge Journeys
   » An interactive exploration of cultural and scientific collections
   » Largely replace traditional access (e.g. search engines)



     3
Present Generation Multimedia Archives (ca. 2010)

» Still large amounts of non-digitised material
   » Mass digitisation is still expensive, if quality is important
   » Problem: Media deterioration

» Restriced Access
   » Only from reading room, even if digital (legal reasons...)

» Not necessarily up-to-date
   » Indexing requires manual and intellectual effort

» Search is the paradigm
   » User must know what he/she is looking for
   » Different search engines for different collections
   » Media discontinuity
   » Situation in present libraries/archives:
     search engines often slow and outdated



     4
Problems on the Way
•       Deterioration
•       Digitisation
•       Metadata
•       Accessibility




    7
Deterioration
•       Improper storage and handling
•       Magnetic tape: drop-outs, wow and flutter, ...
•       Film: dirt, scratches, blotches, ...
•       Paper: bleaching, acid, mice, ...
•       Optical discs: coating decay, ...




    8
Digitisation
 •       Often lack of
         quality awareness

 •       Quality is crucial for
         preserving cultural
         heritage




     9
Digitisation - Problems
•    Causes for quality issues:
     ‣ Unsuitable hardware
     ‣ Unsuitable configuration
     ‣ Errors during digitisation

•    Goals:
     ‣ Automatisation and efficiency
     ‣ Continuous checks while job is
       being processed




    10
Metadata - Problems
•    Not always present
•    Indexing and annotation are time-consuming and error-prone
•    Incompatibility of different sources of metadata




    11
Accessibility - Problems
•    Current search approaches not really suitable for
     multi-media content
•    Search and consumption is separated
•    Physical presence of media
•    Data is nothing without meta-data!




    12
13
13
13
Our Project
•   THESEUS
    ‣ Research intiative in the area of Internet-based technologies,
      focus: semantic technologies
    ‣ funded by German Federal Ministry of Economics and Technology
    ‣ consortium of approx. 60 partners from academia and industry
    ‣ „application scenario“- and „basic technology“-subprojects

•   CONTENTUS
    ‣ application scenario-subproject of THESEUS
    ‣ concepts and technologies for multimedia-libraries and archives
The CONTENTUS Processing Chain




     Media-specific   Media-independent
Outline
•    Media-specific processing
     ‣ Print
     ‣ Video
     ‣ Audio

•    Media-independent processing
     ‣ Entity recognition and disambiguation
     ‣ Semantic Linking
     ‣ Semantic Multi-Media Search




    16
Print - Quality Assessment




17
Content-aware optimisation




     Original




18
Content-aware optimisation




                Otsu           Sauvola        CONTENTUS Approach +
     Original
                Binarisation   Binarisation   Content-specific optimisation




18
De-Warping




19
Removal of Unwanted Objects




20
Page Segmentation
•    Automatic identification of:
     ‣ Articles
     ‣ Headings
     ‣ Tables of content
     ‣ Figures / pictures and captions
     ‣ Bullet Points




    21
Video - Quality Assessment
•    Quality survey essential to determine value of content
•    Use perceptual no-reference quality metrics
•    Check for specific image artifacts
•    Based on human visual models
•    Use specific restoration modules


Restoration
•    Current solutions require at least 4 hours of work per hour of video
•    Automation necessary




    22
Video - Restoration




23
Video - Restoration
          Scratches are automatically identified




                                  Scratches are automatically removed




23
Video - Segmentation

                     News Show „Tagesthemen“ – 22:35:29




                                Report 1                                         Report 2


           Report   Interview              Report         Interview   Sum.             Inter.




 Speaker                                                                     Speaker




24
Video - Annotation
                                                                    tagesthemen
•    Face detection and annotation                 Face
                                                 detection
                                                                             Logo
•    Prerequisite for further indexing         1st version                 detection


•    Text and logo detection




                                         Text detection


                                                             OCR   Ulrich Wickert




    25
Audio
•    Segmentation speech / non-speech
•    Extraction of musical features
•    Speech transcription
•    Speaker recognition
•    Similarity search




    26
Media-Independent Processing




27
Named Entity Disambiguation
•    Named entities are
     extracted from text
•    Context is used for
     disambiguation
•    Compare to reference
     text, e.g. from Wikipedia




    28
Semantic Linking

                   Authority Files




                     Wikipedia




                    MusicBrainz




29
Semantic Multimedia Search
provides
• Seamless searching in multimedia
• Query expansion / narrowing

combines
• Full-text search and Semantic
   Web Stack (RDF based ontology)

integrates
• Multiple media (Video, Audio,
    Text)
• Clickable filter facets
• Text classification
• Similarity search




 30
The CONTENTUS-Collection
•    Digitisation of Music Information Center of the GDR
•    Now a special collection of the German Music Archive
     ‣ 1600 books
     ‣ 200.000 press clippings
     ‣ 4000 audio records
     ‣ 10.000 photos

•    Other media
     ‣ Newspapers (Neues Deutschland)
     ‣ News broadcasts
     ‣ Historical film material




    31
Demonstration




32
SMMS Demo 1 – Who was Hanns Eisler?
• Hanns Eisler, GDR-Composer
• Composed the music of the
  GDR national anthem


SMMS-Features:
  – Multimedia
  – One unified index
  – Facetted search
                               ?
  – Roles of persons
  – Timeline




 33
Summary
•    Challenges and obstacles on the way to the
     Next Generation Multimedia Library
•    Our project
     ‣ High degree of automation during complete
       processing chain
     ‣ Semantic multimedia search engine




    34
Conclusion
•    The next generation
     library is not here yet
•    We‘re on the way...
•    We need your! help




    35
Thank you for your attention
•    Visit THESEUS:
     http://www.theseus-programm.de/

•    Mail us:
     a.hess@d-nb.de
     jn@acosta-consult.de




    36

More Related Content

Similar to AIEMpro 2010: CONTENTUS: Technologies for Next Generation Multimedia Libraries

Digital projects best practices [xxxiii reunión nacional de archivos 201111]
Digital projects best practices [xxxiii reunión nacional de archivos 201111]Digital projects best practices [xxxiii reunión nacional de archivos 201111]
Digital projects best practices [xxxiii reunión nacional de archivos 201111]Frederick Zarndt
 
Of Communities and Practices: Digital Preservation Innovation & Research
Of Communities  and Practices: Digital Preservation Innovation & ResearchOf Communities  and Practices: Digital Preservation Innovation & Research
Of Communities and Practices: Digital Preservation Innovation & ResearchErwin Verbruggen
 
Using semantics to improve interactive information access
Using semantics to improve interactive information accessUsing semantics to improve interactive information access
Using semantics to improve interactive information accessLynda Hardman
 
Metadata for audiovisual heritage: Semiotic considerations
Metadata for audiovisual heritage: Semiotic considerationsMetadata for audiovisual heritage: Semiotic considerations
Metadata for audiovisual heritage: Semiotic considerationsIndrek Ibrus
 
ICIC 2013 Conference Proceedings Uwe Rosemann TIB
ICIC 2013 Conference Proceedings Uwe Rosemann TIBICIC 2013 Conference Proceedings Uwe Rosemann TIB
ICIC 2013 Conference Proceedings Uwe Rosemann TIBDr. Haxel Consult
 
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspective
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspectiveGIS Day 2015: Geoinformatics, Open Source and Videos - a library perspective
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspectivePeter Löwe
 
Quality and quantity: opening up the archives
Quality and quantity: opening up the archivesQuality and quantity: opening up the archives
Quality and quantity: opening up the archivesEUscreen
 
Presentation on the Warsaw Conference on National Bibliographies August 2012
Presentation on the Warsaw Conference on National Bibliographies August 2012Presentation on the Warsaw Conference on National Bibliographies August 2012
Presentation on the Warsaw Conference on National Bibliographies August 2012nw13
 
IBC 2010 Redefining Search
IBC 2010 Redefining SearchIBC 2010 Redefining Search
IBC 2010 Redefining Searchvrt-medialab
 
Facsimiles of Text and Music from Distributed Resources
Facsimiles of Text and Music from Distributed ResourcesFacsimiles of Text and Music from Distributed Resources
Facsimiles of Text and Music from Distributed Resourcesblalbritton
 
APIS. Digitale biographische Blütenlese
APIS. Digitale biographische BlütenleseAPIS. Digitale biographische Blütenlese
APIS. Digitale biographische Blütenleseeveline wandl-vogt
 
Towards a New Audiovisual Think Tank for Audiovisual Archivists & Cultural He...
Towards a New Audiovisual Think Tank for Audiovisual Archivists & Cultural He...Towards a New Audiovisual Think Tank for Audiovisual Archivists & Cultural He...
Towards a New Audiovisual Think Tank for Audiovisual Archivists & Cultural He...Sound and Vision R&D
 
Unrulyversity 27 feb2013_stefanhaefliger
Unrulyversity 27 feb2013_stefanhaefligerUnrulyversity 27 feb2013_stefanhaefliger
Unrulyversity 27 feb2013_stefanhaefligerStefan Haefliger
 
Ad Stijnman (ICN), Databases for art technological source research: some do's...
Ad Stijnman (ICN), Databases for art technological source research: some do's...Ad Stijnman (ICN), Databases for art technological source research: some do's...
Ad Stijnman (ICN), Databases for art technological source research: some do's...Cultural Heritage Agency of the Netherlands
 
Next 10 years slides v1
Next 10 years slides v1Next 10 years slides v1
Next 10 years slides v1FIAT/IFTA
 
2012.03.20 ihr farquhar v03
2012.03.20 ihr   farquhar v032012.03.20 ihr   farquhar v03
2012.03.20 ihr farquhar v03Digital History
 
Acquisition of audiovisual Scientific Technical Information from OSGeo: A wor...
Acquisition of audiovisual Scientific Technical Information from OSGeo: A wor...Acquisition of audiovisual Scientific Technical Information from OSGeo: A wor...
Acquisition of audiovisual Scientific Technical Information from OSGeo: A wor...Peter Löwe
 
Principles of New Media [Fall 2012 RTF 319 Session 02]
Principles of New Media [Fall 2012 RTF 319 Session 02]Principles of New Media [Fall 2012 RTF 319 Session 02]
Principles of New Media [Fall 2012 RTF 319 Session 02]William J. Moner
 

Similar to AIEMpro 2010: CONTENTUS: Technologies for Next Generation Multimedia Libraries (20)

Digital projects best practices [xxxiii reunión nacional de archivos 201111]
Digital projects best practices [xxxiii reunión nacional de archivos 201111]Digital projects best practices [xxxiii reunión nacional de archivos 201111]
Digital projects best practices [xxxiii reunión nacional de archivos 201111]
 
Of Communities and Practices: Digital Preservation Innovation & Research
Of Communities  and Practices: Digital Preservation Innovation & ResearchOf Communities  and Practices: Digital Preservation Innovation & Research
Of Communities and Practices: Digital Preservation Innovation & Research
 
Engage Your Community to Celebrate Your History
Engage Your Community to Celebrate Your HistoryEngage Your Community to Celebrate Your History
Engage Your Community to Celebrate Your History
 
Using semantics to improve interactive information access
Using semantics to improve interactive information accessUsing semantics to improve interactive information access
Using semantics to improve interactive information access
 
Metadata for audiovisual heritage: Semiotic considerations
Metadata for audiovisual heritage: Semiotic considerationsMetadata for audiovisual heritage: Semiotic considerations
Metadata for audiovisual heritage: Semiotic considerations
 
ICIC 2013 Conference Proceedings Uwe Rosemann TIB
ICIC 2013 Conference Proceedings Uwe Rosemann TIBICIC 2013 Conference Proceedings Uwe Rosemann TIB
ICIC 2013 Conference Proceedings Uwe Rosemann TIB
 
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspective
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspectiveGIS Day 2015: Geoinformatics, Open Source and Videos - a library perspective
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspective
 
Quality and quantity: opening up the archives
Quality and quantity: opening up the archivesQuality and quantity: opening up the archives
Quality and quantity: opening up the archives
 
Presentation on the Warsaw Conference on National Bibliographies August 2012
Presentation on the Warsaw Conference on National Bibliographies August 2012Presentation on the Warsaw Conference on National Bibliographies August 2012
Presentation on the Warsaw Conference on National Bibliographies August 2012
 
IBC 2010 Redefining Search
IBC 2010 Redefining SearchIBC 2010 Redefining Search
IBC 2010 Redefining Search
 
Text Mining : Experience
Text Mining : ExperienceText Mining : Experience
Text Mining : Experience
 
Facsimiles of Text and Music from Distributed Resources
Facsimiles of Text and Music from Distributed ResourcesFacsimiles of Text and Music from Distributed Resources
Facsimiles of Text and Music from Distributed Resources
 
APIS. Digitale biographische Blütenlese
APIS. Digitale biographische BlütenleseAPIS. Digitale biographische Blütenlese
APIS. Digitale biographische Blütenlese
 
Towards a New Audiovisual Think Tank for Audiovisual Archivists & Cultural He...
Towards a New Audiovisual Think Tank for Audiovisual Archivists & Cultural He...Towards a New Audiovisual Think Tank for Audiovisual Archivists & Cultural He...
Towards a New Audiovisual Think Tank for Audiovisual Archivists & Cultural He...
 
Unrulyversity 27 feb2013_stefanhaefliger
Unrulyversity 27 feb2013_stefanhaefligerUnrulyversity 27 feb2013_stefanhaefliger
Unrulyversity 27 feb2013_stefanhaefliger
 
Ad Stijnman (ICN), Databases for art technological source research: some do's...
Ad Stijnman (ICN), Databases for art technological source research: some do's...Ad Stijnman (ICN), Databases for art technological source research: some do's...
Ad Stijnman (ICN), Databases for art technological source research: some do's...
 
Next 10 years slides v1
Next 10 years slides v1Next 10 years slides v1
Next 10 years slides v1
 
2012.03.20 ihr farquhar v03
2012.03.20 ihr   farquhar v032012.03.20 ihr   farquhar v03
2012.03.20 ihr farquhar v03
 
Acquisition of audiovisual Scientific Technical Information from OSGeo: A wor...
Acquisition of audiovisual Scientific Technical Information from OSGeo: A wor...Acquisition of audiovisual Scientific Technical Information from OSGeo: A wor...
Acquisition of audiovisual Scientific Technical Information from OSGeo: A wor...
 
Principles of New Media [Fall 2012 RTF 319 Session 02]
Principles of New Media [Fall 2012 RTF 319 Session 02]Principles of New Media [Fall 2012 RTF 319 Session 02]
Principles of New Media [Fall 2012 RTF 319 Session 02]
 

Recently uploaded

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Recently uploaded (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

AIEMpro 2010: CONTENTUS: Technologies for Next Generation Multimedia Libraries

  • 1. CONTENTUS Technologies for Next Generation Multimedia Libraries AIEMPro’10, Firenze Andreas Heß, German National Library Jan Nandzik, Acosta Consult
  • 3. Motivation More than 30 millions hours of audio-visual content are stored in European archives 2
  • 4. Motivation National Libraries contain millions of printed media More than 30 millions hours of audio-visual content are stored in European archives 2
  • 5. Motivation National Libraries contain millions of printed media World film production hit 5,039 More than 30 millions feature films in 2007 and hours of audio-visual represented 60% of the audio- content are stored in visual revenues. European archives 2
  • 6. Next Generation Multimedia Archives (ca. 2015) Überschrift » Everything is digital » Mass processing, administration and provision of multimedia content is day-to-day business • Bullet Points » All media are available in high quality » Always accessible » Access from anywhere, at any time » Resources are not added to knowledge networks – they are created within them » Always up-to-date » Desired information finds the user » Knowledge Journeys » An interactive exploration of cultural and scientific collections » Largely replace traditional access (e.g. search engines) 3
  • 7. Present Generation Multimedia Archives (ca. 2010) » Still large amounts of non-digitised material » Mass digitisation is still expensive, if quality is important » Problem: Media deterioration » Restriced Access » Only from reading room, even if digital (legal reasons...) » Not necessarily up-to-date » Indexing requires manual and intellectual effort » Search is the paradigm » User must know what he/she is looking for » Different search engines for different collections » Media discontinuity » Situation in present libraries/archives: search engines often slow and outdated 4
  • 8.
  • 9.
  • 10. Problems on the Way • Deterioration • Digitisation • Metadata • Accessibility 7
  • 11. Deterioration • Improper storage and handling • Magnetic tape: drop-outs, wow and flutter, ... • Film: dirt, scratches, blotches, ... • Paper: bleaching, acid, mice, ... • Optical discs: coating decay, ... 8
  • 12. Digitisation • Often lack of quality awareness • Quality is crucial for preserving cultural heritage 9
  • 13. Digitisation - Problems • Causes for quality issues: ‣ Unsuitable hardware ‣ Unsuitable configuration ‣ Errors during digitisation • Goals: ‣ Automatisation and efficiency ‣ Continuous checks while job is being processed 10
  • 14. Metadata - Problems • Not always present • Indexing and annotation are time-consuming and error-prone • Incompatibility of different sources of metadata 11
  • 15. Accessibility - Problems • Current search approaches not really suitable for multi-media content • Search and consumption is separated • Physical presence of media • Data is nothing without meta-data! 12
  • 16. 13
  • 17. 13
  • 18. 13
  • 19. Our Project • THESEUS ‣ Research intiative in the area of Internet-based technologies, focus: semantic technologies ‣ funded by German Federal Ministry of Economics and Technology ‣ consortium of approx. 60 partners from academia and industry ‣ „application scenario“- and „basic technology“-subprojects • CONTENTUS ‣ application scenario-subproject of THESEUS ‣ concepts and technologies for multimedia-libraries and archives
  • 20. The CONTENTUS Processing Chain Media-specific Media-independent
  • 21. Outline • Media-specific processing ‣ Print ‣ Video ‣ Audio • Media-independent processing ‣ Entity recognition and disambiguation ‣ Semantic Linking ‣ Semantic Multi-Media Search 16
  • 22. Print - Quality Assessment 17
  • 24. Content-aware optimisation Otsu Sauvola CONTENTUS Approach + Original Binarisation Binarisation Content-specific optimisation 18
  • 26. Removal of Unwanted Objects 20
  • 27. Page Segmentation • Automatic identification of: ‣ Articles ‣ Headings ‣ Tables of content ‣ Figures / pictures and captions ‣ Bullet Points 21
  • 28. Video - Quality Assessment • Quality survey essential to determine value of content • Use perceptual no-reference quality metrics • Check for specific image artifacts • Based on human visual models • Use specific restoration modules Restoration • Current solutions require at least 4 hours of work per hour of video • Automation necessary 22
  • 30. Video - Restoration Scratches are automatically identified Scratches are automatically removed 23
  • 31. Video - Segmentation News Show „Tagesthemen“ – 22:35:29 Report 1 Report 2 Report Interview Report Interview Sum. Inter. Speaker Speaker 24
  • 32. Video - Annotation tagesthemen • Face detection and annotation Face detection Logo • Prerequisite for further indexing 1st version detection • Text and logo detection Text detection OCR Ulrich Wickert 25
  • 33. Audio • Segmentation speech / non-speech • Extraction of musical features • Speech transcription • Speaker recognition • Similarity search 26
  • 35. Named Entity Disambiguation • Named entities are extracted from text • Context is used for disambiguation • Compare to reference text, e.g. from Wikipedia 28
  • 36. Semantic Linking Authority Files Wikipedia MusicBrainz 29
  • 37. Semantic Multimedia Search provides • Seamless searching in multimedia • Query expansion / narrowing combines • Full-text search and Semantic Web Stack (RDF based ontology) integrates • Multiple media (Video, Audio, Text) • Clickable filter facets • Text classification • Similarity search 30
  • 38. The CONTENTUS-Collection • Digitisation of Music Information Center of the GDR • Now a special collection of the German Music Archive ‣ 1600 books ‣ 200.000 press clippings ‣ 4000 audio records ‣ 10.000 photos • Other media ‣ Newspapers (Neues Deutschland) ‣ News broadcasts ‣ Historical film material 31
  • 40. SMMS Demo 1 – Who was Hanns Eisler? • Hanns Eisler, GDR-Composer • Composed the music of the GDR national anthem SMMS-Features: – Multimedia – One unified index – Facetted search ? – Roles of persons – Timeline 33
  • 41. Summary • Challenges and obstacles on the way to the Next Generation Multimedia Library • Our project ‣ High degree of automation during complete processing chain ‣ Semantic multimedia search engine 34
  • 42. Conclusion • The next generation library is not here yet • We‘re on the way... • We need your! help 35
  • 43. Thank you for your attention • Visit THESEUS: http://www.theseus-programm.de/ • Mail us: a.hess@d-nb.de jn@acosta-consult.de 36