€ 173 mln in 7 jaar (looptijd 2007-2014)wel terugverdienverplichting. Tijdens looptijd € 19 mln
in the contextdatabase also other information sources
The amount of footage for each festivalyear varies from only a summary to almost unabridgedconcert recordings, even including raw, unpublished footage
In contrast to domains like news video, where the numberof visual concepts is unrestricted, the number of conceptsthat may appear in a concert is more or less fixed. A bandplays on stage for an audience. Thus, major concepts arerelated to the role of the band members, e.g. lead singer, orguitarist, and the type of instruments that they play, e.g.,drums or keyboard. Although quite many instruments exist,most bands typically use guitars, drums, and keyboards.We chose 12 concert concepts based on frequency, visual detectionfeasibility, previous mentioning in literature [3, 10],and expected utility for concert video users. For each conceptwe annotated several hundred of examples using theannotation tool depicted in Figure 3 . The 12 concertconcepts are depicted in Figure 4.
Automatic speech recognition (ASR) technology was usedto attach browsing functionality to the interview fragmentsin the collection. Speech transcripts were generated usingthe SHoUT ASR toolkit  and post-processed to generatea filtered term frequency list that is most likely to representthe contents of the interviews, based on tf.idf statistics. Thislist was then used to create a time-synchronized term cloud.Each word in the cloud is clickable to enable users to jumpto the part of the interview where a word is mentioned.
The main mode of user interaction with our video searchengine is by means of a timeline-based video player, seeFigure 2. The player enables users to watch and navigatethrough a single video concert. Little colored dots on thetimeline mark the location of an interesting fragment correspondingto an automatically derived label. To inspect thelabel and the duration of the fragment, users simply movetheir mouse cursor over the colored dot. By clicking the dot,the player instantly starts the specific moment in the video.If needed, the user can manually select more concept labelsin the panel on the left of the video player. If the timelinebecomes too crowded as a results of multiple labels, the usermay decide to zoom in on the timeline. Besides providingfeedback on the automatically detected labels, we also allowour users to comment on the individual fragments, sharethe fragment through e-mail or Twitter, and embed the integratedvideo player, including the crowdsourcing mechanism,on different websites.
In order to find a balance between an appealing user experienceand a maximized user participation, we motivateonline users to participate by providing them with access toa selection of exclusive, full-length concert videos. The userswatch the videos without interruption and are encouragedto provide their feedback by graphical overlays that appearon top of the video, see Figure 1.The threshold to participate is deliberately kept low. Usersdo not need to sign up and can provide their feedback justby clicking buttons. With the thumbs-up button they indicatethat they agree with the automatically detected labelfor the video fragment. If they press the thumbs-down button,the user is asked to correct the label. Within a fewclicks the user can select another pre-defined label or createa new label on demand. In addition, the user are allowedto indicate whether the start or end of the fragment wasinconsistent with the label. All user feedback is stored ina database together with the users IP addresses and usersessions.
Audiovisual content exploitation JTS2010
Audiovisual content exploitation in the networked information society<br />Crowdsourcing Rock ‘n Roll Multimedia Retrieval<br />Roeland Ordelman<br />Research & Development<br />Netherlands Institute for Sound and Vision<br />email@example.com<br />
contents<br />AV content exploitation, annotation technology and user needs<br />NISV context: digitization in Images of the Future<br />Annotation technology for enabling access<br />Annotation technology and user needs<br />Example: Crowdsourcing Rock ‘n Roll Multimedia Retrieval<br />
NISV context<br /><ul><li>+700.000 hours of radio, television, documentaries, films and music, over 2 million photographs, 20.000 objects like cameras, televisions, radios, costumes and pieces of scenery
selection of (Dutch) AV content from the web</li></li></ul><li> LARGE DIGITIZATION PROGRAM<br />IMAGES of the future<br />
Images of the Future<br /><ul><li>Selection, restoration, digitization, encoding and storage of 137,000 hours of video, 20,000 hours of film, 124,000 hours of audio and more than three million photographs.
Creating social- economical value (“unlock the social and economic potential of the collections”)
Innovation: new infrastructure for strengthening knowledge economy</li></li></ul><li>INVESTMENTSBUSINESS MODELS<br />The cultural heritage sector is challenged to re-evaluate its business models<br />
Business model<br /><ul><li>The total investment of this initiative sums up to 173 million Euro
A strong business model is necessary to support this kind of investment and prove that such an investment will result in long-term socio-economic returns
The outcome of a Cost-Benefit analysis was positive: “The total balance of costs and returns of restoring, preserving and digitising audio-visual material (excluding costs of tax payments) will be between: 20+ and 60+ million.’’
conservation of culture, reinforcement of cultural awareness, reinforcement of democracy through the accessibility of information, increase in multimedia literacy and contribution to the Lisbon goals set by the EU</li></ul>http://www.prestoprime.org/project/public.en.html<br />
Content exploitation: from content is king ...<br />
linking of context data (web, program guide, production data)</li></li></ul><li>media professionals<br />journalists<br />researchers<br />educators<br />general public <br />disparity between technology and user needs<br />
User perspective<br /><ul><li>Rapidly evolving networked information society
search & interlink collections via centralized search
project goals: </li></ul>provide demonstrator portal to show how technology could help researchers<br />acquire information on specific user requirements <br />search<br />collaboration<br />linking<br />privacy<br />dedicated work space<br />http://www.verteldverleden.org<br />
Goals<br /><ul><li>exploiting community tagging (tagging games, etc)
exploring the wisdom of crowds by hooking up with user communities (e.g., everyone-as-commentator, unexpected experts)
capturing relevant information from the internet and aligning this with archived items.
finding new ways for communities to interact with the data.</li></li></ul><li>Technology perspective<br />Technology:<br /><ul><li>provide anchor points for linking up with the `cloud’ (entity detection, segmentation, cross-collection SID, etc): people, places, events, topics, quotes, etc.
synchronization of web-content/UGC with AV documents
users in the loop: UGC for adapting/training analysis tools
technology aided annotation: Documentalist Support System
provide documentalist/archivist with relevant context during manual annotation</li></li></ul><li>WEB-archiving<br />COLLECT CONTEXT DATA FROM THE WEB<br />
Web-archiving<br /><ul><li>extend Sound and Vision archive with audiovisual content from the internet
archive internet web content </li></ul>preserve broadcast related websites <br />to use as context information for audiovisual data in the Sound and Vision archive<br />
AUDIOVISUAL INTERNET CONTENT<br />iMMix<br />AV ARCHIVE<br />CONTEXT<br />CONTEXT<br />BROADCAST RELATED INTERNET CONTENT<br /> WEB-ARCHIVE<br />
Special Use Case: documentalist support<br /><ul><li>in the process of generating metadata for an archived AV item, a documentalist searches for relevant information on this item, for example on the internet
internet search might fail as such information is typically available only for a limited amount of time
the “internet archive” works as a “contextdatabase” for relevant internet context</li></li></ul><li>INTERNET CONTEXT THAT MAY “DISAPPEAR” BUT COULD BE USED AS INFORMATION FOR DESCRIBING TELEVISION BROADCASTS<br />
Crowdsourcing Rock N’ Roll Multimedia Retrieval<br />Netherlands Institute for Sound and Vision<br />University of Amsterdam – Visual Search (Cees Snoek)<br />University of Twente – Speech Recognition (Franciska de Jong)<br />VideoDock – User Interface (Bauke Freiburg)<br />
Background<br /><ul><li>40th birthday of popular annual Dutch rock festival Pinkpop
from only summary to almost unabridged recordings, even including raw, unpublished footage as well as interviews
goal: build an application for showcasing history of the festival in an attractive way using state-of-the-art technology</li></li></ul><li>Rationale<br /><ul><li>Use state-of-the-art visual analysis to allow browsing collection on the basis of visual concert concepts
Use speech recognition for browsing interviews
Exploit popularity of festival to get rock ‘n roll enthusiasts community into the loop:
Fragment level concept detection<br /><ul><li>video fragments instead of more technically defined shots or keyframes
fragment algorithm finds the longest fragments with the highest average scores for a specific concert concept
Only the top-n fragments per concert concept areloaded in the video player</li></li></ul><li>Speech Recognition<br /><ul><li>Speech transcripts generated by open-source speech recognition toolkit SHoUT developed in MultimediaN and CATCH projects
users provided feedback more than 4000 times.
We are currently investigating how this feedback can be exploited to improve automated multimedia analysis results</li></li></ul><li>Wrap up<br /><ul><li>value of archive is strongly related to access opportunities
access is to a large extend technology driven
but next to technology development we need to make a shift:
from a ‘laboratory view’ on users to drawing users and communities into the loop
NISV is aiming towards this two-way strategy: