Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Multimedia Information
Retrieval:
Bytes and pixels meet the challenges
of human media interpretation
Martha Larson
Delft U...
About me
● Where do I work?
○ TU Delft: Multimedia Computing Group
○ Radboud University: Multimedia Information Technology...
Today’s topics
● Introducing intelligent information systems
○ Multimedia information retrieval (user is active)
○ Recomme...
Intelligent Information Systems
● Connect users with information,
● Information: digital content, facts, products, service...
Information retrieval
Definition: Information retrieval (IR) is finding material
of an unstructured nature that satisfies ...
Recommender Systems
Definition: A recommender system tries to identify sets of
items that are likely to be of interest to ...
“Multimedia Clues” for the computer
scientist
● Text: Things people write about images and videos.
● User interactions: Wh...
Visual Geo-location prediction
● Combine evidence from
multiple images (e) taken
in an area (Eg
).
● Upweight elements tha...
Good match: Lots of what’s unique
Visual Geo-location prediction
Xinchao Li, Alan Hanjalic, Martha Larson. Geo-distinctive Visual Element Matching for Locat...
Conventional search engine finds “what”
Alan Hanjalic, Christoph Kofler, and Martha Larson. 2012. Intent and its disconten...
Intent-aware search responds to “why”
Alan Hanjalic, Christoph Kofler, and Martha Larson. 2012. Intent and its discontents...
User intent in video search
Our study identified five major reasons why people search
for videos online:
● Information (de...
Why are video moments important?
R. Vliegendhart, M. Larson, B. Loni and A. Hanjalic, "Exploiting the Deep-Link Commentsph...
Viewer Expressive Reactions
R. Vliegendhart, M. Larson, B. Loni and A. Hanjalic, "Exploiting the Deep-Link Commentsphere t...
The way people take a
picture reflects what they
are taking a picture of.
Pixel statistics reveal
very simple information ...
Characterize the trend...
Jacket types are
already very
difficult for
computers!
Crowdsourcing
People interpret
images in exchange
for micropayments.
Example: Amazon
Mechanical Turk
MediaEval 2016
Multimedia Benchmark Initiative
moving forward with benchmarking
MediaEval Multimedia Evaluation Benchmark
● offers tasks on multimedia access and retrieval,
● exploits features derived f...
Example MediaEval Tasks
● Predicting Media Interestingness: Infer interesting
frames and segments of movies (using audio, ...
Publications arising from MediaEval
http://www.citeulike.org/group/16499
2015 Workshop Participants
80 participants from 25 countries
multimediaeval.org
MediaEval Proceedings Papers
multimediaeval.org
What sets MediaEval apart?
• … emphasizes the "multi" in multimedia: speech, audio,
visual content, tags, users, context.
...
Predicting Media Interestingness Task
Automatically select frames or portions of movies which are
the most interesting for...
Predicting Media Interestingness Task
http://multimediaeval.org
Retrieving Diverse Social Images Task
This task addresses the problem of image search result
diversification in the contex...
Retrieving Diverse Social Images Task
(cont.)
initial retrieval results
diversified results
Initial results
Diversified
re...
Context of Multimedia Experience Task
Develops multimodal techniques for automatic prediction of
multimedia in a particula...
Context of Multimedia Experience Task
Different context can lead to
different preferences...
...people like to watch diffe...
Multimodal Person Discovery in
Broadcast TV Task
● Goal: Given raw TV broadcasts, each shot must be
automatically tagged w...
Person Discovery Task
Person names must be discovered in speech track and/or
sub-titles. Models cannot be trained on exter...
Tackling the Person Discovery Task
Slide credit: Johann Poignant, Hervé Bredin, Claude Barras, Person Discovery Task Organ...
Wrap Up
● We want to connect users with information,in order to
satisfy information needs.
● CS Love: Lots of data!
● CS H...
Beyond the user-item matrix
CrowdRec project
● Exploiting multiple sources of information,
● Leveraging the Crowd (crowdwo...
Turn from personalization
• Context has been taken into account by coupling it with
personalization, with context-aware re...
Music I usually like heavy metal music, but now I
have to work and I want to listen to some
soft music
Recommended for
you...
Jaeyoung Choi, Eungchan Kim, Martha Larson, Gerald Friedland, and Alan Hanjalic. 2015. Evento 360: Social Event
Discovery ...
Thank you
Mohammad Soleymani, Guillaume Gravier, Bogdan Ionescu, Gareth Jones,
Claire-Helene Demarty, Ngoc Duong, Frédéric...
Links
● Challenges (Benchmarks)
○ MediaEval Multimedia Evaluation
(http://multimediaeval.org),
○ CLEF NewsREEL News Recomm...
Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation
Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation
Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation
Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation
Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation
Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation
Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation
Upcoming SlideShare
Loading in …5
×

Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

1,394 views

Published on

Within computer science, "Multimedia" is a field of research that investigates how computers can support people in communication, information finding, and knowledge/opinion building. Multimedia content is defined broadly. It includes not only video, but also images accompanied by text and other information (for example, a geo-location). It can be professionally produced, or generated by users for online sharing. Computer scientists historically have a “love-hate” relationship with multimedia. They “love” it because of the richness of the data sources and the wealth of available data, which leads to interesting problems to tackle with machine learning. They “hate” it because multimedia is a diffuse and moving target: the interpretation of multimedia differs from person to person, and changes over time in the course of its use as a communication medium. This talk gives a view onto ongoing research in the area of multimedia information retrieval algorithms, which help people find multimedia. We look at a series of topics that reveal how pattern recognition, text processing, and crowdsourcing tools are used in multimedia research, and discuss both their limitations and their potential.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

  1. 1. Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation Martha Larson Delft University of Technology and Radboud University Nijmegen 29 June 2016, Communication Science, Radboud University Nijmegen
  2. 2. About me ● Where do I work? ○ TU Delft: Multimedia Computing Group ○ Radboud University: Multimedia Information Technology ● What do I do? ○ Background: Speech and language, ○ Research: Multimedia retrieval and recommender systems, ○ Emphasis: How people interpret and use multimedia. ● What am I doing today? ○ Sharing with you potential and open issues.
  3. 3. Today’s topics ● Introducing intelligent information systems ○ Multimedia information retrieval (user is active) ○ Recommender systems (user is passive) ● Computer Science and Multimedia ○ The “love” relationship: lots of data ○ The “hate” relationship: people’s interpretation of media is not “neat”! ● How to move forward? ○ Benchmarking challenges
  4. 4. Intelligent Information Systems ● Connect users with information, ● Information: digital content, facts, products, services, ● Include search engines and recommender systems, ● Success is judged by satisfaction of user needs.
  5. 5. Information retrieval Definition: Information retrieval (IR) is finding material of an unstructured nature that satisfies an information need from within large collections. http://nlp.stanford.edu/IR-book/html/htmledition/irbook.html
  6. 6. Recommender Systems Definition: A recommender system tries to identify sets of items that are likely to be of interest to a certain user given some information from that user’s profile.
  7. 7. “Multimedia Clues” for the computer scientist ● Text: Things people write about images and videos. ● User interactions: What people click on, how long they watch. ● Pixel statistics: Colors, lines, textures, shot change patterns. ● Concept detection: Entities that can be detected in images and videos (faces can be detected well). ● Speech recognition: What is said in a video. ● Sound detection: Sounds that can be detected (laughter and gunshots can be detected well).
  8. 8. Visual Geo-location prediction ● Combine evidence from multiple images (e) taken in an area (Eg ). ● Upweight elements that are distinctive for that particular area (WGeo ). Xinchao Li, Alan Hanjalic, Martha Larson. Geo-distinctive Visual Element Matching for Location Estimation of Images, Under review. http://arxiv.org/pdf/1601.07884v1.pdf
  9. 9. Good match: Lots of what’s unique
  10. 10. Visual Geo-location prediction Xinchao Li, Alan Hanjalic, Martha Larson. Geo-distinctive Visual Element Matching for Location Estimation of Images, Under review. http://arxiv.org/pdf/1601.07884v1.pdf
  11. 11. Conventional search engine finds “what” Alan Hanjalic, Christoph Kofler, and Martha Larson. 2012. Intent and its discontents: the user at the wheel of the online video search engine. In Proceedings of the 20th ACM international conference on Multimedia (MM '12). ACM, New York, NY, USA, 1239-1248. I want a song called “koi pond”.I’m interested in garden koi ponds.
  12. 12. Intent-aware search responds to “why” Alan Hanjalic, Christoph Kofler, and Martha Larson. 2012. Intent and its discontents: the user at the wheel of the online video search engine. In Proceedings of the 20th ACM international conference on Multimedia (MM '12). ACM, New York, NY, USA, 1239-1248. I am interested in the significance of koi ponds. I want to build a koi pond.
  13. 13. User intent in video search Our study identified five major reasons why people search for videos online: ● Information (declarative knowledge) ● Experience for Learning (performative knowledge) ● Experience for Exposure (“being there”) ● Affect (change of mood) ● Object (video as video) Alan Hanjalic, Christoph Kofler, and Martha Larson. 2012. Intent and its discontents: the user at the wheel of the online video search engine. In Proceedings of the 20th ACM international conference on Multimedia (MM '12). ACM, New York, NY, USA, 1239-1248.
  14. 14. Why are video moments important? R. Vliegendhart, M. Larson, B. Loni and A. Hanjalic, "Exploiting the Deep-Link Commentsphere to Support Non-Linear Video Access," in IEEE Transactions on Multimedia, vol. 17, no. 8, pp. 1372-1384, Aug. 2015.
  15. 15. Viewer Expressive Reactions R. Vliegendhart, M. Larson, B. Loni and A. Hanjalic, "Exploiting the Deep-Link Commentsphere to Support Non-Linear Video Access," in IEEE Transactions on Multimedia, vol. 17, no. 8, pp. 1372-1384, Aug. 2015. Expressive reactions are not emotional in the classic sense. They are also not completely personal...but..
  16. 16. The way people take a picture reflects what they are taking a picture of. Pixel statistics reveal very simple information on how people take pictures. We need people to judge if the computer guesses right. Michael Riegler, Martha Larson, Mathias Lux, and Christoph Kofler. 2014. How 'How' Reflects What's What: Content-based Exploitation of How Users Frame Social Images. In Proceedings of the 22nd ACM international conference on Multimedia (MM '14). Fashion and framing
  17. 17. Characterize the trend...
  18. 18. Jacket types are already very difficult for computers!
  19. 19. Crowdsourcing People interpret images in exchange for micropayments. Example: Amazon Mechanical Turk
  20. 20. MediaEval 2016 Multimedia Benchmark Initiative moving forward with benchmarking
  21. 21. MediaEval Multimedia Evaluation Benchmark ● offers tasks on multimedia access and retrieval, ● exploits features derived from multiple modalities: speech, audio, visual content, tags, users, context, ● solutions may or may not involve machine learning. multimediaeval.org This year: MediaEval workshop is right after ACM Multimedia 2016 in Amsterdam
  22. 22. Example MediaEval Tasks ● Predicting Media Interestingness: Infer interesting frames and segments of movies (using audio, visual features, text). ● Retrieving Diverse Social Images: Diversify image results lists (text, visual features). ● Context of Multimedia Experience: Predict multimedia content suitable for watching in stressful situations. ● Person Discovery: finding people in broadcast content. ● Placing: geo-location estimation for social multimedia. multimediaeval.org
  23. 23. Publications arising from MediaEval http://www.citeulike.org/group/16499
  24. 24. 2015 Workshop Participants 80 participants from 25 countries multimediaeval.org
  25. 25. MediaEval Proceedings Papers multimediaeval.org
  26. 26. What sets MediaEval apart? • … emphasizes the "multi" in multimedia: speech, audio, visual content, tags, users, context. • … innovates new tasks and techniques focusing on the human and social aspects of multimedia content. • … community driven. multimediaeval.org
  27. 27. Predicting Media Interestingness Task Automatically select frames or portions of movies which are the most interesting for a common viewer. ● Goal: Make use of the visual, audio and text content (features provided). ● Data: consists in ca 100 movie trailers, together with human annotations ● Metric: System performance is to be evaluated using standard Mean Average Precision.
  28. 28. Predicting Media Interestingness Task http://multimediaeval.org
  29. 29. Retrieving Diverse Social Images Task This task addresses the problem of image search result diversification in the context of social media: ● Goal: refine a ranked list of Flickr photos retrieved with general purpose multi-topic queries using provided visual, textual and user tagging credibility information. ● Metrics: results are evaluated with respect to their relevance to the query and the diverse representation of it. ● Data: ~40k images, social metadata, text models, CNN descriptors, user tagging credibility dataset, etc Three data sets have been published at the MMSys dataset track.
  30. 30. Retrieving Diverse Social Images Task (cont.) initial retrieval results diversified results Initial results Diversified results
  31. 31. Context of Multimedia Experience Task Develops multimodal techniques for automatic prediction of multimedia in a particular consumption content. ● Goal: Predict movies that are suitable to watch on airplanes. ● Data: Input to the prediction methods is movie trailers, and metadata from IMDb, Rotten Tomatoes and Metacritic. ● Metric: Output is evaluated using the Weighted F1 score, with expert labels as ground truth. This year: Task is offered at the MediaEval workshop and at a joint-challenge workshop at http://www.icpr2016.org
  32. 32. Context of Multimedia Experience Task Different context can lead to different preferences... ...people like to watch different movies than they would at home or in the cinema.
  33. 33. Multimodal Person Discovery in Broadcast TV Task ● Goal: Given raw TV broadcasts, each shot must be automatically tagged with the name(s) of people who can be both seen as well as heard in the shot. ● The list of people is not known a priori and their names must be discovered in an unsupervised way from provided text overlay or speech transcripts. ● Data: Multilingual corpus from INA (French), DW (German & English) and UPC (Catalan) ● Metric: standard information retrieval metrics based on a posteriori collaborative annotation of the corpus by the participants themselves.
  34. 34. Person Discovery Task Person names must be discovered in speech track and/or sub-titles. Models cannot be trained on external data. Slide credit: Johann Poignant, Hervé Bredin, Claude Barras, Person Discovery Task Organizers MediaEval 2015
  35. 35. Tackling the Person Discovery Task Slide credit: Johann Poignant, Hervé Bredin, Claude Barras, Person Discovery Task Organizers MediaEval 2015
  36. 36. Wrap Up ● We want to connect users with information,in order to satisfy information needs. ● CS Love: Lots of data! ● CS Hate: How do people really see multimedia, what do they want? ● Way forward: Continue to define new challenges and build algorithms to address them.
  37. 37. Beyond the user-item matrix CrowdRec project ● Exploiting multiple sources of information, ● Leveraging the Crowd (crowdworkers, users, curators), ● Evaluating large scale. Context-driven Recommender systems: “People have more in common with other people in the same situation than they do with past versions of themselves” Roberto Pagano, Paolo Cremonesi, Martha Larson, Balazs Hidasi, Domonkos Tikk, Alexandros Karatzoglou, and Massimo Quadrana The Contextual Turn: from Context-aware to Context-driven recommender systems. ACM RecSys 2016, to appear.
  38. 38. Turn from personalization • Context has been taken into account by coupling it with personalization, with context-aware recommender systems • However being aware of the context is not enough for some domains: recommendations should be driven by the context In traditional recsys, Immutable Preference paradigm (ImP): • User tastes do not evolve • Goals and needs are static • Item catalog is static • Trendiness, Seasonality, Capacity and life-cycle addresses by tweaks to existing models Slide credit: Roberto Pagano Slide credit: Roberto Pagano
  39. 39. Music I usually like heavy metal music, but now I have to work and I want to listen to some soft music Recommended for you: Slide credit: Roberto Pagano
  40. 40. Jaeyoung Choi, Eungchan Kim, Martha Larson, Gerald Friedland, and Alan Hanjalic. 2015. Evento 360: Social Event Discovery from Web-scale Multimedia Collection. ACM Multimedia 2015, pp. 193-196.
  41. 41. Thank you Mohammad Soleymani, Guillaume Gravier, Bogdan Ionescu, Gareth Jones, Claire-Helene Demarty, Ngoc Duong, Frédéric Lefebvre, Yu-Gang Jiang, Bogdan Ionescu, Mats Sjöberg, Hanli Wang,, Toan Do, Richard Sutcliffe, Chris Fox, Richard Lewis, Tom Collins, Eduard Hovy, Deane L. Root, Igor Szoke, Xavier Anguera, Claude Barras, Hervé Bredin, Camille Guinaudeau, Jean Carrive, Yannick Estève, Javier Hernando, Juliette Kahn, Nam Le, Sylvain Meignier , Ramon Morros, Johann Poignant, Satoshi Tamura, Bart Thomee, Olivier Van Laere, Claudia Hauff , Jaeyoung Choi, Emmanuel Dellandréa, Liming Chen, Yoann Baveye, Mats Sjöberg, Christina Boididou, Symeon Papadopoulos, Stuart E. Middleton, Michael Riegler, Duc Tien, Dang Nguyen, Giulia Boato, Andreas Petlund, Michael Riegler, Concetto Spampinato, Bogdan Ionescu, Alexandru Lucian Gînscă, Maia Zaharieva, Mihai Lupu, Henning Müller, Adrian Popescu, Bogdan Boteanu, Alan Woodley, Shlomo Geva, Timothy Chappell, Richi Nayak, Gabi Constantin, Roberto Pagano, Paolo Cremonesi, Martha Larson, Balazs Hidasi, Domonkos Tikk, Alexandros Karatzoglou, Massimo Quadrana, Xinchao Li, Alan Hanjalic, Andreas Lommatzsch, Benjamin Kille, Fabian Abel, Daniel Kohlsdorf, Jonas Seiler, Róbert Pálovics, Andras Benczur...
  42. 42. Links ● Challenges (Benchmarks) ○ MediaEval Multimedia Evaluation (http://multimediaeval.org), ○ CLEF NewsREEL News Recommendation challenge (http://www.clef-newsreel.org), ○ ACM RecSys 2016 Job Recommendation challenge (http://2016.recsyschallenge.com). ● Acknowledgements ○ Multimedia Commons (http://www.multimediacommons.org), ○ EC-funded CrowdRec project (http://crowdrec.eu).

×