Part I – Concepts, challenges, and state of the art
Part II – Medical image retrieval
Part III – Mobile visual search
Part IV – Where is image search headed?
Visual Information Retrieval: Advances, Challenges and OpportunitiesOge Marques
Visual Information Retrieval: Advances, Challenges and Opportunities discusses advances and challenges in visual information retrieval. Key points include:
- Visual information retrieval aims to find relevant images/videos based on visual and text queries, addressing the "semantic gap" between low-level features and high-level meanings.
- Advances include improved text-based, content-based, and mixed search methods, as well applications in medical image retrieval and mobile visual search.
- Ongoing challenges include capturing image similarity, addressing various representation gaps, understanding user intentions, and developing broad domain solutions.
This document discusses advances in image search and retrieval. It begins with an overview of visual information retrieval and its challenges, including the semantic gap between low-level visual features and high-level semantics. It then covers recent techniques like Google image search and similarity search. The document outlines core concepts like capturing similarity, large datasets, and user needs. It also revisits a 2000 paper on the challenges still facing the field, including the unsolved semantic gap and need for standardized evaluation benchmarks.
Image Processing and Computer Vision in iOSOge Marques
- Image processing and computer vision applications are becoming more common on mobile devices like the iPhone and iPad. There are many opportunities to build successful apps that can improve how users work with images and videos.
- The talk provided an overview of developing image and computer vision apps for iOS, including recommended tools like Core Image and OpenCV. It also offered advice on focusing an app idea on solving a specific problem and being aware of competition and market timing.
- Mobile image processing and computer vision have a promising future, and there is a need for good solutions to specific problems in this area that developers can work on building.
Oge Marques (FAU) - invited talk at WISMA 2010 (Barcelona, May 2010)Oge Marques
- Image search and retrieval remains a challenging problem with many open issues even 10 years after it was deemed to be past its early years.
- While progress has been made in areas like datasets, benchmarks and interfaces, core problems around similarity, semantics, and bridging the semantic gap between low-level visual features and high-level concepts remain largely unsolved.
- Narrowing domains and combining content-based techniques with metadata and user involvement through tagging and feedback may provide more successful solutions going forward.
Recent advances in visual information retrieval marques klu june 2010Oge Marques
The document summarizes key points from a 2010 presentation on visual information retrieval (VIR). It revisits conclusions from a 2000 paper on challenges facing content-based image retrieval (CBIR). While some predictions were accurate, like increased data sizes and interaction options, others were not, like solving image understanding. Significant progress was made on benchmarks and datasets but less on similarity metrics. Medical image retrieval poses new challenges to understand but offers opportunities if VIR methods can adapt to new domains.
(1) Portable scanners, scanning pens, smartphones, and tablets allow students quick access to electronic documents and provide flexibility. They are useful for students with physical disabilities or learning disabilities.
(2) Files can be converted to text for use with Kurzweil 3000 software, which reads documents aloud. The KESI Virtual Printer or OCR software can convert scanned images or other files into text files compatible with Kurzweil 3000.
(3) These scanning and conversion tools can benefit students with physical, communication, or learning disabilities by providing accessible electronic texts and notes. They allow independence and flexibility, but schools must consider the costs of devices and software.
Multimedia Information Retrieval: What is it, and why isn't ...webhostingguy
The document discusses opportunities and challenges in video search. It begins with an introduction to video search and outlines key market trends driving growth in online video. It then explores opportunities in leveraging metadata, community contributions, and large datasets. However, it also notes challenges including developing theoretical frameworks for video search and addressing the complexity of video content analysis.
This document provides an overview of the first class in an Information Architecture course. The class covers introductions and an overview of the course schedule and assignments. It also begins to define what information architecture is, noting there is no single agreed upon definition but providing some examples. The instructor introduces himself and his background, as well as the goals and philosophy of the course.
Visual Information Retrieval: Advances, Challenges and OpportunitiesOge Marques
Visual Information Retrieval: Advances, Challenges and Opportunities discusses advances and challenges in visual information retrieval. Key points include:
- Visual information retrieval aims to find relevant images/videos based on visual and text queries, addressing the "semantic gap" between low-level features and high-level meanings.
- Advances include improved text-based, content-based, and mixed search methods, as well applications in medical image retrieval and mobile visual search.
- Ongoing challenges include capturing image similarity, addressing various representation gaps, understanding user intentions, and developing broad domain solutions.
This document discusses advances in image search and retrieval. It begins with an overview of visual information retrieval and its challenges, including the semantic gap between low-level visual features and high-level semantics. It then covers recent techniques like Google image search and similarity search. The document outlines core concepts like capturing similarity, large datasets, and user needs. It also revisits a 2000 paper on the challenges still facing the field, including the unsolved semantic gap and need for standardized evaluation benchmarks.
Image Processing and Computer Vision in iOSOge Marques
- Image processing and computer vision applications are becoming more common on mobile devices like the iPhone and iPad. There are many opportunities to build successful apps that can improve how users work with images and videos.
- The talk provided an overview of developing image and computer vision apps for iOS, including recommended tools like Core Image and OpenCV. It also offered advice on focusing an app idea on solving a specific problem and being aware of competition and market timing.
- Mobile image processing and computer vision have a promising future, and there is a need for good solutions to specific problems in this area that developers can work on building.
Oge Marques (FAU) - invited talk at WISMA 2010 (Barcelona, May 2010)Oge Marques
- Image search and retrieval remains a challenging problem with many open issues even 10 years after it was deemed to be past its early years.
- While progress has been made in areas like datasets, benchmarks and interfaces, core problems around similarity, semantics, and bridging the semantic gap between low-level visual features and high-level concepts remain largely unsolved.
- Narrowing domains and combining content-based techniques with metadata and user involvement through tagging and feedback may provide more successful solutions going forward.
Recent advances in visual information retrieval marques klu june 2010Oge Marques
The document summarizes key points from a 2010 presentation on visual information retrieval (VIR). It revisits conclusions from a 2000 paper on challenges facing content-based image retrieval (CBIR). While some predictions were accurate, like increased data sizes and interaction options, others were not, like solving image understanding. Significant progress was made on benchmarks and datasets but less on similarity metrics. Medical image retrieval poses new challenges to understand but offers opportunities if VIR methods can adapt to new domains.
(1) Portable scanners, scanning pens, smartphones, and tablets allow students quick access to electronic documents and provide flexibility. They are useful for students with physical disabilities or learning disabilities.
(2) Files can be converted to text for use with Kurzweil 3000 software, which reads documents aloud. The KESI Virtual Printer or OCR software can convert scanned images or other files into text files compatible with Kurzweil 3000.
(3) These scanning and conversion tools can benefit students with physical, communication, or learning disabilities by providing accessible electronic texts and notes. They allow independence and flexibility, but schools must consider the costs of devices and software.
Multimedia Information Retrieval: What is it, and why isn't ...webhostingguy
The document discusses opportunities and challenges in video search. It begins with an introduction to video search and outlines key market trends driving growth in online video. It then explores opportunities in leveraging metadata, community contributions, and large datasets. However, it also notes challenges including developing theoretical frameworks for video search and addressing the complexity of video content analysis.
This document provides an overview of the first class in an Information Architecture course. The class covers introductions and an overview of the course schedule and assignments. It also begins to define what information architecture is, noting there is no single agreed upon definition but providing some examples. The instructor introduces himself and his background, as well as the goals and philosophy of the course.
Volumetric medical images contain an enormous amount of visual information that can discourage the exhaustive use of local descriptors for image analysis, comparison and retrieval. Distinctive features and patterns that need to be analyzed for nding diseases are most often local or regional, often in only very small parts of the image. Separating the large amount of image data that might contain little important information is an important task as it could reduce the current information overload of physicians and make clinical work more ecient. In this paper a novel method for detecting key-regions is introduced as a way of extending the concept of keypoints often used in 2D image analysis. In this way also computation is reduced as important visual features are only
extracted from the detected key regions.
The region detection method is integrated into a platform{independent, web-based graphical interface for medical image visualization and retrieval in three dimensions. This web- based interface makes it easy to deploy on existing infrastructures in both small and large-scale clinical environments.
By including the region detection method into the interface, manual annotation is reduced and time is saved, making it possible to integrate the presented interface and methods into clinical routine and work ows, analyzing image data at a large scale.
- The document discusses radiology workflow and Picture Archiving and Communication System (PACS) storage levels.
- There are three main PACS storage levels: Short Term Storage (STS) for online access, Near Line Storage/Archive (NLS/A) for reasonable retrieval speeds, and Long Term Storage (LTS) which requires user intervention for offline access.
- The sizes of STS and NLS are calculated based on factors like the number of exams, growth rate, and need to cover exams for a certain period. LTS can be both online and offline using technologies like tape libraries and optical discs.
Presentation by Prof. Dr. Henning Müller.
Overview:
- Medical image retrieval projects
- Image analysis and 3D texture modeling
- Data science evaluation infrastructures (ImageCLEF, VISCERAL, EaaS – Evaluation as a Service)
- What comes next?
This document summarizes key aspects of fossils and the geologic column according to a young-earth creationist perspective. It describes how fossils are formed, different types of fossils, issues with dating methods, and anomalies in the fossil record that are puzzling from an evolutionary viewpoint but align with a global flood model. The document casts doubt on assumptions of deep time and presents alternative explanations for the fossil record based on a literal interpretation of Genesis.
Presented by Adrien Depeursinge, PhD, at MICCAI 2015 Tutorial on Biomedical Texture Analysis (BTA), Munich, Oct 5 2015.
Texture-based imaging biomarkers complement focal, invasive biopsy based biomarkers by providing information on tissue structure over broad regions, non-invasively, and repeatedly across multiple time points. Texture has been used to predict patient survival, tissue function, disease subtypes and genomics (imagenomics and radiogenomics). Nevertheless, several challenges remain, such as: the lack of an appropriate framework for multi-scale, multi-spectral analysis in 2D and 3D; localization uncertainty of texture operators; validation; and, translation to routine clinical applications.
This document summarizes a presentation on the OpenNLP toolkit. OpenNLP is an open-source Java toolkit for natural language processing. It provides common NLP features like tokenization, sentence segmentation, part-of-speech tagging, and named entity extraction. The presentation discusses how these features work using pre-trained models for different languages. An example is also given showing how OpenNLP could be used to extract tags from a website and display them in a tag cloud. The presentation concludes by providing contact information for the presenter.
Latent semantic analysis (LSA) is a technique used in natural language processing to analyze relationships between documents and terms by producing concepts related to them. LSA assumes words with similar meanings will occur in similar texts, and uses a documents-terms matrix and singular value decomposition to discover hidden concepts and represent words and documents as vectors in a semantic vector space. Apache OpenNLP is a machine learning toolkit that can be used for various natural language processing tasks like part-of-speech tagging and parsing, and LSA can be seen as part of natural language processing.
There are three main types of fossils: preserved organisms, mineral replacements, and impression fossils. Preserved organism fossils occur when the soft body parts of an animal are frozen in time with minimal decay. Mineral replacement fossils form when the hard parts of an animal decay and are replaced by minerals over time, eventually becoming stone. Impression fossils show detailed outlines or carbon deposits left behind when thin plants or small animals die and decay in sediment.
The document describes October monthly specials with reduced pricing on various products like market bags, totes, coolers, and more made from materials like recycled cotton, organic cotton, and polyester. Key details include product colors, sizes, original prices, and new sale prices valid through October 31, 2011 while supplies last. Additional fees may apply.
La Unión Europea ha propuesto un nuevo paquete de sanciones contra Rusia que incluye un embargo al petróleo ruso. El embargo se aplicaría gradualmente durante seis meses para el petróleo crudo y ocho meses para los productos refinados. Este paquete de sanciones requiere la aprobación unánime de los 27 estados miembros de la UE.
Murals are wall paintings created by artist Tanja van Achterberg. She specializes in painting murals on walls in homes and businesses. Her murals bring color and visual interest to bare walls through artistic depictions of nature, abstract designs, and other themes.
The document discusses the results of a study on the effects of exercise on memory and thinking abilities in older adults. The study found that regular exercise led to improvements in memory performance and helped reduce declines in thinking abilities that often occur with age. Exercising for just 30 minutes three times a week was enough to produce these cognitive benefits in adults aged 60-79.
The document discusses the results of a study on the impact of COVID-19 lockdowns on air pollution. The study found that lockdowns led to short-term reductions in nitrogen dioxide and fine particulate matter concentrations globally. However, the decreases in air pollution were temporary and not sufficient to significantly improve air quality or public health in the long run without systemic changes to reduce fossil fuel use and other polluting activities.
This document summarizes and evaluates various news filtering applications on the web. It provides a table listing common news filtering applications and their URLs. It then analyzes the features and drawbacks of each application, noting that while many rely on community recommendations to surface relevant stories, the systems themselves are limited in their ability to do so implicitly without direct user feedback. The document concludes with suggestions for possible enhancements, such as free previews, tutorials, related content feeds, statistics tracking, and improved personalized recommendation capabilities.
Email and Social Media Marketing Synergies - Responsys Leadersihp ForumEdmund Wong
This is a presentation I gave at Responsys Leadership Forum on May 14, 2009 in SF. It is a brief discussion of ways to integrate email marketing and social media. Developed in conjunction with Alisa Hansen and Mark Beekman.
The document summarizes renewable energy developments in India. It states that the central government has asked all states to develop solar energy policies to help meet national targets. It also mentions that India's total renewable grid capacity has increased over three times between April 2014 and February 2015. Key targets mentioned include installing 100 GW of solar and 60 GW of wind power by 2022.
This document is an email from Dan Reifsnyder at the State Department to Phil Cooney at the White House Council on Environmental Quality. It forwards a link to an article from the Competitive Enterprise Institute announcing that the institute has filed a petition to prevent distribution of a flawed White House climate report. The full text of the forwarded article is included.
The document discusses starting a shift change at a facility. Staff are instructed to begin transitioning duties to the next shift team and exchanging important updates. Standard safety protocols should be followed during the handoff process to ensure smooth transfer of responsibilities between shifts.
R&B artists such as Rihanna, Amy Winehouse, Beyoncé, Norah Jones, Lauryn Hill, and Alicia Keys are listed along with some of their popular albums from the 2000s and early 2010s. The document encourages the reader to listen to music from these artists and keep an open mind to different genres, stating that while your eyes can be closed, your ears cannot. It suggests being open to music as a way to find friendship.
This document discusses several issues related to the Intergovernmental Panel on Climate Change (IPCC) negotiations on the Third Assessment Report:
1. It outlines recommendations for restructuring the US attendance at upcoming IPCC meetings to replace representatives from the Clinton/Gore administration with scientists skeptical of climate change risks.
2. It raises issues with deferring portions of the Third Assessment Report to allow more input from the new Bush Administration on the reports and conclusions.
3. It discusses disagreements between climate model projections of increased tropospheric warming and satellite temperature data showing less warming than surface temperatures, calling the ability to correctly model tropospheric temperature changes critically important.
This document discusses research at the intersection of human and computer vision, with a focus on objects in context. It provides background on visual perception and challenges in object and scene recognition. Context is important for human vision but difficult for computers. Representative work by Renninger and Malik shows that early scene identification can be explained by a simple texture model, demonstrating the value of interdisciplinary research between human and computer vision. The document concludes by discussing the author's experiences with interdisciplinary collaboration between psychology and computer science.
Mapping the use of digital sources amongst Humanities scholars in the Netherl...MaxKemman
1) The document reports on a survey of 294 Dutch and Belgian academics regarding their use of digital sources and databases.
2) It finds that text is the most commonly used digital medium, and Google is the dominant search tool and platform. Younger academics are more confident in using audiovisual search tools.
3) Disciplines like history and literature most commonly use images and digitized objects, while fields like social studies and linguistics make more use of video, audio, and statistical data.
4) The study has implications for how to increase awareness, appeal and adoption of digital humanities approaches through user-focused design and inclusion in education.
Volumetric medical images contain an enormous amount of visual information that can discourage the exhaustive use of local descriptors for image analysis, comparison and retrieval. Distinctive features and patterns that need to be analyzed for nding diseases are most often local or regional, often in only very small parts of the image. Separating the large amount of image data that might contain little important information is an important task as it could reduce the current information overload of physicians and make clinical work more ecient. In this paper a novel method for detecting key-regions is introduced as a way of extending the concept of keypoints often used in 2D image analysis. In this way also computation is reduced as important visual features are only
extracted from the detected key regions.
The region detection method is integrated into a platform{independent, web-based graphical interface for medical image visualization and retrieval in three dimensions. This web- based interface makes it easy to deploy on existing infrastructures in both small and large-scale clinical environments.
By including the region detection method into the interface, manual annotation is reduced and time is saved, making it possible to integrate the presented interface and methods into clinical routine and work ows, analyzing image data at a large scale.
- The document discusses radiology workflow and Picture Archiving and Communication System (PACS) storage levels.
- There are three main PACS storage levels: Short Term Storage (STS) for online access, Near Line Storage/Archive (NLS/A) for reasonable retrieval speeds, and Long Term Storage (LTS) which requires user intervention for offline access.
- The sizes of STS and NLS are calculated based on factors like the number of exams, growth rate, and need to cover exams for a certain period. LTS can be both online and offline using technologies like tape libraries and optical discs.
Presentation by Prof. Dr. Henning Müller.
Overview:
- Medical image retrieval projects
- Image analysis and 3D texture modeling
- Data science evaluation infrastructures (ImageCLEF, VISCERAL, EaaS – Evaluation as a Service)
- What comes next?
This document summarizes key aspects of fossils and the geologic column according to a young-earth creationist perspective. It describes how fossils are formed, different types of fossils, issues with dating methods, and anomalies in the fossil record that are puzzling from an evolutionary viewpoint but align with a global flood model. The document casts doubt on assumptions of deep time and presents alternative explanations for the fossil record based on a literal interpretation of Genesis.
Presented by Adrien Depeursinge, PhD, at MICCAI 2015 Tutorial on Biomedical Texture Analysis (BTA), Munich, Oct 5 2015.
Texture-based imaging biomarkers complement focal, invasive biopsy based biomarkers by providing information on tissue structure over broad regions, non-invasively, and repeatedly across multiple time points. Texture has been used to predict patient survival, tissue function, disease subtypes and genomics (imagenomics and radiogenomics). Nevertheless, several challenges remain, such as: the lack of an appropriate framework for multi-scale, multi-spectral analysis in 2D and 3D; localization uncertainty of texture operators; validation; and, translation to routine clinical applications.
This document summarizes a presentation on the OpenNLP toolkit. OpenNLP is an open-source Java toolkit for natural language processing. It provides common NLP features like tokenization, sentence segmentation, part-of-speech tagging, and named entity extraction. The presentation discusses how these features work using pre-trained models for different languages. An example is also given showing how OpenNLP could be used to extract tags from a website and display them in a tag cloud. The presentation concludes by providing contact information for the presenter.
Latent semantic analysis (LSA) is a technique used in natural language processing to analyze relationships between documents and terms by producing concepts related to them. LSA assumes words with similar meanings will occur in similar texts, and uses a documents-terms matrix and singular value decomposition to discover hidden concepts and represent words and documents as vectors in a semantic vector space. Apache OpenNLP is a machine learning toolkit that can be used for various natural language processing tasks like part-of-speech tagging and parsing, and LSA can be seen as part of natural language processing.
There are three main types of fossils: preserved organisms, mineral replacements, and impression fossils. Preserved organism fossils occur when the soft body parts of an animal are frozen in time with minimal decay. Mineral replacement fossils form when the hard parts of an animal decay and are replaced by minerals over time, eventually becoming stone. Impression fossils show detailed outlines or carbon deposits left behind when thin plants or small animals die and decay in sediment.
The document describes October monthly specials with reduced pricing on various products like market bags, totes, coolers, and more made from materials like recycled cotton, organic cotton, and polyester. Key details include product colors, sizes, original prices, and new sale prices valid through October 31, 2011 while supplies last. Additional fees may apply.
La Unión Europea ha propuesto un nuevo paquete de sanciones contra Rusia que incluye un embargo al petróleo ruso. El embargo se aplicaría gradualmente durante seis meses para el petróleo crudo y ocho meses para los productos refinados. Este paquete de sanciones requiere la aprobación unánime de los 27 estados miembros de la UE.
Murals are wall paintings created by artist Tanja van Achterberg. She specializes in painting murals on walls in homes and businesses. Her murals bring color and visual interest to bare walls through artistic depictions of nature, abstract designs, and other themes.
The document discusses the results of a study on the effects of exercise on memory and thinking abilities in older adults. The study found that regular exercise led to improvements in memory performance and helped reduce declines in thinking abilities that often occur with age. Exercising for just 30 minutes three times a week was enough to produce these cognitive benefits in adults aged 60-79.
The document discusses the results of a study on the impact of COVID-19 lockdowns on air pollution. The study found that lockdowns led to short-term reductions in nitrogen dioxide and fine particulate matter concentrations globally. However, the decreases in air pollution were temporary and not sufficient to significantly improve air quality or public health in the long run without systemic changes to reduce fossil fuel use and other polluting activities.
This document summarizes and evaluates various news filtering applications on the web. It provides a table listing common news filtering applications and their URLs. It then analyzes the features and drawbacks of each application, noting that while many rely on community recommendations to surface relevant stories, the systems themselves are limited in their ability to do so implicitly without direct user feedback. The document concludes with suggestions for possible enhancements, such as free previews, tutorials, related content feeds, statistics tracking, and improved personalized recommendation capabilities.
Email and Social Media Marketing Synergies - Responsys Leadersihp ForumEdmund Wong
This is a presentation I gave at Responsys Leadership Forum on May 14, 2009 in SF. It is a brief discussion of ways to integrate email marketing and social media. Developed in conjunction with Alisa Hansen and Mark Beekman.
The document summarizes renewable energy developments in India. It states that the central government has asked all states to develop solar energy policies to help meet national targets. It also mentions that India's total renewable grid capacity has increased over three times between April 2014 and February 2015. Key targets mentioned include installing 100 GW of solar and 60 GW of wind power by 2022.
This document is an email from Dan Reifsnyder at the State Department to Phil Cooney at the White House Council on Environmental Quality. It forwards a link to an article from the Competitive Enterprise Institute announcing that the institute has filed a petition to prevent distribution of a flawed White House climate report. The full text of the forwarded article is included.
The document discusses starting a shift change at a facility. Staff are instructed to begin transitioning duties to the next shift team and exchanging important updates. Standard safety protocols should be followed during the handoff process to ensure smooth transfer of responsibilities between shifts.
R&B artists such as Rihanna, Amy Winehouse, Beyoncé, Norah Jones, Lauryn Hill, and Alicia Keys are listed along with some of their popular albums from the 2000s and early 2010s. The document encourages the reader to listen to music from these artists and keep an open mind to different genres, stating that while your eyes can be closed, your ears cannot. It suggests being open to music as a way to find friendship.
This document discusses several issues related to the Intergovernmental Panel on Climate Change (IPCC) negotiations on the Third Assessment Report:
1. It outlines recommendations for restructuring the US attendance at upcoming IPCC meetings to replace representatives from the Clinton/Gore administration with scientists skeptical of climate change risks.
2. It raises issues with deferring portions of the Third Assessment Report to allow more input from the new Bush Administration on the reports and conclusions.
3. It discusses disagreements between climate model projections of increased tropospheric warming and satellite temperature data showing less warming than surface temperatures, calling the ability to correctly model tropospheric temperature changes critically important.
This document discusses research at the intersection of human and computer vision, with a focus on objects in context. It provides background on visual perception and challenges in object and scene recognition. Context is important for human vision but difficult for computers. Representative work by Renninger and Malik shows that early scene identification can be explained by a simple texture model, demonstrating the value of interdisciplinary research between human and computer vision. The document concludes by discussing the author's experiences with interdisciplinary collaboration between psychology and computer science.
Mapping the use of digital sources amongst Humanities scholars in the Netherl...MaxKemman
1) The document reports on a survey of 294 Dutch and Belgian academics regarding their use of digital sources and databases.
2) It finds that text is the most commonly used digital medium, and Google is the dominant search tool and platform. Younger academics are more confident in using audiovisual search tools.
3) Disciplines like history and literature most commonly use images and digitized objects, while fields like social studies and linguistics make more use of video, audio, and statistical data.
4) The study has implications for how to increase awareness, appeal and adoption of digital humanities approaches through user-focused design and inclusion in education.
Diagramming, Figures, and Imagery (2D): Think Visual in Online LearningShalin Hai-Jew
Learners will…
define “visual thinking” and “visual cognition”
describe some dimensions of visuals in online learning
describe some ways to create visuals in online learning
consider some uses of visuals in online learning
explore legal considerations related to online learning visuals
consider going open-source for visuals
think about signatures and styles in terms of online visuals (and sharing broadly)
contemplate common errors in visualizations for online learning
review ways to think visually
The document discusses strategies for social enterprises to tackle challenges with content, engagement, and scale on social media. It focuses on leveraging curation to create scalable content by organizing information from various sources and adding context. Various curation tools and techniques are examined, from basic options like email alerts, hashtags and lists, to more advanced visual tools like Scoop.it and Flipboard. Guidelines for effective curation are provided. The document also explores strategies for creating compelling original content, such as using animations, unique presentations, free images and templates, and repurposing content across multiple channels and touchpoints.
Invited Talk OAGM Workshop Salzburg, May 2015dermotte
There is a gap between a user's information need and the queries they submit, known as the "intention gap". Bridging this gap is challenging due to the difficulty of translating intentions into search queries. Researchers have studied user intentions in various contexts like search, media production and sharing. However, fully understanding intentions is difficult as people have trouble expressing their own intentions and judging those of others. Future work should develop new techniques to relate content-based image retrieval to user intentions and take an interdisciplinary approach to better model intentions across domains.
The document outlines Gráinne Conole's presentation on design thinking, learning design, and creativity. It discusses technological trends in learning like mobile learning, games-based learning, and the Internet of things. It then covers learning design frameworks like the 7Cs model and socio-cultural perspectives on design. Finally, it discusses approaches like design-based research and e-pedagogies that integrate technology and pedagogy for learning.
Avatars' in teaching the early experiences of [autosaved]newMartin Rieser
This document discusses the author's experiences using avatars in teaching as a non-technologist. It describes some of the challenges faced, including computer hardware issues and learning new software. The author explains using iClone software to create interactive avatar-based simulations. Potential benefits of avatars discussed are their ability to hold student interest, run different simulations, and be familiar technology to students. Examples are provided of using avatars to present concepts in unusual ways, for role-playing scenarios, and blending with film clips.
1) The document discusses the use of scientific imagery in higher education. Visuals can engage people and aid memory and recall compared to text alone.
2) Different types of images are classified, including static images, illustrations, photographs, animations, videos and more. Images serve instructional functions like informing, engaging, and bridging print and digital media.
3) The university's image library provides images for educational use, selecting from Creative Commons, free royalty, and rights-managed sources. Examples show how 3D imagery, graphs, and photographs can be used for learning.
This document provides an overview of a course on computer vision called CSCI 455: Intro to Computer Vision. It acknowledges that many of the course slides were modified from other similar computer vision courses. The course will cover topics like image filtering, projective geometry, stereo vision, structure from motion, face detection, object recognition, and convolutional neural networks. It highlights current applications of computer vision like biometrics, mobile apps, self-driving cars, medical imaging, and more. The document discusses challenges in computer vision like viewpoint and illumination variations, occlusion, and local ambiguity. It emphasizes that perception is an inherently ambiguous problem that requires using prior knowledge about the world.
This document discusses cognitive learning theory and several theorists who contributed to its development. It explains that cognitive learning results from listening, watching, or touching, and involves learning through experiences. Theorists discussed include Allan Paivio, who discovered dual coding theory explaining how people process information through images and language. Robert Gagne identified nine events of instruction and five areas of learning outcomes. Charles Reigeluth created the Elaboration Theory, which proposes starting with basic information and adding more details later. The document also provides examples of how teachers and students can apply cognitive learning theory.
This document outlines a group project on smartphones. It discusses the project workflow including planning, data collection from libraries and surveys, designing web pages, and distributing tasks among group members. The group's objective was to investigate how people use smartphones and educational apps. Their website presented survey results and analysis on the usefulness and variety of educational apps. The summary concludes that the website was informative but could be improved with more entertainment and interactivity, such as games.
This document discusses images in learning and provides information on finding, editing, and creating images. It defines an image as a two-dimensional representation and notes images can be visual or mental. The document recommends royalty-free sites for finding images and lists tools like Pixlr, Paint.net, GIMP and Photoshop for editing and creating images. Copyright considerations are also mentioned when
This presentation discusses Information Architecture (IA), how to achieve it on your team, and how to apply IA concepts to your technical documentation.
e-Learning for Radiation Oncology: What, Why & How?adrianaberlanga
The document discusses e-learning and its potential applications for radiation oncology education. It begins by defining e-learning and explaining its benefits, such as increased access to learning materials, lower costs, and flexibility. Examples are provided of how e-learning has been used for medical education through interactive simulations, quick updates of materials, and remote guidance. The document then outlines various e-learning tools and resources like videos, virtual patients, webinars, online repositories, and e-activities. It also describes some e-courses and e-master programs that have been developed. In the future, the document suggests e-learning could increasingly integrate social learning and connect learners to ideas, interests, and each other through technologies like augmented
CBMI 2013 Presentation: User Intentions in Multimediadermotte
This document discusses user intentions in visual information retrieval and multimedia information systems. It begins by introducing query by example search and different low-level visual features that work better for some domains than others. It then discusses how determining the right features and defining visual similarity is challenging. The document defines context and intention, and discusses how a user's intention relates to their information need. It reviews taxonomies of user intentions in web search and proposes intentions in multimedia may include search, production, sharing, archiving. The document proposes several open PhD theses around developing a general model of user intentions in multimedia, using games and human computation to infer intentions, bringing context to queries, and creating adaptable applications based on user intentions.
The document discusses infographics and provides guidance on creating them. It defines infographics as visual representations of data that allow information to be seen rather than read. The document outlines best practices for infographic design, such as prioritizing data and showing rather than telling. It also discusses properly attributing images using Creative Commons licenses. Finally, it introduces four free infographic tools - Wordle, Infogr.am, Piktochart, and Easel.ly - and suggests when each might be useful.
The document summarizes a student project on smartphones and educational apps. It outlines the project workflow including planning, data collection from libraries and online surveys, designing web pages using software like Flash and Dreamweaver, and distributing tasks among the student team. It also presents the project contents like an introduction, mind map, survey results, slideshow, analysis, and references. The conclusion reflects on strengths and weaknesses, difficulties, and suggestions that educational apps are useful but lack promotion and variety.
Teaching Visual Literacy Skills in a One-Shot Sessionmollyjschoen
Just as one-shot information literacy sessions can be implemented in college classes to improve students’ research capabilities, similarly-styled sessions on image research can increase their visual literacy skills. While most students interact with images daily, capturing photos on their mobile devices, reading picture-heavy articles on websites, and reposting images from social media pages, such activities do not transform them into critical viewers and users of visual media. To be considered visually literate, as defined by the Visual Literacy Competency Standards for Higher Education by the Association of College and Research Libraries, an individual must “effectively find, interpret, evaluate, use, and create images and visual media.”
A wide range of research and critical thinking strategies may be introduced through these instructional sessions. Locating trustworthy sources online, evaluating the content and quality of images, scrutinizing manipulated images, understanding the implications of copyright, and creating an effective system to store digital files and manage citations are among the recommended topics for presentation. Teaching strategies for image research sessions include using live web searches in both scholarly and open access resources to highlight their relative strengths and weaknesses, using real life examples of image use scenarios to provide context, and structuring presentations based around the specific class in which it will be taught. The desired outcome of teaching an instructional session is to provide students with the tools and confidence they need to effectively use high-quality visual materials in their undergraduate years and beyond.
Similar to Advances and Challenges in Visual Information Search and Retrieval (WVC 2012 - Goiania-GO, Brazil) (20)
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...Fwdays
Direct losses from downtime in 1 minute = $5-$10 thousand dollars. Reputation is priceless.
As part of the talk, we will consider the architectural strategies necessary for the development of highly loaded fintech solutions. We will focus on using queues and streaming to efficiently work and manage large amounts of data in real-time and to minimize latency.
We will focus special attention on the architectural patterns used in the design of the fintech system, microservices and event-driven architecture, which ensure scalability, fault tolerance, and consistency of the entire system.
From Natural Language to Structured Solr Queries using LLMsSease
This talk draws on experimentation to enable AI applications with Solr. One important use case is to use AI for better accessibility and discoverability of the data: while User eXperience techniques, lexical search improvements, and data harmonization can take organizations to a good level of accessibility, a structural (or “cognitive” gap) remains between the data user needs and the data producer constraints.
That is where AI – and most importantly, Natural Language Processing and Large Language Model techniques – could make a difference. This natural language, conversational engine could facilitate access and usage of the data leveraging the semantics of any data source.
The objective of the presentation is to propose a technical approach and a way forward to achieve this goal.
The key concept is to enable users to express their search queries in natural language, which the LLM then enriches, interprets, and translates into structured queries based on the Solr index’s metadata.
This approach leverages the LLM’s ability to understand the nuances of natural language and the structure of documents within Apache Solr.
The LLM acts as an intermediary agent, offering a transparent experience to users automatically and potentially uncovering relevant documents that conventional search methods might overlook. The presentation will include the results of this experimental work, lessons learned, best practices, and the scope of future work that should improve the approach and make it production-ready.
Essentials of Automations: Exploring Attributes & Automation ParametersSafe Software
Building automations in FME Flow can save time, money, and help businesses scale by eliminating data silos and providing data to stakeholders in real-time. One essential component to orchestrating complex automations is the use of attributes & automation parameters (both formerly known as “keys”). In fact, it’s unlikely you’ll ever build an Automation without using these components, but what exactly are they?
Attributes & automation parameters enable the automation author to pass data values from one automation component to the next. During this webinar, our FME Flow Specialists will cover leveraging the three types of these output attributes & parameters in FME Flow: Event, Custom, and Automation. As a bonus, they’ll also be making use of the Split-Merge Block functionality.
You’ll leave this webinar with a better understanding of how to maximize the potential of automations by making use of attributes & automation parameters, with the ultimate goal of setting your enterprise integration workflows up on autopilot.
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
"What does it really mean for your system to be available, or how to define w...Fwdays
We will talk about system monitoring from a few different angles. We will start by covering the basics, then discuss SLOs, how to define them, and why understanding the business well is crucial for success in this exercise.
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
This talk will cover ScyllaDB Architecture from the cluster-level view and zoom in on data distribution and internal node architecture. In the process, we will learn the secret sauce used to get ScyllaDB's high availability and superior performance. We will also touch on the upcoming changes to ScyllaDB architecture, moving to strongly consistent metadata and tablets.
Must Know Postgres Extension for DBA and Developer during MigrationMydbops
Mydbops Opensource Database Meetup 16
Topic: Must-Know PostgreSQL Extensions for Developers and DBAs During Migration
Speaker: Deepak Mahto, Founder of DataCloudGaze Consulting
Date & Time: 8th June | 10 AM - 1 PM IST
Venue: Bangalore International Centre, Bangalore
Abstract: Discover how PostgreSQL extensions can be your secret weapon! This talk explores how key extensions enhance database capabilities and streamline the migration process for users moving from other relational databases like Oracle.
Key Takeaways:
* Learn about crucial extensions like oracle_fdw, pgtt, and pg_audit that ease migration complexities.
* Gain valuable strategies for implementing these extensions in PostgreSQL to achieve license freedom.
* Discover how these key extensions can empower both developers and DBAs during the migration process.
* Don't miss this chance to gain practical knowledge from an industry expert and stay updated on the latest open-source database trends.
Mydbops Managed Services specializes in taking the pain out of database management while optimizing performance. Since 2015, we have been providing top-notch support and assistance for the top three open-source databases: MySQL, MongoDB, and PostgreSQL.
Our team offers a wide range of services, including assistance, support, consulting, 24/7 operations, and expertise in all relevant technologies. We help organizations improve their database's performance, scalability, efficiency, and availability.
Contact us: info@mydbops.com
Visit: https://www.mydbops.com/
Follow us on LinkedIn: https://in.linkedin.com/company/mydbops
For more details and updates, please follow up the below links.
Meetup Page : https://www.meetup.com/mydbops-databa...
Twitter: https://twitter.com/mydbopsofficial
Blogs: https://www.mydbops.com/blog/
Facebook(Meta): https://www.facebook.com/mydbops/
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Astute Business Solutions | Oracle Cloud Partner |
Advances and Challenges in Visual Information Search and Retrieval (WVC 2012 - Goiania-GO, Brazil)
1. Advances and Challenges in
Visual Information Search and
Retrieval
Oge Marques
Florida Atlantic University
Boca Raton, FL - USA
VIII
Workshop
de
Visão
Computacional
(WVC)
2012
May
27
–
30,
2012
Goiania,
GO
-‐
Brazil
2. Take-home message
Visual Information Retrieval (VIR) is a fascinating
research field with many open challenges and
opportunities which have the potential to impact
the way we organize, annotate, and retrieve visual
data (images and videos).
Oge
Marques
3. Disclaimer #1
• Visual Information Retrieval (VIR) is a highly
interdisciplinary field, but …
Image and (Multimedia)
Information
Video Database
Retrieval
Processing Systems
Visual
Machine Computer
Learning Information Vision
Retrieval
Visual data
Human Visual
Data Mining modeling and
Perception
representation
Oge
Marques
4. Disclaimer #2
• There are many things that I believe…
• … but cannot prove
Oge
Marques
5. Background and Motivation
“What is it that we’re trying to do
and
why is it so difficult?”
– Taking pictures and storing, sharing, and publishing
them has never been so easy and inexpensive.
– If only we could say the same about finding the images
we want and retrieving them…
Oge
Marques
6. Background and Motivation
The “big mismatch”
easy
• Take pictures
• Store pictures
• Publish pictures
expensive • Share pictures cheap
• Organize pictures
• Annotate pictures
• Find pictures
• Retrieve pictures difficult
Oge
Marques
7. Background and Motivation
• Q: What do you do when you need to find an image
(on the Web)?
• A1: Google (image search), of course!
Oge
Marques
8. Background and Motivation
Google image search results for “sydney opera house”
Source: Google Image Search (http://images.google.com/)
Oge
Marques
9. Background and Motivation
Google image search results for “opera”
Source: Google Image Search (http://images.google.com/)
Oge
Marques
10. Background and Motivation
• Q: What do you do when you need to find an
image (on the Web)?
• A2: Other (so-called specialized) image search
engines
• http://images.search.yahoo.com/
• http://pictures.ask.com
• http://www.bing.com/images
Oge
Marques
14. Background and Motivation
• Q: What do you do when you need to find an
image (on the Web)?
• A3: Search directly on large photo repositories:
– Flickr
– Webshots
– Shutterstock
Oge
Marques
19. Background and Motivation
• Back to our original (two-part) question:
– What is it that we’re trying to do?
– We're trying to create
automated solutions to the problem of
finding and retrieving visual information,
from (large, unstructured) repositories,
in a way that satisfies search criteria specified by
users,
relying (primarily) on the visual contents of the
media.
Oge
Marques
20. Background and Motivation
• Why is it so difficult?
• There are many challenges, among them:
– The elusive notion of similarity
– The semantic gap
– Large datasets and broad domains
– Combination of visual and textual information
– The users (and how to make them happy)
Oge
Marques
21. Outline
• Part I – Concepts, challenges, and state of the art
• Part II – Medical image retrieval
• Part III – Mobile visual search
• Part IV – Where is image search headed?
Oge
Marques
23. The elusive notion of similarity
• Are these two images similar?
Source: Eidenberger, H., Introduction:Visual Information Retrieval, “Habilitation thesis”,Vienna University of Technology, 2004.
Available at http://www.ims.tuwien.ac.at/~hme/papers/habil-full.pdf
Oge
Marques
24. The elusive notion of similarity
• Are these two images similar?
Source: Eidenberger, H., Introduction:Visual Information Retrieval, “Habilitation thesis”,Vienna University of Technology, 2004.
Available at http://www.ims.tuwien.ac.at/~hme/papers/habil-full.pdf
Oge
Marques
25. The elusive notion of similarity
• Is the second or the third image more similar to
the first?
Source: Eidenberger, H., Introduction:Visual Information Retrieval, “Habilitation thesis”,Vienna University of Technology, 2004.
Available at http://www.ims.tuwien.ac.at/~hme/papers/habil-full.pdf
Oge
Marques
26. The elusive notion of similarity
• Which image fits better to the first two: the third
or the fourth?
Source: Eidenberger, H., Introduction:Visual Information Retrieval, “Habilitation thesis”,Vienna University of Technology, 2004.
Available at http://www.ims.tuwien.ac.at/~hme/papers/habil-full.pdf
Oge
Marques
27. The semantic gap
• The semantic gap is the lack of coincidence
between the information that one can extract
from the visual data and the interpretation that
the same data have for a user in a given situation.
• “The pivotal point in content-based retrieval is that the user
seeks semantic similarity, but the database can only provide
similarity by data processing. This is what we called the
semantic gap.” [Smeulders et al., 2000]
Oge
Marques
36. How I see it…
• The semantic gap problem has not been solved (and
maybe will never be…)
• What are the alternatives?
– Treat visual similarity and semantic relatedness differently
• Examples: Alipr, Google (or Bing) similarity search, etc.
– Improve both (text-based and visual) search methods
independently
– Combine visual and textual information in a meaningful
way
– Engage the user
• Collaborative filtering, crowdsourcing, games.
Oge
Marques
37. • But, wait… There
are other gaps!
– Just when you
thought the
semantic gap was
your only
problem…
Source: [Deserno, Antani, and Long, 2009]
Oge
Marques
38. Large datasets and broad domains
• Large datasets bring additional challenges in all
aspects of the system:
– Storage requirements: images, metadata, and “visual
signatures”
– Computational cost of indexing, searching, retrieving,
and displaying images
– Network and latency issues
Oge
Marques
39. Large datasets and broad domains
Source: Smeulders et al., “Content-based image retrieval at the end of
the early years”, IEEE Transactions on PAMI, Vol 22, Issue 12, Dec 2000
Oge
Marques
40. Challenge: users’ needs and intentions
• Users and developers have quite different views
• Cultural and contextual information should be
taken into account
• User intentions are hard to infer
– Privacy issues
– Users themselves don’t always know what they want
– Who misses the MS Office paper clip?
Oge
Marques
41. Challenge: users’ needs and intentions
• The user’s
perspective
– What do they
want?
– Where do
they want to
search?
– In what form
do they
express their Source: R. Datta, D. Joshi, J. Li, and J. Z. Wang, “Image Retrieval: Ideas,
query?
Influences, and Trends of the New Age”, ACM Computing Surveys, April 2008.
Oge
Marques
42. Challenge: users’ needs and intentions
• The image
retrieval system
should be able to
be mindful of:
– How users wish
the results to be
presented
– Where users
desire to search
– The nature of
user input/ Source: R. Datta, D. Joshi, J. Li, and J. Z. Wang, “Image Retrieval: Ideas,
Influences, and Trends of the New Age”, ACM Computing Surveys, April 2008.
interaction.
Oge
Marques
43. Challenge: users’ needs and intentions
• Each application has
different users (with
different intent, needs,
background, cultural bias,
etc.) and different visual
assets.
???
Oge
Marques
44. Challenge: growing up (as a field)
• It’s been 10 years since the “end of the early years”
– Are the challenges from 2000 still relevant?
– Are the directions and guidelines from 2000 still
appropriate?
– Have we grown up (at all)?
– Let’s revisit the ‘Concluding Remarks’ from that paper…
Oge
Marques
45. Revisiting [Smeulders et al. 2000]
What they said
How I see it
• Driving forces
• Yes, we have seen many new
audiences, new purposes, new
– “[…] content-based image styles of use, and new modes
retrieval (CBIR) will continue of interaction emerge.
to grow in every direction:
new audiences, new purposes, • Each of these usually requires
new styles of use, new modes new methods to solve the
of interaction, larger data sets, problems that they bring.
and new methods to solve
the problems.”
• However, not too many
researchers see them as a
driving force (as they should).
Oge
Marques
46. Revisiting [Smeulders et al. 2000]
What they said
How I see it
• Heritage of computer vision
• I’m afraid I have bad news…
– Computer vision hasn’t made
– “An important obstacle to so much progress during the
overcome […] is to realize past 10 years.
that image retrieval does not
entail solving the general – Some classical problems
image understanding (including image
understanding)
problem.”
remain unresolved.
– Similarly, CBIR from a
pure computer vision
perspective didn’t work
too well either.
Oge
Marques
47. Revisiting [Smeulders et al. 2000]
What they said
How I see it
• Influence on computer • The adoption of large data sets
became standard practice in
vision
computer vision.
– “[…] CBIR offers a different • No reliance on strong
look at traditional computer segmentation (still unresolved) led
to new areas of research, e.g.,
vision problems: large data automatic ROI extraction and RBIR.
sets, no reliance on strong • Color image processing and color
segmentation, and revitalized descriptors became incredibly
interest in color image popular, useful, and (to some
processing and invariance.”
degree) effective.
• Invariance still a huge problem
– But it’s cheaper than ever to have
multiple views.
Oge
Marques
48. Revisiting [Smeulders et al. 2000]
What they said
How I see it
• Similarity and learning
• The authors were pointing in the
right direction (human in the
– “We make a pledge for the loop, role of context, benefits
importance of human- based from learning,…)
similarity rather than general
similarity. Also, the connection • However:
between image semantics, – Similarity is a tough problem to
crack and model.
image data, and query context • Even the understanding of how
will have to be made clearer humans judge image similarity is
very limited.
in the future.”
– Machine learning is almost
– “[…] in order to bring inevitable…
• … but sometimes it can be
semantics to the user, learning abused.
is inevitable.”
Oge
Marques
49. Revisiting [Smeulders et al. 2000]
What they said
How I see it
• Interaction
• Significant progress on
– Better visualization options, visualization interfaces and
more control to the user, devices.
ability to provide feedback
[…]
• Relevance Feedback: still a
very tricky tradeoff (effort
vs. perceived benefit), but
more popular than ever
(rating, thumbs up/down,
etc.)
Oge
Marques
50. Revisiting [Smeulders et al. 2000]
What they said
How I see it
• Need for databases
• Very little progress
– “The connection between
CBIR and database research is – Image search and retrieval has
likely to increase in the benefited much more from
future. […] problems like the document information
definition of suitable query retrieval than from database
languages, efficient search in research.
high dimensional feature
space, search in the presence
of changing similarity
measures are largely unsolved
[…]”
Oge
Marques
51. Revisiting [Smeulders et al. 2000]
What they said
How I see it
• The problem of evaluation
• Significant progress on
– CBIR could use a reference benchmarks, standardized
standard against which new datasets, etc.
algorithms could be evaluated
(similar to TREC in the field of – ImageCLEF
text recognition).
– Pascal VOC Challenge
– “A comprehensive and publicly – MSRA dataset
available collection of images, – Simplicity dataset
sorted by class and retrieval – UCID dataset and ground truth
purposes, together with a (GT)
protocol to standardize – Accio / SIVAL dataset and GT
experimental practices, will be – Caltech 101, Caltech 256
instrumental in the next phase – LabelMe
of CBIR.”
Oge
Marques
52. Revisiting [Smeulders et al. 2000]
What they said
How I see it
• Semantic gap and other • The semantic gap problem
sources
has not been solved (and
– “A critical point in the maybe will never be…)
advancement of CBIR is the
semantic gap, where the
meaning of an image is rarely • But the idea about using
self-evident. […] One way to other sources was right on
resolve the semantic gap the spot!
comes from sources outside
– Geographical context
the image by integrating other
sources of information about the – Social networks
image in the query.”
– Tags
Oge
Marques
54. Medical image retrieval
• Challenges
– We’re entering a new country…
• How much can we bring?
• Do we speak the language?
• Do we know their culture?
• Do they understand us and where we come from?
• Opportunities
– They use images (extensively)
– They have expert knowledge
– Domains are narrow (almost by definition)
– Fewer clients, but potentially more $$
Oge
Marques
55. Medical image retrieval
• Selected challenges:
– Different terminology
– Standards
– Modality dependencies
• Other challenges:
– Equipment dependencies
– Privacy issues
– Proprietary data
Oge
Marques
56. Different terminology
• Be prepared for:
– New acronyms
• CBMIR (Content-Based Medical Image Retrieval)
• PACS (Picture Archiving and Communication System)
• DICOM (Digital Imaging and COmmunication in Medicine)
• Hospital Information Systems (HIS)
• Radiological Information Systems (RIS)
– New phrases
• Imaging informatics
– Lots of technical medical terms
Oge
Marques
57. Standards
• DICOM (http://medical.nema.org/)
– Global IT standard, created in 1993, used in virtually all
hospitals worldwide.
– Designed to ensure the interoperability of different
systems and manage related workflow.
– Will be required by all EHR systems that include imaging
information as an integral part of the patient record.
– 750+ technical and medical experts participate in 20+
active DICOM working groups.
– Standard is updated 4-5 times per year.
– Many available tools! (see http://www.idoimaging.com/)
Oge
Marques
58. Medical image modalities
• The IRMA code [Lehmann et al., 2003]
– 4 axes with 3 to 4 positions, each in {0,...9,a,...,z}, where 0
denotes unspecified to determine the end of a path along an
axis.
• Technical code (T) describes the imaging modality
• Directional code (D) models body orientations
• Anatomical code (A) refers to the body region examined
• Biological code (B) describes the biological system
examined.
Oge
Marques
59. Medical image modalities
• The IRMA code [Lehmann et al., 2003]
– The entire code results in a character string of 14
characters (IRMA: TTTT – DDD – AAA – BBB).
Example: “x-ray, projection radiography,
analog, high energy – sagittal, left lateral
decubitus, inspiration – chest, lung –
respiratory system, lung”
Source: [Lehmann et al., 2003]
Oge
Marques
60. Medical image modalities
• The IRMA code
[Lehmann et al.,
2003]
– The companion
tool…
Source: [Lehmann et al., 2004]
Oge
Marques
61. CBMIR vs. text-based MIR
• Most current retrieval systems in clinical use rely on
text keywords such as DICOM header information to
perform retrieval.
• CBIR has been widely researched in a variety of
domains and provides an intuitive and expressive
method for querying visual data using features, e.g.
color, shape, and texture.
• However, current CBIR systems:
– are not easily integrated into the healthcare environment;
– have not been widely evaluated using a large dataset; and
– lack the ability to perform relevance feedback to refine
retrieval results.
Source: [Hsu et al., 2009]
Oge
Marques
62. Who are the main players?
• USA
– NIH (National Institutes of Health)
• NIBIB - National Institute of Biomedical Imaging and
Bioengineering
• NCI - National Cancer Institute
• NLM – National Libraries of Medicine
– Several universities and hospitals
• Europe
– Aachen University (Germany)
– Geneva University (Switzerland)
• Big companies (Siemens, GE, etc.)
Oge
Marques
63. Medical image retrieval systems: examples
• IRMA (Image Retrieval in Medical Applications)
– Aachen University (Germany)
• http://ganymed.imib.rwth-aachen.de/irma/
– 3 online demos:
• IRMA Query demo: allows the evaluation of CBIR on several
databases.
• IRMA Extended Query Refinement demo: CBIR from the IRMA
database (a subset of 10,000 images).
• Spine Pathology and Image Retrieval Systems (SPIRS) designed by the
NLM/NIH (USA): holds information of ~17,000 spine x-rays.
Oge
Marques
64. Medical image retrieval systems: examples
• MedGIFT (GNU Image Finding Tool)
– Geneva University (Switzerland)
• http://www.sim.hcuge.ch/medgift/
– Large effort, including projects such as:
• Talisman (lung image retrieval)
• Case-based fracture image retrieval system
• Onco-Media: medical image retrieval + grid computing
• ImageCLEF: evaluation and validation
• medSearch
Oge
Marques
65. Medical image retrieval systems: examples
• WebMIRS
– NIH / NLM (USA)
• http://archive.nlm.nih.gov/proj/webmirs/index.php
– Query by text + navigation by categories
– Uses datasets and related x-ray images from the
National Health and Nutrition Examination Survey
(NHANES)
Oge
Marques
66. Medical image retrieval systems: examples
• SPIRS (Spine Pathology Image Retrieval System):
Web-based image retrieval system for large
biomedical databases
– NIH / UCLA (USA)
– Representative case study on highly specialized CBMIR
Source: [Hsu et al., 2009] Oge
Marques
67. Medical image retrieval systems: examples
• National Biomedical Imaging Archive (NBIA)
– NCI / NIH (USA)
• https://imaging.nci.nih.gov/
– Search based on metadata (DICOM fields)
– 3 search options:
• Simple
• Advanced
• Dynamic
Oge
Marques
68. Medical image retrieval systems: examples
• ARSS Goldminer
– American Roentgen Ray Society (USA)
• http://goldminer.arrs.org/
– Query by text
– Results can be filtered by:
• Modality
• Age
• Sex
Oge
Marques
69. Evaluation: ImageCLEF Medical Image Retrieval
• ImageCLEF Medical Image
Retrieval
• http://www.imageclef.org/2011/medical
– Dataset: 77,000+ images from articles published in
medical journals including text of the captions and link
to the html of the full text articles.
– 3 types of tasks:
• Modality Classification: given an image, return its modality
• Ad-hoc retrieval: classic medical retrieval task, with 3
“flavors”: textual, mixed and semantic queries
• Case-based retrieval: retrieve cases including images that
might best suit the provided case description.
Oge
Marques
70. Medical Image Retrieval: promising directions
• Better user interfaces (responsive, highly interactive,
and capable of supporting relevance feedback)
• New applications of CBMIR, including:
– Teaching
– Research
– Diagnosis
– PACS and Electronic Patient Records
• CBMIR evaluation using medical experts
• Integration of local and global features
• New visual descriptors
Oge
Marques
73. Mobile visual search: driving factors
• Age of mobile computing
hIp://60secondmarketer.com/blog/2011/10/18/more-‐mobile-‐phones-‐than-‐toothbrushes/
Oge
Marques
74. Mobile visual search: driving factors
• Why do I need a camera? I have a smartphone…
(22 Dec 2011)
hIp://www.cellular-‐news.com/story/52382.php
Oge
Marques
75. Mobile visual search: driving factors
• Powerful devices
1 GHz ARM
Cortex-A9
processor,
PowerVR
SGX543MP2,
Apple A5 chipset
hIp://www.apple.com/iphone/specs.html
hIp://www.gsmarena.com/apple_iphone_4s-‐4212.php
Oge
Marques
76. Mobile visual search: driving factors
• Powerful devices
hIp://europe.nokia.com/PRODUCT_METADATA_0/Products/Phones/8000-‐series/808/Nokia808PureView_Whitepaper.pdf
hIp://www.nokia.com/fr-‐fr/produits/mobiles/808/
Oge
Marques
77. Mobile visual search: driving factors
Social networks
and mobile
devices
(May 2011)
hIp://jess3.com/geosocial-‐universe-‐2/
Oge
Marques
78. Mobile visual search: driving factors
• Social networks and mobile devices
– Motivated users: image taking and image sharing are
huge!
:
hIp://www.onlinemarke_ng-‐trends.com/2011/03/facebook-‐photo-‐sta_s_cs-‐and-‐insights.html
Oge
Marques
79. Mobile visual search: driving factors
• Instagram:
– 50 million registered users (35 M in last four
months)
– 7 employees
– A (growing ecosystem) based on it!
• Search
• Send postcards
• Manage your photos
• Build a poster
• etc.
– Sold to Facebook (for $ 1 Billion !)
earlier this year
hIp://thenextweb.com/apps/2011/12/07/instagram-‐hits-‐15m-‐users-‐and-‐has-‐2-‐people-‐working-‐on-‐an-‐android-‐app-‐right-‐now/
hIp://www.nuwomb.com/instagram/
Oge
Marques
80. Mobile visual search: driving factors
• Legitimate (or not quite…) needs and use cases
hIp://www.slideshare.net/dtunkelang/search-‐by-‐sight-‐google-‐goggles
hIps://twiIer.com/#!/courtanee/status/14704916575
Oge
Marques
81. Search system, a low-latency interactive visual search system. base and is the key to very fast retr
Several sidebars in this article invite the interested reader to dig features they have in common wit
deeper into the underlying algorithms. of potentially similar images is sele
Finally, a geometric verificatio
Mobile visual search: driving factors
ROBUST MOBILE IMAGE RECOGNITION
Today, the most successful algorithms for content-based image
most similar matches in the datab
spatial pattern between features of
retrieval use an approach that is referred to as bag of features didate database image to ensure
(BoFs) or bag of words (BoWs). The BoW idea is borrowed from Example retrieval systems are pres
• A natural use case for CBIR with QBE (at last!)
text retrieval. To find a particular text document, such as a Web
page, it is sufficient to use a few well-chosen words. In the
For mobile visual search, ther
to provide the users with an int
– The example is right in front of the user!
database, the document itself can be likewise represented by a deployed systems typically transm
the server, which might require t
large databases, the inverted file in
memory swapping operations slow
ing stage. Further, the GV step
and thus increases the response t
the retrieval pipeline in the follow
the challenges of mobile visual se
Query Feature
Image Extraction
[FIG2] A Pipeline for image retrieva
from the query image. Feature mat
[FIG1] A snapshot of an outdoor mobile visual search system images in the database that have m
being used. The system augments the viewfinder with with the query image. The GV step
information about the objects it recognizes in the image taken feature locations that cannot be pl
with a camera phone. in viewing position.
Girod
et
al.
IEEE
Mul_media
2011
Oge
Marques
82. MVS: technical challenges
• How to ensure low latency (and interactive
queries) under constraints such as:
– Network bandwidth
– Computational power
– Battery consumption
• How to achieve robust visual recognition in spite
of low-resolution cameras, varying lighting
conditions, etc.
• How to handle broad and narrow domains
Oge
Marques
83. MVS: Pipeline for image retrieval
Girod
et
al.
IEEE
Mul_media
2011
Oge
Marques
85. MVS: descriptor extraction
• Interest point detection
• Feature descriptor computation
Girod
et
al.
IEEE
Mul_media
2011
Oge
Marques
86. Interest point detection
• Numerous interest-point detectors have been proposed in
the literature:
– Harris Corners (Harris and Stephens 1988)
– Scale-Invariant Feature Transform (SIFT) Difference-of-Gaussian
(DoG) (Lowe 2004)
– Maximally Stable Extremal Regions (MSERs) (Matas et al. 2002)
– Hessian affine (Mikolajczyk et al. 2005)
– Features from Accelerated Segment Test (FAST) (Rosten and
Drummond 2006)
– Hessian blobs (Bay, Tuytelaars and Van Gool 2006)
• Different tradeoffs in repeatability and complexity
• See (Mikolajczyk and Schmid 2005) for a comparative
performance evaluation of local descriptors in a common
framework.
Girod
et
al.
IEEE
Signal
Processing
Magazine
2011
Oge
Marques
87. Feature descriptor computation
• After interest-point detection, we compute a
visual word descriptor on a normalized patch.
• Ideally, descriptors should be:
– robust to small distortions in scale, orientation, and
lighting conditions;
– discriminative, i.e., characteristic of an image or a small
set of images;
– compact, due to typical mobile computing constraints.
Girod
et
al.
IEEE
Signal
Processing
Magazine
2011
Oge
Marques
88. Feature descriptor computation
• Examples of feature descriptors in the literature:
– SIFT (Lowe 1999)
– Speeded Up Robust Feature (SURF) interest-point
detector (Bay et al. 2008)
– Gradient Location and Orientation Histogram (GLOH)
(Mikolajczyk and Schmid 2005)
– Compressed Histogram of Gradients (CHoG)
(Chandrasekhar et al. 2009, 2010)
• See (Winder, (Hua,) and Brown CVPR 2007, 2009) and
(Mikolajczyk and Schmid PAMI 2005) for comparative
performance evaluation of different descriptors.
Girod
et
al.
IEEE
Signal
Processing
Magazine
2011
Oge
Marques
89. Feature descriptor computation
• What about compactness?
– Option 1: Compress off-the-shelf descriptors.
• Result: poor rate-constrained image-retrieval
performance.
– Option 2: Design a descriptor with compression in
mind.
– Example: CHoG (Compressed Histogram of Gradients)
(Chandrasekhar et al. 2009, 2010)
Girod
et
al.
IEEE
Signal
Processing
Magazine
2011
Oge
Marques
90. CHoG: Compressed Histogram of Gradients
Gradients
Gradient distributions
Patch
for each bin
dx
dy
dx
dy
011101
Spatial
0100101
binning
01101
101101
Histogram
0100011
111001
compression
0010011
01100
1010100
CHoG
Descriptor
Bernd Girod: Mobile Visual Search
Chandrasekhar
et
al.
CVPR
09,10
Oge
Marques
91. CHoG: Compressed Histogram of Gradients
[3B2-9] mmu2011030086.3d 30/7/011 16:27 Page 92
• Performance evaluation
– Recall vs. bit rate
Industry and Standards
100
features, as they arrive.15 On
98 finds a result that has sufficien
ing score, it terminates the searc
96 ately sends the results back. T
optimization reduces system
Classification accuracy (%)
94
other factor of two.
92 Overall, the SPS system dem
using the described array of tec
90 bile visual-search systems can ac
ognition accuracy, scale to re
88
databases, and deliver search r
86 ceptable time.
84 Send feature (CHoG) Emerging MPEG standard
Send image (JPEG) As we have seen, key compo
82
Send feature (SIFT) gies for mobile visual search alr
80 we can choose among several p
100 101 102
tures to design such a system. W
Query size (Kbytes)
these options at the beginnin
Figure 7. Comparison of different schemes with regard to classification The architecture shown in Figur
Girod
et
al.
IEEE
Mul_media
2011
Oge
Marques
est one to implement on a mobi
accuracy and query size. CHoG descriptor data is an order of magnitude
smaller compared to JPEG images or uncompressed SIFT descriptors. requires fast networks such as W
good performance. The archite
92. MVS: feature indexing and matching
• Goal: produce a data structure that can quickly return a short
list of the database candidates most likely to match the query
image.
– The short list may contain false positives as long as the correct match
is included.
– Slower pairwise comparisons can be subsequently performed on just
the short list of candidates rather than the entire database.
• Example of a technique: Vocabulary Tree (VT)-Based Retrieval
Girod
et
al.
IEEE
Mul_media
2011
Oge
Marques
93. MVS: geometric verification
• Goal: use location information of features in
query and database images to confirm that the
feature matches are consistent with a change in
view-point between the two images.
Girod
et
al.
IEEE
Mul_media
2011
Oge
Marques
94. ik2, c, ikNk 6 is sorted, it is more
utive ID differences 5 dk1 5 ik1,
es. is used to encode the inverted index.
2 ik1Nk 212 6 in place of the IDs. This
dex [58] can significantly reduce
cting recognition accuracy. First, [64] and recursive bottom-up complete (RBUC) code [65] have
been shown to be at least ten times faster in decoding than
MVS: geometric verification
AC, while achieving comparable compression gains as AC. The
carryover and RBUC codes attain these speedups by enforcing
ed in text retrieval [62]. Second, word-aligned memory accesses.
n be quantized to a few repre- Figure S6(a) compares the memory usage of the invert-
• Method: perform ed index with and without feature descriptorsRBUC evaluate
Max quantization. Third, the dis- pairwise matching of compression using the and
ces and visit counts are far from code. Index compression reduces memory usage from near-
geometricrate ly 10 GBof correspondences.
coding can be much more
consistency to 2 GB. This five times reduction leads to a sub-
• Techniques:
oding. Using the distributions of stantial speedup in server-side processing, as shown in
counts, each inverted list can be Figure S6(b). Without compression, the large inverted
c code (AC) [63]. The geometricindex causes swapping between main anddatabase image is usually
– Since keeping transform between the query and virtual memory estimated
very important for interactive regression down the retrieval engine. After compression,
using robust and slows techniques such as:
ions, a scheme that allows ultra- sample consensus (RANSAC) (Fischlermemory congestion
• Random memory swapping is avoided and and Bolles 1981)
red over AC. The carryover code delays no longer contribute to the query latency.
• Hough transform (Lowe 2004)
– The transformation is often represented by an affine mapping or a homography.
• Note: GV is computationally expensive, which is why it’s only used for a subset
of images selected during the feature-matching stage.
onsistency checks to rerank
tion and scale information of
[53] and [69] propose incor-
tion into the VT matching or
71], the authors investigate
stimation itself. Philbin et al.
atching features to propose
c transformation model and
hypotheses. Weak geometric
cally used to rerank a larger
ore a full GVt
al.
Iperformed on011
Girod
e is EEE
Mul_media
2 Oge
Marques
[FIG4] In the GV step, we match feature descriptors pairwise and
find feature correspondences that are consistent with a geometric
add a geometric reranking step
95. Datasets for MVS research
• Stanford Mobile Visual Search Data Set
(http://web.cs.wpi.edu/~claypool/mmsys-dataset/2011/stanford/)
– Key characteristics:
• rigid objects
• widely varying lighting conditions
• perspective distortion
• foreground and background clutter
• realistic ground-truth reference data
• query data collected from heterogeneous low and high-end
camera phones.
Chandrasekhar
et
al.
ACM
MMSys
2011
Oge
Marques
96. SMVS Data Set: categories and examples
• DVD covers
hIp://web.cs.wpi.edu/~claypool/mmsys-‐2011-‐dataset/stanford/mvs_images/dvd_covers.html
Oge
Marques
97. SMVS Data Set: categories and examples
• CD covers
hIp://web.cs.wpi.edu/~claypool/mmsys-‐2011-‐dataset/stanford/mvs_images/cd_covers.html
Oge
Marques
98. SMVS Data Set: categories and examples
• Museum paintings
hIp://web.cs.wpi.edu/~claypool/mmsys-‐2011-‐dataset/stanford/mvs_images/museum_pain_ngs.html
Oge
Marques
99. Other MVS data sets
ISO/IEC
JTC1/SC29/WG11/N12202
-‐
July
2011,
Torino,
IT
Oge
Marques
100. MPEG Compact Descriptors for Visual Search (CDVS)
• Objective
– Define a standard that enables efficient
implementation of visual search functionality on mobile
devices
• Scope
• bitstream of descriptors
• parts of descriptor extraction process (e.g. key-point
detection) needed to ensure interoperability
– Additional info:
• https://mailhost.tnt.uni-hannover.de/mailman/listinfo/cdvs
• http://mpeg.chiariglione.org/meetings/geneva11-1/geneva_ahg.htm (Ad hoc groups)
Bober,
Cordara,
and
Reznik
(2010)
Oge
Marques
101. MPEG CDVS
[3B2-9] mmu2011030086.3d 1/8/011 16:44 Page 93
• Summarized timeline
Table 1. Timeline for development of MPEG standard for visual search.
When Milestone Comments
March, 2011 Call for Proposals is published Registration deadline: 11 July 2011
Proposals due: 21 November 2011
December, 2011 Evaluation of proposals None
February, 2012 1st Working Draft First specification and test software model that can
be used for subsequent improvements.
July, 2012 Committee Draft Essentially complete and stabilized specification.
January, 2013 Draft International Standard Complete specification. Only minor editorial
changes are allowed after DIS.
July, 2013 Final Draft International Finalized specification, submitted for approval and
Standard publication as International standard.
that among several component technologies for existing standards, such as MPEG Query For-
image retrieval, such a standard should focus pri- mat, HTTP, XML, JPEG, and JPSearch.
marily on defining the format of descriptors and
Girod
et
al.
IEEE
Mul_media
2011
Oge
Marques
parts of their extraction process (such as interest Conclusions and outlook
point detectors) needed to ensure interoperabil- Recent years have witnessed remarkable
102. Examples
• Google Goggles
• SnapTell
• oMoby (and the IQ Engines API)
• pixlinQ
• Moodstocks
Oge
Marques
103. Examples of commercial MVS apps
• Google
Goggles
– Android
and iPhone
– Narrow-
domain
search and
retrieval
hIp://www.google.com/mobile/goggles
Oge
Marques
104. SnapTell
• One of the earliest (ca. 2008) MVS apps for iPhone
– Eventually acquired by Amazon (A9)
• Proprietary technique (“highly accurate and robust
algorithm for image matching: Accumulated Signed Gradient
(ASG)”).
hIp://www.snaptell.com/technology/index.htm
Oge
Marques
105. oMoby (and the IQ Engines API)
– iPhone app
hIp://omoby.com/pages/screenshots.php
Oge
Marques
106. oMoby (and the IQ Engines API)
• The IQ Engines API:
“vision as a service”
hIp://www.iqengines.com/applica_ons.php
Oge
Marques
107. pixlinQ
• A “mobile visual
search solution that
enables you to link
users to digital
content whenever
they take a mobile
picture of your
printed materials.”
– Powered by image
recognition from LTU
technologies
hIp://www.pixlinq.com/home
Oge
Marques
108. pixlinQ
• Example app (La Redoute)
hIp://www.youtube.com/watch?v=qUZCFtc42Q4
Oge
Marques
109. Moodstocks: overview
• Offline image recognition thanks to a smart image
signatures synchronization
hIp://www.youtube.com/watch?v=tsxe23b12eU
Oge
Marques
110. Moodstocks: technology
• Unique features:
– offline image recognition thanks to a smart image signatures
synchronization,
– QR Code decoding,
– EAN 8/13 decoding,
– online image recognition as a fallback for very large image databases,
– simultaneous run of image recognition and barcode decoding,
– seamless scans logging in the background.
• Cross-platform (iOS / Android) client-side SDK and HTTP API
available: https://github.com/Moodstocks
• JPEG encoder used within their SDK also publicly
available: https://github.com/Moodstocks/jpec
Oge
Marques
111. Moodstocks
• Many successful apps for different platforms
hIp://www.moodstocks.com/gallery/
Oge
Marques
112. MVS: concluding thoughts
• Mobile Visual Search (MVS) is coming of age.
• This is not a fad and it can only grow.
• Still a good research topic
– Many relevant technical challenges
– MPEG efforts have just started
• Infinite creative commercial possibilities
Oge
Marques
114. Where is image search headed?
• Advice for [young] researchers
– In this last part, I’ve compiled pieces and bits of advice
that I believe might help researchers who are entering
the field.
– They focus on research avenues that I personally
consider to be the most promising.
Oge
Marques
115. Advice for [young] researchers
• LOOK
• THINK
• UNDERSTAND
• CREATE
Oge
Marques
116. Advice for [young] researchers
• LOOK…
– at yourself (how do you search for images and videos?)
– around (related areas and how they have grown)
– at Google (and other major players)
Oge
Marques
117. Advice for [young] researchers
• THINK…
– mobile devices
– new devices and services
– social networks
– games
Oge
Marques
118. Advice for [young] researchers
• UNDERSTAND…
– human intentions and emotions
– the context of the search
– user’s preferences and needs
Oge
Marques
119. Advice for [young] researchers
• CREATE…
– better interfaces
– better user experience
– new business opportunities (added value)
Oge
Marques
120. Concluding thoughts
– I believe (but cannot prove…) that successful VIR
solutions will:
• combine content-based image retrieval (CBIR) with
metadata (high-level semantic-based image retrieval)
• only be truly successful in narrow domains
• include the user in the loop
– Relevance Feedback (RF)
– Collaborative efforts (tagging, rating, annotating)
• provide friendly, intuitive interfaces
• incorporate results and insights from cognitive science,
particularly human visual attention, perception, and
memory
Oge
Marques
121. Concluding thoughts
– I believe (but cannot prove…) that successful VIR
solutions will:
• combine content-based image retrieval (CBIR) with
metadata (high-level semantic-based image retrieval)
• only be truly successful in narrow domains
• include the user in the loop
– Relevance Feedback (RF)
– Collaborative efforts (tagging, rating, annotating)
• provide friendly, intuitive interfaces
• incorporate results and insights from cognitive science,
particularly human visual attention, perception, and
memory
Oge
Marques
122. Concluding thoughts
– I believe (but cannot prove…) that successful VIR
solutions will:
• combine content-based image retrieval (CBIR) with
metadata (high-level semantic-based image retrieval)
• only be truly successful in narrow domains
• include the user in the loop
– Relevance Feedback (RF)
– Collaborative efforts (tagging, rating, annotating)
• provide friendly, intuitive interfaces
• incorporate results and insights from cognitive science,
particularly human visual attention, perception, and
memory
Oge
Marques
123. Concluding thoughts
– I believe (but cannot prove…) that successful VIR
solutions will:
• combine content-based image retrieval (CBIR) with
metadata (high-level semantic-based image retrieval)
• only be truly successful in narrow domains
• include the user in the loop
– Relevance Feedback (RF)
– Collaborative efforts (tagging, rating, annotating)
• provide friendly, intuitive interfaces
• incorporate results and insights from cognitive science,
particularly human visual attention, perception, and
memory
Oge
Marques
124. Concluding thoughts
– I believe (but cannot prove…) that successful VIR
solutions will:
• combine content-based image retrieval (CBIR) with
metadata (high-level semantic-based image retrieval)
• only be truly successful in narrow domains
• include the user in the loop
– Relevance Feedback (RF)
– Collaborative efforts (tagging, rating, annotating)
• provide friendly, intuitive interfaces
• incorporate results and insights from cognitive science,
particularly human visual attention, perception, and
memory
Oge
Marques
125. Concluding thoughts
• “Image search and retrieval” is not a problem, but
rather a collection of related problems that look like
one.
• There is a great need for good solutions to specific
problems.
• 10 years after “the end of the early years”, research in
visual information retrieval still has many open
problems, challenges, and opportunities.
Oge
Marques