This is a presentation to share the experiences and selected presentation from International Computer Vision Summer School (ICVSS2011) attended by Angel Cruz and Andrea Rueda from Bioingenium Research Group of Universidad Nacional de Colombia.
This document discusses a motion graphic exploration project created by Kim Chambers. It contains images and a final video from the motion graphic exploration project. The project involved creating motion graphics and compiling them into a final video presentation.
This document summarizes a tutorial on large-scale visual recognition. It begins by outlining common visual recognition tasks like image retrieval and classification. It then describes several benchmark datasets used for these tasks and how they have increased in scale over time. The document discusses how approaches to collecting large classification datasets have evolved from relying on manual post-processing to using automated methods or crowdsourcing. It notes a convergence between techniques originally developed for image retrieval and classification. The goals of the tutorial are described as providing tools to handle large-scale datasets and showing this convergence between retrieval and classification approaches.
This document discusses techniques for instance search using convolutional neural network features. It presents two papers by the author on this topic. The first paper uses bags-of-visual-words to encode convolutional features for scalable instance search. The second paper explores using region-level features from Faster R-CNN models for instance search and compares different fine-tuning strategies. The document outlines the methodology, experiments on standard datasets, and conclusions from both papers.
This document summarizes a project on using multiple low-resolution images to generate a single high-resolution image using super-resolution techniques. It describes registering the low-resolution images using motion estimation to align them, then projecting the images onto a high-resolution grid using interpolation methods like the Papoulis-Gerchberg algorithm. The document presents results on synthetic and real-world images, showing the super-resolution technique can generate higher quality images than interpolation, but requires accurate registration of the low-resolution inputs.
This document discusses the design of benchmark imagery for validating facility annotation algorithms. It notes that previous benchmarks are inadequate and that developing benchmarks requires considering many intrinsic and extrinsic factors. The authors propose using real imagery annotated by experts, composite imagery blending models into real scenes, and synthetic imagery. Creating these benchmarks comes with costs for data acquisition, modeling, rendering, and expert annotation. Future work includes automated synthetic facility generation and developing representations for encoding facility knowledge.
This document discusses the design of benchmark imagery for validating facility annotation algorithms. It notes that previous benchmarks are inadequate and that developing benchmarks requires considering many intrinsic and extrinsic factors. The authors propose using real imagery annotated by experts, composite imagery blending models into real scenes, and synthetic imagery. Creating these benchmarks comes with costs for data acquisition, modeling, rendering, and expert annotation. Future work includes developing ontologies to represent facility components and automated generation of synthetic facilities. The document emphasizes that designing comprehensive benchmarks for geospatial algorithm validation is challenging due to the many factors involved and the complexity of real-world scenes.
This document discusses a motion graphic exploration project created by Kim Chambers. It contains images and a final video from the motion graphic exploration project. The project involved creating motion graphics and compiling them into a final video presentation.
This document summarizes a tutorial on large-scale visual recognition. It begins by outlining common visual recognition tasks like image retrieval and classification. It then describes several benchmark datasets used for these tasks and how they have increased in scale over time. The document discusses how approaches to collecting large classification datasets have evolved from relying on manual post-processing to using automated methods or crowdsourcing. It notes a convergence between techniques originally developed for image retrieval and classification. The goals of the tutorial are described as providing tools to handle large-scale datasets and showing this convergence between retrieval and classification approaches.
This document discusses techniques for instance search using convolutional neural network features. It presents two papers by the author on this topic. The first paper uses bags-of-visual-words to encode convolutional features for scalable instance search. The second paper explores using region-level features from Faster R-CNN models for instance search and compares different fine-tuning strategies. The document outlines the methodology, experiments on standard datasets, and conclusions from both papers.
This document summarizes a project on using multiple low-resolution images to generate a single high-resolution image using super-resolution techniques. It describes registering the low-resolution images using motion estimation to align them, then projecting the images onto a high-resolution grid using interpolation methods like the Papoulis-Gerchberg algorithm. The document presents results on synthetic and real-world images, showing the super-resolution technique can generate higher quality images than interpolation, but requires accurate registration of the low-resolution inputs.
This document discusses the design of benchmark imagery for validating facility annotation algorithms. It notes that previous benchmarks are inadequate and that developing benchmarks requires considering many intrinsic and extrinsic factors. The authors propose using real imagery annotated by experts, composite imagery blending models into real scenes, and synthetic imagery. Creating these benchmarks comes with costs for data acquisition, modeling, rendering, and expert annotation. Future work includes automated synthetic facility generation and developing representations for encoding facility knowledge.
This document discusses the design of benchmark imagery for validating facility annotation algorithms. It notes that previous benchmarks are inadequate and that developing benchmarks requires considering many intrinsic and extrinsic factors. The authors propose using real imagery annotated by experts, composite imagery blending models into real scenes, and synthetic imagery. Creating these benchmarks comes with costs for data acquisition, modeling, rendering, and expert annotation. Future work includes developing ontologies to represent facility components and automated generation of synthetic facilities. The document emphasizes that designing comprehensive benchmarks for geospatial algorithm validation is challenging due to the many factors involved and the complexity of real-world scenes.
This document discusses the design of benchmark imagery for validating facility annotation algorithms. It notes that previous benchmarks are inadequate and that developing benchmarks requires considering many intrinsic and extrinsic factors. The authors propose using real imagery annotated by experts, composite imagery blending models into real scenes, and synthetic imagery. Creating these benchmarks comes with costs for data acquisition, modeling, rendering, and expert annotation. Future work includes automated synthetic facility generation and validation methodology. The document emphasizes that benchmark design is difficult due to many scene factors and objects to represent, and that real, composite and synthetic imagery each have roles in comprehensive validation and verification.
This document discusses the design of benchmark imagery for validating facility annotation algorithms. It notes that previous benchmarks are inadequate and that developing benchmarks requires considering many intrinsic and extrinsic factors. The authors propose using real imagery annotated by experts, composite imagery blending models into real scenes, and synthetic imagery. Creating these benchmarks comes with costs for data acquisition, modeling, rendering, and expert annotation. Future work includes developing ontologies to represent facility components and automated generation of synthetic facilities. The document emphasizes that designing comprehensive benchmarks for geospatial algorithm validation is challenging due to the many factors involved and the complexity of facility scenes.
1) The document discusses using data in deep learning models, including understanding the limitations of data and how it is acquired.
2) It describes techniques for image matching using multi-view geometry, including finding corresponding points across images and triangulating them to determine camera pose.
3) Recent works aim to improve localization of objects in images using multiple instance learning approaches that can learn without full supervision or through more stable optimization methods like linearizing sampling operations.
Iccv2009 recognition and learning object categories p2 c03 - objects and an...zukun
The document discusses research at the intersection of vision and language processing in the human brain. It describes how different areas of the brain are involved in processing vision and language, including areas responsible for object recognition (LOC) and face recognition (FFA). It also discusses early work using simple images to understand how humans can quickly summarize a visual scene in a sentence after only brief exposures.
Searching Images: Recent research at SouthamptonJonathon Hare
Intelligence, Agents, Multimedia Seminar series. University of Southampton. 7th March 2011.
Southampton has a long history of research in the areas of multimedia information analysis. This talk will focus on some of the recent work we have been involved with in the area of image search. The talk will
start by looking at how image content can be represented in ways analogous to textual information and how techniques developed for indexing text can be adapted to images. In particular, the talk will introduce ImageTerrier, a research platform for image retrieval that is built around the University of Glasgow's Terrier text retrieval software. The talk will also cover some of our recent work on image classification and image search result diversification.
This document discusses camera models and image formation. It begins by describing the pinhole camera model and how a pinhole creates an image by blocking most light rays. A lens is then introduced to focus light rays onto film. Projection using the pinhole camera is modeled mathematically using homogeneous coordinates. Perspective projection is introduced, along with the camera projection matrix which models the camera's intrinsics and extrinsics. Distortion effects like radial distortion are also covered.
Searching Images: Recent research at SouthamptonJonathon Hare
Knowledge Media Institute seminar series. The Open University. 23rd March 2011.
Southampton has a long history of research in the areas of multimedia information analysis. This talk will focus on some of the recent work we have been involved with in the area of image search. The talk will start by looking at how image content can be represented in ways analogous to textual information and how techniques developed for indexing text can be adapted to images. In particular, the talk will introduce ImageTerrier, a research platform for image retrieval that is built around the University of Glasgow's Terrier text retrieval software. The talk will also cover some of our recent work on image classification and image search result diversification.
This document provides an introduction to visual search using neural networks. It discusses using a triplet loss function to learn an embedding space where similar images are closer together. The document outlines training a convolutional neural network (CNN) on triplets of images to learn the embedding function. It also discusses approaches for visual search with limited resources, such as fine-tuning a pre-trained CNN like Inception-V3 and evaluating nearest neighbors in the embedding space.
Modelling User Interaction utilising Information Foraging Theory (and a bit o...Ingo Frommholz
The document discusses modelling user interaction with information using Information Foraging Theory and quantum theory. It summarizes research applying IFT to content-based image recommendation and query auto-completion in image search. A quantum-inspired model is proposed to represent user interaction as a state change in a Hilbert space, where subspaces represent queries, images, and image patches. User feedback projects the information need vector onto relevant subspaces, updating probabilities. This provides a framework for multimodal query auto-completion accounting for user interaction.
This document discusses using large image datasets and context to understand scenes and objects. It proposes using millions of internet images to generate proposals for image completion and labeling based on nearest visual neighbors. Location metadata from geotagged images can provide context without object labels. Event prediction and video synthesis is demonstrated by retrieving relevant images from large collections to construct new videos based on a text query. Overall it argues that large internet-scale image collections provide rich context that can be leveraged for computer vision tasks through data-driven approaches rather than explicit modeling.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
Information Visualisation (Multimedia 2009 course)Joris Klerkx
This document summarizes research on information visualization techniques for exploring abstract data. It discusses how visualization can amplify cognition by using interactive visual representations. Several techniques are examined, including tree maps and node-link graphs for displaying hierarchical and network data. Case studies describe prototypes that apply these techniques to domains like learning object repositories and social bookmarking data. The prototypes are evaluated through expert reviews and user studies to assess their effectiveness and usability. Pointers to related libraries and readings are also provided.
Blind Verification of Digital Image Originality: A Statistical ApproachLar21
This paper presents a statistical approach to handle information noise in databases consisting of unguaranteed images. In other words, this approach can identify which images fingerprints belong to a given device, and which images fingerprints has been generated by a software, i.e., whether the image was modified by a software. Accordingly, this approach is used to determine which images are original which it is a critical task for forensics methods that requires large-scale collections of reliable data.
The document describes the creation of a logo recognition system using two approaches: a bag of features model and convolutional neural networks. It outlines extracting features from images using SURF, clustering the features into visual words with k-means, building an inverted file index for classification, and re-ranking results. It also discusses using pre-trained CNN models like AlexNet and GoogLeNet for fine-tuning to the logo datasets. The goal is to build an app that can recognize logos in images and provide company information.
The document provides an overview of deep learning and reinforcement learning. It discusses the current state of artificial intelligence and machine learning, including how deep learning algorithms have achieved human-level performance in various tasks such as image recognition and generation. Reinforcement learning is introduced as learning through trial-and-error interactions with an environment to maximize rewards. Examples are given of reinforcement learning algorithms solving tasks like playing Atari games.
CVML2011: human action recognition (Ivan Laptev)zukun
This document provides an overview of a lecture on human action recognition. It discusses the historic motivation for studying human motion from early studies in art and biomechanics to modern applications in motion capture and video editing. It also covers challenges in human pose estimation and recent advances in appearance-based, motion-based, and space-time methods for recognizing human actions in images and videos. The lecture focuses on key approaches like pictorial structures, motion history images, and space-time features.
발표자: 이광무(빅토리아대 교수)
발표일: 2018.7.
Local features are one of the core building blocks of Computer Vision, used in various tasks such as Image Retrieval, Visual Tracking, Image Registration, and Image Matching. Especially for geometric applications, that is, finding the pose of the camera from images, it still remains the state of the art, even in the era of Deep Learining.
In this talk, I will introduce our recent works on using local features, as well as a recent self-supervised pipeline that learns to keypoints that are used to match images from scratch. I will first talk about how to learn to find good correspondences, then talk about how to go through losses based on eigen decomposition, which is essential when retrieving the camera pose. I will also introduce our latest local feature pipeline, which is self-supervised inspired by reinforcement learning.
Fake It While We Make It (Data-Driven Prototyping)Ryan LaBouve
“Can you make this?” Our manager slides over a screenshot from Tron/Oblivion/Any Sci-fi movie with a sticky note that says “our data here” and pitches us an idea.
We are stuck between a cool idea and tons of unknowns. We know that the availability, accuracy, and nature of our data will directly affect the quality and outcome of our project. So how do we begin developing? We fake it while we make it.
This is what the talk is about: the importance of developing with fake data, the types of data we can fake, and some useful strategies for getting up and running with fake data.
http://www.okcruby.org/blog/2015/02/05/february-2015-meeting/
Spot the Dog: An overview of semantic retrieval of unannotated images in the ...Jonathon Hare
This document discusses using computational techniques to semantically retrieve unannotated images by enabling textual search of imagery without metadata. It describes:
1) Using exemplar image/metadata pairs to learn relationships between visual features and metadata, then projecting this to retrieve unannotated images.
2) Representing images as "visual terms" like words in text.
3) Creating a multidimensional "semantic space" where related images, terms and keywords are placed closely together based on training. This allows retrieving unannotated images that lie near descriptive keywords.
4) Experimental retrieval results on a Corel dataset, showing the approach works better for keywords associated with colors than others. The approach takes progress but significant challenges remain.
This document describes a content-based video retrieval system. It discusses how the system works in two phases: a database population phase that detects shot boundaries, selects key frames, and extracts low-level features; and an image retrieval phase that allows querying by example, color histograms, shape histograms, and combined categories. Some challenges are also noted, such as handling multi-layer images and improving human-guided relevance feedback. The overall goal of the system is to enable more accurate searching of growing video archives online through analysis of visual content rather than just text descriptions.
This document discusses the design of benchmark imagery for validating facility annotation algorithms. It notes that previous benchmarks are inadequate and that developing benchmarks requires considering many intrinsic and extrinsic factors. The authors propose using real imagery annotated by experts, composite imagery blending models into real scenes, and synthetic imagery. Creating these benchmarks comes with costs for data acquisition, modeling, rendering, and expert annotation. Future work includes automated synthetic facility generation and validation methodology. The document emphasizes that benchmark design is difficult due to many scene factors and objects to represent, and that real, composite and synthetic imagery each have roles in comprehensive validation and verification.
This document discusses the design of benchmark imagery for validating facility annotation algorithms. It notes that previous benchmarks are inadequate and that developing benchmarks requires considering many intrinsic and extrinsic factors. The authors propose using real imagery annotated by experts, composite imagery blending models into real scenes, and synthetic imagery. Creating these benchmarks comes with costs for data acquisition, modeling, rendering, and expert annotation. Future work includes developing ontologies to represent facility components and automated generation of synthetic facilities. The document emphasizes that designing comprehensive benchmarks for geospatial algorithm validation is challenging due to the many factors involved and the complexity of facility scenes.
1) The document discusses using data in deep learning models, including understanding the limitations of data and how it is acquired.
2) It describes techniques for image matching using multi-view geometry, including finding corresponding points across images and triangulating them to determine camera pose.
3) Recent works aim to improve localization of objects in images using multiple instance learning approaches that can learn without full supervision or through more stable optimization methods like linearizing sampling operations.
Iccv2009 recognition and learning object categories p2 c03 - objects and an...zukun
The document discusses research at the intersection of vision and language processing in the human brain. It describes how different areas of the brain are involved in processing vision and language, including areas responsible for object recognition (LOC) and face recognition (FFA). It also discusses early work using simple images to understand how humans can quickly summarize a visual scene in a sentence after only brief exposures.
Searching Images: Recent research at SouthamptonJonathon Hare
Intelligence, Agents, Multimedia Seminar series. University of Southampton. 7th March 2011.
Southampton has a long history of research in the areas of multimedia information analysis. This talk will focus on some of the recent work we have been involved with in the area of image search. The talk will
start by looking at how image content can be represented in ways analogous to textual information and how techniques developed for indexing text can be adapted to images. In particular, the talk will introduce ImageTerrier, a research platform for image retrieval that is built around the University of Glasgow's Terrier text retrieval software. The talk will also cover some of our recent work on image classification and image search result diversification.
This document discusses camera models and image formation. It begins by describing the pinhole camera model and how a pinhole creates an image by blocking most light rays. A lens is then introduced to focus light rays onto film. Projection using the pinhole camera is modeled mathematically using homogeneous coordinates. Perspective projection is introduced, along with the camera projection matrix which models the camera's intrinsics and extrinsics. Distortion effects like radial distortion are also covered.
Searching Images: Recent research at SouthamptonJonathon Hare
Knowledge Media Institute seminar series. The Open University. 23rd March 2011.
Southampton has a long history of research in the areas of multimedia information analysis. This talk will focus on some of the recent work we have been involved with in the area of image search. The talk will start by looking at how image content can be represented in ways analogous to textual information and how techniques developed for indexing text can be adapted to images. In particular, the talk will introduce ImageTerrier, a research platform for image retrieval that is built around the University of Glasgow's Terrier text retrieval software. The talk will also cover some of our recent work on image classification and image search result diversification.
This document provides an introduction to visual search using neural networks. It discusses using a triplet loss function to learn an embedding space where similar images are closer together. The document outlines training a convolutional neural network (CNN) on triplets of images to learn the embedding function. It also discusses approaches for visual search with limited resources, such as fine-tuning a pre-trained CNN like Inception-V3 and evaluating nearest neighbors in the embedding space.
Modelling User Interaction utilising Information Foraging Theory (and a bit o...Ingo Frommholz
The document discusses modelling user interaction with information using Information Foraging Theory and quantum theory. It summarizes research applying IFT to content-based image recommendation and query auto-completion in image search. A quantum-inspired model is proposed to represent user interaction as a state change in a Hilbert space, where subspaces represent queries, images, and image patches. User feedback projects the information need vector onto relevant subspaces, updating probabilities. This provides a framework for multimodal query auto-completion accounting for user interaction.
This document discusses using large image datasets and context to understand scenes and objects. It proposes using millions of internet images to generate proposals for image completion and labeling based on nearest visual neighbors. Location metadata from geotagged images can provide context without object labels. Event prediction and video synthesis is demonstrated by retrieving relevant images from large collections to construct new videos based on a text query. Overall it argues that large internet-scale image collections provide rich context that can be leveraged for computer vision tasks through data-driven approaches rather than explicit modeling.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
Information Visualisation (Multimedia 2009 course)Joris Klerkx
This document summarizes research on information visualization techniques for exploring abstract data. It discusses how visualization can amplify cognition by using interactive visual representations. Several techniques are examined, including tree maps and node-link graphs for displaying hierarchical and network data. Case studies describe prototypes that apply these techniques to domains like learning object repositories and social bookmarking data. The prototypes are evaluated through expert reviews and user studies to assess their effectiveness and usability. Pointers to related libraries and readings are also provided.
Blind Verification of Digital Image Originality: A Statistical ApproachLar21
This paper presents a statistical approach to handle information noise in databases consisting of unguaranteed images. In other words, this approach can identify which images fingerprints belong to a given device, and which images fingerprints has been generated by a software, i.e., whether the image was modified by a software. Accordingly, this approach is used to determine which images are original which it is a critical task for forensics methods that requires large-scale collections of reliable data.
The document describes the creation of a logo recognition system using two approaches: a bag of features model and convolutional neural networks. It outlines extracting features from images using SURF, clustering the features into visual words with k-means, building an inverted file index for classification, and re-ranking results. It also discusses using pre-trained CNN models like AlexNet and GoogLeNet for fine-tuning to the logo datasets. The goal is to build an app that can recognize logos in images and provide company information.
The document provides an overview of deep learning and reinforcement learning. It discusses the current state of artificial intelligence and machine learning, including how deep learning algorithms have achieved human-level performance in various tasks such as image recognition and generation. Reinforcement learning is introduced as learning through trial-and-error interactions with an environment to maximize rewards. Examples are given of reinforcement learning algorithms solving tasks like playing Atari games.
CVML2011: human action recognition (Ivan Laptev)zukun
This document provides an overview of a lecture on human action recognition. It discusses the historic motivation for studying human motion from early studies in art and biomechanics to modern applications in motion capture and video editing. It also covers challenges in human pose estimation and recent advances in appearance-based, motion-based, and space-time methods for recognizing human actions in images and videos. The lecture focuses on key approaches like pictorial structures, motion history images, and space-time features.
발표자: 이광무(빅토리아대 교수)
발표일: 2018.7.
Local features are one of the core building blocks of Computer Vision, used in various tasks such as Image Retrieval, Visual Tracking, Image Registration, and Image Matching. Especially for geometric applications, that is, finding the pose of the camera from images, it still remains the state of the art, even in the era of Deep Learining.
In this talk, I will introduce our recent works on using local features, as well as a recent self-supervised pipeline that learns to keypoints that are used to match images from scratch. I will first talk about how to learn to find good correspondences, then talk about how to go through losses based on eigen decomposition, which is essential when retrieving the camera pose. I will also introduce our latest local feature pipeline, which is self-supervised inspired by reinforcement learning.
Fake It While We Make It (Data-Driven Prototyping)Ryan LaBouve
“Can you make this?” Our manager slides over a screenshot from Tron/Oblivion/Any Sci-fi movie with a sticky note that says “our data here” and pitches us an idea.
We are stuck between a cool idea and tons of unknowns. We know that the availability, accuracy, and nature of our data will directly affect the quality and outcome of our project. So how do we begin developing? We fake it while we make it.
This is what the talk is about: the importance of developing with fake data, the types of data we can fake, and some useful strategies for getting up and running with fake data.
http://www.okcruby.org/blog/2015/02/05/february-2015-meeting/
Spot the Dog: An overview of semantic retrieval of unannotated images in the ...Jonathon Hare
This document discusses using computational techniques to semantically retrieve unannotated images by enabling textual search of imagery without metadata. It describes:
1) Using exemplar image/metadata pairs to learn relationships between visual features and metadata, then projecting this to retrieve unannotated images.
2) Representing images as "visual terms" like words in text.
3) Creating a multidimensional "semantic space" where related images, terms and keywords are placed closely together based on training. This allows retrieving unannotated images that lie near descriptive keywords.
4) Experimental retrieval results on a Corel dataset, showing the approach works better for keywords associated with colors than others. The approach takes progress but significant challenges remain.
This document describes a content-based video retrieval system. It discusses how the system works in two phases: a database population phase that detects shot boundaries, selects key frames, and extracts low-level features; and an image retrieval phase that allows querying by example, color histograms, shape histograms, and combined categories. Some challenges are also noted, such as handling multi-layer images and improving human-guided relevance feedback. The overall goal of the system is to enable more accurate searching of growing video archives online through analysis of visual content rather than just text descriptions.
How to Make a Field Mandatory in Odoo 17Celine George
In Odoo, making a field required can be done through both Python code and XML views. When you set the required attribute to True in Python code, it makes the field required across all views where it's used. Conversely, when you set the required attribute in XML views, it makes the field required only in the context of that particular view.
हिंदी वर्णमाला पीपीटी, hindi alphabet PPT presentation, hindi varnamala PPT, Hindi Varnamala pdf, हिंदी स्वर, हिंदी व्यंजन, sikhiye hindi varnmala, dr. mulla adam ali, hindi language and literature, hindi alphabet with drawing, hindi alphabet pdf, hindi varnamala for childrens, hindi language, hindi varnamala practice for kids, https://www.drmullaadamali.com
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...Diana Rendina
Librarians are leading the way in creating future-ready citizens – now we need to update our spaces to match. In this session, attendees will get inspiration for transforming their library spaces. You’ll learn how to survey students and patrons, create a focus group, and use design thinking to brainstorm ideas for your space. We’ll discuss budget friendly ways to change your space as well as how to find funding. No matter where you’re at, you’ll find ideas for reimagining your space in this session.
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
This presentation was provided by Steph Pollock of The American Psychological Association’s Journals Program, and Damita Snow, of The American Society of Civil Engineers (ASCE), for the initial session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session One: 'Setting Expectations: a DEIA Primer,' was held June 6, 2024.
Main Java[All of the Base Concepts}.docxadhitya5119
This is part 1 of my Java Learning Journey. This Contains Custom methods, classes, constructors, packages, multithreading , try- catch block, finally block and more.
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
Chapter wise All Notes of First year Basic Civil Engineering.pptxDenish Jangid
Chapter wise All Notes of First year Basic Civil Engineering
Syllabus
Chapter-1
Introduction to objective, scope and outcome the subject
Chapter 2
Introduction: Scope and Specialization of Civil Engineering, Role of civil Engineer in Society, Impact of infrastructural development on economy of country.
Chapter 3
Surveying: Object Principles & Types of Surveying; Site Plans, Plans & Maps; Scales & Unit of different Measurements.
Linear Measurements: Instruments used. Linear Measurement by Tape, Ranging out Survey Lines and overcoming Obstructions; Measurements on sloping ground; Tape corrections, conventional symbols. Angular Measurements: Instruments used; Introduction to Compass Surveying, Bearings and Longitude & Latitude of a Line, Introduction to total station.
Levelling: Instrument used Object of levelling, Methods of levelling in brief, and Contour maps.
Chapter 4
Buildings: Selection of site for Buildings, Layout of Building Plan, Types of buildings, Plinth area, carpet area, floor space index, Introduction to building byelaws, concept of sun light & ventilation. Components of Buildings & their functions, Basic concept of R.C.C., Introduction to types of foundation
Chapter 5
Transportation: Introduction to Transportation Engineering; Traffic and Road Safety: Types and Characteristics of Various Modes of Transportation; Various Road Traffic Signs, Causes of Accidents and Road Safety Measures.
Chapter 6
Environmental Engineering: Environmental Pollution, Environmental Acts and Regulations, Functional Concepts of Ecology, Basics of Species, Biodiversity, Ecosystem, Hydrological Cycle; Chemical Cycles: Carbon, Nitrogen & Phosphorus; Energy Flow in Ecosystems.
Water Pollution: Water Quality standards, Introduction to Treatment & Disposal of Waste Water. Reuse and Saving of Water, Rain Water Harvesting. Solid Waste Management: Classification of Solid Waste, Collection, Transportation and Disposal of Solid. Recycling of Solid Waste: Energy Recovery, Sanitary Landfill, On-Site Sanitation. Air & Noise Pollution: Primary and Secondary air pollutants, Harmful effects of Air Pollution, Control of Air Pollution. . Noise Pollution Harmful Effects of noise pollution, control of noise pollution, Global warming & Climate Change, Ozone depletion, Greenhouse effect
Text Books:
1. Palancharmy, Basic Civil Engineering, McGraw Hill publishers.
2. Satheesh Gopi, Basic Civil Engineering, Pearson Publishers.
3. Ketki Rangwala Dalal, Essentials of Civil Engineering, Charotar Publishing House.
4. BCP, Surveying volume 1
বাংলাদেশের অর্থনৈতিক সমীক্ষা ২০২৪ [Bangladesh Economic Review 2024 Bangla.pdf] কম্পিউটার , ট্যাব ও স্মার্ট ফোন ভার্সন সহ সম্পূর্ণ বাংলা ই-বুক বা pdf বই " সুচিপত্র ...বুকমার্ক মেনু 🔖 ও হাইপার লিংক মেনু 📝👆 যুক্ত ..
আমাদের সবার জন্য খুব খুব গুরুত্বপূর্ণ একটি বই ..বিসিএস, ব্যাংক, ইউনিভার্সিটি ভর্তি ও যে কোন প্রতিযোগিতা মূলক পরীক্ষার জন্য এর খুব ইম্পরট্যান্ট একটি বিষয় ...তাছাড়া বাংলাদেশের সাম্প্রতিক যে কোন ডাটা বা তথ্য এই বইতে পাবেন ...
তাই একজন নাগরিক হিসাবে এই তথ্য গুলো আপনার জানা প্রয়োজন ...।
বিসিএস ও ব্যাংক এর লিখিত পরীক্ষা ...+এছাড়া মাধ্যমিক ও উচ্চমাধ্যমিকের স্টুডেন্টদের জন্য অনেক কাজে আসবে ...
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
ICVSS2011 Selected Presentations
1. ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg
ICVSS 2011: Selected Presentations
Angel Cruz and Andrea Rueda
BioIngenium Research Group, Universidad Nacional de Colombia
August 25, 2011
Angel Cruz and Andrea Rueda — ICVSS 2011: Selected Presentations
2. ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg
Outline
1 ICVSS 2011
2 A Trillion Photos - Steven Seitz
3 Efficient Novel Class Recognition and Search - Lorenzo
Torresani
4 The Life of Structured Learned Dictionaries - Guillermo Sapiro
5 Image Rearrangement & Video Synopsis - Shmuel Peleg
Angel Cruz and Andrea Rueda — ICVSS 2011: Selected Presentations
3. ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg
Outline
1 ICVSS 2011
2 A Trillion Photos - Steven Seitz
3 Efficient Novel Class Recognition and Search - Lorenzo
Torresani
4 The Life of Structured Learned Dictionaries - Guillermo Sapiro
5 Image Rearrangement & Video Synopsis - Shmuel Peleg
Angel Cruz and Andrea Rueda — ICVSS 2011: Selected Presentations
4. ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg
ICVSS 2011
International Computer Vision Summer School
15 speakers, from USA, France, UK, Italy, Prague and Israel
Angel Cruz and Andrea Rueda — ICVSS 2011: Selected Presentations
5. ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg
ICVSS 2011
International Computer Vision Summer School
Angel Cruz and Andrea Rueda — ICVSS 2011: Selected Presentations
6. ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg
ICVSS 2011
International Computer Vision Summer School
Angel Cruz and Andrea Rueda — ICVSS 2011: Selected Presentations
7. ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg
Outline
1 ICVSS 2011
2 A Trillion Photos - Steven Seitz
3 Efficient Novel Class Recognition and Search - Lorenzo
Torresani
4 The Life of Structured Learned Dictionaries - Guillermo Sapiro
5 Image Rearrangement & Video Synopsis - Shmuel Peleg
Angel Cruz and Andrea Rueda — ICVSS 2011: Selected Presentations
8. A Trillion Photos
Steve Seitz
University of Washington
Google
Sicily Computer Vision Summer School
July 11, 2011
9. Facebook
>3 billion uploaded each month
~ trillion photos taken each year
10. What do you do with a trillion photos?
Digital Shoebox
(hard drives, iphoto, facebook...)
29. Reconstructing Rome
In a day...
From ~1M images
Using ~1000 cores
Sameer Agarwal, Noah Snavely, Rick Szeliski, Steve Seitz
http://grail.cs.washington.edu/rome
47. ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg
Outline
1 ICVSS 2011
2 A Trillion Photos - Steven Seitz
3 Efficient Novel Class Recognition and Search - Lorenzo
Torresani
4 The Life of Structured Learned Dictionaries - Guillermo Sapiro
5 Image Rearrangement & Video Synopsis - Shmuel Peleg
Angel Cruz and Andrea Rueda — ICVSS 2011: Selected Presentations
49. Problem statement:
novel object-class search
• Given: image database user-provided images
(e.g., 1 million photos) of an object class
+
• Want:
database • no text/tags available
images • query images may
of this class represent a novel class
50. Application: Web-powered visual search
in unlabeled personal photos
Goal: Find “soccer camp”
pictures on my computer
1 1 Search the Web for images
of “soccer camp”
2 Find images of this visual class
on my computer
2
52. RBM predictedpredicted labels (47%)
RBM labels (47%)
Relation to other tasks sky sky
building building
tree
bed
tree
bed
car car
novel class
road road
Input search Ground truth neighbors
image image
Input Ground truth neighbors 32−RBM 32−RBM 16384-gist
1
query retrieved
image retrieval object categorizationshowingitperce
Figure 6. 6. Curves showing per
Figure Curves
query images that make it int
query images that make into
ofof the query for 1400 image
the query for a a 1400 imag
to 5% of the database size.
upup to 5% of the database siz
analogies: RBM predictedpredicted labels (56%)
RBM labels (56%) crucial for scalable retrieval th
crucial for scalable retrieval
- large databases tree
from [Nister and Stewenius, ’07]
tree sky sky
database make it it to the very
database make to the very to
is is feasible only for a tiny f
feasible only for a tiny fra
- efficient indexing database grows large. Hence, w
database grows large. Hence,
building building the curves meet the y-axis. T
the curves meet the y-axis.
- compact representation (a) car car given in in Table 1 for larger n
given Table 1 for a a larger
sidewalk sidewalkcrosswalkcrosswalk conclusions can bebe drawn from
conclusions can drawn from
road road improves retrieval performance
improves retrieval performan
differences: from neighbors et al., ’07] performance than vocabularies.1
performance than 2 -norm. En
L L2 -norm.
Input image imageGround truth [Philbinneighbors 32−RBM 32−RBM vocabularies. O
Input least for smaller 16384-gist
- simple notions of visual Ground truth least for smaller
gives much better performance th
gives much better performance
(b)
relevancy is is setting T.
setting T.
(e.g., near-duplicate,
same object instance, settings used by [17].
settings used by [17].
The performance with vav
The performance with
same spatial layout) (c)
RBM predictedpredicted labels (63%) [Torralba et al., ’08]
RBM labels (63%) from on the full 6376 image databa
on the full 6376 image data
the scores decrease with inc
the scores decrease with in
ceiling ceiling
are more images toto confus
are more images confuse
Figure Thewall retrieval performance is is evaluated using a large
wall performance evaluated using a large
Figure 5. 5. The retrieval ofof the vocabulary tree is sh
the vocabulary tree is show
ground truth database (6376 images) with groups ofof four images
ground truth database (6376 images) with groups four images
door door defining the vocabulary tree
defining the vocabulary tre
poster poster
53. Relation to other tasks
novel class
search
image retrieval object classification
analogies: analogies:
- large databases - recognition of object
- efficient indexing classes from a few examples
- compact representation
differences:
differences: - classes to recognize are
- simple notions of visual defined a priori
relevancy - training and recognition
(e.g., near-duplicate, time is unimportant
same object instance, - storage of features is not an
same spatial layout) issue
54. Technical requirements of
novel class-search
• The object classifier must be learned on the fly from
few examples
• Recognition in the database must have low
computational cost
• Image descriptors must be compact to allow
storage in memory
55. State-of-the-art in
object classification
Winning recipe: many features + non-linear classifiers
(e.g. [Gehler and Nowozin, CVPR’09])
non-linear
!"#$%
decision boundary
!"#$%&#'()*
+&,-)&.&#(#/*
...
01#-2"#*
&'()*+),%%
-'.,()*+/%
#"0$%
56. Model evaluation on Caltech256
45
40
gist
35 phog
phog2pi
30
accuracy (%)
ssim
25 bow5000
20
!"#$%&'()*$+'
15
,
10 '"#*"-"*.%+'/$%0.&$1
5
0
0 5 10 15 20 25 30
number of training examples
57. Model evaluation on Caltech256
45
40 gist
phog
35
phog2pi
30 ssim
accuracy (%)
bow5000 !"#$%&'()*$+',
25 linear combination
/$%0.&$'2)(3"#%4)#
20
!"#$%&'()*$+'
15
,
10 '"#*"-"*.%+'/$%0.&$1
5
0
0 5 10 15 20 25 30
number of training examples
58. Model evaluation on Caltech256
5)#6+"#$%&'()*$+',
45 /$%0.&$'2)(3"#%4)#'
40 7%898%8':.+4;+$'<$&#$+'
gist !$%&#"#=>'
35 phog ?@$A+$&'B'5)C)D"#E'FGH
phog2pi
30
accuracy (%)
ssim
25 bow5000
!"#$%&'()*$+',
linear combination /$%0.&$'2)(3"#%4)#
20 nonlinear combination
!"#$%&'()*$+'
15
,
10 '"#*"-"*.%+'/$%0.&$1
5
0
0 5 10 15 20 25 30
number of training examples
59. Multiple kernel combiners
Classification output is obtained by combining many features via
non-linear kernels:
F
N
h(x) = βf kf (x, xn )αn + b
f =1 n=1
sum over features sum over training examples
!#$%
...
where
'()*+),%%
-'.,()*+/%
#0$%
60. m=1
s. For a kernel function k between a SVM.
he short-hand notation
Training Same as for averaging.
= k(fm (x), fm (x )),
Multiple con- 4. Methods: Multiple Kernel Learning
kernel learning (MKL)
nel km : X × X → R only
espect to image feature fal., 2004; Sonnenburg etapproach toVarma and Ray, 2007] is to
[Bach et m . If the Another al., 2006; perform kernel selection
to a certain aspect, say, it only con- a kernel combination during the training phase of th
gorithm. jointly optimizing over
Learning a non-linear SVM by One prominent instance of this class is MKL
on, then the kernel measures simi-
F
a linear combinati
to this aspect. The subscript m of
nderstood as a linear combinationobjective ∗ (x, x ) k=(x, x ) =β over(x,fx ) x ) the par
1. indexing into the set of kernels k
is to optimize jointly
of kernels: ∗ F β k (x,
km f and
m
m=1 f =1
2. the SVM parameters: α ∈ RN and b ∈ R of an SVM.
ters
notational convenience, we will de- MKL was originally introduced in [1]. For efficiency
e of the m’th feature for a given
F in order N obtain sparse, F
to interpretable coefficients,
F
raining samples xi , i = 1, 1 . . . , N
min βf αT Kf α stricts βm ≥ 0 and ,imposes thefconstraintT α βm
+ C L yn b + β Kf (xn ) m=1
α,β,b 2 Since the scope of this paper is to access the applicab
f =1 n=1 f =1
of MKL to feature combination rather than its optimiz
), km (x, x2 ), . . . , km (x, xN )]T .
F part we opted to present the MKL formulations in a wa
aining sample, i.e. x = xi , then = 1,lowing for easier 1, . . . , F
subject to βf βf ≥ 0, f = comparison with the other methods
h column of the m’th kernel matrix.f =1 write its objective function as
F
ernel selection In this papert) = max(0, 1 − yt) 1
where L(y, we
min βm αT Km α
classifiers that aim to combine sev- 2 m=1
Kf (x) = [kf (x, x1 ), kf (x, x2 ), . . . , kf (x, xN )]T
α,β,b
e model. Since we associate image
N F
ctions, kernel combination/selection
+C L(yi , b + βm Km (x)T α)
61. LP-β: a two-stage approach to MKL
! [Gehler and Nowozin, 2009]
• Classification output of traditional MKL:
F
N
hM KL (x) = βf kf (x, xn )αn + b
f =1 n=1
• Classification function of LP-β:
F
N
h(x) = βf kf (x, xn )αf n + bf
f =1
n=1
hf (x)
Two-stage training procedure:
1. train each hf (x) independently → traditional SVM learning
2. optimize over β → a simple linear program
62. LP-β for novel-class search?
The LP-β classifier:
F
N
h(x) = βf kf (x, xn )αf n + bf
f =1 n=1
sum over features sum over training examples
Unsuitable for our needs due to:
• large storage requirements (typically over 20K bytes/image)
• costly evaluation (requires query-time kernel distance
computation for each test image)
• costly training (1+ minute for O(10) training examples)
63. Classemes: a compact descriptor for
efficient recognition [Torresani et al., 2010]
!
Key-idea: represent each image x in terms of its “closeness”
to a set of basis classes (“classemes”)
x
Φ(x) = [φ1 (x), . . . , φC (x)]T
F
N
φc (x) = hclassemec (x) = c
βf kf (x, xc )αn + bc
n
c
f =1 n=1
output of a pre-learned LP-β for the c-th basis class
Φ(x1 ) ... Φ(xN )
Query-time learning: training
examples of
train a linear classifier on Φ(x) novel class
C
F
N
g duck (Φ(x); wduck ) = Φ(x)T wduck = wc
duck c
βf kf (x, xc )αn + bc
n
c
c=1
f =1 n=1
LP-β trained before the
trained at query-time
creation of the database
64. How this works...
Efficient Object Category Recognition Using Classemes 777
• Accurate weighted classemes. Five classemes with the highest LP-β weights
Table 1. Highly
semantic labels are not required...
to
•make semantic sense, but it should bejust used that detectors may create
for the retrieval experiment, for a selection of Caltech 256 categories. Somefor appear
Classeme classifiers are emphasized as our goal is simply to
specific patterns of texture, color, shape, etc.
a useful feature vector, not to assign semantic labels. The somewhat peculiar classeme
labels reflect the ontology used as a source of base categories.
!#$%'()*+$ ,-(./+$#-(.'0$%/1121$
%)#3)+4.'$ !#$% '()*%'+%*,-. -,.+(,/ -)##-%01# $2330/+(,/
05%6$ 1)$1*+(#,/ 1)45+)3+6,%* '60$$* 6,#.0/7 '%*,07!%
12##+$,#+!*4+
/6$ 3072*+'.,%* -,%%# 7*,8'0% 4,4+1)45
,/0$,#
7*-13$ 6,%*-*,3%+'2*3,- '-'0+-,1# ,#,*$+-#)-. !0/42 '*80/7+%*,5
6'%*/+!$0'(!*+
'*-/)3-'4898$ -)/89+%!0/7 $0/4+,*, -4(#,5* *),'%0/7+(,/
(*')/
%,.0/7+-,*+)3+ -)/%,0/*+(*''2*+
#./3**)#$ 1,77,7+()*%* -,/)(5+-#)'2*+)(/ *)60/7+'!##
')$%!0/7 1,**0*
Large-scale recognition benefits from a compact descriptor for each image,
for example allowing databases to be stored in memory rather than on disk. The
65. bject Classes by Between-Class Attribute Transfer
Hannes Nickisch Stefan Harmeling
Related work
or Biological Cybernetics, T¨ bingen, Germany
u
me.lastname}@tuebingen.mpg.de
•
otter
when train-
Attribute-based recognition:
black:
white:
yes
no
brown: yes
examples of stripes: no
hardly been water: yes
[Lampert et al., CVPR’09] [Farhadi et al., CVPR’09]
eats fish: yes
rule rather
ens of thou- polar bear
black: no
very few of white: yes
d annotated brown: no
stripes: no
water: yes
introducing eats fish: yes
ct detection zebra
ption of the black: yes
description white: yes
requires hand-specified attribute-class associations
brown: no
hape, color
s. On the left
h properties
stripes:
water:
yes
no
ribute be
hey can predic-
eats fish: no
to
displayed. attribute classifiers must be trained with
arethe cur- Figure 1. A description object categories: after learningthe transfer
by high-level attributes allows
ected based of knowledge between the visual
ed for a new cat- human-labeled examples
ve across appearance of attributes from any classes with training examples,
and to “engine”,can detect also object classes that do not have any training
ike facil- we based on which attribute description a test image fits best. randomly selected positively pre
new large- images, Figure 5: This figure shows
election helps
30,000 an- tributes for 12 typical images from 12 categories in Yahoo set.
nd “rein” that of well-labeled training imageslearnedtechniques
rson’s clas- lions and is likely out of
classifiers are numerous on Pascal train set and tested on Yahoo se
reach for years to come. Therefore,
emantic at-
one class outreducing the number of necessary training imagesattributes from the list of 64 attributes a
for domly select 5 predicted have
66. Method overview
1. Classeme learning
φ”body of water” (x) →
...
φ”walking” (x) →
2. Using the classemes for recognition and retrieval
training examples of novel class
C
g duck (Φ(x)) = wc φc (x)
duck
c=1
Φ(x1 ) ... Φ(xN )
67. Classeme learning:
choosing the basis classes
• Classeme labels desiderata:
- must be visual concepts
- should span the entire space of visual classes
• Our selection:
concepts defined in the Large Scale Ontology for Multimedia
[LSCOM] to be “useful, observable and feasible for automatic
detection”.
2659 classeme labels, after manual elimination of
plurals, near-duplicates, and inappropriate concepts
68. Classeme learning:
gathering the training data
• We downloaded the top 150 images returned by
Bing Images for each classeme label
• For each of the 2659 classemes, a one-versus-the-rest
training set was formed to learn a binary classifier
φ”walking” (x)
yes no
69. Classeme learning:
training the classifiers
• Each classeme classifier is an LP-β kernel combiner
[Gehler and Nowozin, 2009]:
F
N
φ(x) = βf kf (x, xn )αf,n + bf
f =1 n=1
linear combination of feature-specific SVMs
• We use 13 kernels based on spatial pyramid histograms
computed from the following features:
- color GIST [Oliva and Torralba, 2001]
- oriented gradients [Dalal and Triggs, 2009]
- self-similarity descriptors [Schechtman and Irani, 2007]
- SIFT [Lowe, 2004]
70. A dimensionality reduction
view of classemes
GIST
self-similarity
descriptor Φ
φ1 (x)
...
x=
φ2659 (x)
oriented
gradients
• near state-of-the-art accuracy
SIFT with linear classifiers
• can be quantized down to
• non-linear kernels are needed 200 bytes/image with almost
for good classification no recognition loss
• 23K bytes/image
71. Experiment 1: multiclass
recognition on Caltech256
60 LP-β in [Gehler
LPbeta Nowozin, 2009]
LPbeta13 using 39 kernels
50 MKL
Csvm LP-β with our x
Cq1svm
40 Xsvm our approach:
linear SVM with
accuracy (%)
classemes Φ(x)
30
linear SVM with
binarized classemes,
20 i.e. (Φ(x) 0)
linear SVM with x
10
0
0 10 20 30 40 50
number of training examples
72. Computational cost
comparison
Training time Testing time
1500 40
23 hours 30
time (minutes)
1000
time (ms)
20
500
9 minutes 10
0 0
LPbeta Csvm LPbeta Csvm
73. Accuracy vs. compactness
4
10
188 bytes/image
compactness (images per MB)
3
10
2.5K bytes/image
2
10
LPbeta13 23K bytes/image
1 Csvm
10
Cq1svm
nbnn [Boiman et al., 2008] 128K bytes/image
emk [Bo and Sminchisescu, 2008]
Xsvm
0
10
10 15 20 25 30 35 40 45
accuracy (%)
Lines link performance at 15 and 30 training examples
74. Experiment 2:
object class retrieval
Efficient Object Category Recognition Using Classemes 787
30
Csvm
Cq1Rocchio (β=1, γ=0)
25
Cq1Rocchio (β=0.75, γ=0.15)
Precision @ 25 25
Bowsvm
Precision (%) @
20 BowRocchio (β=1, γ=0)
BowRocchio (β=0.75, γ=0.15)
15
• Random performance is 0.4%
10
• training Csvm takes 0.6 sec with
5*256 training examples
5
0
0 10 20 30 40 50
Number of training images
Fig. 4. Retrieval. Percentage of the top 25 in a 6400-document set which match the
query class. Random performance is 0.4%.
75. Analogies with text retrieval
• Classeme representation of an image:
presence/absence of visual attributes
• Bag-of-word representation of a text-document:
presence/absence of words
76. Related work
• Prior work (e.g., [Sivic Zisserman, 2003; Nister Stewenius, 2006;
Philbin et al., 2007]) has exploited a similar analogy for
object-instance retrieval by representing images as bag of visual words
Detect interest patches Compute SIFT descriptors [Lowe, 2004]
…
…
Quantize
Represent image as a sparse
descriptors
histogram of visual words
frequency
…..
codewords
• To extend this methodology to object-class retrieval we need:
- to use a representation more suited to object class recognition
(e.g. classemes as opposed to bag of visual words)
- to train the ranking/retrieval function for every new query-class
84. Efficient retrieval via
inverted index
Inverted index:
w: [1.5 -2 0 -5 0 3 -2 0 ]
f0 f1 f2 f3 f4 f5 f6 f7
I0 I2 I0 I2 I1 I0 I4 I6
I2 I7 I1 I3 I4 I6 I5 I9
I3 I8 I3 I9 I5 I8
I4 I7 I9
I6 I9
I8
Cost of scoring is linear in the sum of the lengths of inverted
lists associated to non-zero weights
85. Improve efficiency via
sparse weight vectors
Key-idea: force w to contain as many zeros as possible
classeme vector label of
Learning objective of example n
Tomographic inversion with example n
1 wavelet penalization 3
N
E(w) = R(w) + C
N n=1 L(w; Φn , yn )
w2
regularizer loss function
w with d = AWT w and smallest 1 -norm
•
T
L2-SVM: R(w) d =wT w w and smallestn ,2yn ) = max(0, 1 − yn (wT Φn ))
w with = AW
, L(w; Φ -norm
d = AWT w
• 2
Since |wi | wi for small wi w 2
w 2i
|wi |
and |wi | wi for large wi , w1
2
choosing R(w) = i |wi | will tend to |w|
produce a small number of larger
wi
weights and 2 -ball: wzero2 weights
more 1 + w2 = constant
2
w
1 -ball: |w1 | + |w2 | = constant
86. Improve efficiency via
sparse weight vectors
Key-idea: force w to contain as many zeros as possible
classeme vector label of
Learning objective of example n example n
N
E(w) = R(w) + C
N n=1 L(w; Φn , yn )
regularizer loss function
• L2-SVM: R(w) = wT w , L(w; Φn , yn ) = max(0, 1 − yn (wT Φn ))
• L1-LR: R(w) = i |wi | , L(w; Φn , yn ) = log(1 + exp(−yn wT Φn ))
• FGM (Feature Generating Machine) [Tan et al., 2010]:
R(w) = wT w , L(w; Φn , yn ) = max(0, 1 − yn (w ⊙ d)T Φn )
s.t. 1T d ≤ B d ∈ {0, 1}D elementwise product
87. Performance evaluation on
ImageNet (10M images)
35
! [Rastegari et al., 2011]
35
Full inner product evaluation L2 SVM
30
Full inner product evaluation L1 LR
30
Inverted index L2 SVM
Precision @ 10 (%)
25
Inverted index L1 LR
Precision @ 10 (%)
25
20
20 • Performance averaged over 400 object
15 classes used as queries
15 • 10 training examples per query class
10
10
• Database includes 450 images of the query
class and 9.7M images of other classes
5
5 •
Prec@10 of a random classifiers is 0.005%
0
20 40 60 80 100 120 140
Search time per query (seconds) 0
20 40 60 80 100 120 140
Each curve is obtained by varying sparsity through C in training objective Search time per query (seconds)
N
E(w) = R(w) + C
N n=1 L(w; Φn , yn )
regularizer loss function
88. Top-k ranking
• Do we need to rank the entire database?
- users only care about the top-ranked images
• Key idea:
- for each image iteratively update an upper-bound and
a lower-bound on the score
- gradually prune images that cannot rank in the top-k
95. Distribution of weights and
pruning rate
CCV
CV IC
1745
745 #
#1
ICCV 2011 Submission #1745. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
ICCV 2011 Submission #1745. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
540
40
11 100
100
L1−LR
L1−LR
Distribution absolute weight values
Distribution of absolute weight values
41
541
normalized of absolute weight values
42
542 L2−SVM
L2−SVM
43
543 0.8
0.8 FGM
FGM 80
80
% of images pruned
% of images pruned
44
544 TkP L1−LR, k=10
TkP L1−LR, k=10
45
545 TkP L1−LR, k=3000
TkP L1−LR, k=3000
0.6
0.6 60
60
46
546 TkP L2−SVM, k=10
TkP L2−SVM, k=10
47
547 TkP L2−SVM, k=3000
TkP L2−SVM, k=3000
48
548 0.4
0.4 40
40 TkP FGM, k=10
TkP FGM, k=10
49
549 TkP FGM, k=3000
TkP FGM, k=3000
50
550
0.2
0.2 20
20
51
551
52
552
53
553 00 00
54
554 aa 00 500
500 1000
1000 1500
1500
Dimension
2000
2000 2500
2500 bb 00 500
500 1000
1000 1500 1500 2000 2000
Number ofof iterations (d)
iterations (d)
2500
2500
Dimension Number
55
555
56
556 Figure 2. (a) Distribution of weight absolute values for different classifiers (after sorting the weight magnitudes). TkP runs faster with
Figure 2. (a) Distribution of weight absolute values for different classifiers (after sorting the weight magnitudes). TkP runs faster with
57
557
Features considered in descending order of |wi |
sparse, highly skewed weight values. (b) Pruning rate of TkP for various classification model and different values ofof k (k = 10, 3000).
sparse, highly skewed weight values. (b) Pruning rate of TkP for various classification model and different values k (k = 10, 3000).
58
558
59
559
60
560 aa smaller value of kk allows the method to eliminate more
smaller value of allows the method to eliminate more
61 images from consideration at aavery early stage. 20
20 v=128
561 images from consideration at very early stage. v=128
8
v=256
v=256
62 w=2 8 v=256
v=256 w=28 8
562 w=2 6
v=64
v=64 w=2 6 w=2
w=2
63
96. Performance evaluation on 35
ImageNet (10M images) 30
35 ! [Rastegari et al., 2011]
Precision @ 10 (%)
25
30 TkP L1−LR
20
TkP L2−SVM
Inverted index L1−LR
Precision @ 10 (%)
25
15
Inverted index L2−SVM
20
10 • k = 10
15
• Performance averaged over 400 object
5 classes used as queries
10 • 10 training examples per query class
0
0 50 •
100 150 Database includes 450 images of the query
5 Search time per query (seconds) and 9.7M images of other classes
class
• Prec@10 of a random classifiers is 0.005%
0
0 50 100 150
Search time per query (seconds)
Each curve is obtained by varying sparsity through C in training objective
N
E(w) = R(w) + C
N n=1 L(w; Φn , yn )
regularizer loss function
97. Alternative search strategy:
approximate ranking
• Key-idea: approximate the score function with a measure that can
computed (more) efficiently (related to approximate NN search:
[Shakhnarovich et al., 2006; Grauman and Darrell, 2007; Chum et al.,
2008])
• Approximate ranking via vector quantization:
wT Φ ≈ wT q(Φ) !
q(!)
where q(.) is a quantizer returning
the cluster centroid nearest to Φ
• Problem:
- to approximate well the score we need a fine quantization
- the dimensionality of our space is D=2659:
too large to enable a fine quantization using k-means clustering
98. Product quantization
!
Product quantization for nearest neighbor search
[Jegou et al., 2011]
• Split feature vector ! into v subvectors: ! [ !1 | !2 | ... | !v ]
Vector split into m subvectors:
• Subvectors are quantized separately by quantizers
Subvectors are quantized separately by quantizers
q(!) = [ q1(!1) | q2(!2) | ... | qv(!v) ]
where each qi(.) is learned in a space of dimensionality D/v
where each is learned by k-means with a limited number of centroids
• Example from [Jegou vector split in 8 subvectors of dimension 16
Example: y = 128-dim
et al., 2011]:
! is a 128-dimensional vector split into 8 subvectors of dimension 16
16 components
16 components
y1 y2 y3 y4 y5 y6 y7 y8
!1 !2 !3 !4 !5 !6 !7 !8
xedni noitazitnauq tib-46
stib 8
256 ) 1 y( 1 q
q
) 2 y( 2 q
q2
) 3 y( 3 q
q3
)4y(4q
q4
)5y(5q
q5
)6y(6q
q6
)7y(7q )8y(8q
q7 q8
28 = 256
centroids 1
centroids
q1 q2
1 q3
1 q4
1 q5 q6 q7 q8
sdiortnec 1q 2q 3q 4q 5q 6q 7q 8q
652
q1(y1) q2(y2) q3(y3) q4(y4) q5(y5) q6(y6) q7(y7) q8(y8)
q1(!1) q2(!2) q3(!3) q4(!4)
1
1y 1 1 1 1
2y 1 3y 4y 5y q5(!5) q6(!6) q7(!7) q8(!8)
6y 7y 8y
8 bits
stnenopmoc 61
64-bit quantization index
8 bits
64-bit quantization index
61 noisnemid fo srotcevbus 8 ni tilps rotcev mid-821 = y :elpmaxE
hcae erehw sdiortnec fo rebmun detimil a htiw snaem-k yb denrael si
99. obhgien tseraen rof noitazitnauq tcudorP
:srotcevbus m otni tilps rotceV
wv
.
.
.
tnauq yb yletarapes dezitnauq era srotcevbuS
w2
sub-blocks
w1
htiw snaem-k yb denrael si
centroids (r per sub-block)
hcae erehw
1.Filling the look-up table:
tcevbus 8 ni tilps rotcev mid-821 = y :elpmaxE
look-up table
can be precomputed and stored in a stnenopmoc 61
j=1
5y 4y 3y 2y T 1y
wj qj (Φj ) wT Φ ≈ wT q(Φ) =
v
652
5q 4q 3q
Efficient approximate scoring 2q 1q sdiortnec
y(5q )4y(4q )3y(3q ) 2 y( 2 q ) 1 y( 1 q
stib 8
xedni noitazitnauq tib-46
100. obhgien tseraen rof noitazitnauq tcudorP
:srotcevbus m otni tilps rotceV
wv
.
.
.
tnauq yb yletarapes dezitnauq era srotcevbuS
w2
sub-blocks
s11 w1
in
ner product
quantization for sub-block 1:
htiw snaem-k yb denrael si
centroids (r per sub-block)
hcae erehw
1.Filling the look-up table:
tcevbus 8 ni tilps rotcev mid-821 = y :elpmaxE
look-up table
can be precomputed and stored in a stnenopmoc 61
j=1
5y 4y 3y 2y T 1y
wj qj (Φj ) wT Φ ≈ wT q(Φ) =
v
652
5q 4q 3q
Efficient approximate scoring 2q 1q sdiortnec
y(5q )4y(4q )3y(3q ) 2 y( 2 q ) 1 y( 1 q
stib 8
xedni noitazitnauq tib-46
101. obhgien tseraen rof noitazitnauq tcudorP
:srotcevbus m otni tilps rotceV
wv
.
.
.
tnauq yb yletarapes dezitnauq era srotcevbuS
w2
sub-blocks
uct
s11 s12 prod w1
inner
quantization for sub-block 1:
htiw snaem-k yb denrael si
centroids (r per sub-block)
hcae erehw
1.Filling the look-up table:
tcevbus 8 ni tilps rotcev mid-821 = y :elpmaxE
look-up table
can be precomputed and stored in a stnenopmoc 61
j=1
5y 4y 3y 2y T 1y
wj qj (Φj ) wT Φ ≈ wT q(Φ) =
v
652
5q 4q 3q
Efficient approximate scoring 2q 1q sdiortnec
y(5q )4y(4q )3y(3q ) 2 y( 2 q ) 1 y( 1 q
stib 8
xedni noitazitnauq tib-46
102. obhgien tseraen rof noitazitnauq tcudorP
:srotcevbus m otni tilps rotceV
wv
.
.
.
tnauq yb yletarapes dezitnauq era srotcevbuS
w2
sub-blocks
duct
s11 s12 s13 ... ... ... ... ... ... s1r r pro i
w1
nne
quantization for sub-block 1:
htiw snaem-k yb denrael si
centroids (r per sub-block)
hcae erehw
1.Filling the look-up table:
tcevbus 8 ni tilps rotcev mid-821 = y :elpmaxE
look-up table
can be precomputed and stored in a stnenopmoc 61
j=1
5y 4y 3y 2y T 1y
wj qj (Φj ) wT Φ ≈ wT q(Φ) =
v
652
5q 4q 3q
Efficient approximate scoring 2q 1q sdiortnec
y(5q )4y(4q )3y(3q ) 2 y( 2 q ) 1 y( 1 q
stib 8
xedni noitazitnauq tib-46
103. obhgien tseraen rof noitazitnauq tcudorP
:srotcevbus m otni tilps rotceV
wv
.
.
.
tnauq yb yletarapes dezitnauq era srotcevbuS
w2
s21 in
sub-blocks
ner prod
uct w1
s11 s12 s13 ... ... ... ... ... ... s1r
quantization for sub-block 2:
htiw snaem-k yb denrael si
centroids (r per sub-block)
hcae erehw
1.Filling the look-up table:
tcevbus 8 ni tilps rotcev mid-821 = y :elpmaxE
look-up table
can be precomputed and stored in a stnenopmoc 61
j=1
5y 4y 3y 2y T 1y
wj qj (Φj ) wT Φ ≈ wT q(Φ) =
v
652
5q 4q 3q
Efficient approximate scoring 2q 1q sdiortnec
y(5q )4y(4q )3y(3q ) 2 y( 2 q ) 1 y( 1 q
stib 8
xedni noitazitnauq tib-46
104. xedni noitazitnauq tib-46
stib 8
) 1 y( 1 q ) 2 y( 2 q )3y(3q )4y(4q y(5q
Efficient approximate scoringsdiortnec
652
1q 2q 3q 4q 5q
v
wT Φ ≈ wT q(Φ) = wj qj (Φj )
T 1y 2y 3y 4y 5y
j=1
stnenopmoc 61 can be precomputed and stored in a
look-up table
tcevbus 8 ni tilps rotcev mid-821 = y :elpmaxE
2.Score each quantized vector q(Φ)
in the database using the look-up hcae erehw centroids (r per sub-block)
htiw snaem-k yb denrael si
table: s1r
s11 s12 s13 ... ... ... ... ... ...
sub-blocks
s21 s22 s23 ... ... ... ... ... ... s2r
w q(Φ) = w1 q1 (Φ1 ) + w2 q2 (Φ2 ) + . . . + wv qv... ... ) ...
T T T T
(Φv
tnauq yb yletarapes dezitnauq era srotcevbuS... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ...
T
q(Φ) = w1 q1 (Φ1 ) + w2 q2 (Φ2 ) + . . . + wv qv (Φv )
T T T
... ... ...
:srotcevbus m otni tilps rotceV ... ... ... ... ... ... ...
sv1 sv2 sv3 ... ... ... ... ... ... svr
Only v additions per image!
obhgien tseraen rof noitazitnauq tcudorP
105. Choice of parameters
! [Rastegari et al., 2011]
• Dimensionality is first reduced with PCA from D=2659 to D’ D
• How do we choose D’, v (number of sub-blocks),
r (number of centroids per sub-block)?
• Effect of parameter choices on a database of 150K images:
(v,r)
20
8 8
(128,2 ) (256,2 ) 6
(256,2 )
6
(64,2 )
15
Precision @ 10 (%)
6
8
(64,2 )
(32,2 )
(128,28)
D’=512
10 8
(16,2 ) D’=256
8 6
(32,2 ) (64,2 ) D’=128
5 (32,28)
8
(16,2 )
8
(16,2 )
0
0 0.05 0.1 0.15 0.2 0.25 0.3
Search time per query (seconds)