This document discusses techniques for large-scale visual recognition and learning. It covers the following key points in 3 sentences:
Stochastic gradient descent is discussed as an efficient approach for learning linear classifiers on large datasets that can handle new incoming samples with low memory requirements. Explicit embedding of data into higher-dimensional spaces is described as a way to perform non-linear classification with linear classifiers by mapping data into the embedding space, and techniques for additive and exponential kernels are covered. Finally, maximizing pooling is shown to be superior to average pooling for linear classifiers on bag-of-visual-words representations as it reduces variance due to different proportions of foreground and background information.
Deep Learning for Computer Vision: A comparision between Convolutional Neural...Vincenzo Lomonaco
In recent years, Deep Learning techniques have shown to perform well on a large variety of problems both in Computer Vision and Natural Language Processing, reaching and often surpassing the state of the art on many tasks. The rise of deep learning is also revolutionizing the entire field of Machine Learning and Pattern Recognition pushing forward the concepts of automatic feature extraction and unsupervised learning in general.
However, despite the strong success both in science and business, deep learning has its own limitations. It is often questioned if such techniques are only some kind of brute-force statistical approaches and if they can only work in the context of High Performance Computing with tons of data. Another important question is whether they are really biologically inspired, as claimed in certain cases, and if they can scale well in terms of “intelligence”.
The dissertation is focused on trying to answer these key questions in the context of Computer Vision and, in particular, Object Recognition, a task that has been heavily revolutionized by recent advances in the field. Practically speaking, these answers are based on an exhaustive comparison between two, very different, deep learning techniques on the aforementioned task: Convolutional Neural Network (CNN) and Hierarchical Temporal memory (HTM). They stand for two different approaches and points of view within the big hat of deep learning and are the best choices to understand and point out strengths and weaknesses of each of them.
CNN is considered one of the most classic and powerful supervised methods used today in machine learning and pattern recognition, especially in object recognition. CNNs are well received and accepted by the scientific community and are already deployed in large corporation like Google and Facebook for solving face recognition and image auto-tagging problems.
HTM, on the other hand, is known as a new emerging paradigm and a new meanly-unsupervised method, that is more biologically inspired. It tries to gain more insights from the computational neuroscience community in order to incorporate concepts like time, context and attention during the learning process which are typical of the human brain.
In the end, the thesis is supposed to prove that in certain cases, with a lower quantity of data, HTM can outperform CNN.
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...Taegyun Jeon
PR-050: Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting
Original Slide from http://home.cse.ust.hk/~xshiab/data/valse-20160323.pptx
Youtube: https://youtu.be/3cFfCM4CXws
Scene classification using Convolutional Neural Networks - Jayani WithanawasamWithTheBest
Scene Classification is used in Convolutional Neural Networks (CNNs). We seek to redefine computer vision as an AI problem, understand the importance of scene classification as well as challenges, and the difference between traditional machine learning and deep learning. Additionally, we discuss CNNs, using caffe for implementing CNNs and importact reosources to imorove.
CNNs
Jayani Withanawasam
Deep Learning for Computer Vision: A comparision between Convolutional Neural...Vincenzo Lomonaco
In recent years, Deep Learning techniques have shown to perform well on a large variety of problems both in Computer Vision and Natural Language Processing, reaching and often surpassing the state of the art on many tasks. The rise of deep learning is also revolutionizing the entire field of Machine Learning and Pattern Recognition pushing forward the concepts of automatic feature extraction and unsupervised learning in general.
However, despite the strong success both in science and business, deep learning has its own limitations. It is often questioned if such techniques are only some kind of brute-force statistical approaches and if they can only work in the context of High Performance Computing with tons of data. Another important question is whether they are really biologically inspired, as claimed in certain cases, and if they can scale well in terms of “intelligence”.
The dissertation is focused on trying to answer these key questions in the context of Computer Vision and, in particular, Object Recognition, a task that has been heavily revolutionized by recent advances in the field. Practically speaking, these answers are based on an exhaustive comparison between two, very different, deep learning techniques on the aforementioned task: Convolutional Neural Network (CNN) and Hierarchical Temporal memory (HTM). They stand for two different approaches and points of view within the big hat of deep learning and are the best choices to understand and point out strengths and weaknesses of each of them.
CNN is considered one of the most classic and powerful supervised methods used today in machine learning and pattern recognition, especially in object recognition. CNNs are well received and accepted by the scientific community and are already deployed in large corporation like Google and Facebook for solving face recognition and image auto-tagging problems.
HTM, on the other hand, is known as a new emerging paradigm and a new meanly-unsupervised method, that is more biologically inspired. It tries to gain more insights from the computational neuroscience community in order to incorporate concepts like time, context and attention during the learning process which are typical of the human brain.
In the end, the thesis is supposed to prove that in certain cases, with a lower quantity of data, HTM can outperform CNN.
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...Taegyun Jeon
PR-050: Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting
Original Slide from http://home.cse.ust.hk/~xshiab/data/valse-20160323.pptx
Youtube: https://youtu.be/3cFfCM4CXws
Scene classification using Convolutional Neural Networks - Jayani WithanawasamWithTheBest
Scene Classification is used in Convolutional Neural Networks (CNNs). We seek to redefine computer vision as an AI problem, understand the importance of scene classification as well as challenges, and the difference between traditional machine learning and deep learning. Additionally, we discuss CNNs, using caffe for implementing CNNs and importact reosources to imorove.
CNNs
Jayani Withanawasam
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
Image generation. Gaussian models for human faces, limits and relations with linear neural networks. Generative adversarial networks (GANs), generators, discrinators, adversarial loss and two player games. Convolutional GAN and image arithmetic. Super-resolution. Nearest-neighbor, bilinear and bicubic interpolation. Image sharpening. Linear inverse problems, Tikhonov and Total-Variation regularization. Super-Resolution CNN, VDSR, Fast SRCNN, SRGAN, perceptual, adversarial and content losses. Style transfer: Gatys model, content loss and style loss.
The performance of deep neural networks improves with more annotated data. The problem is that the budget for annotation is limited. One solution to this is active learning, where a model asks human to annotate data that it perceived as uncertain. A variety of recent methods have been proposed to apply active learning to deep networks but most of them are either designed specific for their target tasks or computationally inefficient for large networks. In this paper, we propose a novel active learning method that is simple but task-agnostic, and works efficiently with the deep networks. We attach a small parametric module, named “loss prediction module,” to a target network, and learn it to predict target losses of unlabeled inputs. Then, this module can suggest data that the target model is likely to produce a wrong prediction. This method is task-agnostic as networks are learned from a single loss regardless of target tasks. We rigorously validate our method through image classification, object detection, and human pose estimation, with the recent network architectures. The results demonstrate that our method consistently outperforms the previous methods over the tasks
Deep learning fundamental and Research project on IBM POWER9 system from NUSGanesan Narayanasamy
Moving object recognition (MOR) corresponds to the localisation and classification of moving objects in videos. Discriminating moving objects from static objects and background in videos is an essential task for many computer vision applications. MOR has widespread applications in intelligent visual surveillance, intrusion detection, anomaly detection and monitoring, industrial sites monitoring, detection-based tracking, autonomous vehicles, etc. In this session, Murari is going to talk about the deep learning algorithms to identify both locations and corresponding categories of moving objects with a convolutional network. The challenges in developing such algorithms will be discussed. The discourse will also include the implementation details of these models in both conventional and UAV videos.
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
Transformer Architectures in Vision
[2018 ICML] Image Transformer
[2019 CVPR] Video Action Transformer Network
[2020 ECCV] End-to-End Object Detection with Transformers
[2021 ICLR] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
Image generation. Gaussian models for human faces, limits and relations with linear neural networks. Generative adversarial networks (GANs), generators, discrinators, adversarial loss and two player games. Convolutional GAN and image arithmetic. Super-resolution. Nearest-neighbor, bilinear and bicubic interpolation. Image sharpening. Linear inverse problems, Tikhonov and Total-Variation regularization. Super-Resolution CNN, VDSR, Fast SRCNN, SRGAN, perceptual, adversarial and content losses. Style transfer: Gatys model, content loss and style loss.
The performance of deep neural networks improves with more annotated data. The problem is that the budget for annotation is limited. One solution to this is active learning, where a model asks human to annotate data that it perceived as uncertain. A variety of recent methods have been proposed to apply active learning to deep networks but most of them are either designed specific for their target tasks or computationally inefficient for large networks. In this paper, we propose a novel active learning method that is simple but task-agnostic, and works efficiently with the deep networks. We attach a small parametric module, named “loss prediction module,” to a target network, and learn it to predict target losses of unlabeled inputs. Then, this module can suggest data that the target model is likely to produce a wrong prediction. This method is task-agnostic as networks are learned from a single loss regardless of target tasks. We rigorously validate our method through image classification, object detection, and human pose estimation, with the recent network architectures. The results demonstrate that our method consistently outperforms the previous methods over the tasks
Deep learning fundamental and Research project on IBM POWER9 system from NUSGanesan Narayanasamy
Moving object recognition (MOR) corresponds to the localisation and classification of moving objects in videos. Discriminating moving objects from static objects and background in videos is an essential task for many computer vision applications. MOR has widespread applications in intelligent visual surveillance, intrusion detection, anomaly detection and monitoring, industrial sites monitoring, detection-based tracking, autonomous vehicles, etc. In this session, Murari is going to talk about the deep learning algorithms to identify both locations and corresponding categories of moving objects with a convolutional network. The challenges in developing such algorithms will be discussed. The discourse will also include the implementation details of these models in both conventional and UAV videos.
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
Transformer Architectures in Vision
[2018 ICML] Image Transformer
[2019 CVPR] Video Action Transformer Network
[2020 ECCV] End-to-End Object Detection with Transformers
[2021 ICLR] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017StampedeCon
Words are no longer sufficient in delivering the search results users are looking for, particularly in relation to image search. Text and languages pose many challenges in describing visual details and providing the necessary context for optimal results. Machine Learning technology opens a new world of search innovation that has yet to be applied by businesses.
In this session, Mike Ranzinger of Shutterstock will share a technical presentation detailing his research on composition aware search. He will also demonstrate how the research led to the launch of AI technology allowing users to more precisely find the image they need within Shutterstock’s collection of more than 150 million images. While the company released a number of AI search enabled tools in 2016, this new technology allows users to search for items in an image and specify where they should be located within the image. The research identifies the networks that localize and describe regions of an image as well as the relationships between things. The goal of this research was to improve the future of search using visual data, contextual search functions, and AI. A combination of multiple machine learning technologies led to this breakthrough.
In this deck, Huihuo Zheng from Argonne National Laboratory presents: Data Parallel Deep Learning.
"The Argonne Training Program on Extreme-Scale Computing (ATPESC) provides intensive, two weeks of training on the key skills, approaches, and tools to design, implement, and execute computational science and engineering applications on current high-end computing systems and the leadership-class computing systems of the future."
Watch the video: https://wp.me/p3RLHQ-lsl
Learn more: https://extremecomputingtraining.anl.gov/archive/atpesc-2019/agenda-2019/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Journal club done with Vid Stojevic for PointNet:
https://arxiv.org/abs/1612.00593
https://github.com/charlesq34/pointnet
http://stanford.edu/~rqi/pointnet/
Deep learning for Indoor Point Cloud processing. PointNet, provides a unified architecture operating directly on unordered point clouds without voxelisation for applications ranging from object classification, part segmentation, to scene semantic parsing.
Alternative download link:
https://www.dropbox.com/s/ziyhgi627vg9lyi/3D_v2017_initReport.pdf?dl=0
The slides for the techniques to use Convolutional Neural Networks (CNN) for the sequence modeling tasks, including image captioning and natural machine translation (NMT). The slides contain the main building blocks from different papers. Used in group paper reading in University of Sydney.
H2O Open Source Deep Learning, Arno Candel 03-20-14Sri Ambati
More information in our Deep Learning webinar: http://www.slideshare.net/0xdata/h2-o-deeplearningarnocandel052114
Latest slide deck: http://www.slideshare.net/0xdata/h2o-distributed-deep-learning-by-arno-candel-071614
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Nexgen Technology Address:
Nexgen Technology
No :66,4th cross,Venkata nagar,
Near SBI ATM,
Puducherry.
Email Id: praveen@nexgenproject.com.
www.nexgenproject.com
Mobile: 9751442511,9791938249
Telephone: 0413-2211159.
NEXGEN TECHNOLOGY as an efficient Software Training Center located at Pondicherry with IT Training on IEEE Projects in Android,IEEE IT B.Tech Student Projects, Android Projects Training with Placements Pondicherry, IEEE projects in pondicherry, final IEEE Projects in Pondicherry , MCA, BTech, BCA Projects in Pondicherry, Bulk IEEE PROJECTS IN Pondicherry.So far we have reached almost all engineering colleges located in Pondicherry and around 90km
[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...Seiya Ito
第5回 3D勉強会@関東
Deep Reinforcement Learning of Volume-guided Progressive View Inpainting for 3D Point Scene Completion from a Single Depth Image
CVPR 2019 (oral)
Generalizing Scientific Machine Learning and Differentiable Simulation Beyond...Chris Rackauckas
The combination of scientific models into deep learning structures, commonly referred to as scientific machine learning (SciML), has made great strides in the last few years in incorporating models such as ODEs and PDEs into deep learning through differentiable simulation. However, the vast space of scientific simulation also includes models like jump diffusions, agent-based models, and more. Is SciML constrained to the simple continuous cases or is there a way to generalize to more advanced model forms? This talk will dive into the mathematical aspects of generalizing differentiable simulation to discuss cases like chaotic simulations, differentiating stochastic simulations like particle filters and agent-based models, and solving inverse problems of Bayesian inverse problems (i.e. differentiation of Markov Chain Monte Carlo methods). We will then discuss the evolving numerical stability issues, implementation issues, and other interesting mathematical tidbits that are coming to light as these differentiable programming capabilities are being adopted.
Bio: Dr. Chris Rackauckas is the VP of Modeling and Simulation at JuliaHub, the Director of Scientific Research at Pumas-AI, Co-PI of the Julia Lab at MIT, and the lead developer of the SciML Open Source Software Organization. For his work in mechanistic machine learning, his work is credited for the 15,000x acceleration of NASA Launch Services simulations and recently demonstrated a 60x-570x acceleration over Modelica tools in HVAC simulation, earning Chris the US Air Force Artificial Intelligence Accelerator Scientific Excellence Award. See more at https://chrisrackauckas.com/. He is the lead developer of the Pumas project and has received a top presentation award at every ACoP in the last 3 years for improving methods for uncertainty quantification, automated GPU acceleration of nonlinear mixed effects modeling (NLME), and machine learning assisted construction of NLME models with DeepNLME. For these achievements, Chris received the Emerging Scientist award from ISoP.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
The Internet of Things (IoT) is a revolutionary concept that connects everyday objects and devices to the internet, enabling them to communicate, collect, and exchange data. Imagine a world where your refrigerator notifies you when you’re running low on groceries, or streetlights adjust their brightness based on traffic patterns – that’s the power of IoT. In essence, IoT transforms ordinary objects into smart, interconnected devices, creating a network of endless possibilities.
Here is a blog on the role of electrical and electronics engineers in IOT. Let's dig in!!!!
For more such content visit: https://nttftrg.com/
We have compiled the most important slides from each speaker's presentation. This year’s compilation, available for free, captures the key insights and contributions shared during the DfMAy 2024 conference.
6th International Conference on Machine Learning & Applications (CMLA 2024)ClaraZara1
6th International Conference on Machine Learning & Applications (CMLA 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
2. Motivation
Most popular image classification pipeline: BOV + SVMs
! See results of PASCAL VOC challenge
http://pascallin.ecs.soton.ac.uk/challenges/VOC/
But can we scale SVMs to a dataset such as ImageNet…
… at a reasonable computational cost?
3. Outline
Efficient learning of linear classifiers with SGD
Non-linear classification with explicit embedding
An embedding view of the BOV and FV
Putting everything together
4. Outline
Efficient learning of linear classifiers with SGD
Non-linear classification with explicit embedding
An embedding view of the BOV and FV
Putting everything together
5. Learning with SGD
Acknowledgement
The following slides are largely based on material available on Léon
Bottou’s webpage:
http://leon.bottou.org/projects/sgd
See also the excellent code available on the same webpage
6. Learning with SGD
Historical note
Stochastic Gradient Descent (SGD) was the standard for back-propagation
training of Multi-Layer Perceptrons (MLP)
LeCun, Bottou, Orr, Müller, “Efficient BackProp”, Neural Networks: Tricks of the trade”, Springer, 1998.
With the rise of large-margin classification algorithms (e.g. SVMs), MLPs fell
somewhat into disuse… and so did SGD
Then, it was showed that SGD can be applied to large-margin classifiers
with excellent properties
Shalev-Swartz, Singer, Srebro, “Pegasos: Primal Estimated sub-GrAdient Solver for SVM”, ICML’07.
Bottou, Bousquet, “The Tradeoffs of large scale learning”, NIPS’07.
Shalev-Swartz, Srebro, “SVM optimization: inverse dependence on training set size”, ICML’08.
! Today very popular for large-scale learning
7. Learning with SGD
Objective functions
We consider objective functions of the form:
•! the zi’s correspond to the training samples
•! w is the set of parameters to be learned
Linear two-class SVM example (primal formulation):
with
!
8. Learning with SGD
Batch Gradient Descent (BGD)
Iterative optimization algorithm:
where "t is the step size (e.g. line search)
!! assumes all samples are available in a large batch
!! disk / memory requirements in O(N)
!! slow convergence (more later)
9. Learning with SGD
Stochastic Gradient Descent (SGD)
Iterative optimization algorithm:
•! pick a random sample zt
•! (noisy) update:
where "t is the step size (e.g. fixed schedule)
"" disk/memory requirements in O(1)
"" can handle new incoming samples
"" faster convergence
10. Learning with SGD
Convergence: BGD vs SGD
Imagine you have 100 training samples and that you copy them 10 times
! 10x100 = 1,000 training samples
If we do one pass over the data:
•! BGD: gradient with 100 samples = gradient with 1,000 samples
! computational cost x10 for the exact same result
•! SGD: better results with the 1,000 samples than with 100 samples
! SGD exploits the redundancy in the training data
11. Learning with SGD
Choosing the step size
To guarantee convergence to the optimum:
•! "t should decrease fast enough …
•! … but not too fast:
! important to tune (a,t0) correctly
However, fixed schedule can also work well in practice
Bai, Weston, Grangier, Collobert, Chapelle, Weinberger, “Supervised semantic indexing”, CIKM’09.
Weston, Bengio, Usunier, “Large-scale image annotation: Learning to rank with joint word-image embeddings”, ECML’10.
Perronnin, Akata, Harchaoui, Schmid, “Towards good practice in large-scale learning for image classification”, CVPR’12.
12. We have
with
Iteratively:
•! pick a random sample zt=(xt,yt)
•! update:
with
! extremely simple to implement: few lines of code
Learning with SGD
Linear two-class SVM example
13. Learning with SGD
Beyond one-vs-rest
SGD is applicable to many optimization problems including:
•! K-means clustering
•! metric learning
•! learning to rank
•! multiclass learning
•! structured learning
•! etc.
14. Learning with SGD
Extensions
Mini batch SGD: sample K training samples
•! if K=1 ! traditional SGD setting
•! If K=N ! BGD
Second-order SGD:
where Ht is an approximation of the Hessian
Bordes, Bottou, Galinari, “SGD-QN: careful quasi-Newton stochastic gradient descent”, JMLR’09
Averaging SGD:
Polyak, Juditsky, “Acceleration of stochastic approximation by averaging”, SIAM J. Control and Optimization’92.
Xu, “Towards optimal one pass large-scale learning with averaged stochastic gradient descent”, arXiv, 2011.
15. Learning with SGD
Summary
Very generic principle applicable to many algorithms
Very simple to implement…
… but getting the step-size correct is not always trivial
Other online learning algorithms are closely related to SGD:
e.g. Passive-Aggressive algorithms
Crammer, Dekel, Keshet, Shaledv-Swartz, Singer, “Online passive-aggressive algorithms”, JMLR’06.
Best practices for large-scale learning of image classifiers :
Perronnin, Akata, Harchaoui, Schmid, “Towards good practice in large-scale learning for image classification”, CVPR’12.
! a key conclusion: one-vs-rest is a competitive strategy with respect to more
complex multiclass / ranking SVM formulations
! see also code available at http://lear.inrialpes.fr/src/jsgd/
16. Outline
Efficient learning of linear classifiers with SGD
Non-linear classification with explicit embedding
An embedding view of the BOV and FV
Putting everything together
17. Explicit embedding
Kernels for the BOV
Consider two families of non-linear kernels on BOV:
Additive Kernels: Exponential Kernels:
18. Explicit embedding
Non-linear vs linear
We have just discussed how to train efficiently linear classifiers…
… Yet non-linear classifiers perform
significantly better on the BOV:
•! dense SIFT
•! codebook with 4k visual words
•! [1]: Léon Bottou’s SGD
•! [2]: Chang and Lin’s LIBSVM
From: Perronnin, Sánchez and Liu, “Large-scale image categorization with explicit data embedding”, CVPR’10.
Can we get the accuracy of non-linear at the cost of linear?
PASCAL VOC 2007
19. Explicit embedding
SVM formulations
Dual formulation:
"" avoids explicit mapping of data in high-dimensional space
!! training cost (e.g. SMO): between quadratic and cubic in # training samples
!! runtime cost: linear in # support vectors
! Difficult to scale to very large image sets (e.g. ImageNet)
Primal formulation:
!! requires explicit embedding of data in higher dimensional space
"" training cost (e.g. SGD): linear in # training samples
"" runtime cost: independent of # support vectors
!! Is explicit embedding such a high price to pay in large-scale?
20. Explicit embedding
Primal formulation
Positive definite kernel corresponds to implicit embedding:
The score becomes:
non-linear classification in the original space
=
linear classification in the embedding space
21. Explicit embedding
Principle
Non-linear classification with explicit embedding:
•! perform an explicit (approximate) embedding of the data with
•! learn linear classifiers in the new space (e.g. with efficient SGD)
Our focus: given a kernel, find
Generally a two-step procedure:
•! find exact embedding function
•! find good (i.e. accurate and efficient) approximation
22. Outline
Efficient learning of linear classifiers with SGD
Non-linear classification with explicit embedding:
•! general case
•! additive kernels
•! exponential kernels
An embedding view of the BOV and FV
Putting everything together
23. Outline
Efficient learning of linear classifiers with SGD
Non-linear classification with explicit embedding:
•! general case
•! additive kernels
•! exponential kernels
An embedding view of the BOV and FV
Putting everything together
24. Problem: given an (unlabeled) training set with (M>E)
find:
Kernel principal Component Analysis (kPCA):
!! compute the MxM kernel matrix
!! compute eigen-values / -vectors:
Embed a vector z with the Nyström approximation:
!! project z:
!! embed:
Explicit embedding
A generic solution: kPCA
Schölkopf, Smola and Müller, “Non-linear component analysis as a kernel eigenvalue problem”, Neural Computation’98.
Williams and Seeger, “Using the Nystrom method to speed-up kernel machines”, NIPS’02.
25. Embedding cost:
!! kernel matrix computation: O(M^2D)
!! eigen-decomposition (incomplete Cholesky): O(M^2E)
!! embedding one sample: O(M(D+E))
If we have N labeled training samples to learn our SVMs:
!! kPCA embedding cost at training time: O(M(M+N)(D+E))
!! to be compared to direct kernel computation: O(N^2D)
"" works well if data lies in low-dimensional manifold (e.g. faces, digits)
!! does not work so well on complex high-dimensional visual data
! Find specific solution for additive and exponential kernels
Explicit embedding
A generic solution: kPCA
26. Outline
Efficient learning of linear classifiers with SGD
Non-linear classification with explicit embedding:
•! general case
•! additive kernels
•! exponential kernels
An embedding view of the BOV and FV
27. Kernels of the form:
Score becomes:
with
! Generalized Additive Model (GAM)
can be approximated by piecewise constant/linear functions.
! 1D functions enable efficient classification
Explicit embedding
Additive kernels
Maji, Berg and Malik, “Classification using intersection kernel support vector machines is efficient”, CVPR’08.
Hastie and Tibshirani, “Generalized additive models”, Statistical Science’86.
28. Explicit embedding
Additive kernels
Embedding 1D functions: find
such that
The final embedding function is given by:
Maji and Berg, “Max-margin additive classifiers for detection”, ICCV’09.
29. Explicit embedding
Bhattacharya / Hellinger kernel
Explicit mapping is trivial and exact:
Perronnin, Sánchez and Liu, “Large-scale image categorization with explicit data embedding”, CVPR’10.
Vedaldi and Zisserman, “Efficient additive kernels via explicit feature maps”, CVPR‘10.
Intuitive interpretation: reduce effect of bursty visual words
Jégou, Douze and Schmid, “On the burstiness of visual elements”, ICCV’09.
Theoretical interpretation: variance stabilizing transform
Winn, Criminisi and Minka, “Object categorization by learned universal visual vocabulary”, ICCV’05.
30. Introducing the indicator function:
we have:
How to approximate the ! dimensional feature map with a finite one?
!! replace integral by sum
0 x z
1
Explicit embedding
Intersection kernel
31. Split [0,1] into N bins (e.g. N=10):
Introducing notations: and
We have:
Maji and Berg, “Max-margin additive classifiers for detection”, ICCV’09.
0 x=0.33 z=0.54
1
1
Explicit embedding
Intersection kernel
32. Split [0,1] into N bins (e.g. N=10):
Introducing notation:
We have:
Maji and Berg, “Max-margin additive classifiers for detection”, ICCV’09.
0 x=0.33 z=0.54
1
1
Explicit embedding
Intersection kernel
33. "" As accurate as original kernel classifier
"" Embedding is (almost) costless
"" Embedded vectors can be sparsified: encode index (and residual)
"" Training/classification uses simple look-up table accesses
!! Look-up table accesses can be much slower than standard arithmetic
operations (e.g. addition, multiplication)
!! Embedding space is much larger than original space (e.g. x30) and linear
classifiers are not sparse
See also: Wang, Hoiem and Forsyth, “Learning image similarity from Flickr groups using SIKMA”, ICCV’09.
Explicit embedding
Intersection kernel
34. Homogeneous kernels:
For any homogeneous kernel there exists a symmetric non-negative measure
such that:
This can be rewritten as:
with
! closed form formulas exist for standard computer vision kernels
Hein and Bousquet, “Hilbertian metrics and positive definite kernels on probability measures”, AISTAT’05.
Vedaldi and Zisserman, “Efficient additive kernels via explicit feature maps”, CVPR’10.
Explicit embedding
Additive homogeneous kernels
35. Explicit embedding
Additive homogeneous kernels
How to approximate the ! dimensional feature map? ! sampling
and
with L the sampling step
Comparison with [Maji’09]:
"" Higher accuracy for same dimensionality
From: Vedaldi and Zisserman, “Efficient additive kernels via explicit feature maps”, CVPR’10.
36. Explicit embedding
Additive kernels: general case
Going back to kernel kPCA: we can
!! perform kPCA independently in each dimension
!! concatenate dimension-wise embeddings
! Additive kPCA (addkPCA)
Embedding cost:
!! computing D kernel matrices: O(M^2D)
!! eigen-decomposition (Cholesky): O(M^2E)
!! embedding of a sample: O(M(D+E))
! Same cost as kPCA? No as M much smaller for addkPCA!
"" Exploits data distribution: variable number of output embedding
dimensions per input dimension
Perronnin, Sánchez and Liu, “Large-scale image categorization with explicit data embedding”, CVPR’10.
37. Explicit embedding
AddkPCA on Kchi2
Perronnin, Sánchez and Liu, “Large-scale image categorization with explicit data embedding”, CVPR’10.
38. Outline
Efficient learning of linear classifiers with SGD
Non-linear classification with explicit embedding:
•! general case
•! additive kernels
•! exponential kernels
An embedding view of the BOV and FV
Putting everything together
39. Explicit embedding
Shift-invariant kernels
Shift-invariant kernels:
Bochner’s theorem: a shift-invariant kernel is psd if and only if it is the Fourier
transform of a non-negative measure p
Exact embedding:
with:
Approximate embedding by sampling: draw #’s iid from p
Rahimi and Recht, “Random features for large-scale kernel machines”, NIPS’07.
40. Explicit embedding
Exponential kernels
Example: RBF kernel ! p = Gaussian distribution
But RBF kernel on BOV vectors does not perform well…
However, if we have:
then we can write exponential kernels as shift-invariant:
Two-stage embedding process:
•! embed additive kernel with any of the previous methods
•! embed exponential kernel with random projections
Perronnin, Sánchez and Liu, “Large-scale image categorization with explicit data embedding”, CVPR’10.
Vempati, Vedaldi, Zisserman and Jawahar, “Generalized RBF features maps for efficient detection”, BMVC’10.
42. Explicit embedding
Summary
Explicit embedding works: from hours/days of training and testing to
seconds/minutes!
In more detail:
•! square-rooting the BOV already brings large improvements!
•! additive kernels bring additional improvement at affordable cost
•! exponential kernels bring additional gain at a significant cost
See also for a recent study: Gong and Lazebnik, “Comparing data-dependent and data-independent embeddings for
classification and ranking of Internet images”, CVPR’11.
Is explicit embedding the only way to make linear classifiers work on BOV
vectors? No!
43. Explicit embedding
Max-pooling
The pooling strategy has a significant impact: max-pooling of local
statistics superior to average pooling with linear classifiers.
Why? Reduces variance due to different proportions of foreground vs
background information.
Example on Caltech 101:
AVG POOLING MAX POOLING
LK IK LK IK
Hard Quantization 51.4 (0.9) 64.2 (1.0) 64.3 (0.9) 64.3 (0.9)
Soft Quantization 57.9 (1.5) 66.1 (1.2) 69.0 (0.8) 70.6 (1.0)
Sparse Coding 61.3 (1.3) 70.3 (1.3) 71.5 (1.1) 71.8 (1.0)
From [Boureau, CVPR’10]. LK = Linear kernel. IK = Intersection kernel.
Yang, Yu, Gong and Huang, “Linear spatial pyramid matching using sparse coding for image classification”, CVPR’09.
Wang, Yang, Yu, Lv, Huang and Gong, “Locality-constrained Linear Coding for Image Classification”, CVPR’10
Boureau, Bach, LeCun and Ponce, “Learning mid-level features for recognition”, CVPR’10.
See also: Boureau, Ponce and LeCun, “A Theoretical analysis of feature pooling in visual recognition”, ICML’10.
44. Outline
Efficient learning of linear classifiers with SGD
Non-linear classification with explicit embedding
An embedding view of the BOV and FV
Putting everything together
45. Embedding view of the BOV and FV
Embedding view of the BOV: !
So to obtain high-dim linearly separable BOV representations, we do :
•! embed local descriptors individually
•! average ! BOV histogram
•! embed BOV histogram
! Aren’t two rounds of embeddings suboptimal?
46. Embedding view of the BOV and FV
Embedding view of the BOV: !
So to obtain high-dim linearly separable BOV representations, we do :
•! embed local descriptors individually
•! average ! BOV histogram
•! embed BOV histogram
! Aren’t two rounds of embeddings suboptimal?
Fisher kernels can perform directly an embedding in much higher dim:
! works well in combination with linear classifiers
47. BOV:
FV:
A linear classifier on these representations induces in the descriptor space:
Embedding view of the BOV and FV
Csurka and Perronnin, “An efficient approach to semantic segmentation”, IJCV ‘10.
48. BOV:
FV:
A linear classifier on these representations induces in the descriptor space:
•! in the BOV case: a piece-wise constant decision function
Embedding view of the BOV and FV
Csurka and Perronnin, “An efficient approach to semantic segmentation”, IJCV ‘10.
µ1 µ2 µi… µi+1
…
w1
w2
wi
wi+1
49. BOV:
FV:
A linear classifier on these representations induces in the descriptor space:
•! in the BOV case: a piece-wise constant decision function
•! in the FV case: a piecewise linear / quadratic decision function
Embedding view of the BOV and FV
Csurka and Perronnin, “An efficient approach to semantic segmentation”, IJCV ‘10.
µ1 µ2 µi… µi+1
…
50. BOV:
FV:
A linear classifier on these representations induces in the descriptor space:
•! in the BOV case: a piece-wise constant decision function
•! in the FV case: a piecewise linear / quadratic decision function
! FV leads to more complex decision functions for same vocabulary size
Embedding view of the BOV and FV
Csurka and Perronnin, “An efficient approach to semantic segmentation”, IJCV ‘10.
51. Outline
Efficient learning of linear classifiers with SGD
Non-linear classification with explicit embedding
An embedding view of the BOV and FV
Putting everything together:
•! large-scale pipelines
•! large-scale results
52. Outline
Efficient learning of linear classifiers with SGD
Non-linear classification with explicit embedding
An embedding view of the BOV and FV
Putting everything together:
•! large-scale pipelines
•! large-scale results
53. BOV-based large-scale classification
BOV + explicit embedding + linear classifier
Memory / storage issue: embedding increases feature dim, e.g. x3 in
[Vedaldi, CVPR’10] or [Perronnin, CVPR’10]
If embedding cost is low enough, efficient learning with SGD is possible:
•! pick a random sample in original (low-dimensional) space
•! embed sample on-the-fly in new (high-dimensional) space
•! feed embedded sample to SGD routine
•! discard embedded sample
54. FV-based large-scale classification
FV + linear classifier
Memory / storage issue: the FV is high-dim (e.g. "M dim) and weakly sparse
! Example using 4-byte floating point arithmetic:
•! ILSVRC2010 # 2.8TBs
•! ImageNet # 23TBs
… per low-level descriptor type
"" However, we can significantly compress FVs, e.g. using PQ
Jégou, Douze and Schmid, “Product quantization for nearest neighbor search”, TPAMI’11.
Jégou, Douze, Schmid and Pérez, “Aggregating local descriptors into a compact image representation”, CVPR’10.
! 64 fold compression with zero accuracy loss in categorization
Sánchez and Perronnin, “High-dimensional signature compression for large-scale image classification”, CVPR’11.
55. FV-based large-scale classification
FV + PQ (at training time) + linear classifier
Efficient SGD learning algorithm:
•! pick a random compressed sample
•! decompress on-the-fly in high-dimensional space
•! feed uncompressed sample to SGD routine
•! discard uncompressed sample
! one decompressed sample alive in RAM at a time
Sánchez and Perronnin, “High-dimensional signature compression for large-scale image classification”, CVPR’11.
Also possible to learn classifiers without decompression
Vedaldi, Zisserman, “Sparse kernel approximations for efficient classification and detection”, CVPR’12.
Note: test samples do not need to be compressed
56. Outline
Efficient learning of linear classifiers with SGD
Non-linear classification with explicit embedding
An embedding view of the BOV and FV
Putting everything together:
•! large-scale pipelines
•! large-scale results
57. Large-scale results
Evaluation
How to measure accuracy when dealing with a large number of classes?
One cannot expect all images to be labeled with all possible classes
! Measure a top-K loss (e.g. K=5 in ILSVRC)
Given a test image (labeled with G), predict labels L1, … LK
loss = mink d(G, Lk) with d(x,y) = 1 if x=y, 0 otherwise
Not all mistakes are equally bad
! Measure a hierarchical loss
d(x,y) = height of lowest common ancestor of x and y / max. possible height
58. ImageNet Large-Scale Visual Recognition Challenge 2010:
•! 1K classes (leaves only)
•! 1.2M training images + 50K validation images + 150K test images
•! measure top-5 accuracy (flat and hierarchical)
Berg, Deng and Fei-Fei, “Large-scale visual recognition challenge 2010”
Results on ILSVRC2010
Top 2 systems employed higher order statistics
59. ImageNet Large-Scale Visual Recognition Challenge 2010:
•! 1K classes (leaves only)
•! 1.2M training images + 50K validation images + 150K test images
•! measure top-5 accuracy (flat and hierarchical)
Berg, Deng and Fei-Fei, “Large-scale visual recognition challenge 2010”
Results on ILSVRC2010
No compression: training with Hadoop map-reduce (# 100 mappers)
With PQ compression: training on a single server
60. Results on ILSVRC2011
ImageNet Large-Scale Visual Recognition Challenge 2011:
•! 1K classes (leaves only)
•! 1.2M training images + 50K validation images + 100K test images
•! measure top-5 accuracy (flat and hierarchical)
Berg, Deng and Fei-Fei, “Large-scale visual recognition challenge 2011”
FV + PQ compression + SGD learning
BOV + non-linear SVM in dual (GPU computing)
61. Results on ImageNet10K
Results on ImageNet10K:
•! 10,184 classes (leaves and internal nodes)
•! # 9M images: " training / " test
•! accuracy measured as % top-1 correct
62. Results on ImageNet10K
Results on ImageNet10K:
•! 10,184 classes (leaves and internal nodes)
•! # 9M images: " training / " test
•! accuracy measured as % top-1 correct
SIFT + BOV (21K-dim) + explicit embedding + linear SVM (SGD)
!! accuracy = 6.4%
!! training time # 6 CPU years
Deng, Berg, Li and Fei-Fei, “What does classifying more than 10,000 image categories tell us?”, ECCV’10.
63. Results on ImageNet10K
Results on ImageNet10K:
•! 10,184 classes (leaves and internal nodes)
•! # 9M images: " training / " test
•! accuracy measured as % top-1 correct
SIFT + BOV (21K-dim) + explicit embedding + linear SVM (SGD)
!! accuracy = 6.4%
!! training time # 6 CPU years
Deng, Berg, Li and Fei-Fei, “What does classifying more than 10,000 image categories tell us?”, ECCV’10.
SIFT + FV (130K-dim) + PQ compression + linear SVM (SGD)
!! accuracy = 19.1%
!! training time # 1 CPU year (trick: do not sample all negatives)
Perronnin, Akata, Harchaoui, Schmid, “Towards good practice in large-scale learning for image classification”, CVPR’12.